24 Handb ook of Big Data
In the parallel processing literature, the term embarrassingly parallel (EP) refers to
applications in which computation may be split into chunks whose computation can be
done independently of each other. For instance, finding the sum of elements of a vector has
the EP property, whereas complex sorting algorithms do not. The above-described method
has a big advantage in that it in essence can convert a non-EP statistical problem into an
EP one, as the computation of the
˜
θ
i
above can be done independently of each other, hence
the term SA—converting non-EP problems to (statistically equivalent) EP ones.
R code implementing SA is available in [13].
2.4.2 Validity of the SA Method
The speedup in computation would of course be useless if θ were less statistically accurate
than
θ. Fortunately, that is not the case. In [13], it was shown that the SA method works well
for any asymptotically normal estimator, in the sense that SA achieves the same statistical
accuracy as that estimator:
Fix r while letting n and k go to ∞.If
θ is asymptotically multivariate normally
distributed with asymptotic covariance matrix Σ, then the same holds for
θ.
In other words,
θ has the same statistical accuracy as that of
θ, and thus our speedup in
computation does not come at the expense of statistical accuracy. This result is intuitively
clear, and is proven using characteristic functions in [13].
It is clear that the SA method also works if p is growing, as long as one has asymptotically
normal estimators. For this reason, a rough criterion for usability of the method might be
taken to be p<
√
n, the Portnoy criterion discussed above; a more conservative criterion
would be to apply the above to the groups, that is, require p<
n/r.
∗
Concerning the asymptotic nature of the SA method, note that it is used only in sit-
uations in which n is large enough for SA to bring a computational speedup, settings in
which n should be large enough for the asymptotics to have taken hold. SA was applied to
a variety of estimators in [13], and excellent agreement between
θ and
θ was found, with
proportional difference always being under 0.001 and in most cases much smaller.
Note that another reason for restricting use of the method to Big-n is bias, which may be
more prominent in smaller samples. See Section 2.5.3 for a discussion of this in the principal
components analysis (PCA) setting.
Since the data groups are independent, it is straightforward to calculate an estimated
covariance matrix for
θ; we simply take the sample covariance of the r quantities
˜
θ, divided
by r. The R code available in [13] does this.
2.4.3 Complexity Analysis
Suppose the work associated with computing
θ is O(n
c
). This would be the case, for instance,
in computation of any U-statistic, in which one must apply some function to all possible
(X
i
,X
j
) pairs of observations in our data, so that we must do O(n
2
) amount of computation.
If the r chunks are handled simultaneously on a parallel machine, say multicore, SA reduces
the time complexity of an O(n
c
) problem to roughly O(n
c
/r
c
).
Thus, statistical applications with computational time complexity greater than or equal
to O(n) will be major beneficiaries of SA. Examples of such applications investigated in [13]
include quantile regression, estimation of hazard functions with censored data, and so on.
For c>1SAbringsasuperlinear speedup, meaning a speedup that is faster than linear
in r; the latter would only reduce the time to O(n
c
/r). Another noteworthy point is that
∗
One sometimes hears that Tukey recommended that one use not more than
√
n predictor variables in
regression analyses, but the author has not found a reference for this.