Annu. Rev. Astron. Astrophys. 1994. 32:
371-418 Copyright © 1994 by . All
rights reserved |

Measuring redshift-independent distances to many galaxies at large
distances is the key to large-scale dynamics (review:
Jacoby *et al.* 1992). The
simplest method
assumes that a certain class of objects is a ``standard candle'', in the
sense that a distance-dependent observable is
distributed intrinsically at random with small variance about a universal
mean. The luminosity of an object
(
*r*^{-2}) or its diameter ( *r*^{-1}),
can serve as this quantity. In a pioneering
study, Rubin *et al.* (1976a,
b)
used the brightness of giant Sc spirals to discover a net motion
for the shell at 35-60 h^{-1}Mpc that agrees within the errors with
more modern results, but the large uncertainties in this simple
distance indicator made this result controversial at the time.

So far, the most useful distance indicators for LSS have been of the
TF-kind, based on intrinsic
relations between *two* quantities: a
distance-dependent quantity such as the flux
*L / r*^{2}, and
a distance-independent quantity - the maximum rotation velocity
of spirals (TF) or the velocity
dispersion in ellipticals (FJ). The
intrinsic relations are power laws, *L* ^{}, i.e.,
*M*() = *a*
- *b*,
where *M* - 2.5log
*L* + *const* is the absolute
magnitude and
log. The slope *b* can be
determined empirically in clusters, where all the galaxies are assumed to
be at the same distance, typically yielding
3 - 4, depending on the
luminosity band
(e.g. _{I} 3, _{H} 4). Then, for any other galaxy with observed and
apparent magnitude *m* -2.5 log (*L / r*^{2}) + *const*, one can
determine a *relative* distance via 5 log *r* = *m* -
*M*().
There exists a fundamental freedom in determining the *zero point*,
*a*, which fixes the distances at absolute values
(in km s^{-1}, not to be confused with *H* which translates
to Mpc).
Changing *a*, i.e., multiplying the distances
by a factor (1 + )
while the redshifts are fixed, is equivalent to
adding a monopole Hubble-like component - *r* to **v**,
and an offset 3 to
(Equation 4).
It has been arbitrarily determined in several data sets, e.g. by
assuming *u* = 0 for the Coma cluster, but *a* is better
determined by minimizing the variance of the recovered
peculiar velocity field in a large ``fair'' volume. The original
TF
technique has been improved by moving from blue to near-infrared
photometry (H band,
Aaronson *et al.* 1979) and
recently to CCD R and I
bands, where spiral galaxies are more transparent and therefore the
intrinsic scatter is reduced to _{m} ~ 0.33 mag,
corresponding to a relative distance error of
= (ln 10/5) _{m} 0.15.

A distance indicator of similar quality for ellipticals
has proved harder to
achieve. Minimum variance, corresponding to = 0.21, was found
for a revised FJ relation
involving three physical quantities:
*D I*^{} ^{} with
*D* the diameter and
*I*
*L / D*^{2} the surface brightness
(Dressler *et al.* 1987;
Djorgovski & Davis 1987).
The parameters were found to be
5/6 and 4/3. By defining from the
photometry a ``diameter'' at a fixed value of
enclosed *I*, termed *D _{n}*, the
relation returns to a simple form,

The physical origin of the scaling relations is not fully
understood, reflecting our limited understanding of galaxy formation.
What matters for the purpose of distance measurements is the mean
empirical relation and its variance. However, one can point at an
important physical difference between the two relations
(Gunn 1989),
which is relevant to the testing for environmental effects
(Section 6.3). The *D _{n}* - relation is naturally
explained by virial equilibrium,

There is some hope for reducing the error in the
TF method to the
~ 10% range by certain modifications, e.g. by
restricting attention to galaxies of normal morphology
(Raychaudhury 1994). The most
accurate to
date uses the estimator based on surface-brightness fluctuations (SBF) in
ellipticals (Tonry 1991), where the standard candle is the luminosity
function of bright stars in the old population. These
stars show up as distance-dependent fluctuations in sensitive
surface-brightness measurements. The technique is being applied
successfully out to ~ 30 h^{-1}Mpc (e.g.
Dressler 1994),
with the improved accuracy of ~ 8%
enabling high-resolution non-linear analysis,
and it can be of great value for LSS if applied at larger
distances. The need to remove sources of unwanted fluctuations such as
globular clusters requires high
resolution observations which could be achieved by HST or adaptive
optics.

The prospects for the future can be evaluated by estimating
the length scale over which LSS dynamics can be studied using a
distance indicator of relative error .
The error in a velocity derived from *N* galaxies at a distance
~ *r* is _{V} ~ *r* / *N*.
Let the mean sampling density be *n*bar.
Let the desired quantity be the mean velocity *V* in
spheres of radius *R*, and assume that its true rms value is
*V*_{20} at *R* = 20 h^{-1}Mpc and *R*^{-(n+1)}
on larger scales,
with *n* the effective power index of the fluctuation spectrum
near *R*. Then the relative error in *V* is

_{V}
* / V* 0.033
(*n*bar / 0.01)^{-1/2} ( / 0.15) (*V*_{20} / 500)^{-1}
(*R* / 20)^{n+1/2} *r / R*,

where distances are measured in h^{-1}Mpc. The observations
indicate that *V*_{20} ~ 500km s^{-1} and *n*
~ -0.5 for *R* = 20 - 60h^{-1}Mpc
(Section 7.1). Thus, with ideal sampling
of *n*bar ~ 0.01,
(h^{-}1Mpc)^{-3}, the relative error is always only a
few percent of *r / R*. This means that LSS motions can in principle
be meaningfully studied at all distances *r* with smoothing
*R* ~ 0.1*r*, as long as *n* ~ -0.5 at the desired
*R*. Since *n* seems to be negative out
to ~ 100 h^{-1}Mpc (Section 7.1),
dense deep TF samples are
potentially useful out to several hundred megaparsecs.
However, several technical difficulties pose a serious challenge
at such distances. For example,
the calibration requires faint cluster galaxies which are harder to
identify, aperture effects become severe,
the spectroscopy capability is limited.

The random scatter in the distance estimator is a source of severe
systematic biases in the inferred distances and peculiar velocities,
which are generally termed ``Malmquist'' biases but should
carefully be distinguished from each other (e.g.
Lynden-Bell *et al.* 1988;
Willick 1994a,
b).

The calibration of the
TF relation is affected
by the *selection bias* (or *calibration bias*).
A magnitude limit in the selection of the sample used for calibration
at a fixed *true* distance (e.g. in a cluster)
tilts the ``forward''
TF regression line of
*M* on
towards bright *M* at small values.
The bias extends to all values of when objects at a large
range of distances are used for the calibration.
This bias is inevitable when the dependent quantity is explicitly involved
in the selection process, and it occurs to a certain extent even in the
``inverse'' relation *(M)* due to existing dependences of the
selection on .
Fortunately, the selection bias can be corrected once the selection
function is known (e.g.
Willick 1991;
1994a).

The TF inferred distance,
*d*, and the mean peculiar velocity at a given
*d*, suffer from an *inferred-distance bias*, which we term
hereafter ``M'' bias. I comment later
(Section 4.4) on a possible way to avoid
the M bias by performing an inverse analysis in *z*-space,
at the expense of a more complicated procedure and other biases.
Here I focus on a statistical way for correcting the M bias
within the simpler forward
TF procedure in *d*-space.
This bias can also be corrected in an inverse
TF analysis in
*d*-space, using the selection function *S (d)* which is in principle
derivable from the sample itself
(Landy & Szalay 1992).

The current POTENT procedure uses the forward
TF relation in *d*-space.
If *M* is distributed normally for a given , with
standard deviation _{m}, then the
TF-inferred distance *d* of
a galaxy at a true distance *r*
is distributed log-normally about *r*, with relative error
0.46_{m}.
Given *d*, the expectation value of *r* is (e.g.
Willick 1991):

*E (r* | *d*) = _{0}^{} *r P(r* | *d*) d*r* / _{0}^{} *P(r* | *d*) d*r* =

_{0}^{} *r*^{3} *n (r)* exp{-[ln(*r* |
*d*)]^{2} / 2^{2}}d*r* } /

_{0}^{} *r*^{2} *n (r)* exp{-[ln(*r* |
*d*)]^{2} / 2^{2}}d*r* },

where *n (r)* is the number density in the underlying distribution from
which galaxies were selected (by quantities that do not explicitly depend
on *r*). The deviation of *E(r* | *d)* from *d*
reflects the bias.
The homogeneous part (HM) arises from the geometry of space -
the inferred distance *d* underestimates *r*
because it is more likely to have been scattered by errors from *r > d*
than from *r < d*, the volume being *r*^{2}.
If *n* = *const*, Equation (10) reduces to
*E (r* | *d)* = *d* exp(3.5 ^{2}), in which the
inferred distances are simply multiplied by a
factor, 8% for = 0.15,
equivalent to changing the zero-point of the
TF relation.
The HM bias has been regularly corrected this way since
Burstein *et al.* (1986).

Fluctuations in *n (r)* are responsible for the inhomogeneous bias
(IM), which is worse because it systematically enhances the inferred
density perturbations and the value of inferred from them. If
*n (r)* is varying slowly with *r*, and if << 1, then Equation (10)
reduces to *E (r* | *d*) = *d* [1 + 3.5^{2} + ^{2} (*d*
ln*n / d* ln*r*)_{r=d}],
showing the dependence on and the gradients of *n (r)*. To illustrate, consider
a lump of galaxies at one point *r* with *u* = 0.
Their inferred distances are randomly scattered to
the foreground and background of *r*. With all galaxies having
the same *z*, the inferred *u* on either side of *r*
mimic a spurious infall towards *r*, which is interpreted dynamically
as a spurious overdensity at *r*.

In the current data for POTENT analysis
(Section 4.2) the IM bias is corrected
in two steps. First, the galaxies are heavily grouped in *z*-space
(Willick *et al.* 1994), reducing the distance error of each group
of *N* members to /
*N* and thus
significantly weakening the bias. Then, the noisy inferred distance of each
object, *d*, is replaced by *E (r* | *d*) (Equation 10),
with an assumed *n (r)* properly corrected for
grouping. This procedure has been tested using realistic mock data from
*N*-body simulations (Kolatt *et al.*, in prep.), showing that
IM bias can be reduced to a few percent. The practical uncertainty
is in *n (r)*, which can be approximated by the high-resolution
density field of IRAS or optical galaxies
(Section 5), or by the recovered
mass-density itself in an iterative procedure under some assumption about
how galaxies trace mass. The second-step correction to
recovered by POTENT is < 20% even at the highest peaks
(Dekel *et al.* 1994).

Several samples of galaxies with
TF or *D _{n}* - measurements have
accumulated in the last decade. Assuming that all galaxies trace the
same underlying velocity field
(Section 6.3), the analysis of large-scale
motions greatly benefits from merging the different samples into one
self-consistent catalog. The observers differ in their selection
procedure, the quantities they measure, the method of measurement and
the TF calibration techniques,
which cause systematic errors and make the
merger non-trivial. The original merged set, compiled by D. Burstein
(Mark II) and used
in the first application of POTENT
(Bertschinger

As carried out by this group, merger of catalogs
involves the following major steps: *(a)* Standardizing the
selection criteria, e.g. rejecting galaxies of high inclination or low
which are
suspected of large errors and sharpening any *z*
cutoff. *(b)* Rederiving a provisional
TF calibration for each data set
using Willick's algorithm (1994a) which simultaneously groups, fits and
corrects for selection bias,
and then verifying that
inverse-TF distances to clusters
are similar to the
forward-TF distances.
*(c)* Starting with one data set,
adding each new set in succession using the galaxies in common to adjust
the TF parameters of the new set
if necessary.
*(d)* Using only one measurement per galaxy even if it was observed by
more than one observer to ensure
well defined errors, and using multiple observations for a ``cluster''
only if the overlap is small (e.g. < 50%).
*(e)* Adding the ellipticals from Mark II, allowing for a slight
zero-point shift (Section 6.3). Such a
careful calibration and
merger procedure is *crucial* for reliable
results - in several cases it produced
TF distances substantially
different from those quoted by the original authors.