Large Scale Structure Observations

4. GALAXY SURVEY BASICS

In this section we explore some of the practical ideas and methods used to analyse a galaxy survey and make cosmological measurements. The clustering signal is a 3-dimensional signal, so ideally we wish to know the distribution of matter (or galaxies) in 3D. However, while the angular positions of galaxies are in general easy to measure, the radial distances are not so easy. In order to estimate distances to galaxies, large surveys typically use the spectrum of light emitted by galaxies, including absorption and emission lines and more general features, all resulting from the integrated stellar light. By comparing these features to rest-frame models, the redshifts of the galaxies can be estimated and, using the Hubble expansion rate, their distances.

It is possible to fit observed broad-band colours with templates or training samples of emitted light profiles, and thus estimate galaxy recession velocities. These "photometric redshifts" - redshifts estimated from broad-band colours only - vary in quality between galaxy samples. For example, they can have offsets from the true redshifts with standard deviations of ~ 0.03(1 + z) for red galaxies with strong 4000 Å breaks (Ross et al. 2011), while more general populations can give estimates with sigma _z ~ 0.05(1 + z). The level of precision depends on the photometric quality of the data used, the number of bands, etc. If spectra can be obtained for galaxies, then absorption and emission lines can be used to estimate redshifts with typical errors of ~ 0.001...0.0001(1 + z), allowing clustering along the line-of-sight to be more accurately measured.

4.1. Overview of galaxy surveys

Recent advances in galaxy surveys have largely been driven by new instrumentation that enables multiple galaxy spectra to be obtained simultaneously. Using these multi-object spectrographs (MOS), a number of survey teams have created maps of many hundreds of thousands or millions of galaxies. Key wide-field facilities include the AA Omega instrument on the Anglo-Australian Telescope, which has been used to conduct the 2-degree Field Galaxy Redshift Survey (2dFGRS; Colless et al. 2003) and Wigglez (Drinkwater et al. 2010) surveys, and the Sloan Telescope, which has been used for the Sloan Digital Sky Survey (SDSS; York et al. 2000).

In following sections we will use results from the Baryon Oscillation Spectroscopic Survey (BOSS Dawson et al. 2013), part of the SDSS-III project (EEisenstein et al. 2011) to demonstrate possible measurements. BOSS is primarily a spectroscopic survey, which is designed to obtain spectra and derive redshifts from them for 1.2 million galaxies over an extragalactic footprint covering 10000 square degrees. 1000 spectra are simultaneously recovered in each observation, using aluminium plates with drilled holes to locate manually plugged optical fibres that feed a pair of double spectrographs. Each observation is performed in a series of 900-second exposures, integrating until a minimum signal-to-noise ratio is achieved for the faint galaxy targets. The resulting data set has high redshift completeness > 97 per cent over the full survey footprint.

4.2. Measuring over-densities

As described in Section 1.1, we need to translate from an observed galaxy density to a dimensionless over-density. For a galaxy survey, this means understanding where you could have observed galaxies (called the survey mask), not just where galaxies were observed. For current surveys, the mask can be split into independent radial and angular components, where the angular component of the mask includes the varying completeness and purity of observations and the radial component depends on the galaxy selection criteria. For a magnitude-limited catalogue, the radial distribution can be determined by integrating under a model luminosity function (ColeCole 2011), while for a colour selected sample, a fit to the galaxy distribution is usually performed (Anderson et al. 2012). The mask is commonly quantified by means of a random catalogue - a catalogue of spatial positions that Poisson sample the expected density bar{rho} (see Eq. 1), as outlined by the mask. This catalogue has the same spatial selection function as the galaxies but no clustering.

If the correct mask has been used, the estimate of delta _g(x) is unbiased, but has spatially varying noise, and we only have estimates within the volume covered by the survey. In order to optimally measure the amplitude of clustering, we need to apply weights. If we assume that the galaxy field forms a Poisson sampling of a field with expected power spectrum bar{P} (k), then the optimal weights to use are

(25)

where bar{n} (x) is the expected density of galaxies (Feldman et al. 1994). The dependence on both bar{n} (x) and bar{P} (k) in these weights balances the shot-noise and sample-variance components of the measurement error. For multiple samples of galaxies covering the same volume, but with different expected clustering strengths (each sample is given a subscript i), the optimal weights (Percival et al. 2004) to apply are

(26)

These weights provide the same balance between shot-noise and sample variance as Eq. 25, but now also balance this against the clustering strength of each sample. For example, galaxies with high clustering strength are up-weighted as these contain higher signal-to-noise. These weighting schemes assume that galaxies Poisson sample the underlying matter field, but this is only an approximation. In fact, galaxies live in matter haloes, and haloes do not Poisson sample the mass in the Universe: on large-scales the matter has a distribution with sub-Poisson variance. Thus applying a weighting based on the mass of the haloes within which the galaxies reside can reduce the shot noise in surveys (Seljak et al. 2009).

4.3. Measuring the power spectrum

If we only wish to measure the isotropically averaged power spectrum, then it is possible to simply measure this directly from a Fourier transform of the over-density field (Feldman et al. 1994). Suppose that we have a galaxy catalogue with density n_g(x), and a random catalogue describing the mask with density n_r(x) containing α times as many objects. Following Feldman et al. (1994), we start from the un-normalised over-density field

(27)

Taking the Fourier transform of this field, and calculating the power gives

(28)

where G(k) is the Fourier transform of the window function, defined by

(29)

and the final term in Eq. 28 gives the shot noise. We see that the effect of the survey mask, which is multiplicative in real-space (think of seeing the Universe through a window only revealing the patch observed), becomes a convolution in Fourier-space. There is a shot noise contribution on all scales, which can be subtracted from the power spectrum estimates, although it still contributes to the error. The fact that we have to estimate the mean density of galaxies from the sample itself means that the k = 0 mode is artificially set to zero, equivalent to subtracting a single Dirac delta function from the centre of the un-convolved power. The equivalent correction for correlation function estimates is commonly know as the "Integral constraint"

The comoving power spectrum, as measured from the Data Release 9 (DR9) BOSS data, using the Feldman et al. (1994) technique is shown in Fig. 3. The inset shows the ratio of the measured power to a smooth fit, isolating the signature of baryons (see Section 2.3), which is clearly visible (Anderson et al. 2012).

Figure 3. The spherically averaged BOSS DR9 power spectrum for the CMASS galaxy sample (solid circles with 1 sigma errors), calculated using the method of Feldman et al. (1994). The vertical dotted lines show the range of scales fitted (0.02 < k < 0.3 h Mpc^-1), and the inset shows the BAO within this k-range, determined by dividing both model and data by the smooth best-fit model. Plot from Anderson et al. (2012).

We cannot easily measure the anisotropic power spectrum as a function of µ (see Section 1.5) using Fourier methods because the line-of-sight varies across a survey, and does not line-up with a Cartesian grid. Thus, to measure the anisotropic power spectrum, one either needs to manually perform the transform, so a different line-of-sight can be used for each galaxy pair (Yamamoto et al. 2006), or decompose into a basis that is itself separable along and across the spatially varying line-of-sight (e.g. Heavens and Taylor 1995). As surveys increase in size and statistical errors reduce, methods such as these will become increasingly important.

4.4. Measuring the correlation function

In order to estimate the correlation function, we can directly make use of the idea of "throwing down sticks" described in Section 1.2. Suppose that the mask is quantified by a random catalogue as in Section 4.3, then we can form an estimate of the correlation function bar{xi}

(30)

where DD(r) is the number of galaxy-galaxy pairs within a bin with centre r normalised to the maximum possible number of galaxy-galaxy pairs (ie. for n galaxies the maximum number of distinct pairs is n(n - 1) / 2). RR(r) is the normalised number of random-random pairs, and we can similarly define DR(r) as the normalised number of galaxy-random pairs.

This estimate of (r) is biased and we find that

(31)

where (r) is the mean of the two-point correlation function over the mask (Landy and Szalay 1993): this corrects for the fact that the true mean density of galaxies bar{n} (r) is estimated from the sample itself, normalising the total number of pairs. For small samples, the factor [1 + (r)], called the "integral constraint", limits the information that can be extracted from bar{xi} (r) - in the limit where we only observe pairs of a single separation, ≃ , and bar{xi} (r) contains no information.

Because the galaxy and random catalogues are uncorrelated, <DR(r)> = < RR(r)>, and we can consider a number of alternatives to Eq. 30. In particular

(32)

has been shown to have good statistical properties (Landy and Szalay 1993), although Eq. 31 still holds for this estimator (ie. it still "suffers" from the integral constraint). Pair counting methods can be trivially extended to bin pairs in separation and angle to the line-of-sight, giving estimates of the anisotropic correlation function. It is also possible to directly integrate over µ, weighted by a Legendre polynomial to give estimates of the multipoles, or by top-hat windows in µ (called "Wedges"; Kazin et al. 2012). These moments of the correlation function provide a mechanism to compress the anisotropic correlation function while still retaining the majority of the available information.

An example correlation function, calculated from the BOSS DR9 CMASS sample (Anderson et al. 2012) is given in Fig. 4. The BAO "bump" caused by the physics described in Section 2.3 is clearly visible. On large scales, the correlation function reduces in amplitude (note that r² is plotted, enhancing the appearance of the large-scale noise), showing that the galaxies are evenly distributed. In fact, conservation of pair number means that the correlation function changes sign on very large scales.

Figure 4. The spherically averaged BOSS DR9 correlation function for the CMASS galaxy sample (solid circles with 1 sigma errors), calculated using the method of Landy and Szalay (1993). Plot from Anderson et al. (2012).

4.5. Reconstructing the linear density

In an evolved density field, pairs of over-densities initially separated by the BAO scale are subject to bulk motions that "smear" the initial pair separations. The matter flows and peculiar velocities act on intermediate scales (~ 20 Mpc h^-1), and therefore suppress small-scale oscillations in the matter power spectrum and smooth the BAO feature in the correlation function (Eisenstein et al. 2007b, Crocce and Scoccimarro 2008, Matsubara 2008b, Matsubara 2008a). As galaxies trace the matter density field, the BAO in galaxy surveys are also damped. Eisenstein et al. (2007a) uggested that this smoothing can be reversed, in effect using the phase information within the density field to reconstruct linear behaviour. Although not a new idea (e.g. Peebles 1989, Peebles 1990, Nusser and Dekel 1992, Gramann 1993), the dramatic effect on BAO recovery had not been previously realised, and the majority of the benefit was shown to be recovered from a simple reconstruction prescription.

"Reconstruction" has been used to sharpen the BAO feature and improve distance constraints on mock data (Padmanabhan and White 2009, Noh et al. 2009, SSeo et al. 2010, Mehta et al. 2011), and was recently applied to the SDSS-II Luminous Red Galaxy (LRG) sample (Padmanabhan et al. 2012). The reconstruction was particularly effective in this case, providing a 1.9 per cent distance measurement at z = 0.35, decreasing the error by a factor of 1.7 compared with the pre-reconstruction measurement. On the BOSS DR9 CMASS sample, however, reconstruction was not as effective, and did not significantly reduce the error on the BAO scale measurement. Anderson et al. (2012) showed, using mock catalogues, that this is expected, and that the effectiveness of reconstruction varies across mocks. The average effect of reconstruction on the BOSS DR9 CMASS mocks is shown in Fig. 5.

Figure 5. The ratio between recovered power spectrum and smooth fit for the 600 BOSS DR9 CMASS catalogues, before and after applying a simple reconstruction algorithm. We see that, on average, reconstruction acts to improve the BAO signal on small scales. Plot from Anderson et al. (2012).

4.6. Ly-α forest surveys

Light from distant quasars passes through intervening clouds of gas that absorb ultraviolet light predominantly at the wavelength of the Lyman alpha transition in neutral hydrogen (122 nm). The absorbing clouds are all at lower redshift than the quasar, so absorption lines are on the shorter wavelength side of the quasar emission line. Thus, if we divide the quasar spectrum by the expected continuum for each quasar, we are left with a 1D absorber over-density profile, which can be used to estimate the matter over-density field along the line-of-sight in much the same way that galaxies can be used directly (in fact the link between absorber density and matter density is more complicated than between galaxies and the mass). The great thing about using the Ly-α forest is that a single spectrum gives information about multiple structures along the line of sight, and thus this method provides a highly efficient use of telescopes. Using quasars with z > 2.15, the BOSS has made BAO measurements using Lyman α forest in quasar spectra, extending BAO measurements into the redshift range 2 < z < 3.5 (Busca et al. 2013, Slosar et al. 2013, Kirkby et al. 2013).