Tendency of nebulae to cluster has been ddiscovered by Charles Messier and William Herschel, who have constructed the first systematic catalogs of these objects. This tendency has become more apparent as larger and larger samples of galaxies were compiled in the 19th and early 20th centuries. Studies of the most prominent concentrations of nebulae, the clusters of galaxies, were revolutionized in the 1920s by Edwin Hubble's proof that spiral and elliptical nebulae were bona fide galaxies like the Milky Way located at large distances from us (Hubble 1925, 1926), which implied that clusters of galaxies are systems of enormous size. Just a few years later, measurements of galaxy velocities in regions of clusters made by Hubble & Humason (1931) and assumption of the virial equilibrium of galaxy motions were used to show that the total gravitating cluster masses for the Coma (Zwicky 1933, and see also Zwicky 1937) and Virgo clusters (Smith 1936) were enormous as well.
The masses implied by the measured velocity dispersions were found to exceed combined mass of all the stars in clusters galaxies by factors of ~ 200-400, which prompted Zwicky to postulate the existence of large amounts of "dark matter" (DM), inventing this widely used term in the process. Although the evidence for dark matter in clusters was disputed in the subsequent decades, as it was realized that stellar masses of galaxies were underestimated in the early studies, dark matter was ultimately confirmed by the discovery of extended hot intracluster medium (ICM) emitting at X-ray energies by thermal bremsstrahlung that was found to be smoothly filling intergalactic space within the Coma cluster (Gursky et al. 1971, Meekins et al. 1971, Kellogg et al. 1972, Forman et al. 1972, Cavaliere, Gursky & Tucker 1971). The X-ray emission of the ICM has not only provided a part of the missing mass (as was conjectured on theoretical grounds by Limber 1959, van Albada 1960), but also allows the detection of clusters out to z > 1 (Rosati, Borgani & Norman 2002). Furthermore, measurement of the ICM temperature has provided an independent confirmation that the depth of gravitational potential of clusters requires additional dark component. It was also quickly realized that inverse Compton scattering of the cosmic microwave background (CMB) photons off thermal electrons of the hot intergalactic plasma should lead to distortions in the CMB spectrum, equivalent to black body temperature variations of about 10-4 - 10-5 [the Sunyaev-Zel'dovich (SZ) effect; Sunyaev & Zeldovich 1970, 1972b, Sunyaev & Zeldovich 1980]. This effect has now been measured in hundreds of clusters (e.g., Carlstrom, Holder & Reese 2002).
Given such remarkable properties, it is no surprise that the quest to understand the formation and evolution of galaxy clusters has become one of the central efforts in modern astrophysics over the past several decades. Early pioneering models of collapse of initial density fluctuations in the expanding Universe have shown that systems resembling the Coma cluster can indeed form (van Albada 1960, van Albada 1961, Peebles 1970, White 1976). Gott & Gunn (1971, see also Sunyaev & Zeldovich 1972a) showed that hot gas observed in the Coma via X-ray observations can be explained within such a collapse scenario by heating of the infalling gas by the strong accretion shocks. Subsequently, emergence of the hierarchical model of structure formation (Press & Schechter 1974, Gott & Rees 1975, White & Rees 1978), combined with the cold dark matter (CDM) cosmological scenario (Bond, Szalay & Turner 1982, Blumenthal et al. 1984), provided a powerful framework for interpretation of the multi-wavelength cluster observations. At the same time, rapid advances in computing power and new, efficient numerical algorithms have allowed fully three-dimensional ab initio numerical calculations of cluster formation within self-consistent cosmological context in both dissipationless regime (Klypin & Shandarin 1983, Efstathiou et al. 1985) and including dissipational baryonic component (Evrard 1988, Evrard 1990).
In the past two decades, theoretical studies of cluster formation have blossomed into a vibrant and mature scientific field. As we detail in the subsequent sections, the standard scenario of cluster formation has emerged and theoretical studies have identified the most important processes that shape the observed properties of clusters and their evolution, which has enabled usage of clusters as powerful cosmological probes (see, e.g., Allen, Evrard & Mantz 2011 for a recent review). At the same time, observations of clusters at different redshifts have highlighted several key discrepancies between models and observations, which are particularly salient in the central regions (cores) of clusters.
In the current paradigm of structure formation clusters are thought to form via an hierarchical sequence of mergers and accretion of smaller systems driven by gravity and DM that dominates the gravitational field. Theoretical models of clusters employ a variety of techniques determined by a particular aspect of cluster formation they aim to understand. Many of the bulk properties of clusters are thought to be determined solely by the initial conditions, dissipationless DM that dominates cluster mass budget, and gravity. Thus, cluster formation is often approximated in models as DM-driven dissipationless collapse from cosmological initial conditions in an expanding Universe. Such models are quite successful in predicting the existence and functional form of correlations between cluster properties, as well as their abundance and clustering, as we discuss in detail in Section 3. One of the most remarkable models of this kind is a simple self-similar model of clusters (Kaiser 1986, see Section 3.9 below). Despite its simplicity, the predictions of this model are quite close to results of observations and have, in fact, been quite useful in providing baseline expectations for evolution of cluster scaling relations. Studies of abundance and spatial distribution of clusters using dissipationless cosmological simulations show that these statistics retain remarkable memory of the initial conditions.
The full description of cluster formation requires detailed modeling of the non-linear processes of collapse and the dissipative physics of baryons. The gas is heated to high, X-ray emitting temperatures by adiabatic compression and shocks during collapse and settles in hydrostatic equilibrium within the cluster potential well. Once the gas is sufficiently dense, it cools, the process that can feed both star formation and accretion onto supermassive black holes (SMBHs) harbored by the massive cluster galaxies. The process of cooling and formation of stars and SMBHs can then result in energetic feedback due to supernovae (SNe) or active galactic nuclei (AGN), which can inject substantial amounts of heat into the ICM and spread heavy elements throughout the cluster volume.
Galaxy clusters are therefore veritable crossroads of astrophysics and cosmology: While abundance and spatial distribution of clusters bear indelible imprints of the background cosmology, gravity law, and initial conditions, the nearly closed-box nature of deep cluster potentials makes them ideal laboratories to study processes operating during galaxy formation and their effects on the surrounding intergalactic medium.
In this review we discuss the main developments and results in the quest to understand the formation and evolution of galaxy clusters. Given the limited space available for this review and the vast amount of literature and research directions related to galaxy clusters, we have no choice but to limit the focus of our review, as well as the number of cited studies. Specifically, we focus on the most basic and well-established elements of the standard paradigm of DM-driven hierarchical structure formation within the framework of CDM cosmology as it pertains to galaxy clusters. We focus mainly on the theoretical predictions of the properties of the total cluster mass distribution and properties of the hot intracluster gas, and only briefly discuss results pertaining to the evolution of stellar component of clusters, understanding of which is still very much a work in progress. Comparing model predictions to real clusters, we mostly focus on comparisons with X-ray observations, which have provided the bulk of our knowledge of ICM properties so far. In Section 5, we briefly discuss the differences in formation of clusters in models with the non-Gaussian initial conditions and modified gravity. Specifically, we focus on the information that statistics sensitive to the cluster formation process, such as cluster abundance and clustering, can provide about the primordial non-Gaussianity and possible deviations of gravity from General Relativity. We refer readers to recent extensive reviews on cosmological uses of galaxy clusters by Allen, Evrard & Mantz (2011) and Weinberg et al. (2012) for a more extensive discussion of this topic.