Grid Computing

As high-energy physics experiments grow larger in scope, they require more computing power to process and analyze data. Laboratories purchase rooms full of computer nodes for experiments to use. But many experiments need even more capacity during peak periods . And some experiments do not need to use all of their computing power all of the time.

In the early 2000s, members of Fermilab’s Computing Division looked ahead to experiments like those at the Large Hadron Collider, which would collect more data than any computing center in existence could process. They wanted to create a way to optimize use of Fermilab resources for these and current experiments, reduce manpower needed for operations and enable Fermilab to take part in a large consortium grid called Open Science Grid, so they initiated a project known as FermiGrid. During busy times when experiments like CDF and DZero needed more capacity than they had dedicated to them, those experiments could then share capacity that other experiments were not using. Similarly, when CDF, DZero and CMS resources were available, other experiments were able use them.

Grid computing is a form of distributed computing in which multiple clusters of nodes work together to complete tasks. Physicists submit jobs, or computer programs that physicists use to extract physics results from data, to the grid. The grid determines which resources are free and uses those nodes to process the job.

All of Fermilab’s grid resources are organized into a single, local architecture called FermiGrid Gateway. Fermilab physicists can submit jobs to the FermiGrid Gateway, and those jobs may run on Fermilab computers. Or they may run on computers at other institutions through Open Science Grid. Similarly, jobs submitted by researchers at other institutions through Open Science Grid may end up running on Fermilab computers. Fermilab also works with two other grids, TeraGrid and EGEE, to support physicists’ needs .

Grid computing is essential to experiments at the Large Hadron Collider. CERN sends its data to different computing centers, called Tier 1 centers, around the world, which in turn share the data with other centers, called Tier 2 centers, which in turn may share it with even more centers, called Tier 3 centers. Fermilab runs a Tier 1 center, which means that it receives its experimental data directly from CERN. Fermilab is responsible for storing, processing and redistributing a significant portion of the data from the CMS experiment.