I’m just wrapping up an observing run at the IRAM 30m radio telescope in Southern Spain. We have been trying to detect emission from molecules in galaxies undergoing mergers with other galaxies. These molecules are in gas clouds within the galaxies and because they have energy they rotate. Quantum mechanics dictates that they can only rotate at specific rates, with each rate corresponding to a different energy level. When the molecules jump from one rotation rate to another, they either emit or absorb a photon (depending on the difference between the energy levels). Because they can only rotate at specific rates, only specific differences in energy levels are allowed. By understanding the properties of a molecule you can predict the wavelengths of photons that will be emitted as that molecule changes rotation states.
So how do you observe these molecules? Because the photons are only be emitted at specific energies the emission from a molecule will be at a known frequency. For example, the carbon monoxide molecule (CO) emits a photon at 115.27120 GHz when it drops to the lowest rotational energy state from the state just slightly higher in energy. (This is called the J 1-0 transition). The wavelength of this photon is about 3mm, so you need a radio telescope to observe it. Other molecules will emit at different frequencies depending on their structure, but generally they emit in at millimeter wavelengths.
So you point your radio telescope at a galaxy and you might see emission from CO (J 1-0) as a spike at 115.27120 GHz, while the emission at nearby frequencies appears relatively flat. But how do you know it’s a real detection and you’re not just seeing noise in the system that is masquerading as emission from a molecule?
One way to characterize this is via the “signal to noise” ratio. Basically, what is the noise in the measurement and how many times larger than it is the signal you want to measure? The noise is often called “sigma”, giving rise to phrases like “3-sigma” and “5-sigma” (a signal 3 and 5 times the noise, respectively). If you heard the press conference from CERN about the higgs boson, you probably saw similar phrases. Generally, a “3-sigma” result warrants further investigation, but you want a higher level of confidence before claiming a discovery.
For many astronomical observations, you can reduce the noise in your measurement by taking more measurements and combining them. If your signal is real, it should stay, while the noise (which hopefully is random) will average away to zero as you combine more measurements. Typically the noise in your measurement goes down as the inverse of the square root of the length of your measurements. So if you collect data on a galaxy for 1 minute, you can measure some noise level. Now if you stare at that same galaxy for a total of 4 minutes, the noise you measure will be half that of the 1 minute data set.
Because the noise decreases as the square root, you gain a lot by going from 1 minute to 4 minutes (you cut the noise in half), and 4 minutes isn’t much time to spend. And if you go from 1 minute to 1 hour, the noise goes down by almost a factor of 8. But going from 1 minute to 2 hours only gives you a factor of 11 improvement. So at some point you start to experience diminishing returns in your ability to limit the noise.
So we can reduce the noise by observing for longer and combining the measurements. But we’re really interested in the ratio of the signal to the noise. The signal is determined by the physics (e.g., how many molecules there are emitting photons at that frequency), so it stays constant no matter how long you integrate*. Integrating longer reduces the noise while the signal stays constant, so your signal to noise ratio increases.
I have a few plots from the observing run to demonstrate how taking more measurements and adding the data together improves your signal to noise. The specific molecule and name of the galaxy observed have been omitted to protect the guilty parties, but we are expecting to see spikes in the data from molecules emitting at just below 85000 MHz and around 85500 MHz. The x-axis is the frequency of the radio waves being observed and the y-axis is the brightness at that frequency. The y-axis range is the same for all figures, so you can see the noise level changing as we add more data.
This first figure shows data from 1 “scan” (6 minutes of observing). Note there are lots of spikes in the data, but they are somewhat randomly scattered. If you measured the noise, none of those spikes would likely be more than 1 or 2 sigma above the average noise. Another way to look at it is that no prominent spikes stand out at 85000 MHz or 85500 MHz.

Now we observe for another scan (total of 12 minutes observing). You can see the data looks a bit less noisy. There are fewer spikes. By averaging two scans together we’ve reduced the noise by a factor of 1.4 (square root of 2). There’s a hint of spikes at the frequencies we expect, but there are spikes at other frequencies that look somewhat similar. So we can’t be sure we’ve actually detected the two molecules.

So, we collect more data. This time, 4 scans (24 minutes observing). Now our noise is 2x less than the first image. The situation looks better, the spikes at 85000 MHz and 85500 MHz still remain, but the other spikes are diminishing, being averaged out as we add more data.

Now the full data set, 28 scans (168 minutes). Our noise is more than 5 times lower than in a single scan. The signals at 85000 MHz and 85500 MHz is still there and is somewhat obvious. So we can now be confident we’ve detected emission from these two molecules.

Notice how the well-detected features at 85000 MHz and 85500 MHz keep the same amplitude (height) regardless of the number of scans.
But, we also may have detected another molecule, can you see what frequency it’s emitting at?
(* – I’m talking about the rate per second here, not the accumulated number of “events”.)

