Astronomy 302

Lecture 3 & 4

Noise and Error Analysis




1.0 Uncertainty

There will always be sources of noise in any measurement.

there are random errors (cannot quite read the ruler accurately)
and there are systematic errors (ruler has a slightly incorrect scale)

random errors improve with the number of independent observations while systematic errors
do not.

Sometimes we can estimate our random error, while it is harder to estimate our systematic error.

2.0 The Normal or Gaussian Distribution


If we make an infinite number of measurements then we will approach the correct distribution for our
measurements, and therefore estimate the correct mean value for the object we are trying to measure.

Great, if you have a lot of time on your hands....

In reality there is a short cut to having to make an infinite number of measurements!

The central limit theorem proves that:

"that if the sum of the variables has a finite variance, then it will be approximately normally distributed."

WOW

That means that to first order a Normal or Gaussian distribution is what we would obtain if we made  an infinite number of observations of a quantity the same way (finite variance).

So lets look at the Normalized Gaussian distribution:

where the Probability density


This gives the "standard normal distribution" curve (see below) when sigma =1.0 and mu=0



Fig 1: from here shows a Gaussian distribution.

A pure gaussian has some nice characteristics:

Mean = mu
Median = mu
Mode = mu
Standard Deviation = sigma
Variance = sigma2


2.1 The Poisson distribution

It expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate, and are independent of the time since the last event.

It turns out that photons obey Poisson statistics (or counting statistics or shot noise). The probability that there are exactly
k photons collected in say a pixel is:



where
λ (lambda) is a positive real number, equal to the expected number of occurrences that occur during the given interval.

Example, if the photons arrive occur on average every 4 seconds (0.25 Hz) and you are interested in the number of photons collected in  16 seconds, you would use as model a Poisson distribution with lambda =λ = 16/4 = 4
From the image below (green lambda=4 curve) we see that there is a ~20% chance to collect 4 photons in 16 seconds (rate 0.25 Hz), this drops to ~4% to collect 8 photons (rate 0.5 Hz) in one 16 sec interval.

Fig 2: from here which shows the probability of collecting photons (k is the x-axis).

A pure Poissonian has some nice characteristics:

Mean =
λ   (lambda)                                 or N photons
Standard Deviation (SD)=
λ1/2       or Root lambda or root N photons
Variance =
λ  (lambda)                             or N photons

The rather interesting fact that the mean = variance has great application in astronomy.

Example: What is the signal to noise (S/N) expected if one collects 100 photons?
S/N = Mean/Standard Deviation
= N/rootN
= square root (N)  ---- Very important result!
= 10

that means that if we want to get a S/N=100 result (10x better)
we need to collect 10,000 photons and that takes
100x times longer than a S/N=10 result...
the fact that S/N varies as root(time)  drives the development of bigger telescopes (why?)

3.0 DISCRETE MEASUREMENTS


3.1 The Mean

If we have only n measurements of a value then the arithmetic mean is:


if we have weights (like by S/N ratios) then we have the weighted mean:



3.2 The Standard Deviation


The Standard Deviation of a population distribution is the measure of the spread of its values.
The Standard Deviation (sigma) is the square root of the variance.

Like for a gaussian it is sigma.

For a discrete number of measurements:




Typically we try to estimate what sigma is from our measurements. If we wish to estimate the standard deviation of the population then the
sample (or estimated)standard deviation (s) is used where:



only as N-> infinity does s->sigma

Example: We measure the distance between 2 stars in 3 different images.
x1=2.8"
x2=3.1"
x3=2.9"
so the mean is 2.933" and s=0.023". This can be written 2.933+/-0.023"
(note that s is a better estimate of width of the distribution than sigma=0.015" which typically underestimates it)
-- try estimating sigma from one measurement. How does it compare to s? which is better?


4.0 Propagation of Errors


Now that we've learned a bit about how to calculate uncertainties we need to understand how they propagate
through a function.

In general if we are interested in the error in z and z=f(w,x,y, ...) and w, x, y etc. are all independent variables than the exact solution is (from Vern Lindberg's site): 



Now if we evaluate the errors using the simpler average values w,x,y etc. we can write the above equation as:

equ 1

4.1 Addition

so in the case z=x+y+... or z=x-y-... it is clear that (assuming x and y are independent):

equ 2


4.2 multiplication

In the case of z=xy or z=x/y
equ 3


Example 1:
 What is the percentage error in flux if we know V = 12.5+/-0.1 mag ?

first recall how magnitudes work:

m1 - m2 = -2.5*log10(flux1/flux2)

and we could define a zeropoint to the magnitude system (like the star Vega is assigned a V=0 mag)
therefore:

mag(star) = -2.5*log10(flux(star)/flux(Vega))   
 (typically Vega has V band flux of 3540 Jy or 3.44e-08 W/m2/um)

so a flux in units of Vega fluxes is:
flux= f(mag) =10**((mag)/-2.5)

example: if mag = 10 then we know our star is 10,000x fainter than Vega in that filter

and a calibrated flux (say in Jy) is given by
flux= f(mag) =10**((mag-zeropoint)/-2.5)

we should note the
important relation that since m1-m2=-2.5*log10(f1/f2)
then a small pertibation on m say m+epsilon yields:
m - (m-epsilon) ~ -2.5*log10(f/f(1+epsilon))
this works out mathematically since -2.5*log10(1/(1+epsilon)) ~ epsilon where epsilon is small
(this is exact for epsilon=0.176238185 mag
and a reasonable approx. for 0<epsilon<0.5 mag
then we expand that to see that:
(flux*(1+epsilon))/flux ~ f(mag+epsilon)/f(mag)     (where epsilon is small)

in other words:
 if the the error V is +/-0.176 mag then delta FluxV is ~17.6%

(at epsilon=0.5 mag this linear approximation starts to add ~20% error;
whereas at epsilon=0.1 mag there is only a 3% error in this approximation)

This assumption makes our error analysis much simpler since

delta(fluxV)/fluxV ~ delta(V)  in Mag   
(for deltamagV<0.5 mag)                                         equ 4 





Example 2:
we often have to deal with the calculation of color errors in astronomy (as you will in Project#1)
If V=12.5+/-0.1 mag and B=13.5+/-0.1 mag what is the color of the star and its errors?
 
simply this is given by equ 2 as

delta(B-V) = (delta(V)**2 + delta(B)**2)**0.5

It is interesting to note that B-V "color"
(a subtraction of logarithms)
is the same as the flux ratio between Flux V and Flux B!

B-V =
-2.5*log10(fluxB/fluxV) = -2.5*log10(10**(B/-2.5)/10**(V/-2.5))

so delta(B-V) = delta(fluxB/fluxV)

Now from equ 3 we see delta(fluxB/fluxV) is:
delta(fluxB/fluxV)/(fluxB/fluxV) = ((delta(fluxV)/fluxV)**2.0 + (delta(fluxB)/fluxB)**2.0)**0.5

now since delta(fluxV)/fluxV ~ delta(V) (for small deltaV's magnitudes) we can rewrite the above equation as:
delta(fluxB/fluxV)/(fluxB/fluxV) ~ (delta(V)**2 + delta(B)**2)**0.5 = delta(B-V) !! as you would expect


 

4.3 Products of Powers

in the case of z=xmyn
equ 5


5.0 Optimizing S/N


In a very simple example lets assume that we are trying to detect a star against a "sky" background.

on the CCD pixel (or pixels) that has the star there are N photons collected
If we move the telescope out of the way and integrate for the same time we collect Ns photons from the sky.

lets say we do x frames on the source and y frames on the sky

what we want to know is how much time should we spend on the sky and how much time on the star to optimize S/N?

Here our Signal = xN - yNs (these are the photons from the star)
Here our Noise = (delta(xN)**2.0 + delta(yNs)**2.0)**0.5 = (x**2*N + y**2*Ns)**0.5      (from equ 2 and Poisson statistics)

So S/N = (xN - yNs) /
(x**2*N + y**2*Ns)**0.5

Now assume we have only 10 x 1 min frames (so x + y =10)

if N >> Ns (bright star case, or photon noise-limited case)

 than S/N ~  (xN)**0.5

and so the optimal S/N is obtained where x=9 and y=1

if N-Ns=epsilon where epsilon<<1 (faint star case, or background limited case)
 
we need to minimize the sky noise as much as possible, but we have to balance this with
as much data on the object as possible....

therefore the best mix is to have x=y=5 frames each!
it is annoying but you must spend 50% of the observing time off the object to get the best image of it...

Prove this by maximizing of the S/N of the flux from the star
Flux is photons/sec which is a rate. So maximize the S/N of Rstar (the rate of photons received or flux)
assume Rstar = N/t - Ns/ts where t is the time spent on object+sky and ts=time on sky
note the total observing time T=t+ts. Show that to maximize the S/N t=ts=T/2.

6.0 Curve fitting


If we have data and a model

6.1 Least squares


If we have series of data (xi, yi) which we wish to fit with a function f(x) then
we can minimize the S (the sum of the squares of the difference)
at the minimal value of S we have found the best function f(x) to fit the data...


For example IRAF uses the polyfit task to perform a weighted least squares fit
for the general function  f(x) = a0 + a1*x + a2*x**2 + a3*x**3 + ...
if the order parameter in polyfit is set to 1 it is line fit, 2=parabolic fit etc...



6.2 Chi squared goodness of fit test


If we have n independent Oi data points and Ei as our expectation function to fit this data
then the below formula gives us the Chi squared of our fit.



First we have to make sure we know how many true degrees of freedom are in our data
--- like if we've subtracted the mean off of our Oi data we have only n-1 degrees of freedom.

Then we can use table of chi-squared like the one here
to calculate the probability that the Null Hypothesis (E is such that (Oi-Ei)2  is minimized)

For example if we find with 11 independent data points (where have removed the mean -- loss of one degree)
then we have 10 degrees of freedom
and if we find that chi-squared is less (or equal) than 3.940 then there is a 95% chance that the Null Hypothesis is
correct and you likely have a good fit. If the chi-squared was less than 2.558 than it is a very good fit (99% chance
that
the Null Hypothesis is correct).

On the other hand... if we had a value of >18.31 then we can be 95% confident that the Null Hypothesis is WRONG and we have an incorrect expectation function....

To learn more about the Chi-squared test look here.