Astronomy 302
Lecture 3 & 4
Noise and Error Analysis
1.0 Uncertainty
There will always be sources of noise in any measurement.
there are random errors (cannot quite read the ruler accurately)
and there are systematic errors (ruler has a slightly incorrect scale)
random errors improve with the number of independent observations while
systematic
errors
do not.
Sometimes we can estimate our random error, while it is harder to
estimate our systematic error.
2.0 The Normal or Gaussian Distribution
If we make an infinite number of measurements then we will approach the
correct distribution for our
measurements, and therefore estimate the correct mean value for the
object we are trying to measure.
Great, if you have a lot of time on your hands....
In reality there is a short cut to having to make an infinite number of
measurements!
The central
limit theorem proves that:
"that if the sum of the variables has a finite variance,
then it will be approximately normally distributed."
WOW
That means that to first order a Normal or Gaussian distribution is
what we would obtain if we made an infinite number of
observations of a quantity the same way (finite
variance).
So lets look at the Normalized Gaussian distribution:
where the Probability density

This gives the "standard normal distribution" curve (see below) when
sigma =1.0 and mu=0

Fig 1: from here
shows a Gaussian distribution.
A pure gaussian has some nice characteristics:
Mean = mu
Median = mu
Mode = mu
Standard Deviation = sigma
Variance = sigma2
2.1 The Poisson distribution
It expresses the probability of a number of events occurring
in a fixed
period of time if these events occur with a known average rate, and are
independent of the time since the last event.
It turns out that photons obey Poisson statistics (or counting
statistics or shot noise). The probability that there are exactly
k photons collected in say a pixel is:

where λ (lambda) is a
positive real
number,
equal to the expected number of occurrences that occur during the given
interval.
Example, if the photons arrive
occur on average every 4 seconds
(0.25 Hz) and you are interested in the number of photons collected
in 16 seconds, you would use as model a Poisson distribution with
lambda =λ = 16/4 = 4
From the image below (green
lambda=4 curve) we see that there is a ~20% chance to collect 4 photons
in 16
seconds (rate 0.25 Hz), this drops to ~4% to collect 8 photons (rate
0.5 Hz) in one 16 sec interval.

Fig 2: from here
which shows the probability of collecting photons (k is the x-axis).
A pure Poissonian has some nice characteristics:
Mean = λ (lambda)
or N photons
Standard Deviation (SD)= λ1/2
or Root lambda or root N photons
Variance = λ (lambda)
or N photons
The rather interesting fact that the mean = variance has great
application in astronomy.
Example: What is the signal to
noise (S/N) expected if one collects 100 photons?
S/N = Mean/Standard Deviation
= N/rootN
= square root (N) ---- Very important result!
= 10
that means that if we want to get a S/N=100 result (10x better)
we need to collect 10,000 photons and that takes
100x times longer than a S/N=10 result...
the fact that S/N varies as root(time) drives the development of
bigger telescopes (why?)
3.0 DISCRETE MEASUREMENTS
3.1 The Mean
If we have only n measurements of a value then the arithmetic
mean is:

if we have weights (like by S/N ratios) then we have the weighted mean:

3.2 The Standard Deviation
The Standard Deviation of a population distribution is the measure of
the spread of its values.
The Standard Deviation (sigma) is the square root of the variance.
Like for a gaussian it is sigma.
For a discrete number of measurements:

Typically we try to estimate what sigma is from our measurements. If we
wish to estimate the standard deviation of the population then the
sample (or estimated)standard deviation (s) is used where:

only as N-> infinity does s->sigma
Example: We
measure the distance between 2 stars in 3 different images.
x1=2.8"
x2=3.1"
x3=2.9"
so the mean is 2.933" and s=0.023". This can be written 2.933+/-0.023"
(note that s is a better estimate of width of the distribution than
sigma=0.015" which typically underestimates it)
-- try estimating sigma from one measurement. How does it compare to s?
which is better?
4.0 Propagation of Errors
Now that we've learned a bit about how to calculate uncertainties we
need to understand how they propagate
through a function.
In general if we are interested in the error in z and z=f(w,x,y, ...)
and w, x, y etc. are all independent
variables than the exact solution is (from Vern Lindberg's site):

Now if we evaluate the errors using the simpler average values w,x,y
etc. we can write the above equation as:
equ 1
4.1 Addition
so in the case z=x+y+... or z=x-y-... it is clear that
(assuming x and y are independent):
equ 2
4.2 multiplication
In the case of z=xy or z=x/y
equ 3
Example 1:
What is the percentage error in flux
if we know V = 12.5+/-0.1 mag ?
first recall
how magnitudes work:
m1 - m2 = -2.5*log10(flux1/flux2)
and we could define a zeropoint to the magnitude system (like the star
Vega is assigned a V=0 mag)
therefore:
mag(star) = -2.5*log10(flux(star)/flux(Vega))
(typically Vega has V band flux of 3540 Jy or 3.44e-08 W/m2/um)
so a flux in units of Vega fluxes is:
flux= f(mag)
=10**((mag)/-2.5)
example: if
mag = 10 then we know our star is 10,000x fainter than Vega in that
filter
and a calibrated flux (say in Jy) is given by
flux= f(mag) =10**((mag-zeropoint)/-2.5)
we should note
the
important relation that since m1-m2=-2.5*log10(f1/f2)
then a small pertibation on m say m+epsilon yields:
m - (m-epsilon) ~ -2.5*log10(f/f(1+epsilon))
this works out mathematically since -2.5*log10(1/(1+epsilon)) ~ epsilon
where epsilon is small
(this is exact for epsilon=0.176238185 mag
and a reasonable approx. for 0<epsilon<0.5 mag
then we expand that to see that:
(flux*(1+epsilon))/flux ~ f(mag+epsilon)/f(mag)
(where epsilon is small)
in other words:
if the the error V is +/-0.176 mag then delta FluxV is ~17.6%
(at epsilon=0.5 mag this linear approximation starts to add ~20% error;
whereas at epsilon=0.1 mag there is only a 3% error in this
approximation)
This assumption makes our error analysis much simpler since
delta(fluxV)/fluxV ~ delta(V) in Mag
(for
deltamagV<0.5
mag)
equ 4
Example 2:
we often have to deal with the
calculation of color errors in astronomy (as you will in Project#1)
If V=12.5+/-0.1 mag and
B=13.5+/-0.1 mag what is the color of the star and its errors?
simply this is given by equ 2
as
delta(B-V) =
(delta(V)**2 + delta(B)**2)**0.5
It is interesting to note that B-V "color"
(a subtraction of logarithms)
is the same as the flux ratio between Flux V and Flux B!
B-V = -2.5*log10(fluxB/fluxV)
= -2.5*log10(10**(B/-2.5)/10**(V/-2.5))
so delta(B-V) = delta(fluxB/fluxV)
Now from equ 3 we see delta(fluxB/fluxV) is:
delta(fluxB/fluxV)/(fluxB/fluxV)
= ((delta(fluxV)/fluxV)**2.0 + (delta(fluxB)/fluxB)**2.0)**0.5
now since delta(fluxV)/fluxV ~ delta(V) (for small deltaV's magnitudes)
we can rewrite the above equation as:
delta(fluxB/fluxV)/(fluxB/fluxV)
~ (delta(V)**2
+ delta(B)**2)**0.5 = delta(B-V) !! as you would
expect
4.3 Products of Powers
in the case of z=xmyn
equ 5
5.0 Optimizing S/N
In a very simple example lets assume that we are trying to detect a
star against a "sky" background.
on the CCD pixel (or pixels) that has the star there are N photons
collected
If we move the telescope out of the way and integrate for the same time
we collect Ns photons from the sky.
lets say we do x frames on the source and y frames on the sky
what we want to know is how much time should we spend on the sky and
how much time on the star to optimize S/N?
Here our Signal = xN - yNs (these are the photons from the star)
Here our Noise = (delta(xN)**2.0 + delta(yNs)**2.0)**0.5 = (x**2*N +
y**2*Ns)**0.5 (from equ 2 and Poisson
statistics)
So S/N = (xN - yNs) / (x**2*N + y**2*Ns)**0.5
Now assume we have only 10 x 1 min frames (so x + y =10)
if N >> Ns (bright star case, or photon noise-limited
case)
than S/N ~
(xN)**0.5
and so the optimal S/N is obtained where x=9 and y=1
if N-Ns=epsilon where epsilon<<1 (faint star case, or
background limited case)
we need to minimize the sky
noise as much as possible, but we have to balance this with
as much data on the object as possible....
therefore the best mix is to have x=y=5 frames each!
it is annoying but you must spend 50% of the observing time off the
object to get the best image of it...
Prove this by maximizing of
the S/N of the flux from the star
Flux is photons/sec which is a
rate. So maximize the S/N of Rstar (the rate of photons received or
flux)
assume Rstar = N/t - Ns/ts
where t is the time spent on object+sky and ts=time on sky
note the total observing time
T=t+ts. Show that to maximize the S/N t=ts=T/2.
6.0 Curve fitting
If we have data and a model
6.1 Least squares
If we have series of data (xi,
yi) which we wish to fit with a function f(x) then
we can minimize the S (the sum of the squares of the difference)
at the minimal value of S we have found the best function f(x) to fit
the data...

For example IRAF uses the polyfit task
to perform a weighted least squares fit
for the general function f(x) = a0 + a1*x + a2*x**2 + a3*x**3 +
...
if the order parameter in polyfit
is set to 1 it is line fit, 2=parabolic fit etc...
6.2 Chi squared goodness of fit test
If we have n independent Oi data points and Ei as our expectation
function to fit this data
then the below formula gives us the Chi squared of our fit.

First we have to make sure we know how many true degrees of freedom are
in our data
--- like if we've subtracted the mean off of our Oi data we have only n-1
degrees of freedom.
Then we can use table of chi-squared like the one here
to calculate the probability that the Null Hypothesis (E is such that
(Oi-Ei)2 is
minimized)
For example if we find with 11 independent
data points (where have removed the mean -- loss of one degree)
then we have 10 degrees of freedom
and if we find that chi-squared is less (or equal) than 3.940 then
there is a 95% chance that the Null Hypothesis is
correct and you likely have a good fit. If the chi-squared was less
than 2.558 than it is a very good fit (99% chance
that the Null Hypothesis is correct).
On the other hand... if we had a value of >18.31 then we can be 95%
confident that the Null Hypothesis is WRONG and we have an incorrect
expectation function....
To learn more about the Chi-squared test look here.