[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [IPTA-cs] Bayesian upper limits



Hi, all.  It seems to me there is a more general problem about the
presentation of Bayesian results.  Maybe this should be discussed in
some different forum, but let me at least start here.

I think the real question is what the reader of a Bayesian analysis
wants to learn.  First consider the case were there are no nuisance
parameters.  There are some model parameters M, and we have made an
observation yielding some data D.  We are going to present it in a
paper.

I say the reader wants to know (something proportional to) the
likelihood function P(D|M).  This will tell them what our observations
have to say about the model parameters M.  But this isn't what we do.
Instead we pick some prior P(M) and tell the reader the posterior 
P(M|D) propto P(D|M) P(M).  I don't why we do that.  Why should the
reader care about the prior we decided to use?  They may have
their own ideas about what model parameters are likely.

Now consider the more realistic case where there are some nuisance
parameters N.  (The distinction between interesting and nuisance is in
the eye of the beholder.  If you want to measure the GWB amplitude, then
A_GWB is an interesting parameter and intrinsic pulsar noises are
nuisances.  But if want to measure the cosmic string energy scale, then
A_GWB is a nuisance parameter.)  The reader certainly does not want to
know the likelihood P(N,M|D), because they don't care about N.  We
usually present the posterior P(M|D) propto integral dN P(D|M,N) P(N) 
P(M). 
But wouldn't it make more sense to present integral dN P(D|M,N) P(N)
instead?  This uses the prior for the nuisance parameters, and
that's right.  Suppose the reader is interested in A_GWB.  In order to
learn something about that, we have to make some guesses about
intrinsic pulsar noises and integrate over them.  The reader does not
have any opinion about these guesses and hopes that we will just do our
best to tell what our observations mean given this uncertainty.  But it
does not use the prior on A_GWB, about which the reader is entitled
to their own opinion.

The same idea informs my feeling about upper limits.  Suppose we are
considering GWB plus strings.  Then M is just Gmu.  We should give the
reader P(D|Gmu).  What does this function look like?  For very small
Gmu, it is just the likelihood for the model without strings, so it is a
constant as Gmu->0.  For larger Gmu, it may rise if the string spectrum
fits the data, but it will definitely fall when Gmu is too large.
Probably it will fall monotonically.  An interesting Gmu is the one
where P(D|Gmu) = 0.05 P(D|0).  That means that strings at this scale
make our actual observations 20 times less likely than no strings.  I
would say our observations rule out that Gmu or larger at the 95% level.

What do you think?  Am I missing something?

                                        Ken