[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [IPTA-cs] Bayesian upper limits



Dear Ken / All,

Thanks a lot for your messages.

@Ken, The proposal in your last email ("find the point on the Gmu axis where the likelihood has dropped to 5% of its value in the Gmu ---> 0 limit") sounds like you'd like us to use a likelihood ratio, or the "relative likelihood" [ https://en.wikipedia.org/wiki/Relative_likelihood ], in order to find the upper bound on Gmu. Let me make a few comments on this proposal:

1.) Using the relative likelihood is of course a viable option to determine an upper bound on Gmu, even though it would introduce some frequentist elements in our otherwise Bayesian analysis. We need to think about whether we actually want to do this: mix elements of Bayesian and frequentist statistics in the same paper.

2.) In order to normalize the likelihood ratio, it is common practice to compare the likelihood for some generic parameter values to the likelihood at those parameter values that maximize the likelihood. For us, this means that we should not necessarily compare the likelihood at large values of Gmu with the likelihood in the Gmu ---> 0 limit, but with the maximum likelihood along the Gmu axis, which we expect to be localized somewhere at nonzero Gmu.

3.) The 95% interval that we are interested in does not follow from a factor 0.05 in the likelihood ratio, but roughly a factor 0.1465. To see this, note that (-2) times the natural log of the likelihood ratio follows a chi-squared distribution. Then a likelihood ratio of 0.1465 translates into a chi-squared difference of around -2*ln(0.1465) = 3.841, which corresponds to a p-value of p = 0.05 for a chi-squared distribution for one degree of freedom.

So, in the summary, the upper limit you are proposing could be determined by the following algorithm:

i) Identify the "mode" of the Gmu histogram, i.e., the Gmu value that yields the maximum likelihood.

ii) Find the Gmu value in the histogram (to the right of the peak in the histogram) where the likelihood has dropped by a factor 0.1465.

The advantage of this upper limit would be that it would correspond to a local statement: At the position of the upper bound, the relative likelihood is smaller than the maximum likelihood by factor of 0.1465. For the purposes of this statement, it is not necessary to know anything about the values of the likelihood at the intermediate Gmu values. This needs to be contrasted with the Bayesian credible interval, which follows from integrating the posterior probability density until you reach a value of 0.95 and which hence corresponds to a global statement.

At the same time, we need to keep in mind that the upper bound constructed in the way described above will result in a frequentist confidence interval, which also comes with its own misconceptions. Recall that the 95% frequentist confidence interval is not a statement about which Gmu values are more probable or less probable. This would be the job of the Bayesian credible interval. The statement behind the 95% frequentist confidence interval is the following: Suppose we have got 100 observers in 100 copies of the Universe; and in all 100 copies of the Universe, the true value of Gmu is fixed to one and the same number, say, Gmu = 5 * 10^-12. Now let all 100 observers take data and construct their own confidence intervals in the same way and following the same rules of frequentist statistics. Then, if all 100 observers have done their homework correctly, 95 of them we will have constructed a confidence interval that will include the true value of Gmu and 5 of them will have constructed a confidence interval that does not contain the true value of Gmu, even though they have done everything correctly. Again we have to ask ourselves whether this is exactly the kind of statement we want to make.

Long story short: At the moment, I can imagine some 2^N possible ways of constructing an upper limit on Gmu, where N is a number of binary choices that we have to make;

- Binary choice #1: Prior on A_GWB: uniform or log-uniform

- Binary choice #2: Prior on Gmu: uniform or log-uniform

- Binary choice #3: Bayesian credible interval based on the highest posterior density (just integrate the regions of largest posterior density until the integral returns 0.95) or frequentist confidence interval based on a likelihood ratio of 0.1465

@Andrea, @Tobias, @all, How about we try to compute these 2^3 = 8 numbers for one of our stable cosmic-strings models, so that we can check by how much they actually differ from each other? Just pick e.g. the `stable_c + smbhb` model and look at the eight different possibilities listed above.

On top of this, I could also imagine a fourth binary choice:

- Binary choice #4:  Work only in terms of the 1D posterior for Gmu after marginalizing over A_GWB, or work in terms of the 2D psoterior for Gmu and A_GWB. In the latter case, one would then have to search for a likelihood ratio of ~ 0.0500 in the 2D paramater space spanned by Gmu and A_GWB. This value results in a chi-squared difference of 5.991, which yields a p-value of p = 0.05 for a chi-squared distribution for two degrees of freedom.

Furthermore, instead of marginalizing over A_GWB, one might also construct a profile likelihood for Gmu and so on and so forth. But this is maybe leading a bit far. So, in total, I can imagine 8, or at most 16 different ways of constructing an upper bound on Gmu:

01) Uniform prior on A_GWB, uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 1D distribution of Gmu values

02) Log-uniform prior on A_GWB, uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 1D distribution of Gmu values

03) Uniform prior on A_GWB, log-uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 1D distribution of Gmu values

04) Log-uniform prior on A_GWB, log-uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 1D distribution of Gmu values

05) Uniform prior on A_GWB, uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 1D distribution of Gmu values

06) Log-uniform prior on A_GWB, uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 1D distribution of Gmu values

07) Uniform prior on A_GWB, log-uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 1D distribution of Gmu values

08) Log-uniform prior on A_GWB, log-uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 1D distribution of Gmu values

09) Uniform prior on A_GWB, uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 2D distribution of Gmu and A_GWB values

10) Log-uniform prior on A_GWB, uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 2D distribution of Gmu and A_GWB values

11) Uniform prior on A_GWB, log-uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 2D distribution of Gmu and A_GWB values

12) Log-uniform prior on A_GWB, log-uniform prior on Gmu, Bayesian credible interval based on the highest density posterior values, 2D distribution of Gmu and A_GWB values

13) Uniform prior on A_GWB, uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 2D distribution of Gmu and A_GWB values

14) Log-uniform prior on A_GWB, uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 2D distribution of Gmu and A_GWB values

15) Uniform prior on A_GWB, log-uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 2D distribution of Gmu and A_GWB values

16) Log-uniform prior on A_GWB, log-uniform prior on Gmu, frequentist confindence interval based on the relative likelihood, 2D distribution of Gmu and A_GWB values

In view of these options, my questions for you are:

- How many of these 16 different upper bounds should we calculate explicitly?
- Do you have a favorite among these 16 possibilities?
- Is there still an alternative option that is not contained in the above list?

Let me know what you think!

Best regards, Kai.

On 9/9/22 17:25, Ken Olum via IPTA-cosmic-strings wrote:
Much of what I said in my last message is ameliorated by the use of
uninformative priors.  If we give something proportional to P(D|M) P(M),
but P(M) is a constant, then it is also proportional to P(D|M).  So the
same information is readily accessible.

This lets us know what kind of prior to use.  If M is discrete, we
should use a prior with equal weight on all possibilities.  If M is
continuous, then P(D|M) is a probability density function over M, and we
should use a constant function, i.e., a uniform prior, in the parameters
of M.  If the parameter is, say, log10_A_GWB, we should use a uniform
prior in log10_A_GWB, which is a log-uniform prior in A_GWB.  (If the
parameter is A_GWB, we should use a uniform prior in that.  But if we
are then going to give a graph whose horizontal axis is log A_GWB, we
should rescale the density function so that it corresponds to the axis,
producing the same result as having used a logarithmic parameter.)

So nothing I said is very important for presenting posteriors, except to
say that you should always use an uninformative prior even if you have
more information.  But there is still the question of upper limits.
Suppose we give P(D|logGmu) or equivalently P(D|logGmu) P(logGmu).  I
continue to think that the interesting number is the point at which this
quantity drops to 5% of its value at Gmu=0, not the value that encloses
95% of the probability.  The latter depends on the lower cutoff on
logGmu.  The former does not.

                                        Ken