What is the reasoning behind the "anthropic shadow" effect?

Suppose that every million years on the dot, some catastrophic event either happens or does not happen with probability P or (1-P) respectively. Suppose that if the event happens at one of these times, it destroys all life, permanently, with probability Q. Suppose that Q is known, but P is not, and we initially adopt a prior for it which is uniform between 0 and 1.

Given a perfect historical record of when the event has or has not occurred, we could update our prior for P based on this evidence to obtain a posterior for P which will be sharply peaked at (# of times event has occurred) / (# of times event could have occurred). I will refer to this as the “naive estimate”.

In this paper, the naive estimate is argued to be wrong because of an effect called “anthropic shadow”. In particular, it is supposed to be an underestimate. My understanding of the argument is the following: if you pick a fixed value of P and simulate history a large number of times, then in the cases where an observer like us evolves, the observer’s calculation of (# of times event has occurred) / (# of times event could have occurred) will on average be significantly below the true value of P. This is because observers are more likely to evolve after periods of unusually low catastrophic activity. In making this argument, they take a frequentist approach for the estimation of P (P is taken to be a fixed unknown parameter rather than a random variable with some prior distribution), but my understanding is that a fully bayesian approach would also be supposed to differ from the naive estimate of the previous paragraph.

But consider an analogous non-anthropic scenario. Suppose we flip a biased coin a hundred times, which lands heads with probability P (unknown). Whenever this coin lands heads, we immediately flip a second biased coin which lands heads with probability Q (known). If we ever get two heads, one from each coin, we paint a blue state marker red, and it remains red from then on. After the hundred tosses of Coin #1, we find that the state marker is blue, and Coin #1 has landed heads N times. How should we estimate P?

In this scenario, it is true that if you run a large number of simulations at fixed P, and look at the naive estimate (N/100) from cases which end blue, they will on average be below the true value of P, for the same reasons as the previous scenario. Nevertheless, in this scenario, I think the naive estimate is still correct. If N is already given, then the colour of the state marker should give you no additional evidence for the value of P, because the colour only depends on P through N. What the simulation argument misses by working within the blue state outcomes at fixed P is that you are more likely to finish in a blue state when P is smaller.

So the first part of my question is: What is the difference between the existence/non-existence distinction, and the red/blue distinction, which makes anthropic shadow happen in the former case but not the latter?

And the second part is: How can the anthropic shadow argument be phrased in a fully bayesian way? How should I obtain a posterior for P given some prior, the historical record, and the fact of my existence?

[Question] What is the reasoning behind the “anthropic shadow” effect?