Updating on the passage of time and conditional prediction curves

What this is: A technical post about how I believe binary forecasts should be made. There are probably some minor mathematical mistakes, but I doubt there are major mistakes.

Introduction

Probabilistic forecasts can rationally be updated without anything happening except the passage of time. This appears to be a violation of conservation of expected evidence but it isn’t so, as the passage of time is often evidence in itself.

To analyze how the passage of time affects binary forecasts I use conditional prediction curves $p_{s} (t)$ , your predicted forecast at time $t$ given the information at time $s \leq t$ . It’s often possible to construct well-motivated conditional prediction curves automatically, and I provide some details in the context of the “Will $x$ happen by time $τ$ ?”″ type of question.

The benefits of thinking with conditional prediction curves are large, both from the point of view of the individual forecaster, the forecast aggregator, and the scorer.

The forecaster has no need to repeatedly go and update his predictions despite no new information coming in,
The aggregator can aggregate the probabilistic forecasts at any point in time,
The scorer can score a continuous stream of predictions instead of a dislocated bunch of them.

Some downsides of using prediction curves include:

Different kinds of questions require different models. Questions of the sort “Will the event $x$ happen before the event $y$ ?” shouldn’t be handled in the same way as the “Will $x$ happen by time $τ$ ?” kind of question.
Making informal forecasts using conditional prediction curves is harder than making point forecasts. Tutorials and non-technical explanations would probably be required.
It would take some effort to create technical solutions for them.

But prediction curves help us in understanding forecasting too.

Any rational forecaster’s conditional prediction curve will be decreasing if the question looks like “Will $x$ happen by time $τ$ ?”
But the conditional prediction curve will be constant for questions like “Will the event $x$ happen on time $τ$ ?”
More complicated kinds of questions won’t be as regular. Questions on the form “Will the event $x$ happen before the event $y$ ?”″ can have arbitrary conditional prediction curves.

Prediction curves

Define the variables $\begin{matrix} X & : & binary outcome variable, success if equal to 1, T & : & event time, i.e., the time when the outcome X becomes known. \end{matrix}$ You can think about $X$ and $T$ using Metaculus questions. For instance, if the question is “Will a non-state actor develop their own nuclear weapon by 2030?”, $X$ would be $1$ if a non-state actor develops a nuke by 2030 and $0$ otherwise. The random variable $T$ equals $2030$ if $X = 0$ and the point in time the nuke is developed if $X = 1$ .

A prediction curve for $X, T$ is a random function $p (t) \in [0, 1],$ that forecasts the outcome $X$ at every time-point $t$ . If $t > T$ , I’ll assume that $p (t) = 0$ if $X = 0$ and $1$ otherwise, so you can’t make any studip predictions after the event time to screw yourself over.

We can score prediction curves using the integrated scoring rule $s^{'} (p, X) = \int_{t_{0}}^{\infty} s (p (t), X) d t,$ where $s$ is any proper scoring rule (with $0$ being best) and $t_{0}$ is the starting time. Then $s^{'}$ is a proper scoring rule for prediction curves (the exact formulation of what this means is a little technical, see the appendix for a proof), meaning it will always be beneficial to supply your the prediction curve you believe in the most. The scoring rule might be strictly proper too, with the proper definitions and assumptions, but I haven’t investigated it yet. One reason to use an integrated scoring rule is to insentivize honest reporting even when the event time $T$ is far away.

Prediction curves are not used by forecasting sites such as Metaculus. That might be because supplying prediction curves is too much to ask of their audience. But it is possible to construct reasonable prediction curves without too much additional work.

The rational forecaster

Define the information set $F_{t}$ , $\begin{matrix} F_{t} & : & the information about X, T available at time t . \end{matrix}$ A rational forecaster with information set $F_{t}$ is one who makes probabilistic forecasts at each time point $t$ using his best available evidence $F_{t}$ . Define the rational prediction curve as the stochastic process $p (t) = P (X = 1 ∣ F_{t}),$ Here $p (t)$ is random since $F_{t}$ is random. When $s$ is a proper scoring rule $p (t)$ is the optimal prediction given the information set $F_{t}$ according to $s^{'}$ in the sense that $p (t) = {argmin}_{f \in F} \int_{t_{0}}^{\infty} E [s (f (t), X) ∣ F_{t}] d t,$ when $F$ is a suitable class of random functions.

The conditional prediction curve

Define the conditional prediction curve as the expected forecast at time $t$ given the information at time $s$ , but conditioned on the event not yet having happened $\begin{matrix} p_{s} (t) & = & E (p (s) ∣ T > t), = & E (E [X ∣ F_{s}] ∣ T > t), = & P (X = 1 ∣ F_{s}, T > t) . \end{matrix}$ Then $p_{s} (t)$ is the best possible prediction curve based on $F_{s}$ in the sense that $p_{s} (t) = {argmin}_{f} \int_{s}^{\infty} E [s (f (t), X) ∣ F_{s}, T > t] d t .$ You can interpret $p_{s} (t)$ as

the rational prediction at time $t$ of a forecaster who missed all the information from time $s$ to time $t$ .
the rational prediction at time $t$ of a lazy forecaster who did not bother to look for any new information after time $s$ .
the actually rational prediction at time $t$ when information arrives in bursts, not continuously, and the last bit of information became available at time $s$ .

In practice we need the conditional prediction curve because no one is able to update continuously. Call it bounded rationality if you want. The idea is to have each forecaster update their prediction curve whenever they make a forecast, yielding the final prediction curve. If a forecaster provides conditional prediction curves at the times $t_{0} = s_{0} < s_{1} < \dots < s_{k} \leq T$ , the final prediction curve is $p (t) = {\begin{matrix} p_{s_{i}} (t) & when s_{i} < t < s_{i + 1}, X & when t > T . \end{matrix}$

The prediction curve below contains two updates, one at $s_{1} = 0.6$ and one at $s_{2} = 0.8$ . The curves in-between updates are conditional prediction curves: The black curve is the conditional prediction curve $p_{0} (t)$ , the red is $p_{0.6} (t)$ , the blue is $p_{0.8} (t)$ . Together they form your final prediction curve. The event time is $T = 1$ , but that is random an unknown to the forecaster. The conditional forecasts are made using the constant hazard model, discussed in a later section.

Three common question categories

Questions of the sort described above are probably too general to work with, but most can be placed into one of three categories.

Type 1: “Will the event $x$ fail to happen by time $τ$ ?”

Most questions on Metaculus can be written on the form “Will $x$ happen by time $τ$ ?”. Examples include “Will a coup or regime change take place in Russia in 2022 or 2023?” and “Will Putin and Zelenskyy meet to discuss the peaceful resolution of the Russian-Ukrainian conflict before 2023?”. We will look at the questions formulated on the form “Will the event $x$ fail to happen by time $τ$ ?” as it makes the mathematics slightly cleaner.

For these questions we don’t need to model the probability of $X$ at all, yielding the prediction curve $p (t) = P (X = 1 ∣ F_{t}) = P (T \geq τ ∣ F_{t})$ and the conditional prediction curve $p_{s} (t) = P (T \geq τ ∣ F_{s}, T > t) .$

Proposition 1

The conditional prediction curve $p_{s} (t)$ is non-decreasing in $t$ for every $s$ . Moreover, $p_{s} (t)$ is strictly increasing in $t$ under the very minor condition that the hazard rate of $T$ is strictly positive. (This means that there’s a possibility of the question resolving at every time point $t$ .)

Thus a rational forecaster will always expect the probability of a positive resolution to increase in time. Not expecting the probability to decrease is irrational.

Aside from being non-decreasing and starting in $p (s) = p_{s} (s)$ , there are no restrictions on the conditional prediction curve. There are plenty of examples of conditional prediction curves for this kind of question in in the next section.

Type 2: “Will the event $x$ happen on time $τ$ ?”

Some questions on Metaculus can be written on this form. Examples include “Will Ontario’s Conservative Party (PC) win the a majority in the election on 2022-06-02?” and “Will Volodymyr Zelenskyy be named Time Person of the Year in 2022?”″. In these questions the resolution data is fixed at $τ$ , so time has no influence except through the information source $F_{t}$ . Thus the prediction curve is constant in $t$ , and we’re in the intuitive setting that we cannot expect our prediction to change in the future.

Proposition 2

For questions of type “Will the event $x$ happen on time $τ$ ?”, the conditional predictive curve is constant $p_{s} (t) = P (X = 1 ∣ F_{s}, T > t) = P (X = 1 ∣ F_{s}, T > t) = p (s) .$

Type 3: “Will the event $x$ happen before the event $y$ ?”

Questions of this nature are uncommon on Metaculus. The only example I found in my search was “Alexei Navalny to become president or prime minister of Russia in his lifetime?” This questions resolves positively if Navalny becomes PM/president ( $x$ ) happens before Navalny dies $(y)$ . Models for problems of this nature are known as competing risk models.

To model it, define the two times $S$ and $R$ together with $X = 1 [S < R]$ and $T = min (S, R)$ . Then $\begin{matrix} p_{s} (t) & = & P (X = 1 ∣ F_{s}, T > t), = & P (S < R ∣ F_{s}, S > t, R > t) . \end{matrix}$ There is no general regularity in $p_{s} (t)$ unless we know something special about the hazard rates of $S$ and $R$ .

Proposition 3

Let $f (t)$ be any function with range $(0, 1)$ . Then there is a model for $S, R$ and an information set $F_{s}$ so that $p_{s} (t) = f (t)$ .

On one hand, the additional complexity suggests that questions of the “Will the event $x$ happen before the event $y$ ?” form should be avoided. One the other hand, they are quite easy to model, provided you’re willing to provide the prediction curve or hazard rate for both variables $S$ and $R$ . This can be done using the techniques in the next section.

Parametric conditional prediction curves in “Will the event $x$ fail to happen by time $τ$ ?” types of questions

Suppose we know the conditional hazard function at time $s$ and denote it $h_{s} (t) = f (t ∣ T > t, F_{s}) .$ Then we can write the prediction curve as $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (x) d x],$ see the appendix for the proof. This formulation of the conditional prediction curve is helpful as it’s relatively easy to interpret hazard rates. Much of the literature in survival analysis / time-to-event data is formulated in terms of hazard rates too. If you’re willing to assume a parametric form for the hazard rate you can construct conditional prediction curves semi-automatically. We’ll take a closer look at three examples: Constant hazard rates, Weibull hazard, and Gompertz—Makeham hazards.

If you’re willing to assume a parametric form for the hazard rate you can construct conditional prediction curves (semi-)automatically.

Constant hazard

Suppose we may assume the hazard rate is constant, i.e., $h_{s} (t) = λ_{s}$ with $λ_{s}$ unknown. Using the equation $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (x) d x]$ we see that $p_{s} (t) = e^{- λ_{s} (τ - t)} .$ If you know the point forecast $p (s) = p_{s} (s)$ , we may use it to derive $λ_{s}$ . Solving for $λ_{s}$ , we find that $λ_{s} = - \frac{log p (s)}{τ - s},$ so the implied conditional prediction curve is $p_{s} (t) = e^{log p_{t} (\frac{τ - t}{τ - s})} = p (s)^{\frac{τ - t}{τ - s}} .$

Example

Suppose the current date is in the middle 2022 and we consider the “Will Putin and Zelenskyy not meet to discuss the peaceful resolution of the Russian-Ukrainian conflict before 2023?”. Then we can put $τ = 1$ and $s = 0.5$ . In the plot below we show $p (s) = 0.2, 0.6, 0.8$ , where $0.8$ was the Metaculus prediction at the time. When $p (s)$ is reasonably large, the conditional prediction curve is almost linear, making $p (s) + (1 - p (s)) \frac{τ - t}{τ - s}$ a reasonable approximation to $p_{s} (t)$ .

The benefits of assuming a constant hazard rate lies in its simplicity.

The forecaster doesn’t have to put in more work in the constant hazard model, everything happens automatically.
From the aggregator’s point of view, the constant hazard model allows you to do principled aggregation using only one point data, as you can derive the most up-to-date prediction for every forecaster.
The scorer can calculate principled scores using the scoring rule $s (p (T), X)$ straight from the data.

Weibull hazard

The Weibull hazard is usually written on the form $h (t) = b k t^{k - 1} .$ It is used to model increasing (when $k > 1$ ) or decreasing $(k < 1)$ hazard rates. The conditional prediction curve is $p_{s} (t) = exp [- b (τ^{k} - t^{k})] .$

To use the Weibull hazard you can an provide a point estimate at the current time and then either

visually modify the curve until you’re pleased with the look,
provide another point estimates and deduce the values of $b, k$ mathematically,
provide more than two points and use e.g. least squares to find the best-fitting curve.

Visual modification

Take the logarithm of $p (s)$ and solve for $b$ $- \frac{log p (s)}{τ^{k} - s^{k}} = b .$ The conditional prediction curve can be written in terms of $k$ and $p (s)$ $\begin{matrix} p_{s} (t) & = & p (s)^{\frac{τ^{k} - t^{k}}{τ^{k} - s^{k}}} . \end{matrix}$

Now you can plot the hazard rate and the conditional prediction curve while sliding $k$ around. You can stop at the $k$ you’re most comfortable with.

Two predictions

Suppose $r > t$ and $1 > p_{s} (t) > p_{s} (r) > 0$ . We need to solve $\frac{log p_{s} (t)}{τ^{k} - t^{k}} = \frac{log p_{s} (r)}{τ^{k} - r^{k}}$ This is equivalent to solving $\frac{τ^{k} - r^{k}}{τ^{k} - t^{k}} = \frac{log p_{s} (r)}{log p_{s} (t)} .$ Since $1 > p_{s} (t) > p_{s} (r) > 0$ we have $1 > \frac{log p_{s} (r)}{log p_{s} (t)} > 0$ . In addition, $0 < \frac{τ^{k} - r^{k}}{τ^{k} - t^{k}} < 1$ is increasing in $k$ , as can be verified by taking its derivative, and has asymptotes at $- \infty$ and $\infty$ , so the equality has a solution that can be found using root-finding.

Example: “Will India have at least 200 nuclear warheads at the end of 2023?”

This plausibly a question with increasing rate. The description says that “As of May 2021, the Federation of American Scientists estimated India as having 160 nuclear warheads.” In order to reach $200$ warheads, they first have to reach $161$ , then $162$ , and so on, making it more likely they finally reach $200$ at a given instance as time goes on.

Due to the way I’ve formulated the mathematics, we have to analyze the opposite question “Will India have less than 200 nuclear warheads at the end of 2023?” instead.

Suppose I make the forecast $p (s) = 0.95$ at time $s = 0$ equal to the the last day of 2022, and suppose that $τ = 1$ on the last day of $2023$ . Then $p (s)^{\frac{τ^{k} - t^{k}}{τ^{k} - s^{k}}} = {0.95}^{(1 - t^{k})} .$ The plot below shows the resulting prediction curves for $k \in {1 / 3, 1 / 2, 1, 2, 3}$ . To interpret the red line, observe that the prediction barely changes when $t$ is small enough. This reflects that the probability of India obtaining $200$ nukes is small in the short term. However, as $t$ approach $1$ and they still haven’t acquired nukes, the probability of them not acquiring them increases rapidly.

Gompertz—Makeham hazard

The Gompertz—Makeham hazard has the form $h (t) = α e^{β t} + λ$ . From $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (t) d x]$ we find that $\begin{matrix} p_{s} (t) & = & exp [- \int_{t}^{τ} α e^{β t} + λ d x], = & exp [- \frac{α}{β} (e^{β τ} - e^{β t})] exp [- λ (τ - t)] . \end{matrix}$ The Gompertz—Makeham hazard has an age-dependent″ term $α e^{β t}$ (the Gompertz term) and a age-independent term $λ$ (the Makeham term). We can potentially think of them independently. In some cases there are multiple sources both of age-dependent and age-independent terms, making it a multi-Gompertz—Makeham hazards. If we have $k$ Gompertz components the $k$ -Gompertz—Makeham hazard is $h (t) = k \sum i = 1 α_{i} e^{β_{i} t} + λ,$ with conditional prediction curve $\begin{matrix} p_{s} (t) & = & exp [- \int_{t}^{τ} k \sum i = 1 α_{i} e^{β_{i} t} + λ d x], = & exp [- λ (τ - t)] k \prod i = 1 exp [- \frac{α_{i}}{β_{i}} (e^{β_{i} τ} - e^{β_{i} t})] . \end{matrix}$

Example: “Will Putin be stay in power until August 11th 2030?”

We can divide the hazards into three parts: Mortality, time-independent hazard for being kicked out of power, time-independent hazard for a coup, and time-dependent hazard for a coup.

Mortality. This document estimates a Gamma—Gompertz—Makeham model on US data and finds parameters $β \approx 0.1$ and $λ = 0.001$ and $α_{30} = 0.00035$ (this means the baseline age is $30$ , i.e., the mortality starts increasing with age only at age $30$ ). This is not the right country nor the right model but the parameters should be close enough. Since there are some rumors of Putin being sick, I’ll modify the constant hazard to $λ = 0.01$ . Since $e^{0.1 \cdot 39} \approx 50,$ the hazard $0.00035 e^{0.1 (t + 69)} \approx 0.02 e^{0.1 t}$ .
Time-independent hazard for a coup. I haven’t found a good source on this, but it’s probably not too hard to find following the leads in e.g. this paper. I’m guessing a $1 %$ yearly ambient risk of a coup.
Time-dependent hazard for a coup. For instance, one might reasonably think this one will decrease with temporal distance from the start of the Ukraine war. Let’s say the Ukraine conflict adds an annual mortality of $5 %$ right now, expected to decrease to $1 %$ in two years time. Thus $α = 0.05$ and $0.05 e^{2 β} = 0.01$ , which implies $β = log (1 / 5) / 2 \approx - 1.4.$

We end up with the hazard rate $0.05 e^{- 1.4 t} + 0.02 e^{0.1 t} + 0.03$ , a sum of two Gompertz components and one Makeham component.

It appears that my complicated Gompertz-Makeham modelling has been for naught, as the prediction curve is virtually identical to the constant hazard prediction curve. I don’t know if we should expect this to happen in general or not. It might be because the two Gompertz components cancel each other other out.

As a side effect, this analysis also yields a density for Date Putin Exits Presidency of Russia. The expression for the survival curve is $P (T > t) = exp [- k \sum i = 1 \frac{α_{i}}{β_{i}} (e^{β_{i} t} - 1) - λ t],$ which can be differentiated to find the density $f (t)$ as seen below.

Concluding thoughts

I feel quite confident that conditional prediction curves is the best option for handling the time problem in binary forecasts. There are some alternatives, such as providing the entire distribution $p (x, t)$ , but that looks quite cumbersome. There are many benefits from using conditional prediction curves (for the forcasters, aggregators, and scorers), they are not too difficult to implement for forecasting platforms, and it should be possible to develop good tutorials that makes forecasters comfortable with them.

It would be great to find out if the complicated hazard functions are worth the hassle—maybe the constant hazard is enough for most purposes? The Putin example suggest a constant hazard rate might be enough, as the complicated multi-Gompertz—Makeham prediction curve plot is virtually the same as the constant hazard prediction curve based on the same $p (s)$ !

I don’t have too many hints for how to choose among the different hazard functions. But you might use empirics as a guide. For instance, the Gompertz—Makeham hazard appears to fit mortality data better than the Weibull—Makeham hazard but the difference appears to be marginal. If you’re dealing with questions such as “Will Putin be ousted as president of Russia by 2030?”, such observations might help you. There are also theoretical reasons to prefer one over the other in some cases, but I don’t know if they are useful.

It could be reasonable to mix the Weibull and Gompertz components too, for instance following the same kind of reasoning as in the Putin example above. There are infinitely many hazard functions I haven’t talked about at all, such as the log-normal hazard. Some of these may have nice interpretations that could help the forecaster.

Appendix

Proof that $s_{w}^{'}$ is proper

We show that the weighted version $s_{w}^{'} (q, X) = \int_{t_{0}}^{\infty} w (t) s (q (t), X) d t$ is a proper scoring rule for any positive weighting function $w$ . Let $p (t)$ denote the true probability $P (X = 1 ∣ F_{t})$ , where $F_{t}$ is the information observed until time $t$ . Let $q (t)$ be any other stochastic process adapted to $F_{t}$ .Since $s (q (t), X)$ is non-positive, we can apply Fubini’s theoremto get

$\begin{matrix} E [\int_{t_{0}}^{\infty} w (t) w (t) s (q (t), X) d t] & = & \int_{t_{0}}^{\infty} E [w (t) s (q (t), X)] d t, = & \int_{t_{0}}^{\infty} E [E [s (q (t), X) ∣ F_{t}]] d t, \end{matrix}$ where the second equality follows from iterated expectations. Since $p (t)$ is the true probability of $X = 1$ conditioned on $F_{t}$ ,we have $E [s (q (t), X) ∣ F_{t}] \geq E [s (p (t), X) ∣ F_{t}],$ since $s$ is a proper scoring rule. It follows that $E [\int_{t_{0}}^{\infty} w (t) s (q (t), X) d t] \geq E [\int_{t_{0}}^{\infty} w (t) s (p (t), X) d t],$ hence $s_{w}^{'}$ is a proper scoring rule.

Comment on the scoring rule

The scoring rule $s^{'} (p, X) = \int_{t_{0}}^{\infty} s (p (t), X) d t$ has the weakness that early forecasters are penalized. If the scoring rule is bounded above, such as the Brier score, early forecasting can be incentivized by setting $p (t) = 1 - X$ for all time points $t$ before the forecaster made their first forecast. Other than that, it appears to me to be a reasonable scoring rule to evaluate forecasts in time. There are other potential scoring rules, such as $s (p (T), X)$ , which do not appear to be proper for predicition curves; but it might also be that prediction curves aren’t the correct abstraction.

Proof that $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (t) d t]$ .

We know that $\begin{matrix} p_{s} (t) & = & P (T = τ ∣ T > t, F_{s}), = & 1 - P (T < τ ∣ T > t, F_{s}) . \end{matrix}$ Using the equality $S (t) = exp [- \int_{0}^{t} h (t) d t]$ , where $S (t)$ is the survival function and $h (t)$ the hazard rate, we find that $\begin{matrix} P (T < τ ∣ T > t, F_{s}) & = & \frac{P (T < τ ∣ F_{s}) - P (T < t ∣ F_{s})}{P (T > t ∣ F_{s})} = & \frac{(1 - exp [- \int_{0}^{τ} h_{s} (t) d t]) - (1 + exp [- \int_{0}^{t} h_{s} (t) d t])}{exp [- \int_{0}^{t} h_{s} (t) d t]} = & 1 - exp [- \int_{t}^{τ} h_{s} (t) d t] . \end{matrix}$ The equality $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (t) d t]$ follows from the definition of $p_{s} (t)$ .

Proof of Proposition 2, that $p_{s} (t)$ is non-decreasing in $t$ for every $s$ .

Suppose that $r > t$ . Using $p_{s} (t) = exp [- \int_{t}^{τ} h_{s} (t) d t]$ we find that $\begin{matrix} p_{s} (r) / p_{s} (t) & = & exp [- \int_{r}^{τ} h_{s} (t) d t + \int_{t}^{τ} h_{s} (t) d t], = & exp [\int_{t}^{r} h_{s} (t) d t] . \end{matrix}$ Since $h_{s} (t) \geq 0$ , $p_{s} (r) / p_{s} (t) \geq 1$ , hence $p_{s} (r) \geq p_{s} (t)$ . In the same way, if the hazard rate is strictly positive, we have $h_{s} (t) > 0$ for all $t$ , $p_{s} (r) / p_{s} (t) > 1$ , hence $p_{s} (r) > p_{s} (t)$ .

Proof of Proposition 3

We can ignore the dependence on $F_{s}$ and work directly on probability measures. In this case $P (S < R ∣ S > t, R > t) = p_{s} (t)$ , and we see that $\begin{matrix} P (S < R ∣ S > t, R > t) & = & \frac{P (t < S < R)}{P (S > t) P (R > t)}, = & \frac{P (t < S < R)}{exp [- \int_{0}^{t} h_{S} (x) d x] exp [- \int_{0}^{t} h_{R} (x) d x]} . \end{matrix}$

We find that $\begin{matrix} P (t < S < R) & = & \int_{t}^{\infty} P (t < s < R) p (s) d s, = & \int_{t}^{\infty} exp [- \int_{0}^{s} h_{R} (t) d x] p (s) d s, = & \int_{t}^{\infty} exp [- \int_{0}^{s} h_{R} (t) d x] h_{S} (t) exp [- \int_{0}^{s} h_{S} (t) d x] d s . \end{matrix}$ Thus we need to equate $\frac{\int_{t}^{\infty} exp [- \int_{0}^{s} h_{R} (t) d x] h_{S} (t) exp [- \int_{0}^{s} h_{S} (t) d x] d s}{exp [- \int_{0}^{t} h_{S} (x) d x] exp [- \int_{0}^{t} h_{R} (x) d x]} = f (t) .$

Multiply both sides by $exp [\int_{0}^{t} h_{S} (x) d x] exp [\int_{0}^{t} h_{R} (x) d x]$ to obtain $\begin{matrix} \int_{t}^{\infty} exp [- \int_{0}^{s} h_{R} (t) d x] h_{S} (t) exp [- \int_{0}^{s} h_{S} (t) d x] = & f (t) exp [- \int_{0}^{t} h_{S} (x) d x] exp [- \int_{0}^{t} h_{R} (x) d x], \end{matrix}$ and differentiate with respect to $t$ to get $\begin{matrix} exp [- \int_{0}^{s} h_{R} (t) d x] h_{S} (t) exp [- \int_{0}^{s} h_{S} (t) d x] = & f (t) (h_{S} (t) + h_{R} (t)) exp [- \int_{0}^{t} h_{S} (x) d x] exp [- \int_{0}^{t} h_{R} (x) d x], - f^{'} (t) exp [- \int_{0}^{t} h_{S} (x) d x] exp [- \int_{0}^{t} h_{R} (x) d x] . \end{matrix}$ Multiply both sides by $exp [\int_{0}^{s} h_{S} (t) d x] exp [\int_{0}^{s} h_{R} (t) d x]$ to obtain $h_{S} (t) = f (t) (h_{S} (t) + h_{R} (t)) - f^{'} (t),$ which can be rearranged to $h_{R} (t) = \frac{h_{S} (t) + f^{'} (t)}{f (t)} - h_{S} (t) .$ The function $h_{R} (t)$ is a hazard function if and only if it is non-negative, hence we require $\frac{h_{S} (t) + f^{'} (t)}{f (t)} \geq h_{S} (t),$ i.e., $h_{S} (t) \geq f (t) h_{S} (t) - f^{'} (t) .$ Solving the equality $h_{S} (t) \geq f (t) h_{S} (t) - f^{'} (t)$ yields $h_{S} (t) = - f^{'} (t) / (1 - f (t))$ , but this function is negative when $f^{'} (t)$ is positive, hence it’s not in general a hazard function.

We can fix this by defining $h_{S} (t) = \frac{max (- f^{'} (t), 0)}{1 - f (t)},$ for if $- f^{'} (t)$ is non-positive, $h_{S} (t) = 0$ while $f (t) h_{S} (t) - f^{'} (t) = - f^{'} (t) \leq 0$ .