Why It Works

Quick Sidenote

The audio recording isn’t great with functions yet, and there will be a lot of those. listener’s discretion is advised.

Also, the time estimate is a little off, I’d estimate it takes around 17 minutes to read, including breaks & pauses. Not including breaks & pauses, I’d say about 9 minutes.

Also, this makes little to no sense on its own, so read This one first, if you haven’t already. It’ll take around 8 minutes.

Also, it has come to my attention that this does not include the discrete case, and the probability distribution functions are a bit wonky, as the method currently provided is optimized purely for comparing abstract functions. At the moment, the function treats probability distributions* as though there is a uniformly randomly distributed input (x), and a non-uniform output ( $f_{i} (x)$ ), and the probability distribution* of $f_{i} (x)$ is treated as $| f_{i}^{'} (f_{i}^{- 1} (x)) |$ * for the continuous function $f_{i} (x)$ , ( $f_{i}^{- 1} (x)$ being the inverse function of $f_{i} (x)$ .). ~~I am working to fix this.~~ (I am currently undergoing more ambitious and time-effective projects, and this article will likely not be outdated until many months from now, if ever.)

WARNING: Only read this if you want to, ~~or are skeptical about whether it works, and need to be sure~~. It doesn’t really work.

Conceptualizing the Problem:

The problem at hand is to understand the expected value when we “roll” a number of functions a certain number of times. A “roll” in this context refers to generating an input in a function’s domain. For each function, $f_{1} (x), f_{2} (x), a n d f_{3} (x)$ ^[1]

, confined to specific input ranges, denoted by

Minimum input	Maximum input	#times we “roll” the input	Function
$a_{1}$	$b_{1}$	$c_{1}$	$f_{1} (x)$
$a_{2}$	$b_{2}$	$c_{2}$	$f_{2} (x)$
$a_{3}$	$b_{3}$	$c_{3}$	$f_{3} (x)$

. The coefficients $c_{1}, c_{2}, c_{3}$ represent the number of times we evaluate each respective function.

The sgn Function:

Before diving deep, let’s clarify the sgn⁡(x) function. The sgn(x) or “signum” function is simple but powerful. It returns:

−1 if x<0
0 if x=0
1 if x>0.

So, $s g n (x - f_{i} (x_{h}))$ checks if x is larger than our function’s output. By adding 1 and dividing by 2, the original function’s outcomes of −1, 0, and 1 transform to 0, 0.5, and 1 respectively. Essentially, this is turning the signum function into a binary (0 or 1) indicator for whether $f_{i} (x_{h}) < x .$

Functions $f_{a} (x), f_{b} (x), a n d f_{c} (x) .$

These functions integrate the modified ⁡sgn function over the domain of their respective functions $f_{i}$ . Essentially, they measure the proportion (and, by extension, probability) of the function( $f_{i}$ )’s domain where its value is less than x.

f_{i} (x) = \frac{(\int_{a_{h}}^{b_{h}} \frac{(sgn (x - f_{h} (x_{h})) + 1)}{2} d x_{h})}{| b_{h} - a_{h} |}

Small break

Congratulations if you’ve made it this far! As a reward, here’s a Monty Python video that I found funny.

This is optional and works best if you try not to worry about what you’ve read so far. If you think you would, I suggest you re-read it, or read the recap.

Quick recap

we randomly generate inputs for functions and use the largest value.

To get the expected value of this, we multiply each output by the probability that it’s the highest using the function above, denoted as $f_{i}$ , and then “add” all of those outputs to get the expected value.

Bringing it Together in $f_{d}$

Understanding the Mechanism of $f_{d}$ :

$f_{d}$ calculates the expected value for each function $f_{1} (x), f_{2} (x), a n d f_{3} (x)$ individually and then multiplies them by how many times we “roll” them. ( $c_{1}, c_{2}, a n d c_{3} .$ )

Component Analysis:

The full function is

$f_{d} = (\int_{a_{1}}^{b_{1}} (f_{a} {(f_{1} (x))}_{1}^{(\frac{c_{1} - 1 + ∣ ∣ c_{1} - 1 ∣ ∣}{2})} f_{b} {(f_{1} (x))}_{1}^{c_{2}} f_{c} {(f_{1} (x))}_{1}^{c_{3}} f_{1} (x)) d x) c_{1} + (\int_{a_{2}}^{b_{2}} (f_{a} {(f_{2} (x))}_{2}^{c_{1}} f_{b} {(f_{2} (x))}_{2}^{(\frac{c_{2} - 1 + ∣ ∣ c_{2} - 1 ∣ ∣}{2})} f_{c} {(f_{2} (x))}_{2}^{c_{3}} f_{2} (x)) d x) c_{2} + (\int_{a_{3}}^{b_{3}} (f_{a} {(f_{3} (x))}_{3}^{(c_{1})} f_{b} {(f_{3} (x))}_{3}^{c_{2}} f_{c} {(f_{3} (x))}_{3}^{(\frac{c_{3} - 1 + ∣ ∣ c_{3} - 1 ∣ ∣}{2})} f_{3} (x)) d x) c_{3}$

Let’s focus on the first term:

$(\int_{a_{1}}^{b_{1}} (f_{a} {(f_{1} (x))}_{1}^{(\frac{c_{1} - 1 + ∣ ∣ c_{1} - 1 ∣ ∣}{2})} f_{b} {(f_{1} (x))}_{1}^{c_{2}} f_{c} {(f_{1} (x))}_{1}^{c_{3}} f_{1} (x)) d x) c_{1}$

This integral gives the expected value of $f_{1} (x)$ weighted by the probability that $f_{1} (x)$ has the highest output among the three functions for all valid inputs x.

Role of the Exponents:

The exponents in the functions $f_{a}, f_{b}, a n d f_{c}$ serve a crucial role. Given that each of these functions essentially returns a probability, raising them to their respective exponents is the equivalent of multiplying the probabilities.

$f_{a} {(f_{1} (x))}_{1}^{(\frac{c_{1} - 1 + ∣ ∣ c_{1} - 1 ∣ ∣}{2})}$ : Represents the probability that $f_{1} (x)$ is more than the value of $f_{1} (x_{a})$ , $c_{1} - 1$ times. ( $c_{1}$ is the number of times we “roll” an input for the function $f_{1} (x)$ ). The reason for the −1 is because we don’t need to calculate the chances of $f_{1} (x)$ being higher than $f_{1} (x) .$ The reason for the $\frac{+ | c_{1} - 1 |}{2}$ is because it outputs $c_{1} - 1$ for all values where $c_{1}$ is more than or equal to $1$ , and $0$ for all values less than 1. We do this because we don’t want to get a negative exponent, as this may result in dividing by $0$ , which is undefined.
$f_{b} {(f_{1} (x))}_{1}^{c_{2}}$ and $f_{c} {(f_{1} (x))}_{1}^{c_{3}}$ Represent the probabilities that $f_{1} (x)$ surpasses $f_{2} (x_{b})$ and $f_{3} (x_{c})$ respectively, the necessary number of times.

Enhancing your understanding

Probability of Being the Highest:

By multiplying these terms, we calculate the joint probability that $f_{1} (x)$ is the highest output among the three functions a specific number of times as denoted by the coefficients, the same as you would if $f_{2} (x)$ were the same as a hypothetical $f_{4} (x)$ , $f_{5} (x)$ , …^[1]

Weighting by the Function Value:

Finally, we multiply this probability by the function’s value $f_{1} (x)$ to weigh it by the actual outcome of the function, the same way you would always multiply probability by value.

Summing Over the Domain:

By integrating over the domain of $f_{1} (x)$ , we capture the aggregated effect across all possible inputs.

Applying the Coefficients $c_{i}$ :

After obtaining the expected contribution of each function, we multiply each result by their respective coefficient $c_{1}$ , $c_{2}$ , or $c_{3}$ . These coefficients account for the number of times we evaluate each respective function in our simulation.

We can then use the same logic for $f_{2}$ and $f_{3}$ .

If you have any questions, I suggest you ask Chat-GPT or me. If you have any suggestions as to how I can make this clearer, or a better way of finding the expected value of the best option, or any wording that could be done differently, tell me. (ONLY if you want). No pressure.

If there’s anything incorrect, please tell me.

Congratulations! You made it to the end!

as a special reward, have another funny video. You deserve it.

^
3 Functions is an arbitrary quantity. you can use as many as you want. (Here’s how)