Probability of falling down after n steps for a random number

I’m going to describe a problem (that I am trying to solve) without referring to Collatz but I am pretty sure you are going to see the link.

Let’s say a glider just took off (propelled by very strong wind). This glider moves forward, most of the time it goes down, but sometimes it meets upward airstreams and goes up! To simplify let’s consider that every hour it has 50% chances to loose X meters and 50% chances to gain Y meters (X>Y, for example you can take X=1 and Y=log(3/2)/log(2) ).
If at any point during its flight the glider goes under its starting height, it lands and its flight ends there (nobody gets hurt). There is no other limit (it is driven by a self-sufficient robot who is very happy to be here).

After N hours, what is the probability P that the glider is still flying?

For small values of N, it’s easy to calculate it with a tree graph. P(0)=1, P(1)=1/2, P(2)=P(3)=1/4, P(4)=3/16… but what I would like is an equivalent to how exactly it decreases when N tends toward infinity.

2 Likes

Under a random-walk scenario, what’s the chance start number n hits 1 on (or before) the mth step? Great. You might check “Stochastic Models for the 3x+1 and 5x+1 Problems and Related Problems” by Kontorovich and Lagarias. Or see what you come up with… The nice thing is that you can check against empirical results. If you have a formula for

P(n, m) = probability n hits 1 on (or before) the mth step of the random walk,

then for any given (n, m) pair, you can run 10,000 random walks starting at n, and tabulate what percentage of them hit 1 on (or before) the mth step … then see if that’s close to the value P(n, m).

1 Like

Gravity is entropy in your example. I took a look at this strategy and rewrote the sequence as a trivial unbiased PRNG (Pseudo Random Number Generator). The trick is to show the PRNG is unbiased so that the number of odd and even transitions balance out and an infinitely long run is not possible .

Browse for “Entropy Make the Collatz Sequence Go Down” and see if it helps.

This is something I play with a lot - decomposing things into physical systems or “machine engineering” problems. It’s generally served me well. That said, the problem you pose is very hard. I played with it and I think it depends on both a polynomial and an exponential component. If that’s right (which I don’t know if it is, it’s subtle) then the glider’s chance of still being airborne after N hours falls off like N^-3/2 * some exponential (insert number here)^N. This won’t work for Collatz, by the way (which is deterministic and state-dependent)

1 Like

I think I have an idea that could make it work for Collatz, even though it’s deterministic, but I don’t know how to explain it. I’ll tell you more if that yields anything interesting.

Just started reading Entropy Make the Collatz Sequence Go Down and I can tell the authors had fun writing this (emphasis mine) :slight_smile:

The computational mechanics of the Collatz sequence are analyzed
to determine the odds of taking an even or odd step. Using the
following definition of the sequence we shall show the average odds
of taking either step are even.

1 Like

Nice you caught that. I usually censor myself, but sometimes I can’t help it.

1 Like

Quick kook take. A gambler starts with $n and at each step either wins $1 (45% chance) or loses $1 (55% chance). The expected number of steps to reach $0 is \cfrac{n}{0.55-0.45} = 10n. I guess the intuition is that you lose 10 cents per step on average, but the proof is kinda complicated. The formula is confirmed by simulation:

Initial Stake n Formula mean Simulated mean Simulated median
1 10 10.03 1
5 50 49.96 25
10 100 100.2 68
20 200 199.1 160
50 500 499.5 454

The first line seemed counter-intuitive to me – look, Mom, if I start with $1, on average I can hold out 10 steps before going broke (?!) – so I also computed the median.

For a 3n+1 random walk in log space, we have a 50% chance of losing $1 and a 50% chance of winning \approx $0.585. That means on average we lose about 20.7 cents per step. So, you might guess it would take \cfrac{\log_2 n}{0.207} = 4.82 \log_2 n steps to go broke. That’s again no proof. Result:

n 4.82 \log_2 n Simulated mean Simulated median
100 32.02 34.06 28
1000 48.04 50.17 44
1m 96.07 97.91 92

I got confused as to why Lagarias says 6.95 \log n instead of 4.82 \log n (“Stochastic Models for the 3x+1…”), but he is using natural log because he is a mathematician, haha!

1 Like

This is a different problem: I am looking for the probability of a n to fly for at least m steps, while that gambler problem is more akin to looking for the expectancy of the flight length for a given number.

If I knew my glider would on average have a flight length of 10, that doesn’t tell me much about the probability of it being still airborne after let’s say 10 steps: that could be 1/2 if the distribution was some kind of gaussian around 10, or that could be 1/10000 if the glider has 1/10000 chance to have a very long flight and would otherwise have a rather short one, or that could be close to one if flight length was almost always 11 except for some moderately rare occurences where it was 1.

I am having quite a hard time with Kontorovich&Lagarias (not used to those notations), but that’s very interesting anyway.

1 Like

Is it very hard the way Collatz is very hard? I’d appreciate a lot if you had some explanation about how you came up with “N^-3/2 * some exponential (insert number here)^N”.
It seems to me that if we know we are looking for N^-3/2 * A^MN then it could be only a matter of adjusting the parameters for specific values of M.

Ah, I totally get you. I answered a much simpler question. You want to know what’s the chance the gambler’s going to be broke by the time his mom comes to pick him up at 9 o’clock. :nine_o_clock:

1 Like

I found this for the probability of a random walk hitting zero within t steps, starting at h:

prob-of-random-walk-hitting-barrier

where \Phi is the standard normal cumulative distribution function (CDF), which is supposedly built into math packages:

standard-normal-cdf

I don’t know anything else about this thing. But it could give you an idea of how hard the problem is, and/or you could run some simulations to see if it works in practice. In the 3n+1 case:

h = \log_2(n)
A = 0.585
B = -1.0
\mu = \frac{1}{2}(A+B) = -0.208
\sigma^2 = \frac{1}{2}((A-\mu)^2 + (B-\mu)^2)) = 0.628
\sigma = 0.792

The second term seems to be ignorable/negligible, since e^{2 \mu \log_2(n) / \sigma^2} = n^{-0.92}, which vanishes as n increases.

1 Like

I am not sure the second term is negligible, since h goes to infinity… I’m looking at it (tomorrow).

This is subtle. And it’s what I was referencing above when I said the problem is “hard.” (Not “collatz hard,” but hard for me to be fully be sure of). I think the correct equation to use includes the exponential part, not just the polynomial, so the full equation would be:

P(\text{the probability the glider is still flying after }N\text{ hours}) \sim C N^{-3/2} [M(\theta^*)]^N

where M(\theta^*) = \min_{\theta} M(\theta) \approx 0.965 in this case

There are different ways to think about problems like this, and I’m not 100% sure which one is totally correct. But I think for the probability of survival up to some large finite time N with S_0=0 and negative drift, the “most likely path” is governed by Cramer-type deviation, leading to \rho=M(\theta^*) as the exponential factor. You could possibly frame it in more of the gambler’s ruin approach mathkook used above (which is polynomial without the exponent piece).

I guess the summary is that there’s a LOT going on in a problem like this, and different forms you could use depending on how you interpret certain pieces (theta is convex, there’s a thing called a sharp asymptote, etc).

Ok, here is my reasoning.
I am actually interested in the probability of not hitting zero (P = 1-Pr). That doesn’t change much.
Also I use ν=-μ just to avoid having a negative variable.

By developping your expression, I get

P = 1-Pr = 1/2 ( 2 - erfc((h-νt)/σ·sqrt(2t)) - exp(-2νh/σ²) · erfc((h+νt)/σ·sqrt(2t)) )
= 1/2 ( erfc((-h+νt)/σ·sqrt(2t)) - exp(-2νh/σ²) · erfc((h+νt)/σ·sqrt(2t)) )
= 1/2 ( (1-exp(-2νh/σ²)) · erfc((h+νt)/σ·sqrt(2t)) ) + S

S is 1/sqrt(π) times the integral of exp(-x²) over [(-h+νt)/(σ·sqrt(2t)) ; (h+νt)/(σ·sqrt(2t))]
2h/σsqrt(2πt) · exp[-(h+νt)²/(σ²2t)] <= S <= 2h/σsqrt(2πt) · exp[-(-h+νt)²/(σ²2t)]

Meanwhile, when t tends towards infinity, erfc((h+νt)/σ·sqrt(2t)) is equivalent to
σsqrt(2/πt)·exp[-(h+νt)²/(σ²2t)]
thus 1/2 ( (1-exp(-2νh/σ²)) · erfc((h+νt)/σ·sqrt(2t)) )
is equivalent to σ/sqrt(2πt) · exp[-(h+νt)²/(σ²2t)] · (1-exp(-2νh/σ²))

(my lessons are rusty but I am pretty sure I am allowed to sum equivalents here)

then we get P equivalent to 2h/σsqrt(2πt) · exp[-(h+νt)²/(σ²2t)] + σ/sqrt(2πt) · exp[-(h+νt)²/(σ²2t)] · (1-exp(-2νh/σ²))
= σ/sqrt(2πt) · exp[-(h+νt)²/(σ²2t)] · (3-exp(-2νh/σ²))
= σ/sqrt(2πt) · exp[-ν²t/(σ²2)] · (3-exp(-2νh/σ²)) (not sure that’s valid)

Numerical application: P → 0.316/sqrt(t) · exp(-0.034t) · (3-exp(-0.66h))

If I had a bigger decrease it could have meant a contradiction with the existence of an ever-growing sequence, because such a sequence implies to exist that an infinity of generators of such sequences also exist, and this infinity has to have at least a minimal density.

Great – thought I’d throw in some experiments. The formula gives the probability of a random walk starting at \log_2(n) hitting zero within m steps. I put this next to a simulation of 10,000 random Collatz-like walks starting at n, iteratively picking \frac{3n+1}{2} or \frac{n}{2} at random (not considering evens or odds). Seems like the right formula for the job.

n m formula 10k walks (3n+1)/2 10k walks 3n/2
1000 80 0.825407 0.8270 0.8877
1000 40 0.369868 0.3812 0.4365
400 20 0.102421 0.0982 0.1463
100001 100 0.699405 0.7047 0.7534
100000001 100 0.231215 0.2353 0.2695

The last column picks \frac{3n}{2} or \frac{n}{2} at random, instead of \frac{3n}{2}. I thought the formula would better model the last column – kinda strange that it doesn’t. Whether to stop the simulator at n<1 or n \leq 1 also affects the numbers a little.

PS. Using only the first term of the formula seems fine – the second term seems to have a n^{-0.92} factor that makes it vanish (comparatively):

n m formula formula 1st term only
1000 80 0.825407 0.825407
1000 40 0.369868 0.369868
400 20 0.102420 0.102421
100001 100 0.699405 0.699405
100000001 100 0.231215 0.231215

PPS. It’s also possible to replace the simulator with the actual even/odd Collatz process – maybe just ask “how many numbers from n-500 to n+500 hit actually 1 within m steps.” I remember K&L’s article cautioning that the numbers in such a range wouldn’t behave independently, though, with 8n+4 and 8n+5 trajectories quickly coalescing, for example.

They actually kinda would behave independently but only if you consider the full 2^m consecutive numbers at the same time. If I take, let’s say, the numbers from 14500 to 14503 I know one will go “up-up”, one “down-down”, one “up-down” and one “down-up”.

1 Like