(No real question here, just background for possible other posts in this area.)
Looking at the Collatz tree, people ask, “How many numbers are in the tree?” Probably all of the numbers … but maybe not. Well, at least some of them are in there. How many for sure? All the powers of 2 are in there, so … infinitely many?
That’s kind of unsatisfying, so people instead ask, “How many numbers from 1 to x are definitely in the tree?” Then they look for a formula f(x) for arbitrary x.
Again, probably all of them (f(x)=x), but nobody can show that yet. At least 99% of them, f(x) \geq 0.99x? At least 1% of them, f(x) \geq 0.01x? Turns out, for large x, not even one-percent of numbers up to x are known to be in the Collatz tree.
If you dig through the literature, you pretty quickly come across the limit of current knowledge, something like “at least x^{0.84} of them”. To a math kook, that looks like 84%, but it’s way less, so I’m writing instead the similar but more evocative “at least \sqrt{x} of them” (x^{0.5}). That gives the idea that as x gets bigger, we can only prove that a vanishingly small percentage of numbers from 1 to x are (for sure) there in the tree. For example, among numbers less than 10^{1000}, we know that 10^{500} are in the tree, which is 0.00000...00001% of them.
The proofs are … you know … so I wanted to get a rough intuition for why.
People wind up ascending the tree, level by level, asking at each level, “How many numbers between 1 and x do we see at that level?”
-
All the powers of 2 are “height-0 numbers”, because they get to 1 without passing through any odds. For sure, there are at least \log_2 x height-0 numbers less than x.
-
Above them are the height-1 numbers, which involve one odd in their trajectory, such as 5 \rightarrow 16 \rightsquigarrow 1, or 21 \rightarrow 64 \rightsquigarrow 1.
-
Above those are the height-2 numbers, like 13 \rightsquigarrow 5 \rightsquigarrow 1.
The plan is to consider all numbers less than x and sum up how many (for sure) are height-0, height-1, height-2, and so on, without any pre-set bound.
Here’s a histogram that counts height-h numbers less than x = 1,000,000:
- There are at least \log_2 x height-0 numbers less than x.
- With some work, we can find that there are at least \frac{1}{2} (\log_2 (3x+1)) - 1 height-1 numbers less than x.
- For height-2 numbers, there are at least \cfrac{\frac{1}{2}(\log_2 x + 6.16-8)^2}{12} plus \cfrac{\frac{1}{2}(\log_2 x + 1.16-8)^2}{12} of them.
If we sum those up, we’re still in the \log x-type range, not yet in the \sqrt{x} range. To get there, we need to sum over infinite heights. (So far, we’ve definitely missed start number 27, for example, which is a height-41 number!)
The trouble comes when you try to push this to height-3, height-4, etc., because the Collatz tree is so pattern-less. So you wind up having to “pretend” that the tree keeps following a certain regular pattern in such a way that guarantees an undercount of height-h numbers.
In the end, you might get something like this (and this is almost surely not really-real):
\sum_{h=0}^\infty \cfrac{(\frac{\log_2 x}{6} - h)^h}{h!} > x^{0.13}
which says, considering numbers between 1 and x (for any x), the Collatz tree contains at least the square root of the square root of x of them.
In pretending that the tree has a regular pattern, we wind up severely undercounting the actual counts of height-h numbers in the tree:
(Reference: Math Kook book, Chapter 11)

