How many numbers between 1 and x appear in the Collatz tree?

mathkook · August 2, 2025, 9:19pm

(No real question here, just background for possible other posts in this area.)

Looking at the Collatz tree, people ask, “How many numbers are in the tree?” Probably all of the numbers … but maybe not. Well, at least some of them are in there. How many for sure? All the powers of 2 are in there, so … infinitely many?

That’s kind of unsatisfying, so people instead ask, “How many numbers from 1 to x are definitely in the tree?” Then they look for a formula f(x) for arbitrary x.

Again, probably all of them (f(x)=x), but nobody can show that yet. At least 99% of them, f(x) \geq 0.99x? At least 1% of them, f(x) \geq 0.01x? Turns out, for large x, not even one-percent of numbers up to x are known to be in the Collatz tree.

If you dig through the literature, you pretty quickly come across the limit of current knowledge, something like “at least x^{0.84} of them”. To a math kook, that looks like 84%, but it’s way less, so I’m writing instead the similar but more evocative “at least \sqrt{x} of them” (x^{0.5}). That gives the idea that as x gets bigger, we can only prove that a vanishingly small percentage of numbers from 1 to x are (for sure) there in the tree. For example, among numbers less than 10^{1000}, we know that 10^{500} are in the tree, which is 0.00000...00001% of them.

The proofs are … you know … so I wanted to get a rough intuition for why.

People wind up ascending the tree, level by level, asking at each level, “How many numbers between 1 and x do we see at that level?”

All the powers of 2 are “height-0 numbers”, because they get to 1 without passing through any odds. For sure, there are at least \log_2 x height-0 numbers less than x.
Above them are the height-1 numbers, which involve one odd in their trajectory, such as 5 \rightarrow 16 \rightsquigarrow 1, or 21 \rightarrow 64 \rightsquigarrow 1.
Above those are the height-2 numbers, like 13 \rightsquigarrow 5 \rightsquigarrow 1.

The plan is to consider all numbers less than x and sum up how many (for sure) are height-0, height-1, height-2, and so on, without any pre-set bound.

Here’s a histogram that counts height-h numbers less than x = 1,000,000:

There are at least \log_2 x height-0 numbers less than x.
With some work, we can find that there are at least \frac{1}{2} (\log_2 (3x+1)) - 1 height-1 numbers less than x.
For height-2 numbers, there are at least \cfrac{\frac{1}{2}(\log_2 x + 6.16-8)^2}{12} plus \cfrac{\frac{1}{2}(\log_2 x + 1.16-8)^2}{12} of them.

If we sum those up, we’re still in the \log x-type range, not yet in the \sqrt{x} range. To get there, we need to sum over infinite heights. (So far, we’ve definitely missed start number 27, for example, which is a height-41 number!)

The trouble comes when you try to push this to height-3, height-4, etc., because the Collatz tree is so pattern-less. So you wind up having to “pretend” that the tree keeps following a certain regular pattern in such a way that guarantees an undercount of height-h numbers.

In the end, you might get something like this (and this is almost surely not really-real):

\sum_{h=0}^\infty \cfrac{(\frac{\log_2 x}{6} - h)^h}{h!} > x^{0.13}

which says, considering numbers between 1 and x (for any x), the Collatz tree contains at least the square root of the square root of x of them.

In pretending that the tree has a regular pattern, we wind up severely undercounting the actual counts of height-h numbers in the tree:

(Reference: Math Kook book, Chapter 11)

stargazer07817 · August 8, 2025, 4:20pm

Wirsching does a good job walking through these ideas. Overall, To get a polynomial bound, you have to sum over many heights (up to c\log x) and control how residues mod 3^k distribute as you climb the inverse tree. Or more generally, for each fixed height h, the number of n\le x at that height is only about (\log x)^h in size. Said another way, height 0 contributes \asymp \log x, height 1 contributes \asymp (\log x)^2, etc. You have to go to unbounded heights to get a polynomial in x. I like this problem because it wraps several “classic” approaches to understanding the conjecture all into one package that has a concrete outcome to understand.

mathkook · August 8, 2025, 4:46pm

@stargazer07817 Many thanks! I don’t understand why I can’t read proofs yet. I can read theorems, and I can skim a proof for the main idea, but beyond that I have to work it out. Wirsching’s book mocks me from my shelf!

Anyway, looks like the right idea.

A question: did you say that you can get an O(n^k) bound by summing

over heights from 1 to c \log x, or
over unbounded heights?

It seems like maybe it’s easier mathematically to do unbounded height, although to get a result as weak as O(n^k), it seems like some finite number of heights should be enough.

PS. By O(n^k) bound, I mean something like “at least c n^{0.5} start numbers from 1 to n are in the Collatz tree.”

stargazer07817 · August 8, 2025, 6:30pm

To get any bound of the form “at least c\,x^{k} with k>0,” you have to let heights grow with x (essentially up to a constant multiple of \log x).

If you stop at any fixed number of heights, you can’t give x^{k} for any k>0. Summing all the way to infinity is really just a convenience, since the sum is automatically zero beyond O(\log x) anyway, so saying “unbounded heights” and up to c\log x are the same.