Let l be the length of a putative Collatz cycle, and s be the number of odd terms. For a parity vector like 100 (l=3, s=1) there’s a unique rational number m that goes up-down-down and returns to m.
(1) Circuits (1^* 0^*) can’t be Collatz counterexamples (ie, m is never an integer).
(2) High cycles can’t be Collatz counterexamples.
A generalization of (1) and (2) is:
(3) No cycle whose parity vector has two rotations that share a common prefix longer than 0.38\ l … can be a Collatz counterexample. The proof is that the two cycle members corresponding to those rotations w and u have a difference \cfrac{odd(\beta(w) - \beta(u))}{2^l - 3^s}, which is not an integer.
The hardest case for this method are the deBruijn-like sequences, where every pair of rotations has a very short shared prefix. Take deBruijn-like to mean “in a sequence of length l, every subsequence of length \lceil \log_2 l \rceil occurs at most once, including overlaps and wrap-arounds.”
When l = 2^n, we have the true deBruijn sequences. The density (s/l) of a true deBruijn sequence is always 1/2. As @oros pointed out, such a sequence doesn’t have enough 1s to be a Collatz counterexample.
But for deBruijn-like sequences, density can be higher. Here are some density-maximizing strings of various lengths, along with their densities:
l\ \ \ density-maximizing string of length l
2 01 50.00
3 011 66.67
4 0011 50.00
5 00111 60.00
6 000111 50.00
7 0010111 57.14
8 00010111 50.00
9 001101111 66.67
10 0001101111 60.00
11 00110101111 63.64
12 000110101111 58.33
13 0001001101111 53.85
14 00001001101111 50.00
15 000101001101111 53.33
16 0000101001101111 50.00
17 00011101011011111 64.71
18 000101100111011111 61.11
19 0010011101011011111 63.16
20 00010101100111011111 60.00
Generally speaking, given a deBruijn-like sequence of length l, what’s the maximum possible density?
It looks like the densities are all over the place, somewhere between 50% and 70%, highest when l = 2^n + 1. Relevant here is l=65, s=41, a classic case where 2^{65} \approx 3^{41}, where it is easy to find deBruijn-like sequences with high density 41/65.
Here is where experimenting with low l is misleading, though. We really need to focus on the asymptotics.
If all the substrings of length \lceil \log_2 l \rceil are different, then we’re using at least half of them. In which case, binomial-wise, that unordered set of substrings contains a ton of zeroes … and so does the string itself. In the end, the maximum density is capped by
\cfrac{1}{2} +\cfrac{1}{2 \sqrt{\lceil \log_2 l \rceil}}
So
if l > 64, density < 0.688
if l > 2^{24}, density < 0.6
if l > 2^{99}, density < 0.55
if l > 2^{399}, density < 0.525
and so on, converging to 50\%.
Separately, we know that Collatz counterexamples need to have a sufficiently large number of 1s, for sufficiently large l; otherwise, they can’t close the loop.
Are there any experts on the quantification of that “sufficiently large” who can weigh in on the question of how large l needs to be for the density upper-bound (above) to support the claim:
(4) deBruijn-like sequences can’t be Collatz counterexamples.
(or alternatively, naturally, find errors in the reasoning here…)
PS. I feel like the asymptotics around this approach are probably in our favor, \beta(v) and odd(\beta(w)-\beta(u)) probably being lower than we think, and \delta = 2^l - 3^s probably being higher than we think.