5-type, or 16-type?

The post by @Fibra about unmarked cells reminds me of another question …

If the Collatz conjecture is true, then all start numbers go to 1. In that case, every start number m falls into one of three categories:

  • ones that go m ... 5, 8, 4, 2, 1 (“5-type”).
  • ones that go m ... 16, 8, 4, 2, 1 (“16-type”).
  • the finite set of numbers {1, 2, 4, 8} that hit neither 5 nor 16.

If m is a power of 2, it’s definitely a “16-type” number (at least for m>8). What if m is a power of three? What if m is prime? Do other numerical features correlate with these sets?

How many start numbers between 1 and 1000 are “5-type” versus “16-type”?
How about between 1 and 10^9?
How about between 1 and (arbitrarily given) n? Are there bounds?

Do the “5-type” and “16-type” sets both have positive density, or are almost all “5-type”?

1 Like

My paper “Binary expression of ancestors in the Collatz graph” gives a systematic way (using regular expressions) to describe the set of Collatz-ancestors of x such that there are k odd terms on the way to x (not including x).

It’s a mess :slight_smile:

My library coreli used to compute them for you (using an old version, e.g. commit hash 8a12ea40)

Examples

5-type

The ancestors of 5 that see 3 odd terms (other than 5) on the way are described by the following regex:

  (001011110110100001)*001011110110100(000111)*00(01)*01(0)*
| (001011110110100001)*00101111011010000(100011)*100(01)*01(0)*
| (001011110110100001)*001011110(110001)*1100(01)*01(0)*
| (001011110110100001)*00101(111000)*11100(01)*01(0)*
| (001011110110100001)*001(011100)*011100(01)*01(0)*
| (001011110110100001)*00101111011(001110)*0(01)*01(0)*
| (001011110110100001)*001011110110100(000111)*0001(10)*1(0)*
| (001011110110100001)*00101111011010000(100011)*10001(10)*1(0)*
| (001011110110100001)*001011110(110001)*110001(10)*1(0)*
| (001011110110100001)*00101(111000)*1(10)*1(0)*
| (001011110110100001)*001(011100)*01(10)*1(0)*
| (001011110110100001)*00101111011(001110)*001(10)*1(0)*

For instance, taking from the top branch we can construct 001011110110100 00 0101 01 corresponding to integer 1553429 of which Collatz sequence until reaching 5 is (odd integers marked with *):

* 1553429  
  2330144
  1165072
  582536
  291268
  145634
* 72817 
  109226
* 54613
  81920
  40960
  20480
  10240
  5120
  2560
  1280
  640
  320
  160
  80
  40
  20
  10
* 5

16-type

The regex for 16 and 3 odd integers is:

  (100101111011010000)*1001011110110100(000111)*00(01)*01(0)*
| (100101111011010000)*100101111011010000(100011)*100(01)*01(0)*
| (100101111011010000)*1001011110(110001)*1100(01)*01(0)*
| (100101111011010000)*100101(111000)*11100(01)*01(0)*
| (100101111011010000)*1001(011100)*011100(01)*01(0)*
| (100101111011010000)*100101111011(001110)*0(01)*01(0)*
| (100101111011010000)*1001011110110100(000111)*0001(10)*1(0)*
| (100101111011010000)*100101111011010000(100011)*10001(10)*1(0)*
| (100101111011010000)*1001011110(110001)*110001(10)*1(0)*
| (100101111011010000)*100101(111000)*1(10)*1(0)*
| (100101111011010000)*1001(011100)*01(10)*1(0)*
| (100101111011010000)*100101111011(001110)*001(10)*1(0)*

For instance, taking an integer from the second branch you can construct:

100101111011010000 100101111011010000 100011 100011 100 01 01 01 0

Corresponding to integer 170803185867525674 of which Collatz sequence until seeing 16 is (odd integers marked *):

  170803185867525674
* 85401592933762837
  128102389400644256
  64051194700322128
  32025597350161064
  16012798675080532
  8006399337540266
* 4003199668770133
  6004799503155200
  3002399751577600
  1501199875788800
  750599937894400
  375299968947200
  187649984473600
  93824992236800
  46912496118400
  23456248059200
  11728124029600
  5864062014800
  2932031007400
  1466015503700
  733007751850
* 366503875925
  549755813888
  274877906944
  137438953472
  68719476736
  34359738368
  17179869184
  8589934592
  4294967296
  2147483648
  1073741824
  536870912
  268435456
  134217728
  67108864
  33554432
  16777216
  8388608
  4194304
  2097152
  1048576
  524288
  262144
  131072
  65536
  32768
  16384
  8192
  4096
  2048
  1024
  512
  256
  128
  64
  32
  16

A few points:

  • The number of branches of the regex is the same for all x (not multiple of 3) when you fix k, namely 2^{(k-1)} 3^{(k-1)(k-2)/2}, here with k=2 we get 2^2 \times 3 = 12 branches (See Theorem 1 there is an off-by-one error in the statement..), which is the number of branches we got in practice for 5 and 16 (separated by | above).

  • The size (i.e. number of characters) of the regex is also roughly the same at fixed k (when x is not a multiple of 3)


Coming back to your question:

Do the “5-type” and “16-type” sets both have positive density, or are almost all “5-type”?

The structure of the predecessor set of 5 and 16 is the same: infinite union of the regular expressions for each value of k, and each of these regex have the same structure. Hence, without being an expert about density arguments but given this 1-1 correspondence and knowing the structure of these regexes, I’m pretty sure it implies the densities are both positive.

2 Likes

This is my first post. It is an area that I have looked at. Starting values that lead to 5 and to 16 will have the same densities, off set by a small starting adjustment.
1 2 4 8 16 32 64 128 256
85 170 340
21 42 84
5 10 20 40 80 160
53
13 26 52
3 6 12 24
The pattern of connecting branches has a structure dependent on a branch type that is defined by the root mod 3 value. Branch 5 is type 2 as 5 mod 3 is 2. It has a first following even node 10 that connect a type 3 branch with 3 mod 3 as 0. The extension of branch 1 past 16 has the next node at 64 which connects a type 3 branch of value 21. The connection pattern for branch types in the subtrees that connect after node 10 on branch 5 is identical to those that connect after node 64 on branch 1. The subtree with a root 20 connects to 10 by the the division rule. An identical structured subtree of the same size and density and distribution of branches of the same type has a root at 128 and connects to node 64 on branch 1. Both subtrees are infinite in size but for a specified depth measured from the respective subtree roots of 20 and 128, both will have the same number of total terms in all connected branches . This is due to the fact that a set of branches that connect to consecutive nodes on a type 1 or type 2 branch (I call them colinked branch sets), cycle branch types in type 1, type 2, type 3. order. There are six permutations of two branch types that connect colinked branch set that begin with one of the three different types. The whole set repeats modulo 18.

@jeff317 and @cosmo , great! Every time I start crawling up the Collatz tree, I think I see a repetition, then it looks like chaos again! These posts clear a bunch of the chaos away.

Here are some empirical numbers to add:

Among two-digit start numbers, 91.1% are “5-type”, and 8.9% are “16-type”.

Among three-digit numbers, 93.8% are “5-type”.

Among four-digit numbers, 94.1% are “5-type”.

Among a sampling of numbers less than 2^{30}, 93.7% are “5-type”.

Density says, consider the start numbers ranging from 1 to n, for various n … then as n increases, does the percentage of “5-type” start numbers (versus “16-type”), get closer and closer to some fixed constant?

It might converge on 50% (as easily imagined), or on 94.129283%, or on 100% (as composites do, versus primes, even though there’s an infinity of primes), or on 0% (as powers of 2 do) …

If we stick to the relative percentage of “5-type” versus “16-type”, then we can gloss over the possibility of other types of numbers, such as ones that get stuck in huge as-yet-undiscovered cycles, or even the possibility that a vanishing percentage of numbers ever reach 1 (the best we know is that something like O(n^{0.84}) of them do, which approaches 0% as n increases).

Might I clarify the method for defining the meaning of percentage when applied to numbers in the Collatz tree. All branches of the tree are infinite in extent. One third are type 3 branches with root mod 3 equal to 0. These are bare branches with no connection nodes. All connection nodes are on type 1 and 2 branches as defined by the root mod 3 value.
Orbits in the tree zigzag in an explicit way, there is a pattern. For any node there may be larger and smaller numbers connecting as child branches. The number of child values in a subtree grows exponentially from the root of the subtree. We are comparing two exponential growths with x axis offsets and the one with a lower root value will have a higher percentage than the delayed start graph. It will stabilize to a set percentage for ranges defined on a linear interval of the natural numbers. BUT! If the subtree root we are comparing are nodes on similar branch types with a similar branch type connection, the subtree topology will be identical. In this case the number of terms that exist in the subtree to a nominated depth of levels will stabilize at 50% for each subtree. The subtree with the larger root will have terms that have a higher average value. Defining the percentage by the number of term up to a defined value, the percentage will be higher for the subtree with a lower root value on average. The value of the percentage will grow larger as the difference in the root values of the two subtrees increases. I don;t know if this helps.

1 Like

Right on.

I’m also gonna guess that around 95% of all start numbers from 1 to n are “5-type”, no matter how big n gets.

I do believe in a one-to-one correspondence among respective members of the 5-tree and the 16-tree, though it could be like the one-to-one correspondence between primes and composites (match the first prime with the first composite, etc), where the percentage of primes between 1 and n still approaches 0%, as n increases … similar to the one-to-one correspondence between natural numbers and perfect squares.

1 Like

According to @cosmo , the correspondence is straightforward when the number of odd terms is 3. Take an ancestor of 5 with this property, add a power of 2 whose exponent is the number of digits (in base 2) plus 2 and you will obtain an ancestor of 16. Both numbers have the same parity vector up to 5 and 32, respectively. This comes from the fact that 5+3^3=32. For example, the integer 11 reaches 5 and its orbit encounters two other odd values (17 and 13 in that order). Add 11 and 2^6 to get 75 and you have its corresponding integer in branch 16. More generally, if k is the number of digits and q the number of odd terms on k+2 iterations, then T^{k+2}(n+2^{k+2}) = T^{k+2}(n) + 3^q which is easy to establish. The hard part is the regex found by @cosmo which gives the exact base 2 representations of the integers.

This explains why branches 5 and 32 in the Collatz inverse tree are perfectly isomorphic up to 3 odd terms (see Fig. 1 in this great paper The 3x + 1 Problem and its Generalizations from Lagarias ). I didn’t know why up to now :slight_smile:

However, this marvelous symmetry is broken when the number of odd terms increases (e.g., take 7 which is an immediate predecessor of 11 in branch 5 and has no corresponding integer with same parity vector in branch 16). So I suppose the correspondence becomes more complicated with four odd terms or more. At least, the parity vector is no longer preserved.

Here, it might be interesting to study whether the branches 5 and 5+3^4=86 are isomorphic up to 4 odd terms… But 86 is not in branch 16 as shown by the Collatz sequence 86, 43, 65, 98, 49, 74, 37, 56, 28, 14, 7, 11, 17, 26, 13, 20, 10, 5.

For a better description of the structure of the Collatz inverse tree, there are some usefull references:

Right, if the Collatz tree were regular, it wouldn’t look like this!

My original question about the relative counts of 5-type versus 16-type numbers seems hard. I like how @jeff317 calls the 16-branch a “delayed start graph”.

It might be within reach to answer the original question, though. For example, here’s how we know that between 1 and x, at least \sqrt{x} of those numbers appear somewhere in the Collatz tree rooted at 1.

Maybe similar reasoning could be done to obtain: (1) a count f_5(x) of numbers less than x that definitely appear somewhere in the 5-branch, (2) a similar count f_{16}(x), and (3) a convergent \frac{f_5(x)}{f_{16}(x)} as x goes to infinity?

The Collatz tree is basically just a binary tree padded out with even values that are not connection nodes, and with type 3 branches that have no nodes (they have a factor of 3 in the root of the branch) . One third of even numbers in the Collatz tree are 6k+4 format and are are connection nodes.

Can I use the binary tree as a simplified model of the Collatz system, to explain my view of density? Branch 1 is a power series of powers of 2, the same as for the Collatz system. Column 2 has the terms 2 and 3 and the two subtrees at 2 and 3 are identical sizes. In each column of the expanding binary tree subtree 2 has the first half of the range of values between two consecutive powers of 2 and subtree 3 has the second half of the same range. As this is true for all following columns, the percentage of values in subtree 2 and subtree 3 is 50% for any depth down the tree.

Subtree 2 contains subtrees 4 and 5. Each has one fourth of the column. The percentage of terms when comparing subtree 3 with subtree 4 will be 66% as subtree 3 has half a column and subtree 4 has one fourth of a column in all following columns.

The Collatz tree connects a new branch at every second term on branches with root mod 3 equal to 1 or 2. Type 1 and 2 branches have nodes out of phase with each other. On average, at larger depths each subtree will contain new branches added at a rate of 2/3 of a branch for each increment of one term along the branch multiplied by the number of current branches in the subtree. On average, there will be equal numbers of all three branch types, as branches that are added to a single branch will cycle between the three types. Once a percentage is established, the value will not drift for greater range but will stabilize on one value that depends on the depth of the root of the subtrees being compared. The subtrees both grow exponentially at the same rate. I hope this helps.

1 Like

@oros the correspondence is straightforward for an arbitrary number of odd terms:

See proof of Theorem 1:

  • The number of branches is fixed by k (with, here, k+1 the number of odd terms), and there are \pi_k = |(\mathbb{Z}/3^k\mathbb{Z})^*| = 2 \times 3^{k-1} of them.
  • Branch i corresponds to the ability of a suffix to reach 2_k^{-i} which is the i th power of the multiplicative inverse of 2 in \mathbb{Z}/3^k\mathbb{Z}
  • Hence, for instance if you take 4 odd terms, or any odd term, the correspondence still exists: for each integer y_0 able to reach x_0 (not multiple of three) you can construct its “symmetric” y_1 for x_1 (not multiple of three) by choosing from the “same” branch where “same” is defined by choosing the same power of 2^{-1}_k (“the symmetric” is uniquely defined once you impose to take the same number of repetitions for groups marked with a *)
  • @oros : defined that way, the parity vector p_0 going from y_0 to x_0 and the parity vector p_1 going from y_1 to x_1 will only differ in its trailing number of even terms, i.e. there exists m < \pi_{k+1} such that p_0 = p_1 (0)^m or p_1 = p_0 (0)^m, and m, technically speaking is the difference |i_0 - i_1| when writing x_0 = 2^{i_0}_{k+1} and x_1 = 2^{i_1}_{k+1} with minimal powers.

Intuitively, I believe that what you said for k=3 is related to the fact that 5 and 16 are next to each other when iterating powers of 2^{-1}_3 = 14:

[1, 14, 7, 17, 22, 11, 19, 23, 25, 26, 13, 20, 10, 5, 16, 8, 4, 2]

This figure also helps:

(note that this paper uses convention ↓ = x/2 and ← = (3x+1)/2, which is the opposite of what I’ve later used in my thesis… sorry..)

@mathkook : I never said the Collatz graph was regular. I (my article) said that the Collatz graph, or more precisely, the predecessor set of any integer, is an infinite union of regular languages (each language corresponding to a successive “odd-distance” (i.e. number of odd terms between two numbers in a Collatz sequence)).

@cosmo : so it is easy for you compute the corresponding integer of 7 in branch 16, right? I would like to see it in base 2…

@oros there is a confusion here: I said that there was a symmetry between both branches number i in the predecessor regexes (at same k) of two different numbers (non multiple of 3).

For instance, talking about the symmetry between branch number 4 of \texttt{reg}_k(5) and branch number 4 of \texttt{reg}_k(16): parity vectors will be the same, up to the endings 0s, as claimed above.

I was not comparing two branches of the same regex which is what I think you are asking for?

1 Like

@oros or maybe I should read “branch 16” in your previous message as “the regex of 16”?

Because we have 7 \in \texttt{reg}_4(5) and so you are asking for the corresponding y \in \texttt{reg}_4(16) ?

Let me compute it :slight_smile:

Yes and I expect the least significant digits of this number in base 2 to read : ...1110^* (according to your previous answer)

1 Like

@oros here it is:

  • 7 goes to 5 with parity vector [1,1,1,0,1,0,0]
  • 2^{-1}_4 = 41 and 5 = 2_4^{-31} and 16 = 2_4^{-50}
  • 50 - 31 = 19

Hence, the parity vector we care about is: [1,1,1,0,1,0,0] + [0]*19, which brings 13256071 to 16.

And on 26 bits (total length of the parity vector) the binary representation of 13256071 is 00110010100100010110000111.

Is this what you asked?

The corresponding Collatz sequence is (odd integers marked *):

[13256071, *
 19884107, *
 29826161, *
 44739242,
 22369621, *
 33554432,
 16777216,
 8388608,
 4194304,
 2097152,
 1048576,
 524288,
 262144,
 131072,
 65536,
 32768,
 16384,
 8192,
 4096,
 2048,
 1024,
 512,
 256,
 128,
 64,
 32,
 16]

@oros the key idea is simple: the output of a parity vector with k odd terms lands in (\mathbb{Z}/3^k\mathbb{Z})^* and adding 0s at the end of parity vector corresponds to keep multiplying the output by 2_k^{-1}, which is a generator of the group, hence, at some point you will encounter whichever element you want.

And of course, you expect me saying that, but it is very easy to see with the tiles, see Chapter 1.6 exactly on that topic.

Ok, thank you!
It is surprising that the corresponding integer for 7 is as large as 13256071.

To summarize, the respective parity vectors of the corresponding integers are not that different, we just need to append a number of trailing zeros on one branch. But this is sufficient to break the isomorphism between branch 5 and branch 32.

Here’s a way to rephrase the question at the top of this thread.

The computer picks a random start number n between 1 and 10^{10,000}.

  • If n hits 1 by passing through 16, I pay you $5.
  • If n hits 1 by passing through 5, you pay me $1.

Would you take this bet?

My guess is that you would win $5.