This post is more on the computer side, less on the math side.
Here are start numbers (in binary) labeled as {\color{red}0} if their trajectory ends in 5-8-4-2-1, and as {\color{red}1} if it ends in 16-8-4-2-1.
1001010100101011111 {\color{red}0} (start number 305503 in decimal)
11010110011010001001 {\color{red}0}
111100100110011011 {\color{red}0}
1000011101111111 {\color{red}0}
100100011101100001 {\color{red}0}
110111110000101 {\color{red}1}
1111101101100101 {\color{red}1}
1000110101111111 {\color{red}1}
1010010101100101111 {\color{red}0}
11101001101101111 {\color{red}1}
…
And here are parity sequences (not start numbers!) labeled as {\color{red}1} if they correspond to integer cycle shapes for some 3n+d problem (with \lvert d \rvert < 2^{\text length}-3^{\text weight}), and {\color{red}0} otherwise. This is the concept of near-cycle.
00010001111 {\color{red}1} (3n+95 cycle)
111111011101101001011110000 {\color{red}1} (3n+5 cycle)
11101010 {\color{red}0} (3n+13 cycle, yes, but 13 \not\lt 2^8-3^5)
1111111100000 {\color{red}0}
1111111010000 {\color{red}0}
0010100111111 {\color{red}1} (3n+233 cycle)
0011001101111 {\color{red}1} (3n+233 cycle)
0001011111101 {\color{red}0}
0001111101101 {\color{red}0}
…
In each case, we have a potentially-infinite supply of train/test examples for a classifier or AI system to locate predictive relationships between input sequence and output class.
Maybe the AI system doesn’t learn the whole concept, but if it does better than chance (on test data), it may have found a piece of the puzzle. Just for example, it might trivially be learned that any power of 2 (any input of the form 10^*) will go through 16 rather than 5 … or less trivially, that no high cycle parity vector will be a near-cycle.
Interested to know if anyone’s tried this type of classification and/or machine-assisted discovery, or if there are other data sets that might be generated and tried.