Discovery via classifiers and/or AI?

mathkook · November 28, 2025, 5:00pm

Good suggestion, especially when it’s time to look at extrapolation to larger instances than are in the training. I’m ignoring that for now – for example, taking the base-24 TST predictor trained on 100,000 long start numbers 1 to 10^{12} and applying it to the original test set of small start numbers between 80,001 and 100,000 gives an accuracy of 0\%.

For TST, I upped the training to one million start numbers between 1 and 10^{12}. For comparison, the previous result was:

100k training start numbers and their TSTs, written in base-24.
Result: 41\% accuracy on train data, 10\% on test data
Example instances:
– 2 8 1 15 13 9 0 15 19 | 13 13 | 7 16 | Wrong
– 2 12 16 2 0 8 7 5 9 | 5 7 | 6 10 | Wrong
– 5 15 16 22 3 6 12 6 2 | 7 22 | 7 22 | Right

With 10x training data, using the same test set, we get:

Epoch	Loss	Train Acc	Test Acc
10	1.088	8.91%	9.64%
50	1.068	10.38%	10.81%
100	1.044	11.00%	11.11%
150	1.041	11.51%	11.84%
200	1.044	11.82%	11.58%
250	1.027	12.09%	11.91%
300	1.025	12.30%	12.30%

This time it doesn’t overfit, and the test accuracy steadily improves (and is still improving). The question is what is it learning? Some kind of interpolation?

I’ll ponder on this some more.