Discovery via classifiers and/or AI?

Good suggestion, especially when it’s time to look at extrapolation to larger instances than are in the training. I’m ignoring that for now – for example, taking the base-24 TST predictor trained on 100,000 long start numbers 1 to 10^{12} and applying it to the original test set of small start numbers between 80,001 and 100,000 gives an accuracy of 0\%.

For TST, I upped the training to one million start numbers between 1 and 10^{12}. For comparison, the previous result was:

  • 100k training start numbers and their TSTs, written in base-24.
  • Result: 41\% accuracy on train data, 10\% on test data
  • Example instances:
    – 2 8 1 15 13 9 0 15 19 | 13 13 | 7 16 | Wrong
    – 2 12 16 2 0 8 7 5 9 | 5 7 | 6 10 | Wrong
    – 5 15 16 22 3 6 12 6 2 | 7 22 | 7 22 | Right

With 10x training data, using the same test set, we get:

Epoch Loss Train Acc Test Acc
10 1.088 8.91% 9.64%
50 1.068 10.38% 10.81%
100 1.044 11.00% 11.11%
150 1.041 11.51% 11.84%
200 1.044 11.82% 11.58%
250 1.027 12.09% 11.91%
300 1.025 12.30% 12.30%

This time it doesn’t overfit, and the test accuracy steadily improves (and is still improving). The question is what is it learning? Some kind of interpolation?

I’ll ponder on this some more.

1 Like