Just to be clear on the very hardest final test, with ultra hard statistical reasoning plus formal reasoning it maxxed out at between 95-98%, didn't get above 99%, but if you look at the difficulty of that you'll see there's something a wee bit special about this model. The real test will come with non-synthetic data as the model has a particular knack for picking out signatures of algos used to obscure data, so true randomness and noise I'm interested to see how it does with that too.
Just to be clear on the very hardest final test, with ultra hard statistical reasoning plus formal reasoning it maxxed out at between 95-98%, didn't get above 99%, but if you look at the difficulty of that you'll see there's something a wee bit special about this model. The real test will come with non-synthetic data as the model has a particular knack for picking out signatures of algos used to obscure data, so true randomness and noise I'm interested to see how it does with that too.