Modified: December 28, 2025
continual learning objectives
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.in addition to architectures, infra, etc, the bigger question for a continual learning system is: what are we even doing here?
what is the objective for 'test-time training'? for a learning system in general? no rewards are observed at test time. so we're not doing explicit RL. and we're also not doing pure prediction in the naive sense
do we just think of it as an extension of context window? ie, are we just meta-learning? we have a system whose behavior is parameterized and we just train it on a barrage of different long-context tasks. eg: coding in large codebases, long-running agentic tasks (proving hard math, etc)
basically, we'll have metrics and hillclimb on them. then we'll saturate the metrics and we'll need new metrics
ultimately of course the objective is survival / reproduction. do people like the system and want to continue using it? but for design purposes we need proxies for that.
the difficulty with meta-learning is that it gets more expensive as the context gets longer.
so you have to hope that we generalize to longer contexts
does the bitter lesson runs into trouble here? we want methods that scale with increasing compute and data. but once we're trying to train things to act on months- or years-long timelines, we actually can't easily scale the amount of data, because it takes months or years to do each rollout.
coding and math still seem like the answers here: we can say, "write an operating system", "prove this complicated theorem", etc - tasks that represent years of work - and potentially simulate them in a much shorter amount of time. and maybe the learning capabilities we get here generalize, or at least begin to generalize.
this leaves open the question of how we train learning agents to do these sorts of ambitious tasks.