How do you make a computer program show curiosity? You randomly draw from a beta distribution. Yes I'm serious. A core problem in agentic learning is the navigation of the explore/exploit tradeoff. Agents have many options to choose from when trying to engage users, and they always need to balance the goal of taking advantage of lessons already learned with the goal of trying options that haven't yet been tried. If an agent only ever explores, it never optimizes. If an agent only ever exploits, it prematurely optimizes and ends up a local maximum - something that's better than some options, but not nearly as good as it could be. Aampe agents store two different weights for every possible action in each of their action spaces - a probability of influence (roughly analogous to an expected success rate), but also a measure of signal strength, which encodes how much evidence is backing the probability. The agent needs both of those so it can hedge it's bets when it has to operate on low evidence, and can double down when it's able to operate on a high evidence. Also, this video references another video on Interrupted Time Series analysis. You can find that here:
Share from 0:00
0:00
/
0:00
Transcript
Share this post