Beta distribution draws and signal strength

Playback speed

Share post at current time

0:00

Transcript

Beta distribution draws and signal strength

A probability is not a point estimate

Schaun Wheeler

Oct 04, 2024

Transcript

How do you make a computer program show curiosity? You randomly draw from a beta distribution. Yes I'm serious. A core problem in agentic learning is the navigation of the explore/exploit tradeoff. Agents have many options to choose from when trying to engage users, and they always need to balance the goal of taking advantage of lessons already learned with the goal of trying options that haven't yet been tried. If an agent only ever explores, it never optimizes. If an agent only ever exploits, it prematurely optimizes and ends up a local maximum - something that's better than some options, but not nearly as good as it could be. Aampe agents store two different weights for every possible action in each of their action spaces - a probability of influence (roughly analogous to an expected success rate), but also a measure of signal strength, which encodes how much evidence is backing the probability. The agent needs both of those so it can hedge it's bets when it has to operate on low evidence, and can double down when it's able to operate on a high evidence. Also, this video references another video on Interrupted Time Series analysis. You can find that here:

How agents view behavioral signals

Schaun Wheeler

October 4, 2024

Agents that work on the basis of behavioral data need to be able to make judgements about how successful (or not) their actions are. Unlike an A/B test or a multi-armed bandit, where you use success rates over many users to determine the relative value of different options, an agent needs to be able to try one specific action with one specific user and …

Read full story

Beta distribution draws and signal strength

How agents view behavioral signals

Discussion about this video