Transforming distributions to encode business priorities

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Transforming distributions to encode business priorities

Agents need to navigate both the explore/exploit and the user/business trade-offs.

Schaun Wheeler

Oct 04, 2024

Transcript

Agentic learners aren't tools or systems or programs. They're additional headcount. As with a a human team, one of the most important aspects of managing a team agentic learners is to know how you can give feedback and instruction.

I made a video recently (link the comments) about representing user preferences as two parameters in order to do a random draw from a beta distribution. The probability parameter tells the agent how much an intervention is expected to positively impact user behavior, and the signal parameter tells the agent how confident it should be about that probability.

Draw from a beta distribution that has high probability but low signal, and the result may very possibly be a low number. This is what keeps agents from getting stuck in local maxima.

However, all of that deals with the explore/exploit tradeoff. That's a common tradeoff of agents to make, because an agent needs to know whether to continue to try as-yet unexplored options, or focus on options that have already proven successful (even if still other options might be even more successful).

But in any realistic business context, agents also need to navigate a tradeoff between what a user prefers and what a business needs. While it doesn't do a business any good to push options on a user if the user really hates those options, it can often make sense to give a user their second- or third-choice option if doing so can meet a business objective.

To do that, remember this simple formula:

v ** (log(t) / log(a))

Three parameters:

v: the actual value drawn from the beta distribution.
a: the anchor value of the distribution - I usually use 0.5, because it's central and intuitive.
t: the target value to which to move the anchor.

So if v = 0.5 and a = 0.5 and v = 0.66, then using that formula would transform a draw of 0.5 to 0.66. The value of the formula is that is transforms any draw from the distribution, whether it's 0.5 or 0.98 or 0.00023. It effectively uses the anchor and target values to shift the entire distribution.

So if you're a business and you need your agents to prioritize the selling of a particular product line, you can raise the target value of the distribution for that product line and agents will prioritize interventions about that product, even if the user's probabilities for that product tend to be lower than the probabilities for other products.

By the way, this video mentions a previous video on parameterizing beta distributions. You can find that here:

Beta distribution draws and signal strength

Schaun Wheeler

October 4, 2024

Beta distribution draws and signal strength

How do you make a computer program show curiosity? You randomly draw from a beta distribution. Yes I'm serious. A core problem in agentic learning is the navigation of the explore/exploit tradeoff. Agents have many options to choose from when trying to engage users, and they always need to balance the goal of taking advantage of lessons already learned …

Read full story

Transforming distributions to encode business priorities

Beta distribution draws and signal strength

Discussion about this video