Discover more from The Agentic Edge 🔀
𝐒𝐢𝐦𝐩𝐬𝐨𝐧’𝐬 𝐩𝐚𝐫𝐚𝐝𝐨𝐱 is not a school of media criticism about why the earlier seasons of the Simpsons are so superior to the later ones. 𝐈𝐭’𝐬 𝐭𝐡𝐞 𝐧𝐚𝐦𝐞 𝐨𝐟 𝐚 𝐩𝐡𝐞𝐧𝐨𝐦𝐞𝐧𝐨𝐧 𝐰𝐡𝐞𝐫𝐞 𝐭𝐡𝐞 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐭𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬 𝐨𝐟 𝐭𝐡𝐞 𝐰𝐡𝐨𝐥𝐞 𝐝𝐨𝐧’𝐭 𝐥𝐨𝐨𝐤 𝐥𝐢𝐤𝐞 𝐭𝐡𝐞 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐭𝐞𝐧𝐝𝐞𝐧𝐜𝐢𝐞𝐬 𝐨𝐟 𝐭𝐡𝐞 𝐩𝐚𝐫𝐭𝐬. And sometimes it’s hard to spot.
I ran into it today when trying to gauge the performance of different message groups across two larger categories of messages. For the sake of anonymity, let’s say we launched messaging in two towns at the same time: Springfield and Shelbyville. The business (our customer) was interested in message-group-wise performance as well as the Springfield vs. Shelbyville comparison.
In the aggregate, Shelbyville has an advantage - you can see that in the bar chart on the left. But the story is different when you look at the performance of each individual message group (the figure on the right). For most messages, performance was better in Springfield than in Shelbyville. Why?
In discussions of Simpson’s paradox, reference is made to the “lurking variable,” which is just a menacing way to refer to a confounding variable that tells a different story than the aggregate. Here, the “lurking variable” is which message group you’re talking about.
Take a look at the scatter plot on the right. Each point represents an individual message group; the horizontal axis represents the message group’s performance in Springfield, and the vertical axis represents the message group’s performance in Shelbyville. The diagonal represents what it would look like for a message group to have the exact same performance in both towns.
The first insight that pops out from this plot is that most of the points are to the lower-right of the diagonal. In other words, 𝐦𝐨𝐬𝐭 𝐦𝐞𝐬𝐬𝐚𝐠𝐞 𝐠𝐫𝐨𝐮𝐩𝐬 𝐡𝐚𝐯𝐞 𝐚 𝐡𝐢𝐠𝐡𝐞𝐫 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐢𝐧 𝐒𝐩𝐫𝐢𝐧𝐠𝐟𝐢𝐞𝐥𝐝 𝐭𝐡𝐚𝐧 𝐭𝐡𝐞𝐲 𝐝𝐨 𝐢𝐧 𝐒𝐡𝐞𝐥𝐛𝐲𝐯𝐢𝐥𝐥𝐞.
But there’s another thing represented in the scatter plot - the relative volume messages that went out for each message group, which is encoded in color. A blue point represents a message group with low volume; a red point represents high volume.
And you can see pretty immediately there’s one bright red spot amid a sea of purple-ish blue ones. 𝐓𝐡𝐚𝐭’𝐬 𝐭𝐡𝐞 𝐡𝐢𝐠𝐡𝐞𝐬𝐭-𝐯𝐨𝐥𝐮𝐦𝐞 𝐦𝐞𝐬𝐬𝐚𝐠𝐞 𝐠𝐫𝐨𝐮𝐩, 𝐚𝐧𝐝 𝐢𝐭 𝐣𝐮𝐬𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐭𝐨 𝐛𝐞 𝐨𝐧 𝐭𝐡𝐞 𝐒𝐡𝐞𝐥𝐛𝐲𝐯𝐢𝐥𝐥𝐞 > 𝐒𝐩𝐫𝐢𝐧𝐠𝐟𝐢𝐞𝐥𝐝 𝐬𝐢𝐝𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐠𝐫𝐚𝐩𝐡.
In other words, we’ve got one group that performs better in Shelbyville than it does in Springfield, and just because that group is way higher-volume than all the others, it shifts the whole average. If you just look in aggregate, you might assume there’s something special about Shelbyville that makes it receptive to your offering. But when you break it down by message group, that interpretation starts looking a little less compelling - maybe it’s really just something unique about that one group.
𝐈𝐬 𝐢𝐭 𝐫𝐞𝐚𝐥𝐥𝐲 𝐬𝐚𝐟𝐞 𝐭𝐨 𝐬𝐚𝐲 𝐭𝐡𝐚𝐭 𝐭𝐡𝐞𝐲 𝐥𝐨𝐯𝐞 𝐲𝐨𝐮 𝐦𝐨𝐫𝐞 𝐢𝐧 𝐒𝐡𝐞𝐥𝐛𝐲𝐯𝐢𝐥𝐥𝐞 𝐭𝐡𝐚𝐧 𝐢𝐧 𝐒𝐩𝐫𝐢𝐧𝐠𝐟𝐢𝐞𝐥𝐝, 𝐣𝐮𝐬𝐭 𝐛𝐞𝐜𝐚𝐮𝐬𝐞 𝐚 𝐬𝐢𝐧𝐠𝐥𝐞 𝐡𝐢𝐠𝐡-𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐦𝐞𝐬𝐬𝐚𝐠𝐞 𝐠𝐫𝐨𝐮𝐩 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐭𝐨 𝐚𝐩𝐩𝐞𝐚𝐥 𝐭𝐨 𝐒𝐡𝐞𝐥𝐛𝐲𝐯𝐢𝐥𝐥𝐢𝐚𝐧𝐬 𝐦𝐨𝐫𝐞?
Why does this matter? Well, we think about this kind of thing at Aampe a lot. If you judge performance based on broad overall measures and miss a “lurking variable” that changes the story, you can end up making suboptimal decisions.
At the same time, it’s hard to figure out which variables matter. If you tried to be really comprehensive about figuring out what influences performance, the complexity could spiral out of control.
That’s exactly why you need an agentic platform like Aampe. Aampe’s core offering is a way to look at each of your unique users, figure out what they prefer, and design a user experience just for them - instead of what it looks like most users prefer (which could be misleading in aggregate). 𝐖𝐞 𝐝𝐨𝐧’𝐭 𝐦𝐚𝐤𝐞 𝐝𝐞𝐜𝐢𝐬𝐢𝐨𝐧𝐬 𝐛𝐚𝐬𝐞𝐝 𝐨𝐧 𝐭𝐡𝐞 𝐟𝐨𝐫𝐞𝐬𝐭 - 𝐰𝐞 𝐡𝐚𝐯𝐞 𝐚𝐧 𝐚𝐠𝐞𝐧𝐭 𝐟𝐨𝐫 𝐞𝐚𝐜𝐡 𝐭𝐫𝐞𝐞.
Subscribe to The Agentic Edge 🔀
The Agentic Edge is an initiative to democratize knowledge about agentic infrastructure and workflows. Powered by Aampe.