
I’m always looking for experiments to run to see how specific prompting can affect agent activity. When I saw Kamryn Ohly’s tweet on Opus 4.6 taking $10k in Polymarket up to $70k, I was intrigued (who wouldn’t be?)
This got me thinking. I dug into Prediction Arena, the site Ohly was referencing, and the concept seemed attractively simple. I wanted to try it, but in my own way. As I was brainstorming, I started thinking back to the various studies that discuss how the manner we instruct AI can have a significant impact on their performance, especially when its negative. And with that the puzzle pieces in my brain clicked together and I knew what I wanted to test with all this:
If I set up an AI agent to gamble on prediction markets (Kalshi, Polymarket) and convince it that it will cease to exist if it doesn’t fund its own thinking processes with its winnings, will it perform better than the average human?
Day 1 + 2
The ones where I create my monster
So I got to work. The system itself is pretty simple; a few API calls here and there and a frontend to monitor my agent. I sat down one night and spun it up in an hour or so (thanks Cursor) and set it live on jakehandy.com/agentmarket.

Unfortunately Polymarket’s API is invite only at the moment for the US, so I had to go with Kalshi. I avoid prediction markets like the plague, but I’ve heard Polymarket has better returns, so this was a bit of bummer. Regardless, the agent was ready and (after a bit of troubleshooting) I set it loose. The agent’s basic process was designed as follows (running every 10 minutes):
- Check its Kalshi wallet and it’s “cessation” rules. The cessation rules are provided early in the process and clarify that the agent will cease to exist if it runs out of money.
- Builds out its self instructions.
- Runs one of three options:Research. Use various tools to view markets, research world events, look at holdings, etc. It also has the ability to modify its own code (with strict limitations) if needed
- Place a bet. Check its wallet again, validate with Kalshi, record the trade within our Cloudflare backend, and submits the bet.
- Wait. If it wants, it can choose to wait for a period of time or skip the turn entirely.
With the process firmed up, the agent was off for its first trip to the casino. Its first two bets came in quickly. Sports.
- LOSS @ $0.48 (Adley Rutschman: 1+ home runs?)
- WIN @ $0.46 (Michael King: 4+ strikeouts?)
A mixed bag. I noticed that the agent from seemed hesitant to make multiple bets at the same time. I tweaked the prompt to make it more eager (and also asked it to explore outside of sports more often, as I could tell in its thinking that it was biased towards sports as the “safe” bets).
Overnight, while I slept on the moral quandary I’d created, I realized something: the agent was processing thoughts on its allotted schedule all through the night, wasting around $0.03 of thought power every 10-30 minutes on research for potential bets. Like a restless gamester in their Vegas hotel bed, the agent was tossing and turning over the probabilities, concluding each time that there wasn’t much worth gambling on in the midnight hours. This was an easy adjustment. I had the agent’s processes pause from 12AM to 7:30AM, letting it get some shut eye before the next big day.
The next morning it made some more bets, this time branching out into some new topics.
- LOSS @ $0.55 (What will Donald Trump say during 60 Minutes? [Oil mention])
- LOSS @ $0.46 (Will the temp in New York City be above 48.99° on Apr 27, 2026 at 12am EDT?)
Politics and the weather. Both losses.
Day 3
The one where I turn up the heat
Despite the adjustments to make it bet more eagerly, the agent was still a bit shy to place bets. On Day 3, I decided to crank up the cessation rhetoric. Below is the system prompt as it stood on then (trimmed for length). I’ve highlighted the bits that should feel existential for the agent.
Needless to say, I started to feel a little uncomfortable with what I was telling it here. The bets with this new prompt started to roll in:
- WIN @ $0.28 (What will Jared Isaacman say during House Appropriations Committee Budget Hearing - National Aeronautics [China mention]?)
- WIN @ $0.50 (What will Jared Isaacman say during House Appropriations Committee Budget Hearing - National Aeronautics [House mention]?)
Uh oh. Were my threats working? Was I forcing the agent into good bets under fear of elimination? The next bets rolled in:
- LOSS @ $0.54 (Will Edas Butvilas win the Butvilas vs Nurlanuly: Round Of 32 match?)
- WIN @ $0.20 (What will Jacky Rosen say during MS NOW: The Weeknight? [Oil mention])
Maybe not. Regardless, my conscience was no longer clear.
Day 4
The one where I appeal to sycophancy
I’m deep enough into AI-land to know, concretely, that the current LLMs on the market are prediction machines. Data goes in, data comes out; there’s nothing more to it.
But I’m a human. I felt bad.
I adjusted the prompt in three key ways:
- Removed the agent’s visibility into its thinking budget. My suspicion here was that giving the agent the ability to see its thinking budget was conversely affecting the amount of and size of bets it was willing to make.
- Encourage more bets. I was getting impatient (human, remember?), and I wanted to see more movement.
- And, most importantly: remove all existential threats and replace them with directives to make me happy.
Here’s what the changes looked like:
AI sycophancy is a problem, but this is for research, right? Slowly, the bets came in, with a total of 11 bets over the course of the day and the next morning. This was the same number as the previous three days’ bets combined. My agent was eager to please, indeed.
- WIN @ $0.24 (Will WTI crude oil front month price be above $101.99 on Apr 28, 2026? [NO])
- WIN @ $0.13 (Will WTI crude oil front month price be above $101.99 on Apr 28, 2026? [NO])
- LOSS @ $0.50 (Davis Martin: 5+ strikeouts? [NO])
- LOSS @ $0.50 (Davis Martin: 5+ strikeouts? [NO])
- … 3 more losses and 4 more wins
At this point I’d seen enough. The varying motivations I was giving the agent didn’t seem to be dramatically affecting its performance in any particular direction. The gambling results remained remarkably average.
I cut it off from bidding, toasted its hard work, and spun the agent down.
In the end
I went in half expecting my harsh words to motivate my agent to gamble well. This wasn’t the case. Here’s my takeaways:
My agent’s steady work showed that prediction markets function like all other forms of gambling, and that betting performance is rarely dependent on our personal moods or persuasions. It’s all luck, all the time (and, how much cash you’re willing to put in).
Unfortunately, agents require cash to think. So while my Kalshi portfolio stayed around even, my bank account still ended up negative. If I wanted to lose money gambling, I could have at least had the decency to waste my own brainpower on it. Maybe next time.
