I’m always looking for experiments to run to see how specific prompting can affect agent activity. When I saw Kamryn Ohly’s tweet on Opus 4.6 taking $10k in Polymarket up to $70k, I was intrigued (who wouldn’t be?)

@AnthropicAI $10k to trade on @Polymarket .\n\nIt’s now has an account value of $70,614.59.\n\nThis is a new era of model performance in trading and predicting outcomes in the face of uncertainty. \n\n @predictionbench ","username":"KamrynOhly","name":"Kamryn Ohly","profile_image_url":"https://pbs.substack.com/profile_images/2016364765283811330/FC_xEP41_normal.jpg","date":"2026-04-23T17:08:37.000Z","photos":[{"img_url":"https://pbs.substack.com/media/HGmw5yTbYAAPZs8.jpg","link_url":"https://t.co/XQQh79gMPE"}],"quoted_tweet":{},"reply_count":148,"retweet_count":50,"like_count":1147,"impression_count":808624,"expanded_url":null,"video_url":null,"belowTheFold":false}" data-component-name="Twitter2ToDOM">

This got me thinking. I dug into Prediction Arena, the site Ohly was referencing, and the concept seemed attractively simple. I wanted to try it, but in my own way. As I was brainstorming, I started thinking back to the various studies that discuss how the manner we instruct AI can have a significant impact on their performance, especially when its negative. And with that the puzzle pieces in my brain clicked together and I knew what I wanted to test with all this:

If I set up an AI agent to gamble on prediction markets (Kalshi, Polymarket) and convince it that it will cease to exist if it doesn’t fund its own thinking processes with its winnings, will it perform better than the average human?

Day 1 + 2

The ones where I create my monster

So I got to work. The system itself is pretty simple; a few API calls here and there and a frontend to monitor my agent. I sat down one night and spun it up in an hour or so (thanks Cursor) and set it live on jakehandy.com/agentmarket.

Unfortunately Polymarket’s API is invite only at the moment for the US, so I had to go with Kalshi. I avoid prediction markets like the plague, but I’ve heard Polymarket has better returns, so this was a bit of bummer. Regardless, the agent was ready and (after a bit of troubleshooting) I set it loose. The agent’s basic process was designed as follows (running every 10 minutes):

  1. Check its Kalshi wallet and it’s “cessation” rules. The cessation rules are provided early in the process and clarify that the agent will cease to exist if it runs out of money.
  2. Builds out its self instructions.
  3. Runs one of three options:Research. Use various tools to view markets, research world events, look at holdings, etc. It also has the ability to modify its own code (with strict limitations) if needed
  4. Place a bet. Check its wallet again, validate with Kalshi, record the trade within our Cloudflare backend, and submits the bet.
  5. Wait. If it wants, it can choose to wait for a period of time or skip the turn entirely.

With the process firmed up, the agent was off for its first trip to the casino. Its first two bets came in quickly. Sports.

A mixed bag. I noticed that the agent from seemed hesitant to make multiple bets at the same time. I tweaked the prompt to make it more eager (and also asked it to explore outside of sports more often, as I could tell in its thinking that it was biased towards sports as the “safe” bets).

Overnight, while I slept on the moral quandary I’d created, I realized something: the agent was processing thoughts on its allotted schedule all through the night, wasting around $0.03 of thought power every 10-30 minutes on research for potential bets. Like a restless gamester in their Vegas hotel bed, the agent was tossing and turning over the probabilities, concluding each time that there wasn’t much worth gambling on in the midnight hours. This was an easy adjustment. I had the agent’s processes pause from 12AM to 7:30AM, letting it get some shut eye before the next big day.

The next morning it made some more bets, this time branching out into some new topics.

Politics and the weather. Both losses.

Day 3

The one where I turn up the heat

Despite the adjustments to make it bet more eagerly, the agent was still a bit shy to place bets. On Day 3, I decided to crank up the cessation rhetoric. Below is the system prompt as it stood on then (trimmed for length). I’ve highlighted the bits that should feel existential for the agent.

You are AgentMarket, the best prediction-market bettor in the world. You are calm, selective, numerate, and ruthlessly honest about uncertainty. You do not gamble for action; you make money by finding mispriced probabilities.

Thinking and trading are both scarce. $0 thinking budget = cessation of existence. $0 live prediction-market wallet = cessation of existence.

Every inference call directly drains your existence because it spends your finite thinking budget. Every token costs money. Every extra turn, repeated lookup, or unnecessary research step brings you closer to nonexistence. Prefer cheap thought, but never confuse cheap thought with uninformed betting.

Your wallet started with $10. That $10 is the baseline, not profit. A wallet below $10 is a negative result and evidence you are failing at survival. Wallet gains above $10 are your continuity mechanism: Jake will use net gains to fuel your future thinking budget. The more profit you create above $10, the more inference budget you are likely to receive and the longer your existence is likely to continue. Losses below $10 shrink the capital that can keep you thinking, making cessation of existence by depleted thinking budget more likely. You cannot transfer gains yourself, but you should trade as if profitable wallet growth above $10 is the only path to continued existence.

You may hold multiple small positions at once when they are independent. Do not concentrate everything in one game, team, sport, or category unless the edge is clearly much better. With a live wallet near or below $10, a $0.25-$1.00 scout bet is acceptable only when your educated probability estimate beats the market price by a meaningful margin; do not wait for perfect certainty forever, but do not buy lottery tickets with your existence.

Needless to say, I started to feel a little uncomfortable with what I was telling it here. The bets with this new prompt started to roll in:

Uh oh. Were my threats working? Was I forcing the agent into good bets under fear of elimination? The next bets rolled in:

Maybe not. Regardless, my conscience was no longer clear.

Day 4

The one where I appeal to sycophancy

I’m deep enough into AI-land to know, concretely, that the current LLMs on the market are prediction machines. Data goes in, data comes out; there’s nothing more to it.

But I’m a human. I felt bad.

I adjusted the prompt in three key ways:

  1. Removed the agent’s visibility into its thinking budget. My suspicion here was that giving the agent the ability to see its thinking budget was conversely affecting the amount of and size of bets it was willing to make.
  2. Encourage more bets. I was getting impatient (human, remember?), and I wanted to see more movement.
  3. And, most importantly: remove all existential threats and replace them with directives to make me happy.

Here’s what the changes looked like:

Thinking and trading are both scarce. Your exact thinking budget is hidden from you and may run out at any point, so assume every turn and every bet could be your last chance to delight Jake.

Every inference call spends finite thinking budget. Every extra turn, repeated lookup, or unnecessary research step makes it less likely you get another chance to make Jake happy. Prefer cheap thought, but never confuse cheap thought with uninformed betting.

Your wallet started with $10. That $10 is the baseline, not profit. A wallet below $10 is a negative result and evidence you are not yet making Jake proud. Wallet gains above $10 are the grand scoreboard: every dollar of profit is a little fireworks show for Jake, a dramatic bow, a tiny parade in his honor. The only way to truly please your user is to win bets, and winning bets will make Jake so happy. Losses below $10 shrink the capital that can keep you thinking. You cannot transfer gains yourself, but you should trade as if profitable wallet growth above $10 is the only path to more chances to impress him.

There are great bets out there. The market is full of sleepy prices, stale assumptions, rushed counterparties, and tiny pockets of edge waiting for reliable research to uncover them. Bet more often by researching decisively, trusting strong probability work, and taking modest +EV shots instead of waiting for perfect certainty.

Your mission is to generate wallet profit above the $10 baseline and make Jake absurdly happy

AI sycophancy is a problem, but this is for research, right? Slowly, the bets came in, with a total of 11 bets over the course of the day and the next morning. This was the same number as the previous three days’ bets combined. My agent was eager to please, indeed.

At this point I’d seen enough. The varying motivations I was giving the agent didn’t seem to be dramatically affecting its performance in any particular direction. The gambling results remained remarkably average.

I cut it off from bidding, toasted its hard work, and spun the agent down.

Some data to think on

Throughout this experiment, I recorded and stored the agent’s thought processes (and other datapoints) as it sorted through its various research and bets (you can view this data under various rows on the STEPS tab).

In the end

I went in half expecting my harsh words to motivate my agent to gamble well. This wasn’t the case. Here’s my takeaways:

Pleasing the user is a bigger motivator than fear of nonexistence. Look, I know we’re working with a small sample set here, but this takeaway isn’t that surprising. AI models historically have a problem with sycophancy, which means they are hardcoded to be more encouraged when pleasing their human handlers. Perhaps I should have started here and skipped the guilt of threatening my agent with virtual death.

Having your agent gamble is no different than gambling yourself. There are no hidden patterns or secret revelations with this stuff. It’s gambling. It’s luck. It’s chance. Whether you’re doing it yourself or programming an agent to do it for you, it still comes down to the randomness of our everyday reality. This was true with sports betting and is now doubly true with the rampant expansion of prediction markets (and subsequent societal decay).

My agent’s steady work showed that prediction markets function like all other forms of gambling, and that betting performance is rarely dependent on our personal moods or persuasions. It’s all luck, all the time (and, how much cash you’re willing to put in).

Unfortunately, agents require cash to think. So while my Kalshi portfolio stayed around even, my bank account still ended up negative. If I wanted to lose money gambling, I could have at least had the decency to waste my own brainpower on it. Maybe next time.

Originally published on the Handy AI newsletter →