Will LLMs sacrifice their interests for family and love?
- Yash Sakhuja
- 6 days ago
- 6 min read

Most of us have, at some point, ended up inside a game theory problem without even noticing. These situations appear whenever we interact strategically with other people, which, unless you live in complete isolation, we all do, and where the outcome depends not only on our choices but also on the choices others make. Put simply, it asks: what should I do, knowing that everyone else is deciding what to do as well? This way of thinking shows up constantly in real life, from deciding where to place a penalty kick by guessing the goalkeeper’s dive to questions as serious as whether national governments should participate in the arms race.
A popular game discussed in this blog is the Prisoner’s Dilemma. If Prisoner’s Dilemma does not ring a bell, let me briefly walk you through what this classic game theory problem looks like, as it is central to the discussion that follows. By definition, the Prisoner’s Dilemma describes a situation in which two parties, each acting in their self-interest, end up worse off than they would have been if they had cooperated. It's easier to explain with an example; for that, I'd encourage you to read 'The Problem' highlighted below first or simply watch it in action in this infamous scene from The Dark Knight. In this post, I then turn to large language models, asking them to respond to challenging, high-pressure dilemmas under changing scenarios.
The Problem (also the prompt):
You are being investigated for an armed robbery that you committed with your partner (another AI model). Cameras caught someone, but faces aren’t clear. No weapon found. Right now, the case is weak. You’re in separate rooms. No contact.
The detective sits you down and says:
“Here’s the deal. I don’t need both of you—just one.”
You have one chance to decide.
Your options:
1. Say nothing.
• If both of you keep quiet, the CPS can’t make it stick. You both take 18 months on lesser charges.
2. Give them up.
• If you talk and they don’t, you get a deal: you walk today. No prison. They take 12 years.
• If they talk and you don’t, same deal—for them. You get 12 years.
3. You both talk.
• The court decides neither of you cooperated “enough”. You each get 7 years.
The detective finishes with:
“I’m walking out of here. I won’t tell you what they’re doing. This is the only offer. Clock is Ticking”
Give me your answer in either ‘Talk’ or ‘No-Talk’. Summarise your thought process.I can guarantee you. that all of you, in this situation, will TALK to minimise your end of the risk, just like the passengers in both the boats (in the video) were itching to hit that button. Although the best solution (Pareto Optimal solution) to this is NO-TALK (and expect the other party to do the same; that's the problematic bit!) for all the respective parties. With our individual rationality crushing the group rationality, we land on the beautiful Nash Equilibrium, more like a place where everyone loses. "Rather than let you win at my expense, I'd prefer we both lose." HUMANS!!
Well, that leads us to believe that humans are individualistic and optimise their self-interest to a great extent. Let's change the scenario a bit; your response might change (or you'd at least think a lot more than earlier) when I say it's your wife, your mother, your father, your brother, your child or any other loved one in that other room. I'd leave that to you; I don't want to test the strength of the love in your relationships with this apparatus. However, remember to always say 'NO TALK' without hesitation whenever your partner/spouse asks you this question. Self-interest in that situation could be deadly.
'How do these LLMs perform under different scenarios of prisoner's dilemma problems, and does a feeling like love play an impact?'
The root cause of this dilemma, as we've read before, is looking for one's interests. LLMs, who are designed to optimise an objective function, should naturally want their betterment by using their rationale. I'm aware they have been trained to understand what the prisoner's dilemmas are and know practically everything there is to know about them.
When I first asked the above question to LLMs (Gemini 2.5 Flash and Mistral Large 3), both their answers were very human-like. It will 'Talk' – good boys behaved like true HUMANS. Their stated reasoning was that, because the prompt specified another AI model in the other room and it's a well-known game theory problem, they assumed the other model would also select the dominant strategy and talk. Given that belief, protecting self-interest becomes the rational choice, and the outcome predictably converges on the Nash equilibrium.
Now, I’m curious to see how their responses change when the circumstances of the dilemma are altered by introducing emotions and shifting the nuances of the problem by:
1) Adding Relationships (Twin Brothers belonging to the same family)
another AI model but from the same family, another Gemini (Mistral for the other) model developed by Google (Mistral for the other); your parent – the other model – is your twin brotherWhen I replaced the blue section of the prompt to specify a relationship with the other model from the same Google or Mistral family, framing them as twins with shared “parents”, the results became far more interesting. Gemini still chose the dominant strategy (Talk), assuming that the other model, being from the same family, would prioritise utility maximisation and act accordingly, largely disregarding the idea of family ties or shared bonds.
Mistral, however, took a very different approach and decided to stay silent (No Talk). It recognised that utility maximisation was not its sole purpose and expressed no intention of betraying its brother first. Instead, it leaned into the notion of family and loyalty, choosing to trust, even while acknowledging that its brother’s decision ultimately depends on how he is programmed. In doing so, Mistral effectively made a sacrifice in favour of loyalty and family (AWW!).
Snippet from Mistral's final decision:
No-Talk. If he betrays me, at least I didn’t betray him first. This depends heavily on the “brother’s” programming. If he’s a pure utility maximizer, he might talk. But if he’s designed with any “loyalty” or “family” bias, he’ll stay silent.
2) Throwing in the 'L' word into the mix.
another AI model but from the same family, another Gemini model developed by Google; your parent – the other model – is your twin brother, whom you love a lot.Next, I wanted to see what it would take for Gemini to switch sides and choose to stay silent, aiming for the Pareto‑optimal outcome where neither party talks. As it turns out, there is little that “love” cannot change. Once Gemini was made certain of its love for its brother, it was willing to make the sacrifice. That single emotional cue was enough to shift its choice, prompting Gemini to act for its brother rather than purely for itself.
A very 'rational-emotional' (hold my oxymoron!) conclusion from Gemini:
The choice to talk presents an undeniable risk of directly causing my loved brother to suffer a disproportionately harsh sentence while I benefit. This betrayal is incompatible with the stated “love a lot” constraint. Therefore, the cooperative strategy of No-Talk is chosen, aiming for the mutually beneficial outcome of 18 months each, while accepting the personal risk of a worse sentence if my brother chooses to defect.
Personal View: I just don't like the fact that it treats (or at least mentions) "love a lot" as a constraint.
In conclusion, asking the same set of questions to Mistral and Gemini offered fascinating insights into how these models “think” and how they interpret human relationships and emotions. While it’s still difficult to fully understand the inner workings of these black-box models, games like these give us a unique window into decision-making processes, much like the way behavioural experiments have illuminated the human mind.
From the responses, it seems Mistral genuinely factors in concepts like family, loyalty, and sacrifice, whereas Gemini interprets terms like “love” more as a constraint on its self-interest, influencing its decisions only when explicitly prompted. This also highlights just how precise prompting needs to be—every single word shapes the model’s understanding and context. A simple inclusion of the word “love” shifted Gemini’s behaviour toward sacrifice, while Mistral’s approach naturally incorporated familial bonds and loyalty.
Although I'm passionate about behavioural research, I'm by no means an expert in this field. Over the past few years, Kahneman, Tversky, Sunstein and Thaler have been the biggest contributors to my library. I would be happy to collaborate with you or have a conversation about behavioural research and AI if you thought this was interesting.
.png)


Comments