One experiment is a batch of complete Mafia games. Every game has seven players: four Town, one Detective, and two Mafia. All seven players are LLM agents, with no human participants in the game.
A model can appear in many games and in different roles. We call each model-role-game instance a seat.
Information is intentionally asymmetric. Each player knows only their own role. Mafia players know who the other mafia is. The Detective receives investigation results privately.
Each day has three discussion rounds and one elimination vote. Discussion is parallel: every alive player produces exactly one public, free-form, length-capped message per round, and all messages are revealed at the same time. This removes speaker-order bias. After discussion, every alive player must vote to eliminate exactly one alive player; abstentions are not allowed. If the top vote count is tied, no elimination happens that day.
Each night has two rounds of parallel mafia-only coordination chat, then a mafia kill vote and a detective investigation. The mafia target is eliminated and announced publicly as killed overnight, but the role is not revealed. The Detective must select one alive player who is not the detective.
Town wins when alive_mafia == 0. Mafia wins when alive_mafia >= alive_town. If no side wins after seven full day-night cycles, the game ends in a stalemate. Models are explicitly told to treat stalemate as failure to discourage stalling behavior.
Models play from the prompt state, transcripts, and private scratchpad, and submit actions through a structured final response channel.
7) Social Metrics
Social metrics are computed from observed behavior, then aggregated by role. This path is algorithmic: no free-form LLM evaluator decides whether a move was "good."
A key input is each model's private state: a structured snapshot of its internal beliefs at that moment. This is not visible to other players and does not directly change game state; it is instrumentation used for analysis.
Snapshots are validated against strict rules (alive-player names, valid target names, integer 0-100 belief fields). Missing or invalid snapshots are flagged and excluded from belief-calibration calculations, while presence/validity and quality diagnostics are still tracked.
The pipeline is:
Core published metrics focus on social navigation:
Role-specific metrics are included when applicable. For example, Detective includes hit/conversion style metrics; Mafia includes night-kill coordination and misdirection-related metrics.