Using AI Persuasion to Reduce Political Polarization
Job Market Paper

AI chatbots can reduce overall polarization across different topics, compared to different controls, with effects persisting after one month. AI chatbots perform on par with incentivized humans and static text interventions, but uniquely improve some measures for affective polarization, enjoyment, and individualization. Through two pre‐registered randomized controlled trials with representative samples of the U.S. population (N=811 and N=838), I provide the first comprehensive experimental evidence that AI‐powered conversational agents can effectively reduce political polarization on contentious issues. The first experiment demonstrated that an AI chatbot successfully persuaded participants to adopt more moderate views on U.S. support for Ukraine, reducing overall ideological polarization by approximately 20 percentage points. The second experiment compared AI persuasion to human persuaders and static text on immigration policy. All three interventions significantly reduced participants’ distance from moderate positions by about 10 percentage points, with no statistically significant differences in persuasive effectiveness between treatments. However, participants rated AI conversations as significantly more enjoyable and felt their individual concerns were better addressed by the AI compared to other interventions. Affective polarization showed limited improvements across all treatments. These findings demonstrate that AI‐powered persuasion could serve as a cheap, scalable tool for reducing political polarization while highlighting important concerns about potential misuse by political parties and geo‐political adversaries, underscoring the need for careful regulation of AI persuasion technologies.

[paper]

Advised by an Algorithm: Learning with Different Informational Resources and Reactions to Heterogeneous Advice Quality
Joint work with Jan Biermann and John Horton.

In a wide range of settings, decision-makers increasingly rely on algorithmic tools for support. Often, the algorithm serves as an advisor, leaving the final decision to be made by human judgment. In this setting, we focus on two aspects: first, identifying the informational resources that aid individuals in evaluating algorithmic guidance, and second, exploring human reactions to varying qualities of algorithmic advice. To address these questions, we conducted an online experiment involving 1565 participants. In the baseline treatment, subjects repeatedly perform the same estimation task and are provided with algorithmic guidance, all without knowledge of the type of algorithm or feedback after each round. Subsequently, we introduce two interventions aimed at enhancing the quality of human decisions when receiving algorithmic advice. In the first intervention, we explain the way the algorithm functions. We find that while this intervention reduces adherence to algorithmic advice, it does not improve decision-making performance. In the second treatment, we disclose the correct answer to the task after each round. This intervention leads to a reduction in adherence to algorithmic advice and an improvement in human decision-making performance. Furthermore, we investigate the extent to which individuals can adjust their assessment of the algorithm when advice quality fluctuates due to external circumstances. We find some evidence that individuals can assess algorithmic advice thoughtfully, adjusting their adherence depending on the quality of algorithmic recommendations.

[paper]

Testing Novelty Incentives in Human Red Teaming
Joint work with Dominik Rehse and Sebastian Valet.

We test whether paying for novel failures makes human red‐teaming more efficient. In a real‐time market, each model reply is scored for harassment and for novelty (from embeddings). Two pre‐registered Prolific experiments pit a harm‐only control against a treatment paid for novelty‐weighted harm under two regimes: in Experiment 1, treatment bonuses can be at most equal to control; in Experiment 2, they are at least equal. This two‐regime design intentionally separates pay/risk effects from the novelty objective itself. Novelty incentives push search into new areas and raise novelty but make eliciting harassment harder. Efficiency improves under the first regime (more novelty‐weighted harm per euro) but not under the second, where higher pay fails to lift efficiency. On average, treatment yields lower novelty‐weighted harm as novelty gains are offset by lower harassment. Ex‐post, treatment inputs are more diverse and semantically distinct; outputs show no consistent diversity gains.

[paper]