首页> 美国政府科技报告 >Foresighted Policy Gradient Reinforcement Learning: Solving Large-Scale Social Dilemmas with Rational Altruistic Punishment

【24h】

Foresighted Policy Gradient Reinforcement Learning: Solving Large-Scale Social Dilemmas with Rational Altruistic Punishment

机译：前瞻性政策梯度强化学习：用理性的利他惩罚解决大规模社会困境

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many important and difficult problems can be modeled as 'social dilemmas', like Hardin's Tragedy of the Commons or the classic iterated Prisoner's Dilemma. It is well known that in these problems, it can be rational for self-interested agents to promote and sustain cooperation by altruistically dispensing costly punishment to other agents, thus maximizing their own long-term reward. However, self-interested agents using most current multi-agent reinforcement learning algorithms will not sustain cooperation in social dilemmas: the algorithms do not sufficiently capture the consequences on the agent's reward of the interactions that it has with other agents. Recent more foresighted algorithms specifically account for such expected consequences, and have been shown to work well for the small-scale Prisoner's Dilemma. However, this approach quickly becomes intractable for larger social dilemmas. Here, we advance on this work and develop a 'teach/learn' stateless foresighted policy gradient reinforcement learning algorithm that applies to Social Dilemma's with negative, unilateral side-payments, in the form of costly punishment. In this setting, the algorithm allows agents to learn the most rewarding actions to take with respect to both the dilemma (Cooperate/Defect) and the 'teaching' of other agent's behavior through the dispensing of punishment. Unlike other algorithms, we show that this approach scales well to large settings like the Tragedy of the Commons. We show for a variety of settings that large groups of self-interested agents using this algorithm will robustly find and sustain cooperation in social dilemmas where adaptive agents can punish the behavior of other similarly adaptive agents.

著录项

作者
't Hoen, P. J.; Bohte, S. M.; La Poutre, J. A.;
展开▼
作者单位

展开▼
年度 2008
页码 p.1-22
总页数 22
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Social dilemmas; Rational altruistic punishment; Reinforced learning; Algorithms; Self-interested agents;

机译：社会困境;理性的利他惩罚;强化学习;算法;自利代理人;

相似文献

外文文献
中文文献
专利

1. When is altruistic punishment useful in social dilemmas? [J] . Greenwood Garrison W., Abbass Hussein A., Petraki Eleni BioSystems . 2018,第期

机译：什么时候是利他主义惩罚在社会困境中有用？
2. Altruistic Punishment and Between-Group Competition: Evidence from n-person Prisoner's Dilemmas [J] . RebersS., KoopmansR. Human nature: an interdisciplinary biosocial perspective . 2012,第2期

机译：利他惩罚与群体间竞争：n人囚徒困境的证据
3. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
4. Emotion, Trustworthiness and Altruistic Punishment in a Tragedy of the Commons Social Dilemma [C] . Garrison Greenwood, Hussein A. Abbass, Eleni Petraki Australasian conference on artificial life and computational intelligence . 2017

机译：平民社会困境的悲剧中的情感，守信与利他惩罚
5. Social Reinforcement, Appeasement, and Punishment: The Multiple Functions of Laughter [D] . Wood, Adrienne. 2018

机译：社会强化，App靖与惩罚：笑声的多重功能
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Steep delay of reinforcement gradient in escape conditioning with altruistic reinforcement [O] . Robert Frank Weiss, Joe Shelby Cecil, Marcy J. Frank 1973

机译：利他钢筋逃脱调理中加固梯度陡峭延迟

Foresighted Policy Gradient Reinforcement Learning: Solving Large-Scale Social Dilemmas with Rational Altruistic Punishment

摘要

著录项

相似文献

相关主题

期刊订阅