CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

HolmesParker, Chris; Taylor, Mathew E.; Tumer, Kagan; Agogino, Adrian

Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent's reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent's reward signal. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards and empirically demonstrate their benefits

Document ID

20140013383

Acquisition Source

Ames Research Center

Document Type

Conference Paper

Authors

Date Acquired

November 6, 2014

Publication Date

May 5, 2014

Subject Category

Report/Patent Number

Meeting Information

Meeting: International Conference on Autonomous Agents and Multiagent Systems

Location: Paris, France

Country: France

Start Date: May 5, 2014

End Date: May 9, 2014

Sponsors: Association for Computing Machinery

Funding Number(s)

Distribution Limits

Public

Public Use Permitted.

Keywords

Available Downloads

Name

Type

20140013383.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records