NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent LearningLearning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent's reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent's reward signal. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards and empirically demonstrate their benefits
Document ID
20140013383
Acquisition Source
Ames Research Center
Document Type
Conference Paper
Authors
HolmesParker, Chris
(Parflux LLC Salem, OR.)
Taylor, Mathew E.
(Washington State Univ. Pullman, WA, United States)
Tumer, Kagan
(Oregon State Univ. Corvallis, OR, United States)
Agogino, Adrian
(California Univ. Moffett Field, CA, United States)
Date Acquired
November 6, 2014
Publication Date
May 5, 2014
Subject Category
Cybernetics, Artificial Intelligence And Robotics
Report/Patent Number
ARC-E-DAA-TN13699
Meeting Information
Meeting: International Conference on Autonomous Agents and Multiagent Systems
Location: Paris, France
Country: France
Start Date: May 5, 2014
End Date: May 9, 2014
Sponsors: Association for Computing Machinery
Funding Number(s)
CONTRACT_GRANT: NAS2-03144
Distribution Limits
Public
Copyright
Public Use Permitted.
Keywords
Reinforcement Learning
Multiagent Systems
Optimization
No Preview Available