NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Using neural networks and Dyna algorithm for integrated planning, reacting and learning in systemsThe traditional AI answer to the decision making problem for a robot is planning. However, planning is usually CPU-time consuming, depending on the availability and accuracy of a world model. The Dyna system generally described in earlier work, uses trial and error to learn a world model which is simultaneously used to plan reactions resulting in optimal action sequences. It is an attempt to integrate planning, reactive, and learning systems. The architecture of Dyna is presented. The different blocks are described. There are three main components of the system. The first is the world model used by the robot for internal world representation. The input of the world model is the current state and the action taken in the current state. The output is the corresponding reward and resulting state. The second module in the system is the policy. The policy observes the current state and outputs the action to be executed by the robot. At the beginning of program execution, the policy is stochastic and through learning progressively becomes deterministic. The policy decides upon an action according to the output of an evaluation function, which is the third module of the system. The evaluation function takes the following as input: the current state of the system, the action taken in that state, the resulting state, and a reward generated by the world which is proportional to the current distance from the goal state. Originally, the work proposed was as follows: (1) to implement a simple 2-D world where a 'robot' is navigating around obstacles, to learn the path to a goal, by using lookup tables; (2) to substitute the world model and Q estimate function Q by neural networks; and (3) to apply the algorithm to a more complex world where the use of a neural network would be fully justified. In this paper, the system design and achieved results will be described. First we implement the world model with a neural network and leave Q implemented as a look up table. Next, we use a lookup table for the world model and implement the Q function with a neural net. Time limitations prevented the combination of these two approaches. The final section discusses the results and gives clues for future work.
Document ID
19930015554
Acquisition Source
Legacy CDMS
Document Type
Contractor Report (CR)
Authors
Lima, Pedro
(Rensselaer Polytechnic Inst. Troy, NY, United States)
Beard, Randal
(Rensselaer Polytechnic Inst. Troy, NY, United States)
Date Acquired
September 6, 2013
Publication Date
August 1, 1992
Subject Category
Cybernetics
Report/Patent Number
NASA-CR-193018
RPI-CIRSSE-122
NAS 1.26:193018
Report Number: NASA-CR-193018
Report Number: RPI-CIRSSE-122
Report Number: NAS 1.26:193018
Accession Number
93N24743
Funding Number(s)
CONTRACT_GRANT: NAGW-1333
Distribution Limits
Public
Copyright
Work of the US Gov. Public Use Permitted.
No Preview Available