home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!BEN.DCIEM.DND.CA!MMT
- Message-ID: <9208182241.AA20328@chroma.dciem.dnd.ca>
- Newsgroups: bit.listserv.csg-l
- Date: Tue, 18 Aug 1992 18:41:25 EDT
- Sender: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET>
- From: mmt@BEN.DCIEM.DND.CA
- Subject: Hierarchic Dynamic Programming
- Lines: 67
-
- [Martin Taylor 920818 18:40]
-
- Just after posting that quoted paragraph from sci.cognitive, I looked at the
- next posting. I don't think HDP is like PCT at all, but judge for yourselves.
- Here's the posting:
-
- Martin
- ================
-
- Heuristic Dynamic Programming in a Realistic Biological Context
- Harry R. Erwin
- erwin@trwacs.fp.trw.com
-
- As I showed at the 1982 Animal Behavior Workshop in Guelph, Ontario,
- the optimum strategy for playing a discrete game against nature involving
- information collection is a simple threshold strategy. The player uses
- Bayesian statistics to maintain an estimate of his probability of success,
- and compares that estimate against a threshold at each decision point.
- If the probability of success remains above the threshold, he continues
- the game; otherwise, he quits. The threshold can be calculated by treating
- the game as a problem in dynamic programming. (John Bather, Pers. Com.,
- 1983)
-
- In a biological context, this strategy lends itself to implemention using
- HDP. The critic network would provide the current threshold value as a local
- goal value, and the action network would compare the current probability
- against that value. If the current probability exceeded the threshold, the
- preferred action would be to continue to collect information; otherwise it
- would be to quit. Note that the critic network responds to the perceived
- payoffs and risks of the game and not to the current situation. Both critic
- and action networks would be prior to the motor cortex, which would then
- treat both as a combined critic network and attempt to reduce fear to
- nominal levels.
-
- Current payoffs---\
- O-- local goal value--------------\
- Target category---/ (A) feedback \
- --- \
- Target condition--\ | | \
- \ V | \ (D)
- Self condition----->0--initial estimate --->0-current est>0
- / (B) / (C) \
- Environment-------/ / \
- / decision
- Information collected and processed-----/ (expressed as fear level)
- \ (E)
- Motor options-------------------------------------------------->0->motor
- cortex
-
- Note that there are a number of places where training would occur. Subsystem A
- needs to learn how to calculate the local goal values corresponding to various
- payoffs and intensities of the game (primarily defined by target category).
- I suspect most species have this hard-coded in the genome. (The local goal
- values are not obvious functions of the inputs!) Subsystem B can be trained
- more easily--in mammals, that is part of the role of play and parental
- teaching. Subsystems C and D are probably hard-coded, even in man. Subsystem C
- implements logistic functions, while Subsystem D does a simple comparison.
- Subsystem E probably uses fear level to affect the preference functions for
- various actions used by the motor controller, although it may select a desired
- fear level and output partials to the motor controller instead. (I suspect that
- version is more correct, because the corresponding 2-person game can't be
- handled by outputting simple fear level, and man does play the 2-person game.)
-
- Cheers,
- --
- Harry Erwin
- Internet: erwin@trwacs.fp.trw.com
-