NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / bit / listserv / csgl / 738 < prev next >

Wrap

Text File | 1992-08-18 | 3.8 KB | 78 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!paladin.american.edu!auvm!BEN.DCIEM.DND.CA!MMT Message-ID: <9208182241.AA20328@chroma.dciem.dnd.ca> Newsgroups: bit.listserv.csg-l Date: Tue, 18 Aug 1992 18:41:25 EDT Sender: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET> From: mmt@BEN.DCIEM.DND.CA Subject: Hierarchic Dynamic Programming Lines: 67 [Martin Taylor 920818 18:40] Just after posting that quoted paragraph from sci.cognitive, I looked at the next posting. I don't think HDP is like PCT at all, but judge for yourselves. Here's the posting: Martin ================ Heuristic Dynamic Programming in a Realistic Biological Context Harry R. Erwin erwin@trwacs.fp.trw.com As I showed at the 1982 Animal Behavior Workshop in Guelph, Ontario, the optimum strategy for playing a discrete game against nature involving information collection is a simple threshold strategy. The player uses Bayesian statistics to maintain an estimate of his probability of success, and compares that estimate against a threshold at each decision point. If the probability of success remains above the threshold, he continues the game; otherwise, he quits. The threshold can be calculated by treating the game as a problem in dynamic programming. (John Bather, Pers. Com., 1983) In a biological context, this strategy lends itself to implemention using HDP. The critic network would provide the current threshold value as a local goal value, and the action network would compare the current probability against that value. If the current probability exceeded the threshold, the preferred action would be to continue to collect information; otherwise it would be to quit. Note that the critic network responds to the perceived payoffs and risks of the game and not to the current situation. Both critic and action networks would be prior to the motor cortex, which would then treat both as a combined critic network and attempt to reduce fear to nominal levels. Current payoffs---\ O-- local goal value--------------\ Target category---/ (A) feedback \ --- \ Target condition--\ | | \ \ V | \ (D) Self condition----->0--initial estimate --->0-current est>0 / (B) / (C) \ Environment-------/ / \ / decision Information collected and processed-----/ (expressed as fear level) \ (E) Motor options-------------------------------------------------->0->motor cortex Note that there are a number of places where training would occur. Subsystem A needs to learn how to calculate the local goal values corresponding to various payoffs and intensities of the game (primarily defined by target category). I suspect most species have this hard-coded in the genome. (The local goal values are not obvious functions of the inputs!) Subsystem B can be trained more easily--in mammals, that is part of the role of play and parental teaching. Subsystems C and D are probably hard-coded, even in man. Subsystem C implements logistic functions, while Subsystem D does a simple comparison. Subsystem E probably uses fear level to affect the preference functions for various actions used by the motor controller, although it may select a desired fear level and output partials to the motor controller instead. (I suspect that version is more correct, because the corresponding 2-person game can't be handled by outputting simple fear level, and man does play the 2-person game.) Cheers, -- Harry Erwin Internet: erwin@trwacs.fp.trw.com