This paper analyzes the long-term behavior of the REINFORCE and related algorithms (Williams 1986, 1988, 1992) for generalized learning automata (Narendra and Thathachar 1989) for the associative reinforcement learning problem (Barto and Anandan 1985). The learning system considered here is a feedforward connectionist network of generalized learning automata units. We show that REINFORCE is a gradient ascent algorithm but can exhibit unbounded behavior. A modified version of this algorithm, based on constrained optimization techniques, is suggested to overcome this disadvantage. The modified algorithm is shown to exhibit local optimization properties. A global version of the algorithm, based on constant temperature heat bath techniques, is also described and shown to converge to the global maximum. All algorithms are analyzed using weak convergence techniques.

This content is only available as a PDF.
You do not currently have access to this content.