Q-learning associates states and actions of a Markov Decision Process to expected future reward through online learning. In practice, however, when the state space is large and experience is still limited, the algorithm will not find a match between current state and experience unless some details describing states are ignored. On the other hand, reducing state information affects long term performance because decisions will need to be made on less informative inputs. We propose a variation of Q-learning that gradually enriches state descriptions, after enough experience is accumulated. This is coupled with an ad-hoc exploration strategy that aims at collecting key information that allows the algorithm to enrich state descriptions earlier. Experimental results obtained by applying our algorithm to the arcade game Pac-Man show that our approach significantly outperforms Q-learning during the learning process while not penalizing long-term performance.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com