== rl-texplore-ros-pkg References == This page provides references for the code provided in the rl-texplore-ros-pkg, including the [[reinforcement_learning]] stack and all of its packages. * Beeson P, O’Quin J, Gillan B, Nimmagadda T, Ristroph M, Li D, Stone P (2008) Multiagent interactions in urban driving. Journal of Physical Agents 2(1):15–30 * Brafman R, Tennenholtz M (2001) R-Max - a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), pp 953–958 * Breiman L (2001) Random forests. Machine Learning 45(1):5–32 * Dietterich T (1998) The MAXQ method for hierarchical reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp 118–126 * Hester T, Stone P (2010) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-ICDL10-hester.html|Real time targeted exploration in large domains]]. In: Proceedings of the Ninth International Conference on Development and Learning (ICDL) * Hester T, Quinlan M, Stone P (2012) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-ICRA12-hester.html|A real-time model-based reinforcement learning architecture for robot control]]. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) * Hester T, Stone P (2012) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-ICDL12-hester.html|Intrinsically Motivated Model Learning for a Developing Curious Agent]]. In: Proceedings of the Eleventh International Conference on Development and Learning (ICDL) * Hester T, Stone P (2013) [[http://www.springerlink.com/content/nj1t3273037pr607/| TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots]]. In: Machine Learning, Volume 90, Issue 3, Pages 385-429, 2013. * Kocsis L, Szepesvari C (2006) Bandit based Monte-Carlo planning. In: Proceedings of the Seventeenth European Conference on Machine Learning (ECML) * Konidaris G, Barto A (2007) Building Portable Options: Skill Transfer in Reinforcement Learning. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) * Moore A, Atkeson C (1993) Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13:103–130 * !McCallum A (1996) Learning to use selective attention and short-term memory in sequential tasks. In: From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior * Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, Wheeler R, Ng A (2009) ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software * Quinlan R (1986) Induction of decision trees. Machine Learning 1:81–106 * Quinlan R (1992) Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, World Scientific, Singapore, pp 343–348 * Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR 166, Cambridge University Engineering Department * Strehl A, Diuk C, Littman M (2007) Efficient structure learning in factored-state MDPs. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pp 645–650 * Sutton R (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning (ICML), pp 216–224 * Sutton R, Barto A (1998) Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA * Tanner B, White A (2009) RL-Glue : Language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research 10:2133–2136 * Watkins C (1989) Learning from delayed rewards. PhD thesis, University of Cambridge