Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning

概要

Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments. However, analyzing the nature of those environments is often overlooked. In particular, we still do not have agreeable ways to measure the difficulty or solvability of a task, given that each has fundamentally different actions, observations, dynamics, rewards, and can be tackled with diverse RL algorithms. In this work, we propose policy information capacity (PIC) – the mutual information between policy parameters and episodic return – and policy-optimal information capacity (POIC) – between policy parameters and episodic optimality – as two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. Evaluating our metrics across toy environments as well as continuous control benchmark tasks from OpenAI Gym and DeepMind Control Suite, we empirically demonstrate that these information-theoretic metrics have higher correlations with normalized task solvability scores than a variety of alternatives. Lastly, we show that these metrics can also be used for fast and compute-efficient optimizations of key design parameters such as reward shaping, policy architectures, and MDP properties for better solvability by RL algorithms without ever running full RL experiments.

収録
International Conference on Machine Learning
古田 拓毅
古田 拓毅
博士課程
松嶋 達也
松嶋 達也
特任研究員

人間と共生できるような適応的なロボットの開発と,そのようなロボットを作ることにより生命性や知能を構成的に理解することに興味があります.

顧 世翔
顧 世翔
客員准教授