Delayed Rewards Calibration via Reward Empirical Sufficiency.
Yixuan LiuHu WangXiaowei WangXiaoyue SunLiuyue JiangMinhui XuePublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- bandit problems
- reward function
- expected reward
- camera calibration
- markov decision processes
- decision problems
- total reward
- average reward
- stereo camera
- theoretical analysis
- computer vision
- discounted reward
- focal length
- hand eye coordination
- multiarmed bandit
- multi armed bandits
- real time
- free riding
- camera parameters
- information theoretic
- optimal policy
- machine learning