Publication: Zero-shot Learning of Hint Policy via Reinforcement Learning and Program Synthesis.