Characterizing Policy Divergence for Personalized Meta-Reinforcement Learning