This model uses REINFORCE with a learned baseline (value net), entropy regularization,
batch updates, observation normalization, orthogonal initialization, and gradient clipping.
Downloads last month
-
Downloads are not tracked for this model. How to track