For my reasoning on model's architecture and training regime, you can read [here](https://amachinewithorgans.wordpress.com/)