Snake Played by a Deep Reinforcement Learning Agent
With bloopers

Written By
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS

![CID of TI-considering current-RF optimization. The actions and received rewards of the first timestep's agent are orange, those of the second step's agent are blue. The highlighted path shows an instrumental goal to preserve the current implemented RF. Source: Author generated, inspired by [4].](https://towardsdatascience.com/wp-content/uploads/2022/04/1EQohNBT-7wJ07psSXCbdsg.png)