DQN agent learns to interact with bridge to reach goal. Gym is force-based. Gym is no longer episodic - instead when reaching the goal, it flips north-south. Also, many results added about experiments with ANN agent based on a O'reilly URL. Base point for implementing multi agent interaction and test emergent tool use