KerasReinforcementLearningProjects - (EPUB全文下载)

文件大小:7.58 mb。
文件格式:epub 格式。
书籍内容:

Continuous control with deep reinforcement learning
In this example, we will address the problem of an inverted pendulum swinging up—this is a classic problem in control theory. In this version of the problem, the pendulum starts in a random position, and the goal is to swing it up so that it stays upright. Torque limits prevent the agent from swinging the pendulum up directly. The following diagram shows the problem:
The problem is addressed using an environment available in the OpenAI Gym library (Pendulum-v0) with the help of the DDPG agent of the keras-rl library (DDPGAgent).
OpenAI Gym is a library that helps us to implement algorithms based on reinforcement learning. It includes a growing collection of benchmark issues that expose a common interface and a website where people can share their results and compare algorithm performance. For the moment, we will imitate the use of the OpenAI Gym library; for more details, we will deepen the concepts that we will soon be looking at in Chapter 7, Dynamic Modeling of a Segway as an Inverted Pendulum System. The Pendulum-v0 environment is very similar to the CartPole environment (which we will use in the following chapter), but with an essential difference—we are expanding from a discrete environment (CartPole) to a continuous environment (Pendulum-v0).
The DDPG agent is based on an adaptation of Deep Q-learning to the domain of continuous action. This is an actor–critic algorithm, devoid of models, based on a deterministic policy gradient that can operate on continuous action spaces. Using the same learning algorithm, network architecture, and hyperparameters, this algorithm effectively solves several simulated physical activities, including classic problems, such as the inverted pendulum problem.
Actor–critic methods implement a generalized policy iteration, alternating between a policy evaluation and a policy improvement step. There are two closely related processes of actor improvement that aim at improving the current policy and critic evaluation, evaluating the current policy. If the critic is modeled by a bootstrapping method, it reduces the variance so that the learning is more stable than pure policy gradient methods.
Let's analyze the code in detail. As always, we will start with importing of the library necessary for our calculations, as follows:
import numpy as npimport gym
As shown in the following code, first we import the numpy library, which will be used to set the seed value. Then, we import ............

书籍插图:
书籍《KerasReinforcementLearningProjects》 - 插图1
书籍《KerasReinforcementLearningProjects》 - 插图2

以上为书籍内容预览,如需阅读全文内容请下载EPUB源文件,祝您阅读愉快。

版权声明:书云(openelib.org)是世界上最大的在线非盈利图书馆之一,致力于让每个人都能便捷地了解我们的文明。我们尊重著作者的知识产权,如您认为书云侵犯了您的合法权益,请参考版权保护声明,通过邮件openelib@outlook.com联系我们,我们将及时处理您的合理请求。 数研咨询 流芳阁 研报之家 AI应用导航 研报之家
书云 Open E-Library » KerasReinforcementLearningProjects - (EPUB全文下载)