Getting ready

We will train a linear neural network to solve the 'CartPole-v0' environment (https://github.com/openai/gym/wiki/CartPole-v0). The goal here is to balance the pole on the cart; the observation state consists of four continuous-valued parameters: Cart Position [-2.4, 2.4], Cart Velocity [-∞, ∞], Pole Angle [~-41.8º, ~41.8º], and Pole Velocity at Tip [-∞, ∞]. The balancing can be achieved by pushing the cart either to left or right, so the action space consists of two possible actions. You can see the CartPole-v0 environment space:

Now, for Q learning, we need to find a way to quantize the continuous-valued observation states. This is achieved using the class FeatureTransform; the class first generates 20,000 random samples of observation space examples. The randomly generated observation space examples are standardized using the scikit StandardScaler class. Then scikit's RBFSampler is employed with different variances to cover different parts of the observation space. The FeatureTransformer class is instantiated with the random observation space examples, which are used to train the RBFSampler using the fit_transform function method.

Later, the transform method is employed to transform the continuous observation space to this featurized representation:

class FeatureTransformer:
 def __init__(self, env):
   obs_examples = np.random.random((20000, 4))
   print(obs_examples.shape)
   scaler = StandardScaler()
   scaler.fit(obs_examples)

   # Used to converte a state to a featurizes represenation.
   # We use RBF kernels with different variances to cover different parts of the space
   featurizer = FeatureUnion([
       ("cart_position", RBFSampler(gamma=0.02, n_components=500)),
       ("cart_velocity", RBFSampler(gamma=1.0, n_components=500)),
       ("pole_angle", RBFSampler(gamma=0.5, n_components=500)),
       ("pole_velocity", RBFSampler(gamma=0.1, n_components=500))
       ])
    feature_examples =          featurizer.fit_transform(scaler.transform(obs_examples))
    print(feature_examples.shape)

    self.dimensions = feature_examples.shape[1]
    self.scaler = scaler
    self.featurizer = featurizer

def transform(self, observations):
    scaled = self.scaler.transform(observations)
    return self.featurizer.transform(scaled)

Table of Contents for Getting ready

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting ready