Reinforcement learning

As a human being, you and we also learn from past experiences. We haven't got so charming by accident. Years of positive compliments as well as negative criticism have all helped shape us who we are today. You learn what makes people happy by interacting with friends, family, or even strangers, and you figure out how to ride a bike by trying out different muscle movements until it just clicks. When you perform actions, you're sometimes rewarded immediately. For example, finding a shopping mall nearby might yield instant gratification. Other times, the reward doesn't appear right away, such as traveling a long distance to find an exceptional place to eat. These are all about Reinforcement Learning (RL).

Thus RL is a technique, where the model itself learns from a series of actions or behaviors. The complexity of the dataset, or sample complexity, is very important in the reinforcement learning needed for the algorithms to learn a target function successfully. Moreover, in response to each data point for achieving the ultimate goal, maximization of the reward function should be ensured while interacting with an external environment, as demonstrated in the following figure:

Figure 5: Reinforcement learning

Reinforcement learning techniques are being used in many areas. Here's a very short list includes the following:

Advertising helps in learning rank, using one-shot learning for emerging items, and new users will bring more money
Teaching robots new tasks, while retaining prior knowledge
Deriving complex hierarchical schemes, from chess gambits to trading strategies
Routing problems, for example, management of a shipping fleet, which trucks/truckers to assign to which cargo
In robotics, the algorithm must choose the robot's next action based on a set of sensor readings
It is also a natural fit for Internet of Things (IoT) applications, where a computer program interacts with a dynamic environment in which it must perform a certain goal without an explicit mentor
One of the simplest RL problems is called n-armed bandits. The thing is there are n-many slot machines but each has different fixed pay-out probability. The goal is to maximize the profit by always choosing the machine with the best payout
An emerging area for applying is the stock market trading. Where a trader acts like a reinforcement agent since buying and selling (i.e. action) particular stock changes the state of the trader by generating profit or loss i.e. reward.

Table of Contents for Reinforcement learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Reinforcement learning