Reinforcement Learning

In reinforcement learning, we teach the model to adapt its behaviour by telling it how well it's doing at a certain task. In opposition to supervised learning, we don't include correct input our output in the learning process.

Examples

Multi-armed bandit

We can illustrate the multi-armed bandit problem as following.

A model is faced with a series of slot machines (one-armed bandit), each with an unknown probability distribution of rewards. The goal is to maximize cumulative rewards over time by balancing two strategies:

Exploration: The model tries different slot machines to gather information and estimate their respective reward distributions, aiming to discover which machine offers the highest payoff.
Exploitation: The model focuses on the slot machine that is currently believed to provide the highest expected reward, based on the information it has gathered so far, in order to maximize its immediate gains.

The challenge is to efficiently balance exploration (learning about the machines) and exploitation (leveraging the best-known machine).

Sorting

Genetic algorithms

Graph algorithms

Problems

Representation model

Other

Sysml

UML

Behaviour-diagrams

Structural-diagrams

Paradigms

Assets

Quality Attributes

Binary

Data structure

Heap

NoSQL

Data types

Cloud

Glossary

Glossary

Operating System

Learning paradigms

Neural Network

Linear algebra

Tensor

Physics

Reinforcement Learning ​

Examples ​

Multi-armed bandit ​

Reinforcement Learning

Examples

Multi-armed bandit