×
CARLOS'S LEARNED STRATEGY
Algorithm: ε-Greedy
Parameter: ε = 0.1
Estimated Q-Values
Training Progress
Carlos learns which slot machine has the highest expected payout by balancing
exploration (trying different machines) and exploitation
(playing the best known machine). Higher Q-values indicate machines Carlos believes are more rewarding.