FxMath RL Studio — Interactive Guide

Reinforcement Learning is a machine learning paradigm where an agent learns to make decisions by interacting with its environment. In trading:

Agent = Your trading bot
Environment = Historical or live market data (price, volume, indicators)
Action = Buy, Sell, or Hold
Reward = Profit / loss from each trade

The agent explores thousands of trading scenarios during training, learning which actions lead to profit and which cause losses — without you writing a single trading rule.

Make sure MetaTrader 5 is installed and running on your PC.
Select your Symbol (e.g. XAUUSDb, EURUSDb).
Choose Number of Bars — how much historical data to load.
Pick a Timeframe (H1 for hourly, M15 for 15-min bars, etc.).
Set the path to your MT5 terminal64.exe or click "Detect Instances" to auto-find it.
Click "Connect". A green [Connected] status confirms success.

⚠️ If "Detect Instances" shows nothing, check that MT5 is installed in the default location or browse manually.

Parameter	Meaning	Typical Value
`Alpha (α)`	Learning rate — how quickly the agent adapts to new information. Higher = faster learning but less stable.	0.1
`Gamma (γ)`	Discount factor — how much the agent values future rewards vs. immediate profit. Higher = more forward-looking.	0.95
`Epsilon (ε)`	Exploration rate — chance the agent picks a random action instead of the "best" known one. Higher = more exploration.	0.3
`Eps Decay`	Epsilon shrinks each episode by this multiplier, so the agent explores less over time.	0.995
`Min Eps`	Floor for epsilon — ensures the agent never stops exploring entirely.	0.01

Rule of thumb: Start with defaults. If the agent's rewards are flat, increase epsilon or learning rate. If it's too erratic, lower them.

Training time depends on:

Number of episodes — 500–1000 is typical for H1 data.
Bars per episode — 100 bars per episode means each episode simulates 100 trading decisions.
Your CPU — training is compute-heavy. Expect 5–30 minutes for 1000 episodes on a modern CPU.

When to stop: Watch the Reward Chart. If the blue reward curve trends upward and the pink 10-episode average stabilizes near the top, training has converged. If the curve is still noisy after 1000 episodes, increase the episode count.

Feature	FREE	PRO
Symbols	Single	Multi-symbol portfolio
Timeframes	Single	Multi-timeframe analysis
Grid Trading	✗	✔ Grid + Martingale
Risk Management	Basic (Max DD)	Advanced (trailing SL, TP, position sizing)
Email Alerts	✗	✔ Trade & error notifications
Model Persistence	Save/Load single	Versioned model checkpoints
Logging	Console	Full CSV export + dashboard

Upgrade at fxmath.com.

Blue line = Raw reward per episode. Spiky = the agent is still exploring.
Pink dashed line = 10-episode rolling average. Smoother — use this to judge progress.

Good sign: Both lines trend upward over time → the agent is learning profitable patterns.

Bad sign: Flat or declining rewards after many episodes → check parameters or data quality.

Normal: Some variance is expected. The agent tries random actions (epsilon) which sometimes lose money even when the strategy is sound.

After training, click "Save Model" in the Controls panel.
Choose a location and filename (e.g. xauusd_h1_rl.pkl).
To reload later: click "Load Model", select the file, and the agent will resume from where it left off.
Use "Save Settings" to persist your current parameters (symbol, timeframe, alpha, etc.) for quick setup next time.

⚠️ Models are tied to the symbol and timeframe they were trained on. Loading a model trained on EURUSD H1 while connected to XAUUSD M15 will produce poor results.

Error Message	Likely Cause	Solution
`No connection to MT5`	MT5 not running or path is wrong.	Open MT5, click "Detect Instances" or browse manually.
`Symbol not found`	Symbol name differs in your broker's Market Watch.	Check Market Watch in MT5 and type the exact name (e.g. `XAUUSDb`).
`Insufficient bars`	Not enough historical data for the requested bars × episodes.	Reduce bar count or download more history in MT5.
`Trade timeout`	Broker rejected the order or server was busy.	Check if manual trading is allowed. Increase slippage.
`Out of memory`	Data set too large (too many bars × episodes).	Reduce bars per episode or max episodes.

Still stuck? Email [email protected] with a screenshot of the log.

Connect to MetaTrader 5