Reinforcement Learning is a machine learning paradigm where an agent learns to make decisions by interacting with its environment. In trading:
The agent explores thousands of trading scenarios during training, learning which actions lead to profit and which cause losses — without you writing a single trading rule.
terminal64.exe or click "Detect Instances" to auto-find it.[Connected] status confirms success.⚠️ If "Detect Instances" shows nothing, check that MT5 is installed in the default location or browse manually.
| Parameter | Meaning | Typical Value |
|---|---|---|
Alpha (α) | Learning rate — how quickly the agent adapts to new information. Higher = faster learning but less stable. | 0.1 |
Gamma (γ) | Discount factor — how much the agent values future rewards vs. immediate profit. Higher = more forward-looking. | 0.95 |
Epsilon (ε) | Exploration rate — chance the agent picks a random action instead of the "best" known one. Higher = more exploration. | 0.3 |
Eps Decay | Epsilon shrinks each episode by this multiplier, so the agent explores less over time. | 0.995 |
Min Eps | Floor for epsilon — ensures the agent never stops exploring entirely. | 0.01 |
Rule of thumb: Start with defaults. If the agent's rewards are flat, increase epsilon or learning rate. If it's too erratic, lower them.
Training time depends on:
When to stop: Watch the Reward Chart. If the blue reward curve trends upward and the pink 10-episode average stabilizes near the top, training has converged. If the curve is still noisy after 1000 episodes, increase the episode count.
| Feature | FREE | PRO |
|---|---|---|
| Symbols | Single | Multi-symbol portfolio |
| Timeframes | Single | Multi-timeframe analysis |
| Grid Trading | ✗ | ✔ Grid + Martingale |
| Risk Management | Basic (Max DD) | Advanced (trailing SL, TP, position sizing) |
| Email Alerts | ✗ | ✔ Trade & error notifications |
| Model Persistence | Save/Load single | Versioned model checkpoints |
| Logging | Console | Full CSV export + dashboard |
Upgrade at fxmath.com.
Good sign: Both lines trend upward over time → the agent is learning profitable patterns.
Bad sign: Flat or declining rewards after many episodes → check parameters or data quality.
Normal: Some variance is expected. The agent tries random actions (epsilon) which sometimes lose money even when the strategy is sound.
xauusd_h1_rl.pkl).⚠️ Models are tied to the symbol and timeframe they were trained on. Loading a model trained on EURUSD H1 while connected to XAUUSD M15 will produce poor results.
| Error Message | Likely Cause | Solution |
|---|---|---|
No connection to MT5 | MT5 not running or path is wrong. | Open MT5, click "Detect Instances" or browse manually. |
Symbol not found | Symbol name differs in your broker's Market Watch. | Check Market Watch in MT5 and type the exact name (e.g. XAUUSDb). |
Insufficient bars | Not enough historical data for the requested bars × episodes. | Reduce bar count or download more history in MT5. |
Trade timeout | Broker rejected the order or server was busy. | Check if manual trading is allowed. Increase slippage. |
Out of memory | Data set too large (too many bars × episodes). | Reduce bars per episode or max episodes. |
Still stuck? Email [email protected] with a screenshot of the log.