From: Application of reinforcement learning for segmentation of transrectal ultrasound images
Initialize Q(s, a) arbitrary |
Repeat (for each episode): |
Initialize state s |
Repeat (for each step of episode): |
Choose action a from state s using policy derived from Q (e.g., ε-greedy) |
Take action a, observe reward r, next state s' |
|
s ← s'; |
until s is terminal |