Skip to main content
Fig. 1 | BMC Medical Imaging

Fig. 1

From: Reinforcement learning using Deep \(Q\) networks and \(Q\) learning accurately localizes brain tumors on MRI with very small training sets

Fig. 1

Environment and reward scheme for training. a Shows the initial state (\({s}_{1}\)) for all episodes, with the agent in the upper left corner. b–d Display the rewards in different states for the three possible actions. When the agent is not in a position overlapping or next to the lesion (b), staying in place gets the biggest penalty (reward of − 2), with a lesser penalty if the agent moves (reward of − 0.5). c Shows the rewards for the possible actions in the state just to the left of the mass. Moving toward the lesion so that the agent will coincide with it receives the largest possible and only positive reward (+ 1). d Shows the state with the agent coinciding with the lesion. Here we want the agent to stay in place, and thus reward this action with a + 1 reward

Back to article page