Skip to main content

Table 1 Q-Learning algorithm

From: Application of reinforcement learning for segmentation of transrectal ultrasound images

Initialize Q(s, a) arbitrary

Repeat (for each episode):

   Initialize state s

   Repeat (for each step of episode):

   Choose action a from state s using policy derived from Q (e.g., ε-greedy)

Take action a, observe reward r, next state s'

Q ( s , a ) Q ( s , a ) + α [ r + γ max a Q ( s , a ) Q ( s , a ) ] MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyuaeLaeiikaGIaem4CamNaeiilaWIaemyyaeMaeiykaKIaeyiKHWQaemyuaeLaeiikaGIaem4CamNaeiilaWIaemyyaeMaeiykaKIaey4kaSIaeqySdeMaei4waSLaemOCaiNaey4kaSIaeq4SdC2aaCbeaeaacyGGTbqBcqGGHbqycqGG4baEaSqaaiqbdggaHzaafaaabeaakiabdgfarjabcIcaOiqbdohaZzaafaGaeiilaWIafmyyaeMbauaacqGGPaqkcqGHsislcqWGrbqucqGGOaakcqWGZbWCcqGGSaalcqWGHbqycqGGPaqkcqGGDbqxaaa@5756@

ss';

until s is terminal