I have been steeped in math for as long as I remember, so I frequently have a ha...

sarabande · on April 27, 2018

Things that weren't obvious to me as a non-mathematician:

    f:S×A↦S

(I assumed it meant: the model is represented by function f, whose inputs can be any combination of S and A [domain], and will produce an output value in S [codomain])

Why do we set the Q to 0 below?

    The constraint that Q(st,at,st+K,K)=0 enforces the feasibility of the trajectory

yorwba · on April 27, 2018

> the model is represented by function f, whose inputs can be any combination of S and A [domain], and will produce an output value in S [codomain])

Exactly. f:S×A↦S is a function signature, just like in a programming language. Basically, the model tells you which state you end up in after taking a certain action in the given state.

> Why do we set the Q to 0 below?

Q is introduced as

A temporal difference model (TDM)†, which we will write as Q(s,a,s_g,τ), is a function that, given a state s∈S, action a∈A, and goal state s_g∈S, predicts how close an agent can get to the goal within τ time steps. Intuitively, a TDM answers the question, “If I try to bike to San Francisco in 30 minutes, how close will I get?”

That means that setting Q(s_t,a_t,s_{t+K},K)=0 is the same as enforcing that the state s_{t+K} can actually be reached (distance 0) from s_t in K time steps. Without the constraint, it would be possible to plan a trajectory that can't be executed because the intermediate goals are too far away.

sarabande · on April 27, 2018

Ah, thanks for the explanation!