Optimizing Production Manufacturing Using
2006-1-11u2002·u2002Here, V* is the optimal value function and p* is the optimal average reward. Note that we are assuming unichain SMDP's, where the average reward is con-stant across states. Many real-world SMDP's, includ-ing the elevator task (Crites & Barto 1996) and the production inventory task and transfer line problems
Get Price