Abstract
This paper addresses a stochastic economic lot scheduling problem (SELSP) for a single machine make-to-stock production system in which the demands and the processing times for N types of products are random. The sequence-independent setup times and costs are explicitly considered and may have different values for various types of products. The SELSP is to decide when, what, and how much (the lot size) to produce so that the long-run average total cost, including setup, holding and backorder costs, is minimised. We develop a mathematical model and propose two reinforcement learning (RL) algorithms for real-time decision-making, in which a decision agent is assigned to the machine and improves the accuracy of its action-selection decisions via a ‘learning’ process. Specifically, one is a Q-learning algorithm for a semi-Markov decision process (QLS) and another is a Q-learning algorithm with a learning-improvement heuristic (QLIH) to further improve the performance of QLS. We compare the performance of QLS and QLIH with a benchmarking Brownian policy and the first-come-first-served policy. The numerical results show that QLIH outperforms QLS and both benchmarking policies.
Keywords: