As the digital transformation of the bioprocess is progressing, several studies propose to apply data-based methods to obtain a substrate feeding strategy that minimizes the operating cost of a semi-batch bioreactor. However, the negligent application of model-free reinforcement learning (RL) has a high chance to fail on improving the existing control policy because the available amount of data is limited. In this article, we propose an integrated algorithm of double-deep Q-network and model predictive control. The proposed method learns the action-value function in an off-policy fashion and solves the model-based optimal control problem where the terminal cost is assigned by the action-value function. For simulation study, the proposed method, model-based method, and model-free methods are applied to the industrial scale penicillin process. The results show that the proposed method outperforms other methods, and it can learn with fewer data than model-free RL algorithms.