This study presents the implementation of the graphics processing unit (GPU)-based coarse mesh finite difference (CMFD) acceleration in STREAM3D-GPU using the directive-based OpenACC framework. The offloading process follows a structured approach of assessment, parallelization, and optimization, with data structures reorganized to maximize GPU efficiency. Performance evaluations on three-dimensional OPR-1000 reactor models demonstrate up to a 22-fold reduction in CMFD run time compared to the CPU version, with the greatest improvements observed in the linear system solver and flux convergence routines due to extensive parallelization and concurrent execution. Numerical verification using depletion simulations of the BEAVRS benchmark confirmed that the GPU implementation maintained high fidelity, with eigenvalue deviations within 5 pcm and maximum differences in power distribution below 0.6%.