This paper presents a novel multi-agent reinforcement learning (MARL) approach that incorporates agent priorities to address weapon-target assignment (WTA) with constraints, such as heterogeneous engagement time windows. The proposed approach begins by defining the decentralized Markov decision process (Dec-MDP) formulation for WTA involving heterogeneous, multiple agents. Our approach employs a hierarchical structure for MARL training, comprising an agent selector and a target selector, which sequentially determine the order of agents for assignment, i.e., preferred shooter selection and target selection. Through experimental designs, the proposed model demonstrates its ability to generate high-quality assignment plans within a short execution time. The model demonstrates superior performance across various scenarios, achieving the lowest threat survivability with a clear advantage over other baseline methods, especially in tightly constrained scenarios. Ablation studies and qualitative analyses are conducted to illustrate the influence of key components on performance, and these qualitative studies reveal the learning mechanism in agent and target selection. Additionally, transferability tests confirm the model's applicability to unseen problem cases, where training and testing environments are different, indicating its potential for real-world adaptation in various scenarios.