JOURNAL OF MANUFACTURING PROCESSES, v.156, no.A, pp.63 - 78
Abstract
Weld seam tracking is a critical capability for automated laser welding systems, requiring high precision and adaptive control in environments where manual programming is infeasible. Existing approaches often rely on rule-based logic or task-specific models, limiting their ability to support end-to-end automation. This study proposes a novel vision-to-code framework that directly generates executable control code from a single weld seam image, enabling fully automated seam tracking without the need for handcrafted image processing or predefined alignment logic. A domain-specific dataset was constructed by annotating grayscale weld seam images with executable C# code, enabling the model to learn a direct mapping from visual input to machine-level instructions. The proposed deep learning architecture-featuring a CNN-based visual encoder, a nonautoregressive Transformer decoder, and a custom tokenizer for code generation-was trained entirely from scratch to capture the structural and semantic characteristics of the welding task. The system was validated on a butt-joint welding task using a multimode fiber laser applied to aluminum alloy specimens with varying weld geometries and surface textures. The model achieved a BLEU-4 score of 0.94851 and a pass@1 rate of 99.62 %, and demonstrated robust generalization to unseen seam geometries and material textures. These results underscore the novelty and practical utility of the proposed approach, which bridges image understanding and control code generation in an end-to-end framework for vision-driven welding automation.