Automatically generated security patches have become increasingly prevalent as automated vulnerability repair techniques advance. However, accurately validating whether such patches preserve the original program functionality remains a challenging problem. Existing test-based evaluation approaches often lead to misclassification when test cases are absent or insufficient. In addition, prior functionality preservation checking methods typically exclude error-handling blocks from the scope of validation, assuming that they do not affect the original functionality. This assumption causes incorrect validation results when patches introduce flawed error-handling logic, a pattern frequently observed in automatically generated security patches. To address these limitations, we propose ExtractCompare, a functionality validation method based on execution flow equivalence for non-crashing inputs. ExtractCompare constructs preconditions at the repair location for both the original and patched programs, and determines functional validity by checking the implication between them. We evaluate ExtractCompare on a total of 84 security patches from L-AVRBench and an in-house dataset. In this evaluation, ExtractCompare correctly classified 27 patches as invalid that were misclassified as valid by prior functionality preservation checking method. This result demonstrates that ExtractCompare improves the reliability of functionality validation by reducing false positives in the assessment of automatically generated security patches.
Publisher
Ulsan National Institute of Science and Technology