LLM Performance for Code Generation on Noisy Tasks

Radzim Sendyka, Christian Cabrera, Andrei Paleyes, Diana Robinson, Neil D. Lawrence
, 2025.

Abstract

This paper investigates the ability of large language models (LLMs) to recognise and solve tasks which have been obfuscated beyond recognition. Focusing on competitive programming and benchmark tasks (LeetCode and MATH), we compare performance across multiple models and obfuscation methods. We introduce the concept of eager pattern matching and discuss implications for benchmarking, dataset contamination, and automated software systems.