Boffins at the University of Cambridge discovered a bug that affects most computer code compilers and many software development environments.
The problem is a component of the digital text encoding standard Unicode, which allows computers to exchange information regardless of the language used. Unicode currently defines more than 143,000 characters across 154 different language scripts .
However, Unicode's bi-directional or "Bidi" algorithm, which handles displaying text that includes mixed scripts with different display orders, such as Arabic -- which is read right to left -- and English (left to right) has a pretty big issue. Computer systems need to have a deterministic way of resolving conflicting directionality in text and it uses a function called the "Bidi override," which can be used to make text work in any direction.
The boffins found that the default ordering set by the Bidi Algorithm is fairly grim.
Bidi override control characters enable switching the display ordering of groups of characters. Bidi overrides enable even single-script characters to be displayed in an order different from their logical encoding. As the researchers point out, this fact has previously been exploited to disguise the file extensions of malware disseminated via email.
However most programming languages let you put these Bidi overrides in comments and strings. This is bad because most programming languages allow comments within which all text -- including control characters -- is ignored by compilers and interpreters. Also, it's bad because most programming languages allow string literals that may contain arbitrary characters, including control characters.
Ross Anderson, a professor of computer security at Cambridge and co-author of the research said that so you can use them in source code that appears innocuous to a human reviewer “can actually do something nasty".
So, projects like Linux and Webkit that accept contributions from random people, subject them to manual review, then incorporate them into critical code will suffer as the rendered source code looks perfectly acceptable.
"If the change in logic is subtle enough to go undetected in subsequent testing, an adversary could introduce targeted vulnerabilities without being detected", he said.
Bidi override characters persist through the copy-and-paste functions on most modern browsers, editors, and operating systems.