Cache-timing attacks on AES.
[via: Schneier on Security]
Daniel J. Bernstein published a profound paper on AES Timing attack. There is very little new in the article from cryptology or cyptography point of view. The side-channel timing attacks of this nature are well known for decades. Moreover, the attacks are not against AES algorithm as such, but against a specific software implementation (OpenSSL on Pentium III).
However it's clear that it is applicable to any other modern CPU, and to any software implementation of any cryptographic algorithm which uses S-boxes. This and other things that I will point out later, led me to conclude that the article and its implications are truly profound. But first of all a quote from the article:
Was there some careless mistake in the OpenSSL implementation of AES? No. The problem lies in AES itself: it is extremely difficult to write constant-time high-speed AES software for common general-purpose CPUs. The underlying problem is that it is extremely difficult to load an array entry in time that does not depend on the entry's index. I chose to focus on OpenSSL because OpenSSL is one of today's most popular cryptographic toolkits. (Constant time low-speed AES software is fairly easy to write but is also unacceptable for many applications. Rijndael would obviously not have survived the fist round of AES competition if it had been implemented in this way.) Is AES the only cryptographic function vulnerable to this type of attack? No. The relevant feature of AES software, namely its heavy reliance upon S-boxes, is shared by cryptographic software for many other functions.
Emphasis is mine. If you don't know the particulars of clock-resolution performance of modern CPUs, just take his word for it. If you do, most likely you don't know half of it. Read the article for all the gory details. The bad news is that an attacker doesn't even need to understand any of the detils. He just need to know that time depends on index, and simply measure it.
Any software algorithm on any modern CPU which involves a table lookup, like S-boxes, for example, is subject to timing attacks. Since many algorithms, including AES, can be implemented without table lookups, it wouldn't be a terribly big deal, if it weren't for one fact: apparently cryptographers participating in AES standardization were completely unaware of the fact that table lookups are a vulnerability. Again quoting from the article: (note that  refers to "Report on the Development of the Advanced Encryption Standard (AES)" and  refers to "Resistance Against Implementation Attacks: A Comparative Study of the AES Proposals")
7 Errors in the AES standardization process
RAM cache hits can produce timing characteristics," Kocher wrote in 1996 in [16, Section 11], after explaining in detail how to extract secret keys from timing leaks in another operation. A decade later, we still have cryptographic software, in particular, AES libraries, whose timing obviously varies with, among other things, input-dependent RAM cache hits. How did this happen?
Was the National Institute of Standards and Technology unaware of timing attacks during the development of AES? No. In its Report on the development of the Advanced Encryption Standard," NIST spent several pages discussing side-channel attacks, specifically timing attacks and power attacks. It explicitly considered the difficulty of defending various operations against these attacks. For example, NIST stated in [19, Section 5.1.5] that MARS was difficult to defend" against these attacks.
Did NIST decide, after evaluating timing attacks, that those attacks were unimportant? No. Exactly the opposite occurred, as discussed below.
So what went wrong? Answer: NIST failed to recognize that table lookups do not take constant time. Table lookup: not vulnerable to timing attacks," NIST stated in [19, Section 3.6.2]. NIST's statement was, and is, incorrect.
NIST went on to consider the slowness of AES implementations designed to protect against side-channel attacks. For example, NIST stated that providing some defense" for MARS meant severe performance degradation." NIST stated in [19, Section 5.3.5] that Rijndael gained a major speed advantage over its competitors when such protections are considered." This statement was based directly on the incorrect notion that table lookups take constant time. NIST made the same comment in its summary assessments of the finalists," and again in its concluding paragraph explaining the selection of Rijndael as AES. See [19, Section 6.5] and [19, Section 7].
NIST was not the first to make these comments. In a paper Resistance against implementation attacks: a comparative study of the AES proposals," the Rijndael designers commented that a comparative study of the efficiency of the different algorithms on platforms that allow the described attacks should take into account the measures to be taken to thwart these attacks"; incorrectly stated that table lookup is not susceptible to a timing attack"; and concluded that Rijndael was favorable," i.e., relatively easy to secure." See [9, Section 5], [9, Section 3.3], and [9, Section 4.12].
In response to this paper, one of the Rijndael designers characterized cache-timing attacks as irrelevant for cryptographic design." This characterization is out of whack with what actually occurred in the design of the Advanced Encryption Standard. Cryptographic designers should and the AES designers did, at great length consider the difficulty of protecting against timing attacks. Where the AES designers erred was in asserting that table lookups take constant time.
It is hard to guess what might have happened if this error had been pointed out during the AES standardization process. None of the fifteen AES candidates provide high performance on all popular general-purpose CPUs|except by using instructions with input-dependent timings, notably S-box lookups.
As often happens, bugs and vulnerabilities often sneak in in the interface between systems and disciplines. Here it happened on the boundary of three disciplines: cryptography, software, and hardware. Out of 3 pair-wise boundaries, I can speak authoritatively about one: software developers don't understand performance of modern CPUs. It's just too expensive to learn. And by the time you do, the knowledge is obsolete. And I doubt that the other two boundaries are any better.