Current Change Malware

One of my favorite projects from grad school was working on change-in-current malware back in 2002. Nerds seem to really love the story so here it goes! It’s been a long while since I’ve worked on hardware so please let me know in the comments if I’ve messed up anything.

Resonance

You may recall from high school physics a phenomenon known as resonance, where force repeatedly applied at just the right frequency will increase the output in certain systems. Think about how kicking your legs on a swing-set at precisely the right moment increases your height – that’s resonance. There are many other sometimes useful, sometimes hilarious examples of resonance in the world as well.

You may also recall from physics class that electrical circuits have a resonant frequency, and if you vary how much current goes through a circuit at the right frequency you can cause large swings in voltage.

“Hey…” you’re thinking to yourself, “computers are electrical circuits. What does this all mean for computers?” Well, a few things:

  • The max voltage chips can handle has generally been dropping over the years as transistors get smaller. Recommended max voltage on modern Intel processors tops out at 1.7ish volts for example (p. 168) and lower numbers are very common. Any voltage above that max can cause the circuits to malfunction or even break permanently. This is why static electricity can ruin electronics.

  • The inductance of chips has generally been on the rise. All other things being equal, larger inductance means that the same change in current will cause larger voltage changes. This is known as “the L dI/dt problem” in chip design.

  • To avoid dI/dt problems, engineers include Voltage Regulation Modules (VRMs) to keep the voltage steady when current changes (good discussion on the circuits to do that here). When VRMs detect changes in voltage the reaction time is generally in the neighborhood of 1 MHz, so there are some capacitors and an Over-Voltage Protection (OVP) circuit that kicks in to keep things from completely exploding.

  • The resonant frequency of modern processors is usually in the 50-200 MHz range.

  • Modern clock rates are typically in the ballpark of 2.5 – 5 GHz.

So let’s put this all together: if we had a hypothetical 3 GHz processor with a 100 MHz resonant frequency, that means that if we constructed a program that alternates drawing a lot of current and not drawing much current every 15 cycles, we would hit that processor’s resonant frequency and cause voltage to swing out of control. The VRM’s reaction time is a few orders of magnitude too slow to kick in, so either the OVP would shut down the system or we may even permanently break the chip! Technology trends (higher inductance, smaller transistors, etc.) mean that this should become easier over time.

Fun aside: there are examples of real programs stumbling close to resonant frequency unintentionally, but I’m unaware of any other efforts to construct programs specifically to hit it and cause problems.

Programs to Draw Current

First thing to ask is do we really need the program to draw current? Most (all?) modern chips allow you to control the voltage directly as a way to save power, but the latency of switching voltage this way is a little too slow – like 1 millisecond – so you’ll never be able to alter it fast enough to hit resonant frequency. So, yes, we really need to write some assembly code to do this.

OK, so how might we do that? This is very, very dependent on the chip architecture. The general idea is that you want to flip as many/few of the transistors as possible for high/low current draw. There’s a lot of current draw in a chip you have no control over, like the chip’s clock tree and static power that just leaks through transistors. We want to focus on the current we can control – namely bit flips.

I’m going to show results from the Pentium III I was testing on at the time, so let’s break out our trusty architecture guide. I’ll focus the discussion on drawing the most current for just a moment, but we’ll get back to drawing very little current. Here are the limitations of that particular architecture.

The key takeaway is that steady-state we can only push the lesser of 16 bytes of instructions or 3 total instructions through the pipeline each cycle. They can’t just be any instructions either; this architecture has 5 instruction “ports” that can only handle certain types of instructions, e.g., port 2 is only for loads and port 1 is the only one that can handle branches.

Where are the bits to flip? Here’s a die photo from the Pentium III (source).

The biggest blocks are the L2 cache and L1 caches (DCU/IFU), so there are a lot of bits there to flip. It’s also worth looking at the blocks which are more easily ‘flippable’ such as the execution (IEU/FEU) units. Those don’t have as many transistors, but it’s pretty straightforward to construct instructions that will flip a lot of them (e.g., adding -1 and 1) where it’s not necessarily as straightforward to flip a lot of bits in the register alias table (not important if you don’t know what that is). The SIMD unit has even more flippable bits, but those instructions are pretty big and we have that 16 bytes/cycle limitation in the fetch stage.

My partner and I started by trying to understand the current draw of individual instructions, constructing programs with very long loops of 1 instruction with an IPC of 1 where possible. Perf had just been released and was a god-send to ensure the programs were actually doing what we thought they were doing.

Measuring the actual current drawn by a program can be done directly with a power meter, but I was in college when I worked on this, so it was cheaper to just solder a resistor into the computer’s power supply cable and use a multimeter that was lying around. The programs have to run for a few minutes before you get a good measurement because motherboards have capacitors that act as charge reservoirs, so those need emptied before the current draw from the power cable reaches a steady state.

Results: loads and stores that hit in the L2 cache consumed the most current by far: a little over 4.9 Amps in the steady state. L1 hits were closer to 4.1 Amps. Arithmetic operations were mostly in the 3.4 – 3.8 Amp range no matter what we tried. We could generate roughly a 0.2 Amp difference when carefully choosing operands to maximize bit flips versus operands that minimize flips. All these numbers might seem high, but remember these programs were designed with IPC one and no pipeline stalls, meaning the fetch, decode, etc. are working much harder than they would in a typical program.

Drawing very little current is actually trickier. The most effective way is to fire off a slow instruction like a cache miss or a divide that all following instructions need the result of. This causes the entire pipeline to stall waiting for that result but has the downside that you can’t really alter the latency of this stall much. Upshot is that this won’t really work if you need fine-grained control over exactly how many cycles you spend in low-current mode. Of all the instructions we tested CLC seemed to draw the lowest current. One slightly counter-intuitive result: NOPs drew quite a bit of current. They have no dependencies and have two execution ports on this architecture, so the chip can really chew through them quickly, keeping the non-execution portions of the pipeline very busy.

Constructing the Malware

Armed with that information we wrote a program that alternated high-draw instructions (a load that hits L2, an independent store that hits L2, and an add) with low-draw instructions (just a CLC). We actually wrote a program to write these programs since we wanted to try several different frequencies of high/low draw. I wish I could find the code, and if it turns up I’ll definitely post it.

The processor we were playing with had a 1 GHz clock, so we could alternate high/low draw at 500 MHz (1 high draw cycle/1 low draw cycle), 250 MHz (2 high/2 low), 166.7 MHz (3/3), etc. We tried everything from 500 MHz down to 20 MHz. Unfortunately, we weren’t able to cause any noticeable chip failures at the time. It’s possible that the 1 GHz clock didn’t provide enough granularity to hit the resonance frequency as precisely as necessary. It’s also possible that the difference between our high/low current draw sections (roughly 2 Amps) was not large enough to cause noticeable voltage swings.

I’ve always wondered if perhaps it’s possible on modern chips though. Clock frequencies are much higher than they used to be, inductances are higher, chips draw more current at peak operation which should make it easier to cause larger swings, and operating voltages are lower which should make it easier to cause problem on chip. Would love to hear if you give this a shot and discover anything interesting.

2 responses to “Current Change Malware”

  1. nickblackatl Avatar
    nickblackatl

    i’m delighted to find the full story! i’ve repeated this to numerous people since 2010, but without all the details.

  2. nickblackatl Avatar
    nickblackatl

    by the way, you rejected SIMD instructions due to front end stalls in decode. i wonder how that equation changes once you throw the loop stream detector in there, and start working with decoded uops. instructions fed from the LSD don’t need to go through the frontend decoder, so unless you were counting on the flip activity there (which i’d think SIMD would more than make up for), it ought be a win?

Leave a comment

Create a website or blog at WordPress.com