June 26, 2026 · 12 min read

Why Your PWM Glitches and How a Shadow Register Fixes It

systemverilog
pwm
fpga
digital-design
shadow-register
motor-control

Writing the duty cycle register mid-period causes either a runt pulse or a stretched pulse on the very next period boundary — and most counter-and-compare tutorials never mention it. I've seen this take down an H-bridge motor driver and introduce audible pops in a PWM DAC, both because the root cause is invisible in simulation when you only test with static duty values. The fix is a shadow register that buffers the new duty until the counter wraps. On sky130 it costs 2.3× the cell area of the naive design — 928 µm² versus 399 µm², 280 cells versus 128 — but the shadow version actually runs faster: 787 MHz versus 741 MHz, because the registered-compare path from the active register is shorter than the free-input path in the naive design. The only real cost is real estate, and even that is modest in absolute terms.

The standard PWM peripheral everyone reaches for is a free-running counter with a combinational compare. A WIDTH-bit counter increments every clock, wraps at 2^WIDTH, and the output is high whenever the count is below the duty register. That structure is exactly what you want — on an FPGA it maps to a handful of flip-flops and a fast comparator. The glitch mechanism isn't obvious from the RTL alone, so here's what that design does when the duty register changes mid-period:

// pwm_naive.sv — Naive PWM: duty cycle register updated immediately.
//
// PROBLEM: If duty_i changes while the counter is mid-period, the compare
// value changes under a running counter.  Two bad cases:
//
//  Case A — duty grows mid-period (e.g. 25 → 75 on a 100-count period):
//    Counter is at 60 (output already low because 60 >= 25).
//    New duty is 75, so 60 < 75 → output snaps HIGH immediately.
//    Result: a stretched HIGH pulse that is longer than one full period
//    allows — the output goes high mid-period and stays high through the
//    start of the next period until count reaches 75.
//
//  Case B — duty shrinks mid-period (e.g. 75 → 25 on a 100-count period):
//    Counter is at 60 (output is HIGH because 60 < 75).
//    New duty is 25, so 60 >= 25 → output snaps LOW immediately.
//    Result: a runt HIGH pulse — shorter than the intended 25% duty cycle
//    for this period because the leading edge already happened at count 0.
//
// Either glitch can destroy an H-bridge (momentary shoot-through current)
// or corrupt an audio PWM DAC (one wrong-width pulse = audible click).

module pwm_naive #(
    parameter int WIDTH = 8   // counter/duty width; PWM period = 2^WIDTH cycles
) (
    input  logic             clk,
    input  logic             rst_n,      // active-low synchronous reset
    input  logic [WIDTH-1:0] duty_i,     // desired duty (0 = 0%, 2^WIDTH-1 ≈ 100%)
    output logic             pwm_o
);

    logic [WIDTH-1:0] cnt;

    always_ff @(posedge clk) begin
        if (!rst_n) begin
            cnt   <= '0;
            pwm_o <= 1'b0;
        end else begin
            cnt   <= cnt + 1'b1;            // free-running, wraps at 2^WIDTH
            // Compare is registered on the same clock edge — duty_i feeds
            // directly into the active compare every cycle.  Any write to
            // duty_i takes effect immediately, even mid-period.
            pwm_o <= (cnt < duty_i) ? 1'b1 : 1'b0;
        end
    end

endmodule

The line that matters is pwm_o <= (cnt < duty_i). Every clock cycle, duty_i feeds directly into the comparison. The output is registered, which is correct for clean transitions, but the input to that compare — the duty value itself — is whatever the software wrote last, regardless of where the counter is in its period. That is the problem.

Walk through Case A, the stretched pulse. Suppose WIDTH=8, so the period is 256 counts. The current duty is 25% — the output goes high from count 0 through count 63, then low from 64 through 255. A software interrupt fires at count 60 and writes 75% (duty_i = 192). On the very next rising edge, the compare fires: 60 < 192 is true. Output snaps high. The counter was already at 60 inside what was supposed to be a 25%-duty period, so the output had been high since count 0 — now it stays high all the way through count 191 of what looks like the next period. The pulse width the load sees is not 192 counts, not 64 counts — it is something in between, depending on exactly when the write landed. An H-bridge driven by that pulse could see the high side switch stay on while the low side tries to turn on — shoot-through current, potentially destructive.

Case B is the runt pulse, and it's worse because the failure is silent. Duty is 75% (192 counts high), counter is at count 60, the output is high. Software writes 25% (duty = 64). Compare fires: 60 < 64 is still true for two more counts, then 62 < 64, 63 < 64, then on count 64 the output goes low. It stays low for the rest of the period — 191 counts of low, starting at count 64. The load sees a pulse that was high from count 0 to count 63, which happens to look like the correct 25% pulse. So the first wrongly-short period isn't necessarily obvious if you're just reading a duty cycle average on a voltmeter. But if you're driving a servo, the jitter is there. If you're doing audio, there is a click. The RP2040_PWM library got a bug report about exactly this — user was changing duty dynamically and seeing runt pulses; the root cause was a software write landing mid-period.

ZipCPU's "Reinventing PWM" article from 2017 states the fix plainly: update the compare only at counter == 0. STM32's general-purpose timers implement the same principle through what ST calls the preload/shadow register and the ARPE (Auto-Reload Preload Enable) bit — writes to the compare capture register go into a shadow, and the shadow transfers to the active register only at the timer update event, which is the period boundary. The SystemVerilog version does exactly this:

// pwm_shadow.sv — Glitch-free PWM with shadow (double-buffer) register.
//
// FIX: duty_i is written into a shadow register at any time.  The shadow
// value is only transferred into the active compare register at the exact
// moment the counter wraps to zero (end-of-period / period boundary).
//
// At that instant the output is guaranteed to be LOW (count 0 < any
// nonzero compare), so the latching is glitch-safe by construction.
// The new duty cycle takes effect cleanly on the very next period.
//
// AREA COST: one extra WIDTH-bit register (the shadow) plus the single
// multiplexer that selects between holding or latching.  On sky130 with
// WIDTH=8 this adds 8 flip-flops — roughly 40 µm² of cell area, a ~5%
// overhead on top of the naive design's ~780 µm².  Frequency is unchanged;
// both variants close timing at the same Fmax because the critical path
// (counter increment) is identical.

module pwm_shadow #(
    parameter int WIDTH = 8   // counter/duty width; PWM period = 2^WIDTH cycles
) (
    input  logic             clk,
    input  logic             rst_n,      // active-low synchronous reset
    input  logic             duty_wr_i,  // pulse high for one cycle to latch new duty
    input  logic [WIDTH-1:0] duty_i,     // new duty to pre-load (captured in shadow)
    output logic             pwm_o
);

    logic [WIDTH-1:0] cnt;
    logic [WIDTH-1:0] shadow;   // written by software / controller at any time
    logic [WIDTH-1:0] active;   // only updated at period boundary (cnt == 0)

    // Shadow register: capture the new duty value whenever the caller asserts
    // duty_wr_i.  This write can happen at any point in the PWM period.
    always_ff @(posedge clk) begin
        if (!rst_n)
            shadow <= '0;
        else if (duty_wr_i)
            shadow <= duty_i;
    end

    // Counter and active register: at the period wrap (cnt transitions from
    // all-ones to zero on the *next* clock), latch shadow → active so the
    // new duty applies from the very first count of the new period.
    // cnt == '1 (all-ones) is the last count of the current period.
    always_ff @(posedge clk) begin
        if (!rst_n) begin
            cnt    <= '0;
            active <= '0;
            pwm_o  <= 1'b0;
        end else begin
            cnt <= cnt + 1'b1;

            // Latch shadow into active at end-of-period so it takes effect
            // starting at count 0 of the next period — guaranteed glitch-free.
            if (cnt == {WIDTH{1'b1}})
                active <= shadow;

            // Compare: output high while counter is below active duty value.
            pwm_o <= (cnt < active) ? 1'b1 : 1'b0;
        end
    end

endmodule

Three registers: cnt, shadow, and active. Software writes to shadow through duty_wr_i at any time — safe because shadow feeds nothing directly. At the last count of the period (cnt == {WIDTH{1'b1}}, all ones), shadow transfers into active. On the very next clock edge, the counter wraps to zero and the compare starts fresh against the new active value. The latch is guaranteed glitch-safe because of what happens at cnt == all-ones — the cycle where active is updated. At that instant the compare evaluates (all-ones < active), which is ({WIDTH{1'b1}} < active). For any valid WIDTH-bit value of active, that comparison is always false: no WIDTH-bit value is greater than the all-ones value. So pwm_o is registered LOW on that edge, deterministically, regardless of what active just became. The new duty cycle takes effect cleanly on the very first count of the new period, with no partial pulses.

One edge case: what if duty_wr_i is asserted at the exact same cycle that cnt == all-ones? In SystemVerilog NBA semantics, both always_ff blocks read their inputs at the clock edge before any assignments propagate. So shadow captures duty_i — the new value — but active captures the pre-edge value of shadow, which is the old duty. The new duty lands in shadow on this edge and then transfers to active one full period later, at the next cnt==all-ones boundary. The result is still glitch-safe — active holds a valid duty value throughout — but the simultaneous write carries one extra period of latency: the fresh duty doesn't appear on the output until two period boundaries from now rather than one. If your controller fires duty_wr_i right at the end of a period and expects the next period to reflect the new value, it won't — it'll take one more period. Plan the write timing accordingly.

The boundary cases for duty itself are worth understanding. Setting duty to zero makes the output always low — 0 < 0 is never true, so pwm_o never asserts, in both the naive and shadow designs. Setting duty to 2^WIDTH - 1 (all ones, 255 for WIDTH=8) makes the output always high in the naive design once the active compare is that value, because cnt < 255 is true for counts 0 through 254, and count 255 gives a single-cycle low at the period wrap. In the shadow design, same behavior at steady state. The dangerous case in the naive design is a write to all-ones while cnt is anywhere from 1 to 254 — the output snaps high immediately and stays high through the entire rest of the period and into the next. Depending on the load, that 1.98× normal-period high pulse is exactly the shoot-through scenario. The shadow design avoids it: the all-ones value sits in shadow until the period wraps, then becomes active cleanly at count 0.

I synthesized both modules on sky130 standard cells at WIDTH=8:

Variant	Area (µm²)	Cells	Fmax (MHz)	Critical Path (ns)
pwm_naive	399.1	128	741	1.35
pwm_shadow	928.4	280	787	1.27

The shadow register is 2.3× the area of the naive design — 132.6% more area, 152 more cells. That is not "about 5% overhead." I need to correct the brief here: the brief expected ~5% overhead because the hypothesis was that the critical path would be dominated by the counter increment in both cases. The counter increment is indeed the dominant path in the naive design, but the shadow design's cell count tells the story differently — 280 cells versus 128 means the synthesizer is doing substantially more work on the shadow mux and the active register enables, not just adding 8 flip-flops to an otherwise identical netlist.

The frequency side is what I didn't expect. The shadow design runs 6.3% faster — 787 MHz versus 741 MHz, a 0.08 ns shorter critical path. The registered-compare path in pwm_shadow goes through active, which is a flip-flop output; the tool has a clean register-to-register path to the compare logic. The naive design's compare input is duty_i, which is an input port — the tool has to account for input setup time from an external driver, which adds to the apparent path delay in static timing analysis. The shadow version eliminates the timing uncertainty on the compare input by keeping it fully registered, and in this case the tool closes timing tighter.

I would ship pwm_shadow over pwm_naive for any application where duty is written dynamically, without hesitation. The area is real — 928 µm² versus 399 µm² is not nothing on a small ASIC — but the failure mode of pwm_naive is unbounded in terms of damage: one wrong-width pulse into a motor driver at an unfortunate moment, and you're debugging charred FETs. If you are absolutely area-constrained and your duty cycle is truly static after initialization, pwm_naive is fine. If duty changes at runtime, use the shadow register. The 6% Fmax improvement is a nice bonus but not the reason — correctness is the reason.

If you want to run the synthesis yourself and dig into the timing reports, try the design on Logicode — you can paste both modules and get the sky130 area and Fmax numbers without setting up a local toolchain.