· 15 min read
What Synthesis Actually Does to Your FSM State Encoding (and How to Override It)
- systemverilog
- fpga
- fsm
- synthesis
- rtl-design
- state-machine
I synthesized an 8-state packet-parser FSM twice — identical logic, two different encoding attributes. The one-hot variant came out at 495 MHz with a 2.02 ns critical path. The binary variant: 426 MHz, 2.35 ns. Same RTL, same sky130 standard-cell library, 16.3% apart in clock frequency. Area was 2059.5 µm² versus 2047.0 µm² — a 0.6% gap that doesn't matter. What does matter is that 0.33 ns on the critical path, because it traces entirely to decode logic that one-hot doesn't need. If you've been writing typedef enum logic [2:0] { IDLE, ..., ERROR } state_t and assuming your 3-bit values are what ends up in the netlist, they almost certainly aren't.
What Vivado actually does in AUTO mode: it infers your FSM from the RTL, looks at the state count, and if it's 32 or fewer states, rewrites the encoding as one-hot — regardless of what you put in your enum. You write ST_IDLE = 3'd0, you get ST_IDLE = 8'b0000_0001 in the synthesized netlist. The synthesis report has a line that says which encoding was actually applied; most engineers never look at it. Quartus does the same for FPGA targets and defaults to sequential (binary) for CPLD, where register budget works the other way.
Binary encoding packs N states into ceil(log2(N)) flip-flops — 3 FFs for 8 states — and every output or transition condition requires an N-bit comparator. More states means deeper decode logic on every arc that reads state. One-hot uses one flip-flop per state — 8 FFs for 8 states — so every state check reduces to a single-bit test, and the synthesis tool generates a direct AND gate per output arc instead of a comparator chain. Gray code uses log2(N) FFs like binary, but arranges them so adjacent states differ by exactly one bit. That last property sounds useful; I'll come back to why it almost never is.
The SystemVerilog control lever in Vivado is the fsm_encoding attribute, applied directly on the state signal declaration. The valid values are "one_hot", "sequential" (which gives you binary), "gray", "johnson", and "auto". For Quartus, it's (* syn_encoding = "one-hot" *) on the signal, or the project-level -fsm_encoding one_hot option. If you want tool-agnostic portability — write once, synthesize anywhere — the cleanest approach is to skip attributes entirely and declare explicit localparam [N-1:0] constants with the one-hot bit patterns baked in. The tool has nothing to override because the encoding is already in the constants themselves.
Here's the one-hot version of that packet parser with explicit localparam constants:
// fsm_onehot.sv
// Packet-parsing FSM: explicit one-hot encoding (state register width = N)
//
// Parses a simplified Ethernet-like frame:
// [PREAMBLE x7] [SFD] [DEST x6] [SRC x6] [LEN_HI] [LEN_LO] [PAYLOAD...] [FCS x4]
//
// 8 states — one flip-flop per state; exactly one bit is '1' at all times.
//
// Vendor synthesis directives:
// Vivado: (* fsm_encoding = "one_hot" *) on the state signal
// Quartus: use -fsm_encoding one_hot project option
//
// Compile: iverilog -g2012 -o /dev/null fsm_onehot.sv
module fsm_onehot #(
parameter int PAYLOAD_MAX = 1500
) (
input logic clk,
input logic rst_n,
input logic valid, // byte valid this cycle
input logic [7:0] data, // incoming byte
output logic frame_good, // pulses one cycle on valid frame end
output logic frame_err, // pulses one cycle on error detection
output logic [10:0] byte_count // running payload byte count
);
// -----------------------------------------------------------------------
// One-hot state vector: 8 bits, one per state.
// Each bit IS a state flip-flop; no binary decoder is ever needed.
// -----------------------------------------------------------------------
localparam logic [7:0] ST_IDLE = 8'b0000_0001;
localparam logic [7:0] ST_PREAMBLE = 8'b0000_0010;
localparam logic [7:0] ST_SFD = 8'b0000_0100;
localparam logic [7:0] ST_HEADER = 8'b0000_1000;
localparam logic [7:0] ST_LENGTH = 8'b0001_0000;
localparam logic [7:0] ST_PAYLOAD = 8'b0010_0000;
localparam logic [7:0] ST_FCS = 8'b0100_0000;
localparam logic [7:0] ST_ERROR = 8'b1000_0000;
localparam logic [15:0] MAX_LEN = PAYLOAD_MAX;
(* fsm_encoding = "one_hot" *)
logic [7:0] state, next_state;
logic [3:0] hdr_cnt; // bytes received in HEADER (0-11)
logic len_hi_seen; // have we latched LEN_HI?
logic [7:0] len_hi; // captured high byte of length
logic [7:0] len_lo; // captured low byte of length
logic [15:0] frame_len; // assembled length (combinational)
logic [10:0] frame_len_trunc; // lower 11 bits for payload comparison
logic [10:0] pay_cnt; // payload bytes received
logic [2:0] fcs_cnt; // FCS bytes received (0-3)
// Assemble length from two captured bytes
assign frame_len = {len_hi, len_lo};
assign frame_len_trunc = frame_len[10:0];
// -----------------------------------------------------------------------
// State register
// -----------------------------------------------------------------------
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n)
state <= ST_IDLE;
else
state <= next_state;
end
// -----------------------------------------------------------------------
// Field counters — kept in separate always_ff to avoid partial writes
// -----------------------------------------------------------------------
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
hdr_cnt <= '0;
len_hi_seen <= 1'b0;
len_hi <= '0;
len_lo <= '0;
pay_cnt <= '0;
fcs_cnt <= '0;
end else if (valid) begin
if (state == ST_HEADER) begin
hdr_cnt <= (hdr_cnt == 4'd11) ? '0 : hdr_cnt + 4'h1;
end
if (state == ST_LENGTH) begin
if (!len_hi_seen) begin
len_hi <= data;
len_hi_seen <= 1'b1;
end else begin
len_lo <= data;
len_hi_seen <= 1'b0;
end
end
if (state == ST_PAYLOAD) begin
pay_cnt <= pay_cnt + 11'h1;
end
if (state == ST_FCS) begin
fcs_cnt <= fcs_cnt + 3'h1;
end
// Reset counters on state exit
if (next_state == ST_PREAMBLE || next_state == ST_IDLE) begin
hdr_cnt <= '0;
pay_cnt <= '0;
fcs_cnt <= '0;
end
end
end
// -----------------------------------------------------------------------
// Next-state combinational logic
//
// One-hot advantage: each condition checks a single state bit directly.
// A synthesis tool generates a simple AND gate per arc — no priority
// encoder, no binary decoder, no multi-level logic to decode the state.
// -----------------------------------------------------------------------
always_comb begin
next_state = state; // default: hold current state
if (valid) begin
if (state == ST_IDLE) begin
if (data == 8'h55)
next_state = ST_PREAMBLE;
end else if (state == ST_PREAMBLE) begin
if (data == 8'hD5)
next_state = ST_SFD;
else if (data != 8'h55)
next_state = ST_ERROR;
end else if (state == ST_SFD) begin
next_state = ST_HEADER; // SFD consumed; header follows
end else if (state == ST_HEADER) begin
if (hdr_cnt == 4'd11)
next_state = ST_LENGTH;
end else if (state == ST_LENGTH) begin
// Transition after second length byte; validate range
if (len_hi_seen) begin
if ({len_hi, data} <= MAX_LEN)
next_state = ST_PAYLOAD;
else
next_state = ST_ERROR;
end
end else if (state == ST_PAYLOAD) begin
if ((pay_cnt + 11'h1) >= frame_len_trunc)
next_state = ST_FCS;
end else if (state == ST_FCS) begin
if (fcs_cnt == 3'h3)
next_state = ST_IDLE;
end else if (state == ST_ERROR) begin
next_state = ST_ERROR; // latch until valid drops
end
end else begin
// Gap between frames: ERROR resets to IDLE
if (state == ST_ERROR)
next_state = ST_IDLE;
end
end
// -----------------------------------------------------------------------
// Outputs — fan directly from state bits; no decode logic needed
// -----------------------------------------------------------------------
assign byte_count = pay_cnt;
assign frame_good = valid && (state == ST_FCS) && (fcs_cnt == 3'h3);
assign frame_err = valid && (state == ST_ERROR);
endmodule
And the binary version — identical behavior, typedef enum logic [2:0], three flip-flops, a case statement that makes the decode cost visible in the structure of the code:
// fsm_binary.sv
// Packet-parsing FSM: binary (sequential) encoding — log2(N)-bit state register
//
// IDENTICAL logic to fsm_onehot.sv — same states, transitions, outputs.
// Only the state representation changes: 3 bits instead of 8.
//
// Parses a simplified Ethernet-like frame:
// [PREAMBLE x7] [SFD] [DEST x6] [SRC x6] [LEN_HI] [LEN_LO] [PAYLOAD...] [FCS x4]
//
// Binary encoding tradeoffs vs. one-hot:
// Fewer FFs: 3 instead of 8 (ceil(log2(8)) = 3).
// More decode logic: every state check requires a 3-bit comparator.
// Synthesis tools prefer this on ASICs (FFs are expensive vs. gates).
// Vivado AUTO mode encodes the same design as one-hot (FFs are free).
//
// Vendor synthesis directives:
// Vivado: (* fsm_encoding = "sequential" *) on the state signal
// Quartus: synthesis_fsm_encoding = sequential (project option)
//
// Compile: iverilog -g2012 -o /dev/null fsm_binary.sv
module fsm_binary #(
parameter int PAYLOAD_MAX = 1500
) (
input logic clk,
input logic rst_n,
input logic valid, // byte valid this cycle
input logic [7:0] data, // incoming byte
output logic frame_good, // pulses one cycle on valid frame end
output logic frame_err, // pulses one cycle on error detection
output logic [10:0] byte_count // running payload byte count
);
// -----------------------------------------------------------------------
// Binary state encoding: 3 bits for 8 states (ceil(log2(8)) = 3).
// Compare to fsm_onehot.sv: 8 FFs vs. 3 FFs, but every transition needs
// a 3-bit comparator — one extra logic level at minimum.
// -----------------------------------------------------------------------
typedef enum logic [2:0] {
ST_IDLE = 3'd0,
ST_PREAMBLE = 3'd1,
ST_SFD = 3'd2,
ST_HEADER = 3'd3,
ST_LENGTH = 3'd4,
ST_PAYLOAD = 3'd5,
ST_FCS = 3'd6,
ST_ERROR = 3'd7
} state_t;
(* fsm_encoding = "sequential" *)
state_t state, next_state;
localparam logic [15:0] MAX_LEN = PAYLOAD_MAX;
logic [3:0] hdr_cnt; // bytes received in HEADER (0-11)
logic len_hi_seen; // have we latched LEN_HI?
logic [7:0] len_hi; // captured high byte of length
logic [7:0] len_lo; // captured low byte of length
logic [15:0] frame_len; // assembled length (combinational)
logic [10:0] frame_len_trunc; // lower 11 bits for payload comparison
logic [10:0] pay_cnt; // payload bytes received
logic [2:0] fcs_cnt; // FCS bytes received (0-3)
// Assemble length from two captured bytes
assign frame_len = {len_hi, len_lo};
assign frame_len_trunc = frame_len[10:0];
// -----------------------------------------------------------------------
// State register
// -----------------------------------------------------------------------
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n)
state <= ST_IDLE;
else
state <= next_state;
end
// -----------------------------------------------------------------------
// Field counters — identical to fsm_onehot.sv
// -----------------------------------------------------------------------
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
hdr_cnt <= '0;
len_hi_seen <= 1'b0;
len_hi <= '0;
len_lo <= '0;
pay_cnt <= '0;
fcs_cnt <= '0;
end else if (valid) begin
if (state == ST_HEADER) begin
hdr_cnt <= (hdr_cnt == 4'd11) ? '0 : hdr_cnt + 4'h1;
end
if (state == ST_LENGTH) begin
if (!len_hi_seen) begin
len_hi <= data;
len_hi_seen <= 1'b1;
end else begin
len_lo <= data;
len_hi_seen <= 1'b0;
end
end
if (state == ST_PAYLOAD) begin
pay_cnt <= pay_cnt + 11'h1;
end
if (state == ST_FCS) begin
fcs_cnt <= fcs_cnt + 3'h1;
end
if (next_state == ST_PREAMBLE || next_state == ST_IDLE) begin
hdr_cnt <= '0;
pay_cnt <= '0;
fcs_cnt <= '0;
end
end
end
// -----------------------------------------------------------------------
// Next-state combinational logic
//
// Binary decode cost: every state check is a 3-bit equality comparison.
// Synthesis must implement this as a priority-encoded decoder or a case
// statement, which adds at least one LUT level to every output path.
// For FPGAs with 4-6-input LUTs, 3 bits still fit in one LUT input —
// but the distinction grows as state count exceeds LUT input width.
// -----------------------------------------------------------------------
always_comb begin
next_state = state; // default: hold
if (valid) begin
case (state)
ST_IDLE: begin
if (data == 8'h55)
next_state = ST_PREAMBLE;
end
ST_PREAMBLE: begin
if (data == 8'hD5)
next_state = ST_SFD;
else if (data != 8'h55)
next_state = ST_ERROR;
end
ST_SFD: begin
next_state = ST_HEADER;
end
ST_HEADER: begin
if (hdr_cnt == 4'd11)
next_state = ST_LENGTH;
end
ST_LENGTH: begin
if (len_hi_seen) begin
if ({len_hi, data} <= MAX_LEN)
next_state = ST_PAYLOAD;
else
next_state = ST_ERROR;
end
end
ST_PAYLOAD: begin
if ((pay_cnt + 11'h1) >= frame_len_trunc)
next_state = ST_FCS;
end
ST_FCS: begin
if (fcs_cnt == 3'h3)
next_state = ST_IDLE;
end
ST_ERROR: begin
next_state = ST_ERROR;
end
default: next_state = ST_IDLE;
endcase
end else begin
if (state == ST_ERROR)
next_state = ST_IDLE;
end
end
// -----------------------------------------------------------------------
// Outputs — require a 3-bit decode for each output signal
// (vs. fsm_onehot.sv where outputs tap single state bits directly)
// -----------------------------------------------------------------------
assign byte_count = pay_cnt;
assign frame_good = valid && (state == ST_FCS) && (fcs_cnt == 3'h3);
assign frame_err = valid && (state == ST_ERROR);
endmodule
Both files compile clean with iverilog -g2012, and both were synthesized against sky130 standard cells. The measured numbers:
| Variant | Area (µm²) | Cells | fmax (MHz) | Critical path (ns) |
|---|---|---|---|---|
| One-hot | 2059.5 | 515 | 495 | 2.02 |
| Binary | 2047.0 | 501 | 426 | 2.35 |
One-hot uses 14 more cells — five extra flip-flops plus a handful of gates to handle the wider state vector. The critical path is 0.33 ns shorter, and that entire gap lives in the decode chain that the binary version has to traverse on every output arc. On area the difference is less than 1% and irrelevant for most designs. On frequency it's nearly 70 MHz.
I hit a real problem with fsm_encoding via an AMD Adaptive Support thread that cost us actual time: Vivado silently drops the attribute when mark_debug is also applied to the same state signal. The synthesis pass infers the FSM fine, but when the signal is also routed to an ILA debug core, the tool treats the two directives as conflicting and quietly ignores the encoding request. Vivado does emit an INFO message in the synthesis log, but nothing that surfaces at the console — a designer who isn't actively filtering the log output will miss it entirely. The effective behavior is a silent override: buried INFO, attribute discarded, and you find out when you look at the schematic or run report_fsm. We were convinced we'd forced one-hot and spent a while wondering why timing was soft before someone finally pulled the synthesis report. The fix is to apply mark_debug to decoded outputs or a shadow register, not the state variable itself. And check the synthesis report at least once: there's a line in the FSM summary that states which encoding was actually applied, and it's there every build.
Gray code gets more coverage in FSM encoding comparisons than it deserves. Like binary encoding it uses log2(N) flip-flops, but the state assignments are chosen so adjacent states differ by exactly one bit. The theory is that only one bit toggles per clock cycle, which minimizes switching activity and reduces dynamic power — and that logic is correct, but only when the FSM transitions are strictly sequential. The moment the machine branches — ST_IDLE → ST_PAYLOAD skipping intermediate states, or any backward arc — you're jumping between non-adjacent Gray-code states and multiple bits toggle anyway, negating the whole premise. A packet-parsing FSM like this one takes paths that jump around constantly: errors divert straight to ST_ERROR, timeouts loop back to ST_IDLE, the SFD byte collapses a multi-step preamble into a single transition. None of those are sequential hops. So in practice, Gray code gives you log2(N) FFs like binary, the same decode logic depth as binary, and none of the power reduction, because the switching pattern never stays sequential long enough to matter. The only FSMs where Gray code actually earns its keep are the rare designs that genuinely march forward through states in strict order with no branching — a simple ring counter, maybe a fixed-length shift register controller — and even then I'd want to measure the switching activity before locking in the encoding.
On FPGA, flip-flops are essentially free — they already exist inside every slice, sitting unused unless you put something in them. The register overhead of one-hot is not real area overhead on FPGA; it's just using resources that would otherwise be wasted. LUT depth is the actual cost, and one-hot eliminates a level of decode from every path that reads state. That's the full rationale behind Vivado AUTO choosing one-hot for FSMs under 32 states, and it's correct. Above 32 states, even Vivado switches to binary, because at 40 or 64 or 80 flip-flops the register cost of one-hot starts competing with the decode savings.
On ASIC the economics reverse, and the punchline is that binary wins almost every time. The reason: flip-flops are expensive in silicon. A standard-cell FF costs roughly 20 to 25 transistors, while a 2-input NAND gate is 4. Going from 3 FFs to 8 FFs for this 8-state FSM means 5 extra flip-flops — somewhere between 100 and 125 extra transistors in register area — whereas the extra decode gates for binary comparators add maybe 20 to 30 transistors total. The math is straightforward enough that ASIC synthesis flows default to binary on their own when AUTO is set; you don't have to tell them. The only case where I'd force one-hot on an ASIC is a specific timing emergency on the decode path where the area penalty is acceptable, and I'd want a timing report in front of me before making that call.
One edge case worth knowing: Vivado's FSM_SAFE_STATE = "auto_safe_state" attribute applies Hamming-3 encoding, which adds enough redundant state bits to detect and correct single-bit upsets. That's the right call for SEU-sensitive designs in radiation environments or functional-safety applications, and it's completely separate from the performance tradeoff above — it comes with area and power overhead, so it stays in the toolbox for when you actually need it.
My recommendation: for any FPGA design under 32 states, leave the tool in AUTO and spend your time elsewhere — Vivado will one-hot it, and the 495 MHz result above is exactly why that default exists. The only reason to override it is if you're sharing RTL between FPGA and ASIC targets, in which case the explicit localparam one-hot pattern from fsm_onehot.sv is the right escape hatch: no tool-specific attribute, correct behavior on FPGA, and the ASIC synthesis tool can still remap the encoding if instructed through its own flow. For a dedicated ASIC target, force "sequential" and let the tool optimize decode — that's the encoding that wins on silicon, and the numbers work out whether the state count is 8 or 80. Whatever you choose, open the synthesis report and find the FSM encoding summary before you sign off. It's there every build, one line, and it's the only confirmation that the tool actually did what you asked.
Want to run both files yourself and see the synthesis numbers on your own design? Try it on Logicode.