Post

RISC-V - Lessons from My First Processor

Explore my hands-on journey of building a simple RISC-V processor and discover key lessons on design planning, thorough test bench verification, and efficient signal grouping with SystemVerilog.

RISC-V - Lessons from My First Processor

Introduction

I have always believed that the best way to truly understand how something works is to build it yourself—even when following a tutorial. There’s nothing as enlightening as the hands-on experience of writing or creating a solution, whether it’s a WordPress blog or an HTTP server. The same holds true for processors. These complex digital devices hide many secrets and mechanisms that are not immediately obvious to users or security researchers.

A few months ago, I decided to challenge myself by building a very simple processor based on the RISC-V architecture, using the basic RV32I instruction set—the simplest variant of RISC-V.

My Project: ucrv32

Thus, my processor, ucrv32, was born. It features a five-stage instruction pipeline consisting of:

  • IF (Instruction Fetch)
  • ID (Instruction Decode)
  • EX (Execution)
  • MEM (Memory Access/Write)
  • WB (Write Back)

Its design is intentionally simple and far from perfect. After completing most of the project (the processor is not complete yet, there are still a few date hazards left), I realized there are many things I would have done differently and better. If you’re interested in the implementation details of this processor in SystemVerilog (Hardware Description Language), I invite you to check out the repository. This post serves as a kind of diary for me, recording lessons I want to pass on to my future self and to other readers.

Lesson 1 – Plan Your Design

I’ve always had a mindset of, “Why bother planning if nothing ever goes exactly as designed?” While it’s true that early in a project it’s hard to predict every obstacle that might affect the final outcome, sketching out your digital design on paper offers a crucial advantage: catching mistakes early and sticking to your plan.

Moreover, it’s much easier to introduce potential changes in the design phase and then implement them in code, rather than trying to rework the code directly—especially when dealing with numerous signals, multiple modules, and various stages of instruction processing. Consider the many signals that must flow between stages: the current PC (Program Counter, known as the Instruction Pointer in x86), the active instruction, and the control signals for the ALU, memory, and registers. For instance, signals generated during instruction decoding that are meant for the final Write Back stage must traverse the ID, EX, MEM, and finally WB stages.

Take this example: the Write Back stage needs to know the register index where the results should be stored—a signal produced by the controller during decoding. I overlooked this in the initial coding, and later incorporating it required changes across the ID, EX, MEM, and WB stages, potentially causing further connection errors or even missing signal transfers.

Summary: A well-thought-out design plan is essential—it not only helps catch errors early but also prevents the ripple effect of modifications throughout the entire system.

Lesson 2 – Test Benches and Verification

I’ve always believed that rigorous testing is as crucial as solid design. Just as planning your design on paper helps catch mistakes early, crafting detailed test benches with proper assertions is key to ensuring your hardware behaves exactly as expected.

By writing extensive test benches, you can simulate your circuit both in isolation and during interactions with other modules. This approach not only verifies the core functionality of your design but also tests its communication with other systems—ensuring that every signal and state transition aligns with your expectations.

For instance, I use Verilator to convert my HDL code into C, enabling tick-by-tick simulation of the hardware. This tool allows me to treat hardware like software, letting me write precise test cases that check each simulation cycle. With Verilator, I can assert that every component of my design performs correctly, which has proven invaluable for early error detection and efficient debugging.

I’ve always maintained that test benches should be written for every module individually. Yet, even though I believed in thorough testing, I assumed that a simple module wouldn’t require much scrutiny. As an example, during the development of my processor, I didn’t notice until the very end that my registers were constantly being reset.

I had written a simple piece of Assembly code intended to store 5, 10, and 15 in register x1:

1
2
3
4
5
6
.section .text
.globl _start
_start:
    addi x1, x0, 5
    addi x1, x1, 5
    addi x1, x1, 5

When I checked the waveform, I observed that the register was perpetually holding the value 5. After about ten minutes of debugging, I finally discovered that the registers were handling the reset signal incorrectly—in my case, using an active-low reset:

1
2
  always_ff @(posedge clk_i or negedge n_rst) begin
    if (n_rst) begin

The correct implementation should be:

1
2
  always_ff @(posedge clk_i or negedge n_rst) begin
    if (!n_rst) begin

Even though I was committed to writing test benches for every aspect of my design, I assumed that such a simple module wouldn’t have issues. Yet, this experience proved that even the simplest designs can harbor hidden bugs.

Summary: Always write test benches for each module—even for those that seem trivial. They can help uncover subtle issues early, saving time and frustration down the line.

Lesson 3 – Group Signals

SystemVerilog allows you to create interfaces that bundle related signals for connecting two modules. This means you can group all the necessary connections into a single, manageable block. For instance, here’s an interface I created to connect a processor with RAM:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
interface ram_interface #(
  parameter DATA_WIDTH = 32,
  parameter ADDR_WIDTH = 32
)(
  input logic clk_i_a,
  input logic clk_i_b
);
  logic [3:0] we_i_a;
  logic en_i_a;
  logic [DATA_WIDTH-1:0] data_i_a;
  logic [DATA_WIDTH-1:0] data_o_a;
  logic [ADDR_WIDTH-1:0] addr_i_a;
  [...]

  modport master (
    input clk_i_a,
    output we_i_a,
    output en_i_a,
    output data_i_a,
    input data_o_a,
    output addr_i_a,
    [...]
  );

  modport slave (
    input clk_i_a,
    input we_i_a,
    input en_i_a,
    input data_i_a,
    output data_o_a,
    input addr_i_a,
    [...]
  );
endinterface

Here’s a brief overview of the ports:

  • Clock Inputs: clk_i_a and clk_i_b synchronize operations for each port.
  • Data Signals: data_i and data_o handle data transfers.
  • Control Signals: we and en manage write operations and enable the ports.
  • Address Lines: addr_i specify the memory locations. The modports define the roles for the connected modules:
  • The master modport drives outputs (like write enables and addresses) and receives inputs (like data outputs).
  • The slave modport does the opposite.

While I originally developed this interface for RAM, the same approach is ideal for connecting pipeline stages and registers. Grouping signals like this reduces wiring complexity and helps prevent errors—a benefit that addresses issues mentioned in Lesson 1.

There’s just one issue: Verilator doesn’t play well with interfaces—you need to flatten them to the top-level module first.

Summary: Grouping signals into interfaces not only simplifies design modifications but also enhances overall system organization and reliability.

Summary

In this article, I shared my hands-on journey of building a simple RISC-V processor, ucrv32, using the RV32I instruction set. Through this project, I learned invaluable lessons that extend beyond this specific design:

  • Plan Your Design: Early planning and sketching your design on paper help catch potential errors before they propagate through complex modules. This approach minimizes the risk of cascading issues across multiple pipeline stages.
  • Test Bench and Verification: Writing detailed test benches for each module is critical. As demonstrated by a subtle bug in my register reset logic, thorough testing—even for seemingly simple modules—can save significant debugging time. Tools like Verilator enable tick-by-tick simulation, allowing for precise verification of hardware behavior.
  • Group Signals: Using SystemVerilog interfaces to group related signals simplifies connections between modules, such as between a processor and RAM. This not only reduces wiring complexity but also enhances system organization and makes future modifications easier. Overall, these lessons underscore the importance of meticulous design, rigorous testing, and smart signal organization in digital hardware projects.
This post is licensed under CC BY 4.0 by the author.

Trending Tags