EVM Through Huff

Intro

Huff is a low-level programming language designed for developing highly optimized smart contracts that run on the Ethereum Virtual Machine (EVM). Huff does not hide the inner workings of the EVM and instead exposes its programming stack to the developer for manual manipulation.

The Aztec Protocol (opens new window)team originally created Huff to write Weierstrudel, an on-chain elliptical curve arithmetic library that requires incredibly optimized code that neither Solidity nor Yul could provide.

While EVM experts can use Huff to write highly-efficient smart contracts for use in production, it can also serve as a way for beginners to learn more about the EVM.

If you're looking for an in-depth guide on how to write and understand Huff, check out the tutorials.

Add Two

Sample code:

#define function addTwo(uint256,uint256) view returns(uint256)

#define macro MAIN() = takes(0) returns(0) {

    // Get the function selector
    0x00
    calldataload
    0xE0
    shr

    // Jump to the implementation of the ADD_TWO function if the calldata matches the function selector
    __FUNC_SIG(addTwo) eq addTwo jumpi

    addTwo:
        ADD_TWO()
}

#define macro ADD_TWO() = takes(0) returns(0) {
    0x04 calldataload     // load first 32 bytes onto the stack - number 1
    0x24 calldataload     // load second 32 bytes onto the stack - number 2
    add                   // add number 1 and 2 and put the result onto the stack

    0x00 mstore           // place the result in memory
    0x20 0x00 return      // return the result
}

Define function ABI:

#define function addTwo(uint256,uint256) view returns(uint256)

Define the "main function" (macro), specifying it will take 0 things from the stack and return 0 thing back to the stack:

#define macro MAIN() = takes(0) returns(0) {
    ...
}

In other words, when entering the contract the stack will be empty. Upon completion we will not be leaving anything on the stack; therefore, takes() and returns() will both be 0.

Get function selector:

    // Get the function selector
    0x00
    calldataload
    0xE0
    shr

Here 0xE0 = 224 = 256bit - 32bit = 32byte - 4byte. This right shift is meant to extract the first 4 bytes of the calldata, which is the function selector.

The expression 0x00 calldataload 0xE0 shr is Huff's standard way of extracting function selector from calldata. You can just memorize it and use it in your project as a convention.

Jump to addTwo if the function selector matches:

    // Jump to the implementation of the ADD_TWO function if the calldata matches the function selector
    __FUNC_SIG(addTwo) eq addTwo jumpi

    addTwo:
        ADD_TWO()

__FUNC_SIG() is a Huff built-in function for computing function signature (function selector).

The actual implementation of function ADD_TWO():

#define macro ADD_TWO() = takes(0) returns(0) {
    0x04 calldataload     // load first 32 bytes onto the stack - number 1
    0x24 calldataload     // load second 32 bytes onto the stack - number 2
    add                   // add number 1 and 2 and put the result onto the stack

    0x00 mstore           // place the result in memory
    0x20 0x00 return      // return the result
}

calldataload starts from 0x04 since the first 4 bytes is function selector and that part is skipped. The actual parameters of ADD_TWO() starts from the 5th byte.

Hello World

Sample code:

#define macro MAIN() = takes(0) returns(0) {
    // store dynamic offset of 0x20 at 0x00
    0x20                                     // [0x20]
    0x00                                     // [0x00, 0x20]
    mstore                                   // []

    // store string length of 0x0d at 0x20
    0x0d                                     // [0x0d]
    0x20                                     // [0x20, 0x0d]
    mstore                                   // []

    // store bytes for "Hello, world!" at 0x40
    __RIGHTPAD(0x48656c6c6f2c20776f726c6421) // ["Hello, world!"]
    0x40                                     // [0x40, "Hello, world!"]
    mstore                                   // []

    // return full 96 byte value
    0x60                                     // [0x60]
    0x00                                     // [0x00, 0x60]
    return                                   // []
}

As strings are dynamic types it is not as simple as returning the UTF-8 values for "Hello, world!" (0x48656c6c6f2c20776f726c6421). In the ABI standard, dynamic types are encoded in 3 parts, each which takes a full word (32 bytes) of memory:

  1. Offset in memory (a pointer) -> left padded

  2. Length of the string -> left padded

  3. The actual content of the string -> right padded

Suppose we are working with the string "Hello, world!", then the memory will be looking like:

Memory loc      Data
0x00            0000000000000000000000000000000000000000000000000000000000000020 // Offset (pointer)
0x20            000000000000000000000000000000000000000000000000000000000000000d // Length
0x40            48656c6c6f2c20776f726c642100000000000000000000000000000000000000 // "Hello, world!" in hex

Once you understand this construction, the main macro code is self-explanatory.

Moving one step further, there is a way to merge this 3-part construction into 2. This method is called the "Seaport method". Recall that in the "normal method" we have left padded, left padded, and right padded. This means the 2nd and the 3rd entry have adjacent non-zero data. The Seaport method combine the 2nd and the 3rd entry into a single left padded entry.

For example, suppose we are working with the string "TKN". Pictorially:

Simple Storage

Sample code:

// Interface
#define function setValue(uint256) nonpayable returns ()
#define function getValue() nonpayable returns (uint256)

// Storage
#define constant VALUE = FREE_STORAGE_POINTER()

// External function macros

// setValue(uint256)
#define macro SET_VALUE() = takes(0) returns(0) {
    // Read uint256 from calldata, remember to read from byte 4 to allow for the function selector! 
    0x04            // [0x04]
    calldataload    // [value]

    // Get pointer and store
    [VALUE]         // [value_ptr, value]
    sstore          // []
}

// getValue()
#define macro GET_VALUE() = takes(0) returns(0) {
    // Read uint256 from storage
    [VALUE]         // [value_ptr]
    sload           // [value]

    // Store the return value in memory
    0x00            // [0x00, value]
    mstore          // []

    // Return the first 32 bytes of memory containing our uint256
    0x20            // [0x20]
    0x00            // [0x00, 0x20]
    return          // []
}

// Main
#define macro MAIN() = takes(0) returns(0) {
    // Get the function selector
    0x00 calldataload 0xe0 shr

    dup1 __FUNC_SIG(setValue) eq setValue jumpi // Compare function selector to setValue(uint256)
    dup1 __FUNC_SIG(getValue) eq getValue jumpi // Compare the function selector to getValue()

    // dispatch
    setValue:
        SET_VALUE()
    getValue:
        GET_VALUE()

    0x00 0x00 revert
}

Huff implements the FREE_STORAGE_POINTER() keyword for us to keep track of storage slots. For example:

#define constant STORAGE_SLOT0 = FREE_STORAGE_POINTER()
#define constant STORAGE_SLOT1 = FREE_STORAGE_POINTER()
#define constant STORAGE_SLOT2 = FREE_STORAGE_POINTER()

Later on we can use STORAGE_SLOT0, STORAGE_SLOT1, and STORAGE_SLOT2 to refer to different storage slots.

SET_VALUE() macro:

// setValue(uint256)
#define macro SET_VALUE() = takes(0) returns(0) {
    // Read uint256 from calldata, remember to read from byte 4 to allow for the function selector! 
    0x04            // [0x04]
    calldataload    // [value]

    // Get pointer and store
    [VALUE]         // [value_ptr, value]
    sstore          // []
}

The square bracket is the "reference" operator. It means get the address of storage slot VALUE.

GET_VALUE() macro:

// getValue()
#define macro GET_VALUE() = takes(0) returns(0) {
    // Read uint256 from storage
    [VALUE]         // [value_ptr]
    sload           // [value]

    // Store the return value in memory
    0x00            // [0x00, value]
    mstore          // []

    // Return the first 32 bytes of memory containing our uint256
    0x20            // [0x20]
    0x00            // [0x00, 0x20]
    return          // []
}

Here we take a thing from storage, put it into memory and return it.

MAIN macro:

// Main
#define macro MAIN() = takes(0) returns(0) {
    // Get the function selector
    0x00 calldataload 0xe0 shr

    dup1 __FUNC_SIG(setValue) eq setValue jumpi // Compare function selector to setValue(uint256)
    dup1 __FUNC_SIG(getValue) eq getValue jumpi // Compare the function selector to getValue()

    // dispatch
    setValue:
        SET_VALUE()
    getValue:
        GET_VALUE()

    0x00 0x00 revert
}

This is just a function dispatcher that tries to match SET_VALUE() or GET_VALUE() based on the calldata. If nothing matches it is going to revert.

Function Dispatching

Linear Dispatching

Sample code:

// Interface
#define function allowance(address,address) view returns (uint256)
#define function approve(address,uint256) nonpayable returns ()
#define function balanceOf(address) view returns (uint256)
#define function DOMAIN_SEPARATOR() view returns (bytes32)
#define function nonces(address) view returns (uint256)
#define function permit(address,address,uint256,uint256,uint8,bytes32,bytes32) nonpayable returns ()
#define function totalSupply() view returns (uint256)
#define function transfer(address,uint256) nonpayable returns ()
#define function transferFrom(address,address,uint256) nonpayable returns ()
#define function decimals() nonpayable returns (uint256)
#define function name() nonpayable returns (string)
#define function symbol() nonpayable returns (string)

// Function Dispatching
#define macro MAIN() = takes (1) returns (1) {
    // Identify which function is being called.
    0x00 calldataload 0xE0 shr          // [func_sig]

    dup1 __FUNC_SIG(permit)             eq permitJump           jumpi
    dup1 __FUNC_SIG(nonces)             eq noncesJump           jumpi

    dup1 __FUNC_SIG(name)               eq nameJump             jumpi
    dup1 __FUNC_SIG(symbol)             eq symbolJump           jumpi
    dup1 __FUNC_SIG(decimals)           eq decimalsJump         jumpi
    dup1 __FUNC_SIG(DOMAIN_SEPARATOR)   eq domainSeparatorJump  jumpi

    dup1 __FUNC_SIG(totalSupply)        eq totalSupplyJump      jumpi
    dup1 __FUNC_SIG(balanceOf)          eq balanceOfJump        jumpi
    dup1 __FUNC_SIG(allowance)          eq allowanceJump        jumpi

    dup1 __FUNC_SIG(transfer)           eq transferJump         jumpi
    dup1 __FUNC_SIG(transferFrom)       eq transferFromJump     jumpi
    dup1 __FUNC_SIG(approve)            eq approveJump          jumpi

    // Revert if no match is found.
    0x00 dup1 revert

    allowanceJump:
        ALLOWANCE()
    approveJump:
        APPROVE()
    balanceOfJump:
        BALANCE_OF()
    decimalsJump:
        DECIMALS()
    domainSeparatorJump:
        DOMAIN_SEPARATOR()
    nameJump:
        NAME()
    noncesJump:
        NONCES()
    permitJump:
        PERMIT()
    symbolJump:
        SYMBOL()
    totalSupplyJump:
        TOTAL_SUPPLY()
    transferFromJump:
        TRANSFER_FROM()
    transferJump:
        TRANSFER()
}

This is basically a large jump table that redirects the control flow to each function if the function selector in calldata matches one of the functions.

One important thing to note is the following line of code:

    // Revert if no match is found.
    0x00 dup1 revert

The idea is similar to switch statements in C: if you don't add break; between each case, then all the code after that line will be executed line by line. Without 0x00 dup1 revert, all the macros will be executed until a return condition is found.

Linear dispatching seems naive, however this is exactly how Vyper and Solidity* implement linear dispatching. If you want it to be cheaper to call, just move it higher up in the contract!

* Solidity only implements this method if there are less than 4 functions in a contract.

Binary Search Dispatching

Sample code:

// Define Interface
#define function allowance(address,address) view returns (uint256)
#define function approve(address,uint256) nonpayable returns ()
#define function balanceOf(address) view returns (uint256)
#define function DOMAIN_SEPARATOR() view returns (bytes32)
#define function nonces(address) view returns (uint256)
#define function permit(address,address,uint256,uint256,uint8,bytes32,bytes32) nonpayable returns ()
#define function totalSupply() view returns (uint256)
#define function transfer(address,uint256) nonpayable returns ()
#define function transferFrom(address,address,uint256) nonpayable returns ()
#define function decimals() nonpayable returns (uint256)
#define function name() nonpayable returns (string)
#define function symbol() nonpayable returns (string)

// Function Dispatching
#define macro MAIN() = takes (1) returns (1) {
    // Identify which function is being called.
    // [func sig]
    0x00 calldataload 0xE0 shr

    // The function selector of the pivot (number of selectors / 2)
    dup1 __FUNC_SIG(balanceOf) lt pivot0 jumpi

        // pivot 2
        dup1 __FUNC_SIG(totalSupply) lt pivot00 jumpi

            // 1
            dup1 __FUNC_SIG(name)               eq nameJump             jumpi

            // 2
            dup1 __FUNC_SIG(approve)            eq approveJump          jumpi

            // 3
            dup1 __FUNC_SIG(totalSupply)        eq totalSupplyJump      jumpi

            not_found jump

        pivot00:

            // 4
            dup1 __FUNC_SIG(transferFrom)       eq transferFromJump     jumpi

            // 5
            dup1 __FUNC_SIG(decimals)           eq decimalsJump         jumpi

            // 6
            dup1 __FUNC_SIG(DOMAIN_SEPARATOR)   eq domainSeparatorJump  jumpi

            not_found jump

    pivot0:

        dup1 __FUNC_SIG(symbol) lt pivot11 jumpi


            // 7
            dup1 __FUNC_SIG(balanceOf)          eq balanceOfJump        jumpi

            // 8
            dup1 __FUNC_SIG(nonces)             eq noncesJump           jumpi

            // 9
            dup1 __FUNC_SIG(symbol)             eq symbolJump           jumpi

            not_found jump

        pivot11:

            // 10
            dup1 __FUNC_SIG(transfer)           eq transferJump         jumpi

            // 11
            dup1  __FUNC_SIG(permit)             eq permitJump           jumpi

            // 12
            dup1 __FUNC_SIG(allowance)          eq allowanceJump        jumpi

    not_found:

    // Revert if no match is found.
    0x00 dup1 revert

    allowanceJump:
        ALLOWANCE()
    approveJump:
        APPROVE()
    balanceOfJump:
        BALANCE_OF()
    decimalsJump:
        DECIMALS()
    domainSeparatorJump:
        DOMAIN_SEPARATOR()
    nameJump:
        NAME()
    noncesJump:
        NONCES()
    permitJump:
        PERMIT()
    symbolJump:
        SYMBOL()
    totalSupplyJump:
        TOTAL_SUPPLY()
    transferFromJump:
        TRANSFER_FROM()
    transferJump:
        TRANSFER()
}

The idea is dividing the jump table into several parts using "pivots". Binary search dispatching is great when you have many many functions in the contract. Otherwise, stick to linear dispatching since it is a lot easier to implement.

Fallback and Receive Functions

Fallback function

Suppose we are implementing a fallback function that returns 1:

#define macro FALLBACK() = {
    0x01 0x00 mstore
    0x20 0x00 return
}

In the MAIN macro, this fallback function should be inserted at the end of all lookups. For example:

#define macro MAIN() = takes (1) returns (1) {
    // Identify which function is being called.
    // [func sig]
    0x00 calldataload 0xE0 shr

    dup1 __FUNC_SIG(permit)             eq permitJump           jumpi

    ...

    dup1 __FUNC_SIG(approve)            eq approveJump          jumpi

    FALLBACK()

   permitJump:
        PERMIT()

    ...

    approveJump:
        APPROVE()
}

Receive function

If you want to implement receive function on top of the fallback function, do a callvalue check before FALLBACK():

#define macro MAIN() = takes (1) returns (1) {
    // Identify which function is being called.
    // [func sig]
    0x00 calldataload 0xE0 shr

    dup1 __FUNC_SIG(permit)             eq permitJump           jumpi

    ...

    dup1 __FUNC_SIG(approve)            eq approveJump          jumpi

    # Jump into the receive function if msg.value is not zero
    callvalue receive jumpi

    FALLBACK()

    receive:
        RECEIVE()

    permitJump:
        PERMIT()

    ...

    approveJump:
        APPROVE()
}

foundry-huff

If you have an existing Foundry project, you can simply install the necessary dependencies by running:

forge install huff-language/foundry-huff

You also must add the following line to your foundry.toml file to ensure that the foundry-huff library has access to your environment in order to compile the contract:

ffi = true

You can then use HuffDeployer contract to compile and deploy your Huff contracts for you using the deploy function. Here's a quick example:

import { HuffDeployer } from "foundry-huff/HuffDeployer.sol";

contract HuffDeploymentExample {
    function deploy() external returns(address) {
        return new HuffDeployer().deploy("MyContract");
    }
}

Extra Mile Challenge

In Devtooligan's presentation at Spearbit, he covered the "Collatz Puzzle" challenge in QuillCTF:

Here is my writeup for this challenge:

Last updated