# writing wasm as raw bytes

In this article, we'll make a wasm library that exports a single function by writing a simple helper program to generate the raw wasm bytes.

This article is the first in a series about creating a minimal self-hosted language that compiles to wasm.

If you want to write low-level wasm, "wat" is much nicer for authoring wasm programs, but in my case I am interested in writing a compile-to-wasm language where the bytes are a more useful output target.

Outside of the webassembly spec itself (in the "binary format" section), I couldn't find any good info about how to hand-hack binary wasm at the byte level so that's why this article exists!

The wasm spec is very complete, but it's not at all obvious how to get started with making your own wasm binary output. And I had to do a lot of guesswork coming up with the demo that turned into this article, so hopefully future people reading this are saved a bit of trouble.

The generator program is written in javascript so it can run on this page at the bottom as a demo. However, it shouldn't be hard to port this concept to your favorite language.

# wasm file format overview

A .wasm file is a "wasm module". A wasm module starts with a 4-byte magic number ("\0asm") followed by a 4-byte little endian u32 version number.

In this article we'll use wasm v1, so the first 8 bytes in the file are then:

0x00 0x61 0x73 0x6d 0x01 0x00 0x00 0x00

or we can write this in a more readable way using an inline string for the "asm" literal:

0x00 "asm" 0x01 0x00 0x00 0x00

After the magic number and wasm version are a sequence of sections which can appear in any order until the end of the file. However, you will get errors if you include a section of the same type more than once (except for custom sections [section id of 0x00]).

Each section begins with a one-byte section id followed by a u32 byte length of the section contents (in uleb encoding, more on that soon), followed by the section contents as a byte array.

The section names with their corresponding one-byte section id that we will be using are:

# helper functions

Before we get around to generating raw wasm bytes, we will need some functions.

## leb128 encoding

The wasm spec uses a compact representation for integers throughout called LEB128. You can read more about LEB128 on wikipedia. The wikipedia article even includes C-like pseudocode and javascript implementations for both encoding and decoding (for this article we only need an encoder).

LEB128 can represent signed (sleb) and unsigned (uleb) integers.

In this article we are only concerned with 32-bit integers so these implementations may not be appropriate for larger integers.

Here are the sleb() and uleb() functions we will use, each returning an array of bytes:

function uleb(x) {
  let bytes = []
  if (x < 0) x = 0
  do {
    let b = x & 0x7f
    x >>= 7
    if (x != 0) b |= 0x80
    bytes.push(b)
  } while (x != 0)
  return bytes
}

function sleb(x) {
  let bytes = []
  let more = 1
  let neg = (x < 0)
  let size = 32 // 32 bit integer
  while (more) {
    let b = x & 0x7f
    x >>= 7
    if (neg) x |= (~0 << (size - 7))
    let sb = (b >> 6) & 1
    if ((x === 0 && sb === 0) || (x === -1 && sb === 1)) {
      more = 0
    } else {
      b |= 0x80
    }
    bytes.push(b)
  }
  return bytes
}

Small integers in LEB128 are often a single byte and larger integers will consume more bytes:

> uleb(50)
[ 50 ]
> uleb(3000)
[ 184, 23 ]
> sleb(-37)
[ 91 ]
> sleb(-50000)
[ 176, 249, 124 ]
> sleb(1337)
[ 185, 10 ]

## sizing and flattening

For convenience, our wasm generator will be built from nested arrays, byte integers, and strings. We will need two functions to turn that nested structure into a flat array of bytes: sizeof() and write().

sizeof() calculates the number of bytes for the nested structure in the flat byte array representation:

const te = new TextEncoder

function sizeof(p) {
  let size = 0
  if (typeof p === 'number') { // byte
    size += 1
  } else if (typeof p === 'string') {
    size += te.encode(p).length
  } else if (Array.isArray(p)) {
    for (let i = 0; i < p.length; i++) {
      size += sizeof(p[i])
    }
  } else {
    throw new Error(`unexpected input: ${p}`)
  }
  return size
}

while write() sets the bytes for the nested representation into an appropriately-sized Uint8Array:

function write(out, offset, p) {
  if (typeof p === 'number') { // byte
    out[offset++] = p
  } else if (typeof p === 'string') {
    offset += te.encodeInto(p, out.subarray(offset)).written
  } else if (Array.isArray(p)) {
    for (let i = 0; i < p.length; i++) {
      offset = write(out, offset, p[i])
    }
  } else {
    throw new Error(`unexpected input: ${p}`)
  }
  return offset
}

## vectors and byte arrays

Finally, we need to implement the byte format for wasm vectors and byte arrays.

A byte array is specified with the uleb number of bytes followed by the bytes themselves:

function bytes(items) { return uleb(sizeof(items)).concat(items) }

A vector is specified with the uleb number of elements followed by the vector contents:

function vec(items) { return uleb(items.length).concat(items) }

# sections

## type section (0x01)

The type section is a vector of types. To define a function type, use the code 0x60 for functions followed by a vector of the types of the function arguments followed by a vector of the types of the function's return values.

Our function will take one uint32 as an argument and will return one uint32. The type code for uint32 is 0x7f. So we have:

let typeSection = [
  0x01, bytes([
    vec([
      [ 0x60, vec([0x7f]), vec([0x7f]) ], // fn[0]: [i32] -> [i32]
    ]),
  ])
]

## function section (0x03)

The function section is a vector of type indices for functions in the type section.

We previously defined a function as the first item in the type section which is index 0, so the function section becomes:

let functionSection = [
  0x03, bytes([
    vec([
      [ uleb(0) ], // fn[0]
    ]),
  ])
]

## export section (0x07)

The export section allows us to access our function from outside wasm. This section is a vector of export records, which each contain a name as a byte array of utf-8 and an "export description". For functions, the export description starts with 0x00 and then contains the index into the function section of the function to export.

To export the first and only function from the function section (index 0), we have:

let exportSection = [
  0x07, bytes([
    vec([
      [ bytes('f'), 0x00, uleb(0) ], // fn[0]
    ]),
  ])
]

## code section (0x0a)

Like other low-level assembler languages, wasm instructions operate on a stack. The instructions themselves are a variable number of bytes long. Each instruction may push, pop, or peek values from the stack. There are a lot of good resources about wasm instructions, so I won't go too much into the topic here. You can read more in the "instructions" chapter (5.4) of the "binary format" section of the wasm spec. One thing that wasn't obvious to me reading that chapter is that sometimes you need to preface instructions with the type code like the local.get operation below.

In this code section, we define a function (at index 0) that takes a uint32 as an argument, multiplies the argument by the constant 111, and returns the result. The value 0x0b indicates where the function definition ends so that the next one may begin. In this case we only define a single function.

As in the previous type section, we use the value 0x7f to indicate that the the type of the first argument is a uint32.

let codeSection = [
  0x0a, bytes([
    vec([
      bytes([
        vec([0x7f]), // locals (arguments): [uint32]
        [
          0x7f, 0x20, 0x00, // i32.local.get 0: read argument[0] onto the stack
          0x41, sleb(111), // i32.const(111): push the constant 111 onto the stack
          0x6c, // i32.mul: pops two values from the stack,
            // multiplies them together, and pushes the result onto the stack
          0x0f, // return
          0x0b // end
        ],
      ]),
    ]),
  ])
]

# full program

Putting all of the previous code together plus a bit of glue to instantiate our wasm code and call its exported function, we get:

[ program.js ]:

// generate the raw bytes for a wasm library with one function export,
// call that function with an argument, and print the output

const te = new TextEncoder

function uleb(x) {
  let bytes = []
  if (x < 0) x = 0
  do {
    let b = x & 0x7f
    x >>= 7
    if (x != 0) b |= 0x80
    bytes.push(b)
  } while (x != 0)
  return bytes
}

function sleb(x) {
  let bytes = []
  let more = 1
  let neg = (x < 0)
  let size = 32 // 32 bit integer
  while (more) {
    let b = x & 0x7f
    x >>= 7
    if (neg) x |= (~0 << (size - 7))
    let sb = (b >> 6) & 1
    if ((x === 0 && sb === 0) || (x === -1 && sb === 1)) {
      more = 0
    } else {
      b |= 0x80
    }
    bytes.push(b)
  }
  return bytes
}

function sizeof(p) {
  let size = 0
  if (typeof p === 'number') { // byte
    size += 1
  } else if (typeof p === 'string') {
    size += te.encode(p).length
  } else if (Array.isArray(p)) {
    for (let i = 0; i < p.length; i++) {
      size += sizeof(p[i])
    }
  } else {
    throw new Error(`unexpected input: ${p}`)
  }
  return size
}

function write(out, offset, p) {
  if (typeof p === 'number') { // byte
    out[offset++] = p
  } else if (typeof p === 'string') {
    offset += te.encodeInto(p, out.subarray(offset)).written
  } else if (Array.isArray(p)) {
    for (let i = 0; i < p.length; i++) {
      offset = write(out, offset, p[i])
    }
  } else {
    throw new Error(`unexpected input: ${p}`)
  }
  return offset
}

function bytes(items) { return uleb(sizeof(items)).concat(items) }
function vec(items) { return uleb(items.length).concat(items) }

let program = [
  0x00, 'asm', 0x01, 0x00, 0x00, 0x00,
  0x01, bytes([ // type section
    vec([
      [ 0x60, vec([0x7f]), vec([0x7f]) ], // fn[0]: [i32] -> [i32]
    ]),
  ]),
  0x03, bytes([ // function section
    vec([
      [ uleb(0) ], // fn[0]
    ]),
  ]),
  0x07, bytes([ // export section
    vec([
      [ bytes('f'), 0x00, uleb(0) ], // fn[0]
    ]),
  ]),
  0x0a, bytes([ // code section
    vec([
      bytes([
        vec([0x7f]), // locals (arguments): [uint32]
        [
          0x7f, 0x20, 0x00, // i32.local.get 0: read argument[0] onto the stack
          0x41, sleb(111), // i32.const(111): push the constant 111 onto the stack
          0x6c, // i32.mul: pops two values from the stack,
            // multiplies them together, and pushes the result onto the stack
          0x0f, // return
          0x0b // end
        ],
      ]),
    ]),
  ])
]

;(async function () {
  let data = new Uint8Array(sizeof(program))
  write(data, 0, program)
  // at this point you could serialize `data`
  // and then load data and instantiate a webassembly instance from it later:
  let w = await WebAssembly.instantiate(data)
  console.log(w.instance.exports.f(9)) // prints 999
})()

## generated wasm

When we print the wasm data for the program:

console.log(Array.from(data).map(x => '0x'+x.toString(16).padStart(2,'0')).join(' '))

then we get:

0x00 0x61 0x73 0x6d 0x01 0x00 0x00 0x00 0x01 0x06 0x01 0x60 0x01 0x7f 0x01 0x7f
0x03 0x02 0x01 0x00 0x07 0x05 0x01 0x01 0x66 0x00 0x00 0x0a 0x0d 0x01 0x0b 0x01
0x7f 0x7f 0x20 0x00 0x41 0xef 0x00 0x6c 0x0f 0x0b

Annotated, this is:

0x00 0x61 0x73 0x6d # magic number
0x01 0x00 0x00 0x00 # wasm version 1

0x01 0x06 # type section with 6 bytes
  0x01 # types vector of length 1
    0x60 # function type
      0x01 0x7f # function takes 1 argument: uint32
      0x01 0x7f # function returns 1 type: uint32

0x03 0x02 # function section with 2 bytes
  0x01 0x00 # vector of length 1: function signature is at type index zero

0x07 0x05 # export section with 5 bytes:
  0x01 # vector of length 1
    0x01 0x66 # export name: byte array of length 1, contents: "f"
    0x00 # export description: function type
    0x00 # export description: function index 0
  
0x0a 0x0d # code section with 13 (0x0d) bytes
  0x01 # vector of length 1 (one code item)
    0x0b # 11 (0x0b) bytes in this code item
      0x01 0x7f # 1-element vector of local (argument) types: uint32
      0x7f 0x20 0x00 # i32.local.get 0
      0x41 0xef 0x00 # i32.const 111 (111 in decimal is 0xef 0x00 in sleb)
      0x6c # i32.mul
      0x0f # return
      0x0b # end

# demo

# wasm binary output
# result

# more