tvl-depot/users/Profpatsch/netencode/README.md
Profpatsch b4cfddfc80 fix(netencode/README): fix the example of ignored fields
Forgot this example when I changed the spec to ignore earlier
duplicated fields.

Change-Id: I9bc8d3e27201afd0d256aa4771b6420059fc68a7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/8949
Tested-by: BuildkiteCI
Reviewed-by: Profpatsch <mail@profpatsch.de>
2023-07-14 08:03:14 +00:00

133 lines
5.2 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# netencode 0.1-unreleased
[bencode][] and [netstring][]-inspired pipe format that should be trivial to generate correctly in every context (only requires a `byte_length()` and a `printf()`), easy to parse (100 lines of code or less), mostly human-decipherable for easy debugging, and support nested record and sum types.
## scalars
Scalars have the format `[type prefix][size]:[value],`.
where size is a natural number without leading zeroes.
### unit
The unit (`u`) has only one value.
* The unit is: `u,`
### numbers
Naturals (`n`) and Integers (`i`), with a maximum size in bits.
Bit sizes are specified in 2^n increments, 1 to 9 (`n1`..`n9`, `i1`..`n9`).
* Natural `1234` that fits in 32 bits (2^5): `n5:1234,`
* Integer `-42` that fits in 8 bits (2^3): `i3:-42,`
* Integer `23` that fits in 64 bits (2^6): `i6:23,`
* Integer `-1` that fits in 512 bits (2^9): `i9:-1,`
* Natural `0` that fits in 1 bit (2^1): `n1:0,`
An implementation can define the biggest numbers it supports, and has to throw an error for anything bigger. It has to support everything smaller, so for example if you support up to i6/n6, you have to support 16 as well. An implementation could support up to the current architectures wordsize for example.
Floats are not supported, you can implement fixed-size decimals or ratios using integers.
### booleans
A boolean is represented as `n1`.
* `n1:0,`: false
* `n1:1,`: true
TODO: should we add `f,` and `t,`?
### text
Text (`t`) that *must* be encoded as UTF-8, starting with its length in bytes:
* The string `hello world` (11 bytes): `t11:hello world,`
* The string `今日は` (9 bytes): `t9:今日は,`
* The string `:,` (2 bytes): `t2::,,`
* The empty sting `` (0 bytes): `t0:,`
### binary
Arbitrary binary strings (`b`) that can contain any data, starting with its length in bytes.
* The ASCII string `hello world` as binary data (11 bytes): `b11:hello world,`
* The empty binary string (0 bytes): `b0:,`
* The bytestring with `^D` (1 byte): `b1:,`
Since the binary strings are length-prefixd, they can contain `\0` and no escaping is required. Care has to be taken in languages with `\0`-terminated bytestrings.
Use text (`t`) if you have utf-8 encoded data.
## tagged values
### tags
A tag (`<`) gives a value a name. The tag is UTF-8 encoded, starting with its length in bytes and proceeding with the value.
* The tag `foo` (3 bytes) tagging the text `hello` (5 bytes): `<3:foo|t5:hello,`
* The tag `` (0 bytes) tagging the 8-bit integer 0: `<0:|i3:0,`
### records (products/records), also maps
A record (`{`) is a concatenation of tags (`<`). It needs to be closed with `}`.
If tag names repeat the *earlier* ones should be ignored.
Using the last tag corresponds with the way most languages handle converting a list of tuples to Maps, by using a for-loop and Map.insert without checking the contents first. Otherwise youd have to revert the list first or remember which keys you already inserted.
Ordering of tags in a record does not matter.
Similar to text, records start with the length of their *whole encoded content*, in bytes. This makes it possible to treat their contents as opaque bytestrings.
* There is no empty record. (TODO: make the empty record the unit type, remove `u,`?)
* A record with one empty field, `foo`: `{9:<3:foo|u,}`
* A record with two fields, `foo` and `x`: `{21:<3:foo|u,<1:x|t3:baz,}`
* The same record: `{21:<1:x|t3:baz,<3:foo|u,}`
* The same record (earlier occurences of fields are ignored): `{<1:x|u,28:<1:x|t3:baz,<3:foo|u,}`
### sums (tagged unions)
Simply a tagged value. The tag marker `<` indicates it is a sum if it appears outside of a record.
## lists
A list (`[`) imposes an ordering on a sequence of values. It needs to be closed with `]`. Values in it are simply concatenated.
Similar to records, lists start with the length of their whole encoded content.
* The empty list: `[0:]`
* The list with one element, the string `foo`: `[7:t3:foo,]`
* The list with text `foo` followed by i3 `-42`: `[14:t3:foo,i3:-42,]`
* The list with `Some` and `None` tags: `[33:<4:Some|t3:foo,<4None|u,<4None|u,]`
## parser security considerations
The length field is a decimal number that is not length-restricted,
meaning an attacker could give an infinitely long length (or extremely long)
thus overflowing your parser if you are not careful.
You should thus put a practical length limit to the length of length fields,
which implicitely enforces a length limit on how long the value itself can be.
Start by defining a max value length in bytes.
Then count the number of decimals in that number.
So if your max length is 1024 bytes, your length field can be a maximum `count_digits(1024) == 4` bytes long.
Thus, if you restrict your parser to a length field of 4 bytes,
it should also never parse anything longer than 1024 bytes for the value
(plus 1 byte for the type tag, 4 bytes for the length, and 2 bytes for the separator & ending character).
## motivation
TODO
## guarantees
TODO: do I want unique representation (bijection like bencode?) This would put more restrictions on the generator, like sorting records in lexicographic order, but would make it possible to compare without decoding
[bencode]: https://en.wikipedia.org/wiki/Bencode
[netstring]: https://en.wikipedia.org/wiki/Netstring