feat(users/Profpatsch/netencode): ignore earlier record entries

It turns out that the netencode spec requiring to ignore *later*
entries meant that every parser has to do an extra check for each
element, instead of just overriding the key in the hash map.

This leads to a situation where the simple implementation is the wrong
one, which would lead to very subtle problems in parsers (see also the
infamous “json duplicate record entry” problem which has been used for
various exploits in the past).

To be fair, exploits are still possible, but at least a `Map.fromList`
will be the right implementation (provided it folds from the left) now
instead of the wrong one.

Examples of the trivial implementation being now right:

Python:

    > dict([("foo", 1), ("foo", 2)])
    {'foo': 2}

Rust:

    > println!("{:?}", HashMap::from([
      ("foo", 1),
      ("foo", 2)
    ]));
    {"foo": 2}

Haskell:

    > Data.Map.fromList [ ("foo", 1), ("foo", 2) ]
    fromList [("foo",2)]

Change-Id: Ife9593956f4718e5e720f4f348c227e4f3a71e2d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/5108
Tested-by: BuildkiteCI
Reviewed-by: Profpatsch <mail@profpatsch.de>
Reviewed-by: sterni <sternenseemann@systemli.org>
Autosubmit: Profpatsch <mail@profpatsch.de>
This commit is contained in:
Profpatsch 2022-01-29 12:50:19 +01:00
parent 82ba42c439
commit ed68ba6751
2 changed files with 8 additions and 6 deletions

View file

@ -73,7 +73,11 @@ A tag (`<`) gives a value a name. The tag is UTF-8 encoded, starting with its le
### records (products/records), also maps
A record (`{`) is a concatenation of tags (`<`). It needs to be closed with `}`.
If tag names repeat the later ones should be ignored. Ordering does not matter.
If tag names repeat the *earlier* ones should be ignored.
Using the last tag corresponds with the way most languages handle converting a list of tuples to Maps, by using a for-loop and Map.insert without checking the contents first. Otherwise youd have to revert the list first or remember which keys you already inserted.
Ordering of tags in a record does not matter.
Similar to text, records start with the length of their *whole encoded content*, in bytes. This makes it possible to treat their contents as opaque bytestrings.

View file

@ -405,11 +405,9 @@ pub mod parse {
inner_no_empty_string(tag_g(&inner)),
HashMap::new(),
|mut acc: HashMap<_, _>, Tag { tag, mut val }| {
// ignore duplicated tag names that appear later
// ignore earlier tags with the same name
// according to netencode spec
if !acc.contains_key(tag) {
acc.insert(tag, *val);
}
let _ = acc.insert(tag, *val);
acc
},
),
@ -633,7 +631,7 @@ pub mod parse {
record_t("{25:<1:a|u,<1:b|u,<1:a|i1:-1,}".as_bytes()),
Ok((
"".as_bytes(),
vec![("a".to_owned(), T::Unit), ("b".to_owned(), T::Unit),]
vec![("a".to_owned(), T::I3(-1)), ("b".to_owned(), T::Unit),]
.into_iter()
.collect::<HashMap<_, _>>()
)),