feat(users/Profpatsch/netencode): ignore earlier record entries

It turns out that the netencode spec requiring to ignore *later* entries meant that every parser has to do an extra check for each element, instead of just overriding the key in the hash map. This leads to a situation where the simple implementation is the wrong one, which would lead to very subtle problems in parsers (see also the infamous “json duplicate record entry” problem which has been used for various exploits in the past). To be fair, exploits are still possible, but at least a `Map.fromList` will be the right implementation (provided it folds from the left) now instead of the wrong one. Examples of the trivial implementation being now right: Python: > dict([("foo", 1), ("foo", 2)]) {'foo': 2} Rust: > println!("{:?}", HashMap::from([ ("foo", 1), ("foo", 2) ])); {"foo": 2} Haskell: > Data.Map.fromList [ ("foo", 1), ("foo", 2) ] fromList [("foo",2)] Change-Id: Ife9593956f4718e5e720f4f348c227e4f3a71e2d Reviewed-on: https://cl.tvl.fyi/c/depot/+/5108 Tested-by: BuildkiteCI Reviewed-by: Profpatsch <mail@profpatsch.de> Reviewed-by: sterni <sternenseemann@systemli.org> Autosubmit: Profpatsch <mail@profpatsch.de>
2022-01-29 12:50:19 +01:00 · 2022-01-29 12:50:19 +01:00 · ed68ba6751
commit ed68ba6751
parent 82ba42c439
2 changed files with 8 additions and 6 deletions
--- a/users/Profpatsch/netencode/README.md
+++ b/users/Profpatsch/netencode/README.md
@ -73,7 +73,11 @@ A tag (`<`) gives a value a name. The tag is UTF-8 encoded, starting with its le
 ### records (products/records), also maps

 A record (`{`) is a concatenation of tags (`<`). It needs to be closed with `}`.
-If tag names repeat the later ones should be ignored. Ordering does not matter.
+
+If tag names repeat the *earlier* ones should be ignored.
+Using the last tag corresponds with the way most languages handle converting a list of tuples to Maps, by using a for-loop and Map.insert without checking the contents first. Otherwise you’d have to revert the list first or remember which keys you already inserted.
+
+Ordering of tags in a record does not matter.

 Similar to text, records start with the length of their *whole encoded content*, in bytes. This makes it possible to treat their contents as opaque bytestrings.

--- a/users/Profpatsch/netencode/netencode.rs
+++ b/users/Profpatsch/netencode/netencode.rs
@ -405,11 +405,9 @@ pub mod parse {
                    inner_no_empty_string(tag_g(&inner)),
                    HashMap::new(),
                    |mut acc: HashMap<_, _>, Tag { tag, mut val }| {
-                        // ignore duplicated tag names that appear later
+                        // ignore earlier tags with the same name
                        // according to netencode spec
-                        if !acc.contains_key(tag) {
-                            acc.insert(tag, *val);
-                        }
+                        let _ = acc.insert(tag, *val);
                        acc
                    },
                ),
@ -633,7 +631,7 @@ pub mod parse {
                record_t("{25:<1:a|u,<1:b|u,<1:a|i1:-1,}".as_bytes()),
                Ok((
                    "".as_bytes(),
-                    vec![("a".to_owned(), T::Unit), ("b".to_owned(), T::Unit),]
+                    vec![("a".to_owned(), T::I3(-1)), ("b".to_owned(), T::Unit),]
                        .into_iter()
                        .collect::<HashMap<_, _>>()
                )),