feat(users/sterni/nix/utf8): pure nix utf-8 decoder
users.sterni.nix.utf8 implements UTF-8 decoding in pure nix. We implement the decoding as a simple state machine which is fed one byte at a time. Decoding whole strings is possible by subsequently calling step. This is done in decode which uses builtins.foldl' to get around recursion restrictions and a neat trick using builtins.deepSeq puck showed me limiting the size of the thunks in a foldl' (which can also cause a stack overflow). This makes decoding arbitrarily large UTF-8 files into codepoints using nix theoretically possible, but it is not really practical: Decoding a 36KB LaTeX file I had lying around takes ~160s on my laptop. Change-Id: Iab8c973dac89074ec280b4880a7408e0b3d19bc7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/2590 Tested-by: BuildkiteCI Reviewed-by: sterni <sternenseemann@systemli.org>
This commit is contained in:
parent
5ae1d3fd7b
commit
b810c46a45
3 changed files with 332 additions and 0 deletions
|
@ -97,6 +97,8 @@ let
|
|||
# i. e. they truncate towards 0
|
||||
mod = a: b: let res = a / b; in a - (res * b);
|
||||
|
||||
inRange = a: b: x: x >= a && x <= b;
|
||||
|
||||
in {
|
||||
inherit
|
||||
maxBound
|
||||
|
@ -117,5 +119,6 @@ in {
|
|||
bitXor
|
||||
toHex
|
||||
fromHex
|
||||
inRange
|
||||
;
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue