This table maps the grammemes for individual word forms (*not* for
lemmata in either corpus!) to the corresponding grammemes from the
other dataset.
These have drastically different shapes, so the mapping is not
perfect, but will help in determining which forms are intended to be
the same on both sides.
Change-Id: Ib0717e2f7a79d96bcb5e955a20f551e391fcd759
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7918
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Autosubmit: tazjin <tazjin@tvl.su>
This CL fixes the bug where output of a nix
evaluation is not set.
Change-Id: I8ae2759a7ec26e1de2e57dd43302129347a8c302
Signed-off-by: Aaqa Ishtyaq <aaqaishtyaq@gmail.com>
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7896
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
The original dataset contains translations into different languages,
but only the English ones are imported here.
Note that translations are for lemmata only.
Change-Id: Ifb9c32c25fda44c38ad899efca9d205c520c0fa3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7895
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This is the full morphological set table for all the words from the
lemmata table, which they don't call it that.
Change-Id: I6f5be673c5f59f11e36bd8c8c935844a7d4fd170
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7894
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This is actually the lemmata table of this corpus, not the forms of
all words (they're in a separate table).
Change-Id: I89a2c2817ccce840f47406fa2a636f4ed3f49154
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7893
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This is the second dataset I want to integrate as it contains some
more practically useful, but somewhat less structured, information.
Change-Id: Ib46b2597a33e76f59e030f889a0961ecc5a144eb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7873
Tested-by: BuildkiteCI
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: tazjin <tazjin@tvl.su>
I'm changing strategies to importing both OC and another dataset
before continuing to normalise the data, as it might be easier to do
in a set of table-constructing queries inside of SQLite with all raw
data in place.
Change-Id: I26b41af80586fc1bfd8e26a6be20579068a82507
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7872
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This makes the actual imported database of the ~whole Russian
language (all lemmas, grammemes, forms etc.) a Nix build target which
is built in CI.
This still needs schema normalisation (it's fairly directly mapped to
the raw data), but it's already starting to be a useful data set.
This also happens to be a pretty cool demonstration of the power of
Nix. You can do `nix-build -A corp.russian.data-import.database` and
out comes a perfectly valid SQLite database with a valid external data
import!
Change-Id: I5d6d15e67d0e4a7ff590fad06252be34f5d561fd
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7866
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Otherwise up to 1000 elements might be missing.
Change-Id: I20d6238424eec27f0e758e7737c9c31bcb81b23d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7862
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This is an initial and kind of dumb table structure, but there's some
massaging that needs to be done before this makes more sense.
Change-Id: I441288b684ef86be507099bcc4ebf984598789c8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Adds the beginning of a tool which can import OpenCorpora data into a
SQLite database. This is quite a lot of toil and there's probably a
better way to do this, but overall becoming this intimately familiar
with the data structures is quite helpful for understanding what I
can/can't do with only this dataset.
Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This is currently hosted by the company, and I'm assigning my
copyright to the company, which also runs an ad placement on the page.
Note that the NixOS module for hosting it has not been moved yet.
Change-Id: Iba9e1cab9370faa79e43c3344fbfbbbabead50b3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7857
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This uses the `im::OrdMap` for `NixAttrs` to enable sharing of memory
between different iterations of a map.
This slightly speeds up eval, but not significantly. Future work might
include benchmarking whether using a `HashMap` and only ordering in
cases where order is actually required would help.
This switches to a fork of `im` that fixes some bugs with its OrdMap
implementation.
Change-Id: I2f6a5ff471b6d508c1e8a98b13f889f49c0d9537
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7676
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
This is a persistent, structurally sharing data structure which is
more efficient in some of our use-cases. I have verified the
efficiency improvement using `hyperfine` repeatedly over expressions
on nixpkgs.
Lists are not the most performance-critical structure in Nix (that
would be attribute sets), but we can already see a small (~5-10%)
improvement.
Note that there are a handful of cases where we still go via `Vec`
that need to be fixed, most notable for `builtins.sort` which can not
currently be implemented directly using `im::Vector` because of a
restrictive type bound.
Change-Id: I237cc50cbd7629a046e5a5e4601fbb40355e551d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7670
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
As expected, this ends up being significantly nicer to use than the
previous API.
While doing this, I've combined the error fields into one. This is
because there would only ever be one of those anyways, and combining
them ensures that we have consistent formatting (for example,
parser errors would previously not be run through the pretty
formatter but are now).
Change-Id: I6074ec8a4a3901ea82d5d07174b76a345210967b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7547
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
Introduces granular dependency builds using crate2nix, bootstrapped
off the generated configuration from the newly introduced
workspace (see cl/7533).
This commit checks in the generated Cargo.nix file which can be
regenerated with a parameterless invocation of `crate2nix generate` in
`//tvix`. I tried generating this in IFD, but it turned out to be
harder than what seemed worthwhile for now.
In this setup, the various build targets for Rust projects end up
being attributes of the imported `Cargo.nix` file at the `tvix.crates`
attribute. These still lack configuration, however, which has been
fixed in the various `default.nix` files of individual projects.
Note that we (temporarily) lose the ability to build tvix-eval's
benchmarks in CI. I haven't figured out what magic incantation summons
them from the void again ...
The `eval-okay-readDir` tests from both test suites have been disabled
because they fail for unknown reasons when run in this new derivation.
Somebody will have to debug it!
Change-Id: I2014614ccb9c8951aedbd71df7966ca191a13695
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7538
Autosubmit: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
A few weeks ago, oberblastmeister did a release to crates.io so we can
stop importing it via GitHub.
Change-Id: I9d5fa5cd281685779c71b12fed45ed201a1db17e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7532
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
CL/6867 added support for builtins.import, which required a cyclic
reference import->globals->builtins->import. This was implemented
using a RefCell, which makes it possible to mutate the builtins
during evaluation. The commit message for CL/6867 expressed a
desire to eliminate this possibility:
This opens up a potentially dangerous footgun in which we could
mutate the builtins at runtime leading to different compiler
invocations seeing different builtins, so it'd be nice to have
some kind of "finalised" status for them or some such, but I'm not
sure how to represent that atm.
This CL replaces the RefCell with Rc::new_cyclic(), making the
globals/builtins immutable once again. At VM runtime (once opcodes
start executing) everything is the same as before this CL, except
that the Rc<RefCell<>> introduced by CL/6867 is turned into an
rc::Weak<>.
The function passed to Rc::new_cyclic works very similarly to
overlays in nixpkgs: a function takes its own result as an argument.
However instead of laziness "breaking the cycle", Rust's
Rc::new_cyclic() instead uses an rc::Weak. This is done to prevent
memory leaks rather than divergence.
This CL also resolves the following TODO from CL/6867:
// TODO: encapsulate this import weirdness in builtins
The main disadvantage of this CL is the fact that the VM now must
ensure that it holds a strong reference to the globals while a
program is executing; failure to do so will cause a panic when the
weak reference in the builtins is upgrade()d.
In theory it should be possible to create strong reference cycles
the same way Rc::new_cyclic() creates weak cycles, but these cycles
would cause a permanent memory leak -- without either an rc::Weak or
RefCell there is no way to break the cycle. At some point we will
have to implement some form of cycle collection; whatever library we
choose for that purpose is likely to provide an "immutable strong
reference cycle" primitive similar to Rc::new_cyclic(), and we
should be able to simply drop it in.
Signed-off-by: Adam Joseph <adam@westernsemico.com>
Change-Id: I34bb5821628eb97e426bdb880b02e2097402adb7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7097
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
For optional outputs (runtime trace & AST) this has a slightly nicer
user experience.
Note that the code of this is a bit verbose because doing a naive
implementation hits dumb behaviours of browsers that result in
infinite loops.
Thanks Profpatsch for the suggestion.
Change-Id: I8945a8e722f0ad8735829807fb5e39e2101f378c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7006
Reviewed-by: j4m3s <james.landrein@gmail.com>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This uses the JSON serialisation of the AST introduced earlier to
display a text box with the serialised AST to users. Useful for
debugging.
Change-Id: Ibc400eaf5ca87fa5072d5c044942505331c3bb40
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7005
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Reviewed-by: Adam Joseph <adam@westernsemico.com>
Reviewed-by: tazjin <tazjin@tvl.su>
This implements serde::Serialize for the rnix AST through a wrapper
type, and exposes a function for serialising the AST into
a (pretty-printed JSON) string representation.
This can be used to debug issues with the AST, and to display an AST
reprsentation in tools like tvixbolt.
Serialize is implemented manually because we don't own any of the
structs and the way to traverse them is not easily derived
automatically, and this is quite verbose. We might be able to condense
it a little bit, but at the same time it's also fairly straightforward.
Change-Id: I922df43cfc25636f3c8baee7944c75ade516055c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6943
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Reviewed-by: Adam Joseph <adam@westernsemico.com>
Reviewed-by: tazjin <tazjin@tvl.su>
Implement an *initial* version of builtins.match, using the rust `regex`
crate for regular expressions. The rust regex crate definitely has
different semantics than nix's regular expressions - but we'd like to
see how far we can get before the incompatibility starts to matter.
This consciously leaves out any sort of memo for compiled regular
expressions (which upstream nix also has) for the sake of expediency -
in the future we should implement that so we don't have to compile the
same regular expression multiple times.
Change-Id: I5b718635831ec83397940e417a9047c4342b6fa1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6989
Tested-by: BuildkiteCI
Reviewed-by: Adam Joseph <adam@westernsemico.com>
Reviewed-by: tazjin <tazjin@tvl.su>
Using `serde_json` for parsing JSON here, plus an `impl FromJSON for
Value`. The latter is primarily to stay "dependency light" for now -
likely going with an actual serde `Deserialize` impl in the future is
going to be way better as it allows saving significantly on intermediary
allocations.
Change-Id: I152a0448ff7c87cf7ebaac927c38912b99de1c18
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6920
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
This commit implements (lazy) resolution of `<...>` paths via either the
NIX_PATH environment variable, or the -I command-line flag - both
handled via EvalOptions. As a result, EvalOptions can no longer derive
Copy, meaning we have to clone it at each line of the repl - this is
probably not a huge deal as repl performance is not exactly an inner
loop and we're not cloning very much.
Internally, this works by creating a thunk which pushes a constant
containing the string inside the brackets to the stack, then a new
opcode to resolve that path via the `NixPath`. To get that opcode to
work, we now have to pass in the NixPath when constructing the VM.
This (intentionally) leaves out proper implementation of path resolution
via `findFile` (cppnix just calls whatever identifier called findFile is
in scope!!!) as that's widely considered a bit of a misfeature, but if
we do decide to implement that down the road it likely wouldn't be more
than a few extra ops within the thunk introduced here.
Change-Id: Ibc979b7e425b65cbe88599940520239a4a10cee2
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6918
Autosubmit: grfn <grfn@gws.fyi>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This lets the VM emit warnings when it encounters situations that
should only be warned about at runtime.
For starters, this is used to pass through compilation warnings that
come up when `import` is used.
Change-Id: I0c4bc8c534d699999887c430d93629fadfa662c4
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6868
Reviewed-by: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
Adding `import` to builtins causes causes a bootstrap cycle because
the `import` builtin needs to be initialised with the set of globals
before being inserted into the globals, which also must contain
itself.
To break out of the cycle this hack wraps the builtins passed to the
compiler in an `Rc` (probably sensible anyways, as they will end up
getting cloned a bunch), containing a RefCell which gives us mutable
access to the builtins.
This opens up a potentially dangerous footgun in which we could mutate
the builtins at runtime leading to different compiler invocations
seeing different builtins, so it'd be nice to have some kind of
"finalised" status for them or some such, but I'm not sure how to
represent that atm.
Change-Id: I25f8d4d2a7e8472d401c8ba2f4bbf9d86ab2abcb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6867
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
This type hides away the lower-level handling of most codemap data
structures, especially to library consumers (see corresponding changes
in tvixbolt).
This will help with implement `import` by giving us central control
over how the codemap works.
Change-Id: Ifcea36776879725871b30c518aeb96ab5fda035a
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6855
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
There's basically nothing that needs *ownership* of an AST
node (which is just a little box full of references to other things
anyways), so we can thread this through as references all the way.
Change-Id: I35a1348a50c0e8e07d51dfc18847829379166fbf
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6853
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
Upstream nixpkgs removed a lot of aliases this time, so we needed to do
the following transformations. It's a real shame that aliases only
really become discoverable easily when they are removed.
* runCommandNoCC -> runCommand
* gmailieer -> lieer
We also need to work around the fact that home-manager hasn't catched
on to this rename.
* mysql -> mariadb
* pkgconfig -> pkg-config
This also affects our Nix fork which needs to be bumped.
* prometheus_client -> prometheus-client
* rxvt_unicode -> rxvt-unicode-unwrapped
* nix-review -> nixpkgs-review
* oauth2_proxy -> oauth2-proxy
Additionally, some Go-related builders decided to drop support for
passing the sha256 hash in directly, so we need to use the generic hash
arguments.
Change-Id: I84aaa225ef18962937f8616a9ff064822f0d5dc3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6792
Autosubmit: sterni <sternenseemann@systemli.org>
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
Reviewed-by: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Reviewed-by: wpcarro <wpcarro@gmail.com>
This updates rnix-parser to a version where inherits provide an
iterator over `ast::Attr` instead of `ast::Ident`, which mirrors the
behaviour of Nix (inherits can have (statically known) strings as
their identifiers).
This actually required some fairly significant code reshuffling in the
compiler, as there was an implicit assumption in many places that we
would have an `ast::Ident` node available when dealing with variable
access (which is then explicitly only not true in this case).
Change-Id: I12f1e786c0030c85107b1aa409bd49adb5465546
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6747
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
This disconnects ownership of the `File` reference in a compiler from
the calling scope, which is required for when we implement `import`.
`import` will need to carry an `Rc<RefCell<CodeMap>>` (or maybe, in
the future, Arc) to give us the ability to add new detected code
files at runtime.
Note that the choice of `Arc` over `Rc` here is not ours - it's the
codemap crate's.
Change-Id: I3aeca4ffc167acbd1701846a332d93550b56ba7d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6630
Tested-by: BuildkiteCI
Reviewed-by: grfn <grfn@gws.fyi>
This makes it possible to enter something into tvixbolt and then share
the link with someone else.
Suggested by Profpatsch originally.
Change-Id: I9886e76a7b821070f13ea7005df09188821e091d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6636
Tested-by: BuildkiteCI
Reviewed-by: Profpatsch <mail@profpatsch.de>
This will be used to set/get query parameters for making shareable links.
Change-Id: I05ccf8cab2521564710523ccd3b25ec26f435dd5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6633
Tested-by: BuildkiteCI
Reviewed-by: Profpatsch <mail@profpatsch.de>
This bumps rnix-parser to a commit that should be unaffected by the
Nix >= 2.4 bug that prevents it from cloning repositories with filters.
Change-Id: Ie01da95245ec6740fa889eb710819e512202f665
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6634
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
As previously mentioned on IRC, this is why tvixbolt is under //corp.
The majority of people in our community probably block ads anyways,
but might as well ...
The ad account is linked to the TVL legal entity.
The ad is configured not to use any personalised data. In testing it's
showing me lamps and shoes. This is the same kind of ad as on my
grammar page, predlozhnik.ru
Change-Id: I172881ed5d5ceb1fdeb2298b8f822d0c2a6518a8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6558
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Unfortunately the codemap-diagnostic crate doesn't provide a way to
get colour control characters written to an arbitrary writer, so this
is black & white only, but we can look at this later if we introduce
something even fancier. For now it's reasonable.
Change-Id: I1c7655cc4b254f77768b5931bc95fa13b3bd7e12
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6533
Reviewed-by: grfn <grfn@gws.fyi>
Tested-by: BuildkiteCI
This is a crate for source-span based error reporting. Since all of
our spans are already codemap spans, it is a good starting point.
We have to figure out quite a bit of logic for neat error printing;
later on if we want fancier presentation we might want to look at one
of the other libraries in this space like miette.
Change-Id: I4e28886af1ed199b7112d9dbf063c9f29b612bf1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/6531
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Reviewed-by: sterni <sternenseemann@systemli.org>
Reviewed-by: grfn <grfn@gws.fyi>