Commit graph

8 commits

Author SHA1 Message Date
Vincent Ambo
0ed6583edc feat(corp/data-import): let users specify output path
Change-Id: I61ad021c7a5318b099f3adc8bc6aedef65500974
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7865
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
476e312c06 feat(corp/data-import): parse and import links
Change-Id: Iebdbc8f884f28064d7b00b8f8808b5030fa3d05c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7864
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 15:44:06 +00:00
Vincent Ambo
dc55ea3201 feat(corp/data-import): parse and import link types
Change-Id: Iae01d1dc6894117dc693b4690d8bc79861212ae6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7863
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
3f0b1d8e0b fix(corp/data-import): commit the final transaction, too
Otherwise up to 1000 elements might be missing.

Change-Id: I20d6238424eec27f0e758e7737c9c31bcb81b23d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7862
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
6986aa5824 feat(corp/data-import): insert OpenCorpora data into SQLite
This is an initial and kind of dumb table structure, but there's some
massaging that needs to be done before this makes more sense.

Change-Id: I441288b684ef86be507099bcc4ebf984598789c8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 15:44:06 +00:00
Vincent Ambo
485c3cc912 feat(corp/data-import): parse lemmas from OpenCorpora dump
Change-Id: I1e4efcfc8e555f61578b563411d5e6ed9590d8e8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7860
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00
Vincent Ambo
ee7616d956 feat(corp/russian/data-import): new OpenCorpora data import tool
Adds the beginning of a tool which can import OpenCorpora data into a
SQLite database. This is quite a lot of toil and there's probably a
better way to do this, but overall becoming this intimately familiar
with the data structures is quite helpful for understanding what I
can/can't do with only this dataset.

Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00
Vincent Ambo
aa96e25bbc chore(tazjin/predlozhnik): move to //corp
This is currently hosted by the company, and I'm assigning my
copyright to the company, which also runs an ad placement on the page.

Note that the NixOS module for hosting it has not been moved yet.

Change-Id: Iba9e1cab9370faa79e43c3344fbfbbbabead50b3
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7857
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-17 18:23:52 +00:00