Commit graph

7 commits

Author SHA1 Message Date
Vincent Ambo
0ed6583edc feat(corp/data-import): let users specify output path
Change-Id: I61ad021c7a5318b099f3adc8bc6aedef65500974
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7865
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
476e312c06 feat(corp/data-import): parse and import links
Change-Id: Iebdbc8f884f28064d7b00b8f8808b5030fa3d05c
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7864
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 15:44:06 +00:00
Vincent Ambo
dc55ea3201 feat(corp/data-import): parse and import link types
Change-Id: Iae01d1dc6894117dc693b4690d8bc79861212ae6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7863
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
3f0b1d8e0b fix(corp/data-import): commit the final transaction, too
Otherwise up to 1000 elements might be missing.

Change-Id: I20d6238424eec27f0e758e7737c9c31bcb81b23d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7862
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2023-01-18 15:44:06 +00:00
Vincent Ambo
6986aa5824 feat(corp/data-import): insert OpenCorpora data into SQLite
This is an initial and kind of dumb table structure, but there's some
massaging that needs to be done before this makes more sense.

Change-Id: I441288b684ef86be507099bcc4ebf984598789c8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 15:44:06 +00:00
Vincent Ambo
485c3cc912 feat(corp/data-import): parse lemmas from OpenCorpora dump
Change-Id: I1e4efcfc8e555f61578b563411d5e6ed9590d8e8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7860
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00
Vincent Ambo
ee7616d956 feat(corp/russian/data-import): new OpenCorpora data import tool
Adds the beginning of a tool which can import OpenCorpora data into a
SQLite database. This is quite a lot of toil and there's probably a
better way to do this, but overall becoming this intimately familiar
with the data structures is quite helpful for understanding what I
can/can't do with only this dataset.

Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00