Commit graph

3 commits

Author SHA1 Message Date
Vincent Ambo
6986aa5824 feat(corp/data-import): insert OpenCorpora data into SQLite
This is an initial and kind of dumb table structure, but there's some
massaging that needs to be done before this makes more sense.

Change-Id: I441288b684ef86be507099bcc4ebf984598789c8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7861
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 15:44:06 +00:00
Vincent Ambo
485c3cc912 feat(corp/data-import): parse lemmas from OpenCorpora dump
Change-Id: I1e4efcfc8e555f61578b563411d5e6ed9590d8e8
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7860
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00
Vincent Ambo
ee7616d956 feat(corp/russian/data-import): new OpenCorpora data import tool
Adds the beginning of a tool which can import OpenCorpora data into a
SQLite database. This is quite a lot of toil and there's probably a
better way to do this, but overall becoming this intimately familiar
with the data structures is quite helpful for understanding what I
can/can't do with only this dataset.

Change-Id: Ieab33a8ce07ea4ac87917b9c8132226bbc6523b1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7859
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-01-18 01:10:37 +00:00