feat(corp/data-import): map OR word types to sets of OC grammemes

Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
Autosubmit: tazjin <tazjin@tvl.su>
This commit is contained in:
Vincent Ambo 2023-01-25 01:36:35 +03:00 committed by clbot
parent 80723b708d
commit 192dac5a74

View file

@ -1,5 +1,18 @@
//! Manual mapping of some data structures in OC/OR corpora. //! Manual mapping of some data structures in OC/OR corpora.
/// Maps the *names* of OpenRussian word types (the `word_type` field
/// in the `or_words` table) to the *set* of OpenCorpora grammemes
/// commonly attached to lemmata of this type in OC.
///
/// Some word types just don't map over, and are omitted. Many words
/// also have an empty word type.
pub const WORD_TYPES_GRAMMEME_MAP: &'static [(&'static str, &'static [&'static str])] = &[
("adjective", &["ADJF"]),
("adverb", &["ADVB"]),
("noun", &["NOUN"]),
("verb", &["INFN"]), // or "VERB" ...
];
/// Maps the *names* of OpenRussian grammemes (the `form_type` fields /// Maps the *names* of OpenRussian grammemes (the `form_type` fields
/// in the `or_word_forms` table) to the *set* of OpenCorpora /// in the `or_word_forms` table) to the *set* of OpenCorpora
/// grammemes attached to them corresponding lemma in the `oc_lemmas` /// grammemes attached to them corresponding lemma in the `oc_lemmas`