feat(corp/data-import): map OR word types to sets of OC grammemes
Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su>
This commit is contained in:
parent
80723b708d
commit
192dac5a74
1 changed files with 13 additions and 0 deletions
|
@ -1,5 +1,18 @@
|
|||
//! Manual mapping of some data structures in OC/OR corpora.
|
||||
|
||||
/// Maps the *names* of OpenRussian word types (the `word_type` field
|
||||
/// in the `or_words` table) to the *set* of OpenCorpora grammemes
|
||||
/// commonly attached to lemmata of this type in OC.
|
||||
///
|
||||
/// Some word types just don't map over, and are omitted. Many words
|
||||
/// also have an empty word type.
|
||||
pub const WORD_TYPES_GRAMMEME_MAP: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("adjective", &["ADJF"]),
|
||||
("adverb", &["ADVB"]),
|
||||
("noun", &["NOUN"]),
|
||||
("verb", &["INFN"]), // or "VERB" ...
|
||||
];
|
||||
|
||||
/// Maps the *names* of OpenRussian grammemes (the `form_type` fields
|
||||
/// in the `or_word_forms` table) to the *set* of OpenCorpora
|
||||
/// grammemes attached to them corresponding lemma in the `oc_lemmas`
|
||||
|
|
Loading…
Reference in a new issue