feat(corp/data-import): map OR word types to sets of OC grammemes
Change-Id: I674f3a66fcd65314431a2ebd747e3830aa2dd7a1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/7924 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su> Autosubmit: tazjin <tazjin@tvl.su>
This commit is contained in:
parent
80723b708d
commit
192dac5a74
1 changed files with 13 additions and 0 deletions
|
@ -1,5 +1,18 @@
|
||||||
//! Manual mapping of some data structures in OC/OR corpora.
|
//! Manual mapping of some data structures in OC/OR corpora.
|
||||||
|
|
||||||
|
/// Maps the *names* of OpenRussian word types (the `word_type` field
|
||||||
|
/// in the `or_words` table) to the *set* of OpenCorpora grammemes
|
||||||
|
/// commonly attached to lemmata of this type in OC.
|
||||||
|
///
|
||||||
|
/// Some word types just don't map over, and are omitted. Many words
|
||||||
|
/// also have an empty word type.
|
||||||
|
pub const WORD_TYPES_GRAMMEME_MAP: &'static [(&'static str, &'static [&'static str])] = &[
|
||||||
|
("adjective", &["ADJF"]),
|
||||||
|
("adverb", &["ADVB"]),
|
||||||
|
("noun", &["NOUN"]),
|
||||||
|
("verb", &["INFN"]), // or "VERB" ...
|
||||||
|
];
|
||||||
|
|
||||||
/// Maps the *names* of OpenRussian grammemes (the `form_type` fields
|
/// Maps the *names* of OpenRussian grammemes (the `form_type` fields
|
||||||
/// in the `or_word_forms` table) to the *set* of OpenCorpora
|
/// in the `or_word_forms` table) to the *set* of OpenCorpora
|
||||||
/// grammemes attached to them corresponding lemma in the `oc_lemmas`
|
/// grammemes attached to them corresponding lemma in the `oc_lemmas`
|
||||||
|
|
Loading…
Reference in a new issue