BELA API reference¶
For most people, bela.read_eaf()
is the first thing to look at.
This function returns a bela.Bela2
object for manipulating
a BELA transcript directly:
>>> import bela
>>> b2 = bela.read_eaf("my_bela_filename.eaf")
Now you can use the created b2
object to process BELA data.
>>> for person in b2.persons:
>>> print(person.name, person.code)
>>> for u in person.utterances:
>>> print(u, u.from_ts, u.to_ts, u.duration)
>>> if u.translation:
>>> print(u.translation)
>>> for c in u.chunks:
>>> print(f" - {c} [{c.language}]")
The bela module¶
- bela.read_eaf(eaf_path, **kwargs)¶
Read an EAF file as a Bela2 object
- Parameters
eaf_path (str-like object or a Path object) – Path to the EAF file
- Returns
A Bela2 object
- Return type
- bela.from_elan(elan, eaf_path=':memory:', **kwargs)¶
Create a BELA-con version 2.x object from a
speach.elan.ELANDoc
object
The lex module¶
This module provides lexicon analysis functions
(i.e. counting tokens, calculating class-token ratio, et cetera).
New users should start with bela.lex.CorpusLexicalAnalyser
.
>>> from bela.lex import CorpusLexicalAnalyser
>>> analyser = CorpusLexicalAnalyser()
>>> for person in b2.persons:
>>> for u in person.utterances:
>>> analyser.add(u.text, u.language, source=source, speaker=person.code)
>>> analyser.analyse()
BELA-con version 2.0 API¶
The official Bela convention. By default, this should be used for new transcripts.
- class bela.Bela2(elan, path=':memory:', allow_empty=False, nlp_tokenizer=False, word_only=True, ellipsis=True, validate_baby_languages=False, ansi_languages=('English', 'Vocal Sounds', 'Malay', 'Red Dot', ':v:airstream', ':v:crying', ':v:vocalizations'), auto_tokenize=True, split_punc=True, remove_punc=True, **kwargs)[source]¶
BELA-convention version 2
- find_turns(threshold=1500)[source]¶
Find potential turn-takings
- Parameters
threshold (float) – Delay between utterances in milliseconds
- Returns
List of utterance pairs (2-tuple) (from utterance, to utterance object)
- static from_elan(elan, eaf_path=':memory:', **kwargs)[source]¶
Create a BELA-con version 2.x object from a
speach.elan.ELANDoc
object
- parse_name(tier)[source]¶
(Internal) Parse participant name and tier type from a tier object and then update the tier object
This function is internal and should not be used outside of this class.
- Parameters
tier (speach.elan.ELANTier) – The tier object to parse
- static read_eaf(eaf_path, **kwargs)[source]¶
Read an EAF file as a Bela2 object
- Parameters
eaf_path (str-like object or a Path object) – Path to the EAF file
- Returns
A Bela2 object
- Return type
- to_language_mix(to_ts=None, auto_compute=True)[source]¶
Collapse utterances to generate a language mix timeline
- property annotation¶
Get an annotation object by ID
- property participant_codes¶
Immutable list of participant codes
- property person_map¶
Map participant (i.e. person code) to person object
- property persons¶
All Person objects in this BELA object
- property roots¶
Direct access to all underlying ELAN root tiers
BELA-con version 1.0 API¶
Bela1 is deprecated from Mar 2020. It is still available for backward compatible only. Please do not use it for anything other than BLIP’s PILOT10 corpus.