👸 BELA¶
BELA (BLIP ELAN Language Annotation) is a pathway for creating and analysing multi-lingual transcripts using BELA convention and ELAN software.
Getting started¶
BELA is available on PyPI and can be installed using pip:
pip install bela
Sample code¶
The following code snippet reads a BELA transcript and prints out all participants and their utterances & chunks.
import bela
b2 = bela.read_eaf("my_bela_filename.eaf")
for person in b2.persons:
print(person.name, person.code)
for u in person.utterances:
print(u, u.from_ts, u.to_ts, u.duration)
if u.translation:
print(u.translation)
for c in u.chunks:
print(f" - {c} [{c.language}]")
BELA Tutorials¶
To be updated.
For BELA API reference, please visit BELA API reference page.
BELA API reference¶
For most people, bela.read_eaf()
is the first thing to look at.
This function returns a bela.Bela2
object for manipulating
a BELA transcript directly:
>>> import bela
>>> b2 = bela.read_eaf("my_bela_filename.eaf")
Now you can use the created b2
object to process BELA data.
>>> for person in b2.persons:
>>> print(person.name, person.code)
>>> for u in person.utterances:
>>> print(u, u.from_ts, u.to_ts, u.duration)
>>> if u.translation:
>>> print(u.translation)
>>> for c in u.chunks:
>>> print(f" - {c} [{c.language}]")
The bela module¶
- bela.read_eaf(eaf_path, **kwargs)¶
Read an EAF file as a Bela2 object
- Parameters
eaf_path (str-like object or a Path object) – Path to the EAF file
- Returns
A Bela2 object
- Return type
- bela.from_elan(elan, eaf_path=':memory:', **kwargs)¶
Create a BELA-con version 2.x object from a
speach.elan.ELANDoc
object
The lex module¶
This module provides lexicon analysis functions
(i.e. counting tokens, calculating class-token ratio, et cetera).
New users should start with bela.lex.CorpusLexicalAnalyser
.
>>> from bela.lex import CorpusLexicalAnalyser
>>> analyser = CorpusLexicalAnalyser()
>>> for person in b2.persons:
>>> for u in person.utterances:
>>> analyser.add(u.text, u.language, source=source, speaker=person.code)
>>> analyser.analyse()
BELA-con version 2.0 API¶
The official Bela convention. By default, this should be used for new transcripts.
- class bela.Bela2(elan, path=':memory:', allow_empty=False, nlp_tokenizer=False, word_only=True, ellipsis=True, validate_baby_languages=False, ansi_languages=('English', 'Vocal Sounds', 'Malay', 'Red Dot', ':v:airstream', ':v:crying', ':v:vocalizations'), auto_tokenize=True, split_punc=True, remove_punc=True, **kwargs)[source]¶
BELA-convention version 2
- find_turns(threshold=1500)[source]¶
Find potential turn-takings
- Parameters
threshold (float) – Delay between utterances in milliseconds
- Returns
List of utterance pairs (2-tuple) (from utterance, to utterance object)
- static from_elan(elan, eaf_path=':memory:', **kwargs)[source]¶
Create a BELA-con version 2.x object from a
speach.elan.ELANDoc
object
- parse_name(tier)[source]¶
(Internal) Parse participant name and tier type from a tier object and then update the tier object
This function is internal and should not be used outside of this class.
- Parameters
tier (speach.elan.ELANTier) – The tier object to parse
- static read_eaf(eaf_path, **kwargs)[source]¶
Read an EAF file as a Bela2 object
- Parameters
eaf_path (str-like object or a Path object) – Path to the EAF file
- Returns
A Bela2 object
- Return type
- to_language_mix(to_ts=None, auto_compute=True)[source]¶
Collapse utterances to generate a language mix timeline
- property annotation¶
Get an annotation object by ID
- property participant_codes¶
Immutable list of participant codes
- property person_map¶
Map participant (i.e. person code) to person object
- property persons¶
All Person objects in this BELA object
- property roots¶
Direct access to all underlying ELAN root tiers
BELA-con version 1.0 API¶
Bela1 is deprecated from Mar 2020. It is still available for backward compatible only. Please do not use it for anything other than BLIP’s PILOT10 corpus.
BELA Changelog¶
BELA 2.0.0a22 [WIP]¶
Added tokenize() function to utterances and chunks
Added the first working prototype of BELA builder (2022-03-29)
Added Bela2.save() function
Kickstarted BELA documentation
Added BELA documentation: https://bela.readthedocs.io/
Added ANSI & baby language checking rules
use
speach
>= 0.1a15.post1Exposed
read_eaf()
andfrom_elan()
to module levelExposed
media_file
,media_url
,relative_media_url
propertiesFixed None utterances & chunks for not well-formed transcripts
BELA 2.0.0a21¶
Use
speach
> 0.1a14 to support Python 3.10 and 3.11Updated annotation mapping mechanism
Warn users if OMW-1.4 dataset is missing
Clean up
~
characters after plus-to-space expansion
BELA 2.0.0a19¶
2022-01-26
Released bela-2.0.0a19 to PyPI: https://pypi.org/project/bela/2.0.0a19/