BELA API reference

For most people, bela.read_eaf() is the first thing to look at. This function returns a bela.Bela2 object for manipulating a BELA transcript directly:

>>> import bela
>>> b2 = bela.read_eaf("my_bela_filename.eaf")

Now you can use the created b2 object to process BELA data.

>>> for person in b2.persons:
>>>     print(person.name, person.code)
>>>     for u in person.utterances:
>>>         print(u, u.from_ts, u.to_ts, u.duration)
>>>         if u.translation:
>>>             print(u.translation)
>>>         for c in u.chunks:
>>>             print(f"  - {c} [{c.language}]")

The bela module

bela.read_eaf(eaf_path, **kwargs)

Read an EAF file as a Bela2 object

Parameters

eaf_path (str-like object or a Path object) – Path to the EAF file

Returns

A Bela2 object

Return type

bela.Bela2

bela.from_elan(elan, eaf_path=':memory:', **kwargs)

Create a BELA-con version 2.x object from a speach.elan.ELANDoc object

The lex module

This module provides lexicon analysis functions (i.e. counting tokens, calculating class-token ratio, et cetera). New users should start with bela.lex.CorpusLexicalAnalyser.

>>> from bela.lex import CorpusLexicalAnalyser
>>> analyser = CorpusLexicalAnalyser()
>>> for person in b2.persons:
>>>     for u in person.utterances:
>>>         analyser.add(u.text, u.language, source=source, speaker=person.code)
>>> analyser.analyse()
class bela.lex.CorpusLexicalAnalyser(filepath=':memory:', lang_lex_map=None, word_only=False, lemmatizer=True, **kwargs)[source]

Analyse a corpus text

analyse(external_tokenizer=True)[source]

Analyse all available profiles (i.e. speakers)

read(**kwargs)[source]

Read the CSV file content specified by self.filepath

to_dict()[source]

Export analysed result as a JSON-ready object

BELA-con version 2.0 API

The official Bela convention. By default, this should be used for new transcripts.

class bela.Bela2(elan, path=':memory:', allow_empty=False, nlp_tokenizer=False, word_only=True, ellipsis=True, validate_baby_languages=False, ansi_languages=('English', 'Vocal Sounds', 'Malay', 'Red Dot', ':v:airstream', ':v:crying', ':v:vocalizations'), auto_tokenize=True, split_punc=True, remove_punc=True, **kwargs)[source]

BELA-convention version 2

find_turns(threshold=1500)[source]

Find potential turn-takings

Parameters

threshold (float) – Delay between utterances in milliseconds

Returns

List of utterance pairs (2-tuple) (from utterance, to utterance object)

static from_elan(elan, eaf_path=':memory:', **kwargs)[source]

Create a BELA-con version 2.x object from a speach.elan.ELANDoc object

parse_name(tier)[source]

(Internal) Parse participant name and tier type from a tier object and then update the tier object

This function is internal and should not be used outside of this class.

Parameters

tier (speach.elan.ELANTier) – The tier object to parse

static read_eaf(eaf_path, **kwargs)[source]

Read an EAF file as a Bela2 object

Parameters

eaf_path (str-like object or a Path object) – Path to the EAF file

Returns

A Bela2 object

Return type

bela.Bela2

to_language_mix(to_ts=None, auto_compute=True)[source]

Collapse utterances to generate a language mix timeline

tokenize()[source]

tokenize all utterances

property annotation

Get an annotation object by ID

property participant_codes

Immutable list of participant codes

property person_map

Map participant (i.e. person code) to person object

property persons

All Person objects in this BELA object

property roots

Direct access to all underlying ELAN root tiers

BELA-con version 1.0 API

Bela1 is deprecated from Mar 2020. It is still available for backward compatible only. Please do not use it for anything other than BLIP’s PILOT10 corpus.

class bela.Bela1[source]

This class represent BELA convention version 1

static read(filepath, autotag=True)[source]

Read ELAN csv file

to_language_mix(to_ts=None, auto_compute=True)[source]

Collapse utterances to generate a language mix timeline