This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects). More...

Inheritance diagram for parsers.trs_parser.TRSParser:

[legend]

Public Member Functions
def	__init__
	Constructor. More...

def	get_errors
	Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type. More...

def	re_parse
	Resets internal data structures and parses the TRS file a second time. More...

def	parse
	Parses the TRS file, returning a list of Segments. More...

def	get_utter_by_id
	Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute. More...

Public Attributes
	logger

	filename

	error_collector

	segments

	speakers

	utter_index

	parsed

	link_sm

	total_utters

	tree

Static Public Attributes
string	TRANS_LINE_REGEX = '^\s([^\\|]?)\s*('

string	TRANS_OVERLAP_REGEX = '\s<.>\s*'

Private Member Functions
def	_init_data_structures
	Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc. More...

def	_parse
	This method performs the actual parsing work that produces a list of Segment objects from the XML. More...

def	_parse_utters
	Creates Utterance objects for a given XML turn element. More...

def	_parse_speech_data
	Extracts Utterance attributes (eg. More...

def	_assign_speaker
	Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object. More...

def	_assign_utter_attribs
	Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file. More...

def	_parse_speakers
	Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top). More...

Detailed Description

This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects).

The tasks of assigning Utterance start/end times and linking segments into chains are passed off to the parsers.state_machines.StateMachines class.

Definition at line 20 of file trs_parser.py.

Constructor & Destructor Documentation

def parsers.trs_parser.TRSParser.__init__	(	self,
		filename
	)

Constructor.

Parameters

self
filename	(string) full path to TRS file to parse

Definition at line 31 of file trs_parser.py.

Member Function Documentation

def parsers.trs_parser.TRSParser._assign_speaker	(	self,
		el,
		utter
	)

private

Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object.

Parameters

self
el	(etree Element object) The XML element (with either a "sync" or a "who" tag) that corresponds to utter
utter	(Utterance) The Utterance object to assign a speaker to

Definition at line 243 of file trs_parser.py.

def parsers.trs_parser.TRSParser._assign_utter_attribs	(	self,
		utter,
		line,
		remove_bad_trans_codes
	)

private

Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file.

Parameters

self
utter	(Utterance) the object we are assigning attributes to
line	(string) the text following a "sync" or "who" element. This contains LENA codes, plus transcriber added data (and more)

Definition at line 258 of file trs_parser.py.

def parsers.trs_parser.TRSParser._init_data_structures ( self )

private

Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc.

Parameters

self

Definition at line 40 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse	(	self,
		progress_update_fcn,
		seg_filters,
		remove_bad_trans_codes
	)

private

This method performs the actual parsing work that produces a list of Segment objects from the XML.

Parameters

self
progress_update_fcn	(function) see identical parameter description in parse()
seg_filters	(list) list of SegFilter objects to apply to the segments as they are created. Anything that these filters exclude will not be in the returned list (their changes are made permanent).

Returns: (list) list of Segment objects

Definition at line 142 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_speakers ( self )

private

Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top).

Creates Speaker objects for them and stores them in the self.speakers list.

Parameters

self

Definition at line 306 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_speech_data	(	self,
		seg,
		el,
		remove_bad_trans_codes
	)

private

Extracts Utterance attributes (eg.

transcriber codes, transcription phrase, etc.) from the text following a <sync> element.

Parameters

self
seg	(Segment) A Segment object that Utterances created from this text should appear within.
el	(etree Element) An XML "text element" from the etree library, containing the data immediately following a <sync> tag. This may span multiple lines.

Returns: (list) List of Utterance objects with their attributes set. Multiple Utterance objects are created from the text if it spans multiple lines (different speakers) or uses the '.' operator (see the transcriber manual).

Definition at line 202 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_utters	(	self,
		seg,
		turn,
		remove_bad_trans_codes
	)

private

Creates Utterance objects for a given XML turn element.

Parameters

self
seg	(Segment) the parent Segment object
turn	(Element) an etree.Element object representing the XML node

Returns: (list) list of Utterance objects

Definition at line 183 of file trs_parser.py.

def parsers.trs_parser.TRSParser.get_errors ( self )

Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type.

Parameters

self

Returns: (ErrorCollector) - this object can be used to lookup errors/warnings by type (see errors.ErrorCollector class)

Definition at line 62 of file trs_parser.py.

def parsers.trs_parser.TRSParser.get_utter_by_id	(	self,
		utter_id
	)

Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute.

Parameters

self
utter_id	(int) utterance id to search for

Returns: (Utterance) the requested Utterance object, or None if not found

Definition at line 120 of file trs_parser.py.

def parsers.trs_parser.TRSParser.parse	(	self,
		progress_update_fcn = `None`,
		progress_next_phase_fcn = `None`,
		validate = `True`,
		seg_filters = `[]`,
		remove_bad_trans_codes = `True`
	)

Parses the TRS file, returning a list of Segments.

Parameters

self
progress_update_fcn	(function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase
progress_next_phase_fcn(function=None)	- moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog
validate	(boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise
seg_filters	(list=[]) list of SegFilter objects. These filters are applied to the internal segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser)

Returns: (list) list of Segment objects

Definition at line 89 of file trs_parser.py.

def parsers.trs_parser.TRSParser.re_parse	(	self,
		progress_update_fcn = `None`,
		progress_next_phase_fcn = `None`,
		validate = `True`,
		seg_filters = `[]`,
		remove_bad_trans_codes = `True`
	)

Resets internal data structures and parses the TRS file a second time.

Useful if the file has changed since the last parse. All cached segments/utterances from the last parse are cleared.

Parameters

self
progress_update_fcn	(function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase
progress_next_phase_fcn(function=None)	- moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog
validate	(boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise
seg_filters	(list=[]) list of SegFilter objects. These filters are applied to the segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser)

Returns: (list) list of Segment objects

Definition at line 73 of file trs_parser.py.

Member Data Documentation

parsers.trs_parser.TRSParser.error_collector

Definition at line 41 of file trs_parser.py.

parsers.trs_parser.TRSParser.filename

Definition at line 33 of file trs_parser.py.

parsers.trs_parser.TRSParser.link_sm

Definition at line 46 of file trs_parser.py.

parsers.trs_parser.TRSParser.logger

Definition at line 32 of file trs_parser.py.

parsers.trs_parser.TRSParser.parsed

Definition at line 45 of file trs_parser.py.

parsers.trs_parser.TRSParser.segments

Definition at line 42 of file trs_parser.py.

parsers.trs_parser.TRSParser.speakers

Definition at line 43 of file trs_parser.py.

parsers.trs_parser.TRSParser.total_utters

Definition at line 47 of file trs_parser.py.

string parsers.trs_parser.TRSParser.TRANS_LINE_REGEX = '^\s*([^\|]*?)\s*('

static

Definition at line 23 of file trs_parser.py.

string parsers.trs_parser.TRSParser.TRANS_OVERLAP_REGEX = '\s*<.*>\s*'

static

Definition at line 26 of file trs_parser.py.

parsers.trs_parser.TRSParser.tree

Definition at line 48 of file trs_parser.py.

parsers.trs_parser.TRSParser.utter_index

Definition at line 44 of file trs_parser.py.

The documentation for this class was generated from the following file:

C:/Users/Wayne/Documents/baby-lab/bll_app/parsers/trs_parser.py

Public Member Functions

Public Attributes

Static Public Attributes

Private Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation