Baby Language Lab Scripts
A collection of data processing tools.
 All Classes Namespaces Files Functions Variables Pages
parsers.trs_parser.TRSParser Class Reference

This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects). More...

Inheritance diagram for parsers.trs_parser.TRSParser:

Public Member Functions

def __init__
 Constructor. More...
 
def get_errors
 Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type. More...
 
def re_parse
 Resets internal data structures and parses the TRS file a second time. More...
 
def parse
 Parses the TRS file, returning a list of Segments. More...
 
def get_utter_by_id
 Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute. More...
 

Public Attributes

 logger
 
 filename
 
 error_collector
 
 segments
 
 speakers
 
 utter_index
 
 parsed
 
 link_sm
 
 total_utters
 
 tree
 

Static Public Attributes

string TRANS_LINE_REGEX = '^\s*([^\|]*?)\s*('
 
string TRANS_OVERLAP_REGEX = '\s*<.*>\s*'
 

Private Member Functions

def _init_data_structures
 Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc. More...
 
def _parse
 This method performs the actual parsing work that produces a list of Segment objects from the XML. More...
 
def _parse_utters
 Creates Utterance objects for a given XML turn element. More...
 
def _parse_speech_data
 Extracts Utterance attributes (eg. More...
 
def _assign_speaker
 Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object. More...
 
def _assign_utter_attribs
 Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file. More...
 
def _parse_speakers
 Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top). More...
 

Detailed Description

This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects).

The tasks of assigning Utterance start/end times and linking segments into chains are passed off to the parsers.state_machines.StateMachines class.

Definition at line 20 of file trs_parser.py.

Constructor & Destructor Documentation

def parsers.trs_parser.TRSParser.__init__ (   self,
  filename 
)

Constructor.

Parameters
self
filename(string) full path to TRS file to parse

Definition at line 31 of file trs_parser.py.

Member Function Documentation

def parsers.trs_parser.TRSParser._assign_speaker (   self,
  el,
  utter 
)
private

Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object.

Parameters
self
el(etree Element object) The XML element (with either a "sync" or a "who" tag) that corresponds to utter
utter(Utterance) The Utterance object to assign a speaker to

Definition at line 243 of file trs_parser.py.

def parsers.trs_parser.TRSParser._assign_utter_attribs (   self,
  utter,
  line,
  remove_bad_trans_codes 
)
private

Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file.

Parameters
self
utter(Utterance) the object we are assigning attributes to
line(string) the text following a "sync" or "who" element. This contains LENA codes, plus transcriber added data (and more)

Definition at line 258 of file trs_parser.py.

def parsers.trs_parser.TRSParser._init_data_structures (   self)
private

Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc.

Parameters
self

Definition at line 40 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse (   self,
  progress_update_fcn,
  seg_filters,
  remove_bad_trans_codes 
)
private

This method performs the actual parsing work that produces a list of Segment objects from the XML.

Parameters
self
progress_update_fcn(function) see identical parameter description in parse()
seg_filters(list) list of SegFilter objects to apply to the segments as they are created. Anything that these filters exclude will not be in the returned list (their changes are made permanent).
Returns
(list) list of Segment objects

Definition at line 142 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_speakers (   self)
private

Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top).

Creates Speaker objects for them and stores them in the self.speakers list.

Parameters
self

Definition at line 306 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_speech_data (   self,
  seg,
  el,
  remove_bad_trans_codes 
)
private

Extracts Utterance attributes (eg.

transcriber codes, transcription phrase, etc.) from the text following a <sync> element.

Parameters
self
seg(Segment) A Segment object that Utterances created from this text should appear within.
el(etree Element) An XML "text element" from the etree library, containing the data immediately following a <sync> tag. This may span multiple lines.
Returns
(list) List of Utterance objects with their attributes set. Multiple Utterance objects are created from the text if it spans multiple lines (different speakers) or uses the '.' operator (see the transcriber manual).

Definition at line 202 of file trs_parser.py.

def parsers.trs_parser.TRSParser._parse_utters (   self,
  seg,
  turn,
  remove_bad_trans_codes 
)
private

Creates Utterance objects for a given XML turn element.

Parameters
self
seg(Segment) the parent Segment object
turn(Element) an etree.Element object representing the XML node
Returns
(list) list of Utterance objects

Definition at line 183 of file trs_parser.py.

def parsers.trs_parser.TRSParser.get_errors (   self)

Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type.

Parameters
self
Returns
(ErrorCollector) - this object can be used to lookup errors/warnings by type (see errors.ErrorCollector class)

Definition at line 62 of file trs_parser.py.

def parsers.trs_parser.TRSParser.get_utter_by_id (   self,
  utter_id 
)

Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute.

Parameters
self
utter_id(int) utterance id to search for
Returns
(Utterance) the requested Utterance object, or None if not found

Definition at line 120 of file trs_parser.py.

def parsers.trs_parser.TRSParser.parse (   self,
  progress_update_fcn = None,
  progress_next_phase_fcn = None,
  validate = True,
  seg_filters = [],
  remove_bad_trans_codes = True 
)

Parses the TRS file, returning a list of Segments.

Parameters
self
progress_update_fcn(function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase
progress_next_phase_fcn(function=None)- moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog
validate(boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise
seg_filters(list=[]) list of SegFilter objects. These filters are applied to the internal segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser)
Returns
(list) list of Segment objects

Definition at line 89 of file trs_parser.py.

def parsers.trs_parser.TRSParser.re_parse (   self,
  progress_update_fcn = None,
  progress_next_phase_fcn = None,
  validate = True,
  seg_filters = [],
  remove_bad_trans_codes = True 
)

Resets internal data structures and parses the TRS file a second time.

Useful if the file has changed since the last parse. All cached segments/utterances from the last parse are cleared.

Parameters
self
progress_update_fcn(function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase
progress_next_phase_fcn(function=None)- moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog
validate(boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise
seg_filters(list=[]) list of SegFilter objects. These filters are applied to the segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser)
Returns
(list) list of Segment objects

Definition at line 73 of file trs_parser.py.

Member Data Documentation

parsers.trs_parser.TRSParser.error_collector

Definition at line 41 of file trs_parser.py.

parsers.trs_parser.TRSParser.filename

Definition at line 33 of file trs_parser.py.

parsers.trs_parser.TRSParser.link_sm

Definition at line 46 of file trs_parser.py.

parsers.trs_parser.TRSParser.logger

Definition at line 32 of file trs_parser.py.

parsers.trs_parser.TRSParser.parsed

Definition at line 45 of file trs_parser.py.

parsers.trs_parser.TRSParser.segments

Definition at line 42 of file trs_parser.py.

parsers.trs_parser.TRSParser.speakers

Definition at line 43 of file trs_parser.py.

parsers.trs_parser.TRSParser.total_utters

Definition at line 47 of file trs_parser.py.

string parsers.trs_parser.TRSParser.TRANS_LINE_REGEX = '^\s*([^\|]*?)\s*('
static

Definition at line 23 of file trs_parser.py.

string parsers.trs_parser.TRSParser.TRANS_OVERLAP_REGEX = '\s*<.*>\s*'
static

Definition at line 26 of file trs_parser.py.

parsers.trs_parser.TRSParser.tree

Definition at line 48 of file trs_parser.py.

parsers.trs_parser.TRSParser.utter_index

Definition at line 44 of file trs_parser.py.


The documentation for this class was generated from the following file: