This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects).
More...
|
| def | __init__ |
| | Constructor. More...
|
| |
| def | get_errors |
| | Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type. More...
|
| |
| def | re_parse |
| | Resets internal data structures and parses the TRS file a second time. More...
|
| |
| def | parse |
| | Parses the TRS file, returning a list of Segments. More...
|
| |
| def | get_utter_by_id |
| | Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute. More...
|
| |
|
| def | _init_data_structures |
| | Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc. More...
|
| |
| def | _parse |
| | This method performs the actual parsing work that produces a list of Segment objects from the XML. More...
|
| |
| def | _parse_utters |
| | Creates Utterance objects for a given XML turn element. More...
|
| |
| def | _parse_speech_data |
| | Extracts Utterance attributes (eg. More...
|
| |
| def | _assign_speaker |
| | Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object. More...
|
| |
| def | _assign_utter_attribs |
| | Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file. More...
|
| |
| def | _parse_speakers |
| | Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top). More...
|
| |
This class parses transcribed (or untranscribed) TRS files, producing output in the form of Segment objects (which contain Utterance objects).
The tasks of assigning Utterance start/end times and linking segments into chains are passed off to the parsers.state_machines.StateMachines class.
Definition at line 20 of file trs_parser.py.
| def parsers.trs_parser.TRSParser.__init__ |
( |
|
self, |
|
|
|
filename |
|
) |
| |
Constructor.
- Parameters
-
| self | |
| filename | (string) full path to TRS file to parse |
Definition at line 31 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._assign_speaker |
( |
|
self, |
|
|
|
el, |
|
|
|
utter |
|
) |
| |
|
private |
Determines the speaker for an Utterance, and sets the Utterance speaker attribute to an appropriate Speaker object.
- Parameters
-
| self | |
| el | (etree Element object) The XML element (with either a "sync" or a "who" tag) that corresponds to utter |
| utter | (Utterance) The Utterance object to assign a speaker to |
Definition at line 243 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._assign_utter_attribs |
( |
|
self, |
|
|
|
utter, |
|
|
|
line, |
|
|
|
remove_bad_trans_codes |
|
) |
| |
|
private |
Performs the actual assignment of utterance attributes (like transcription phrase, codes, etc.), based upon a line from the TRS file.
- Parameters
-
| self | |
| utter | (Utterance) the object we are assigning attributes to |
| line | (string) the text following a "sync" or "who" element. This contains LENA codes, plus transcriber added data (and more) |
Definition at line 258 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._init_data_structures |
( |
|
self | ) |
|
|
private |
Sets up data structures used to iterate through the XML and track segments, utterances, errors, etc.
- Parameters
-
Definition at line 40 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._parse |
( |
|
self, |
|
|
|
progress_update_fcn, |
|
|
|
seg_filters, |
|
|
|
remove_bad_trans_codes |
|
) |
| |
|
private |
This method performs the actual parsing work that produces a list of Segment objects from the XML.
- Parameters
-
| self | |
| progress_update_fcn | (function) see identical parameter description in parse() |
| seg_filters | (list) list of SegFilter objects to apply to the segments as they are created. Anything that these filters exclude will not be in the returned list (their changes are made permanent). |
- Returns
- (list) list of Segment objects
Definition at line 142 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._parse_speakers |
( |
|
self | ) |
|
|
private |
Grabs a list of all of the speakers in the TRS file, from the <Speakers> tag (which appears near the top).
Creates Speaker objects for them and stores them in the self.speakers list.
- Parameters
-
Definition at line 306 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._parse_speech_data |
( |
|
self, |
|
|
|
seg, |
|
|
|
el, |
|
|
|
remove_bad_trans_codes |
|
) |
| |
|
private |
Extracts Utterance attributes (eg.
transcriber codes, transcription phrase, etc.) from the text following a <sync> element.
- Parameters
-
| self | |
| seg | (Segment) A Segment object that Utterances created from this text should appear within. |
| el | (etree Element) An XML "text element" from the etree library, containing the data immediately following a <sync> tag. This may span multiple lines. |
- Returns
- (list) List of Utterance objects with their attributes set. Multiple Utterance objects are created from the text if it spans multiple lines (different speakers) or uses the '.' operator (see the transcriber manual).
Definition at line 202 of file trs_parser.py.
| def parsers.trs_parser.TRSParser._parse_utters |
( |
|
self, |
|
|
|
seg, |
|
|
|
turn, |
|
|
|
remove_bad_trans_codes |
|
) |
| |
|
private |
Creates Utterance objects for a given XML turn element.
- Parameters
-
| self | |
| seg | (Segment) the parent Segment object |
| turn | (Element) an etree.Element object representing the XML node |
- Returns
- (list) list of Utterance objects
Definition at line 183 of file trs_parser.py.
| def parsers.trs_parser.TRSParser.get_errors |
( |
|
self | ) |
|
Retreives the errors and warnings found by this parser in the form of an ErrorCollector object It provides methods to look up various errors/warnings by type.
- Parameters
-
- Returns
- (ErrorCollector) - this object can be used to lookup errors/warnings by type (see errors.ErrorCollector class)
Definition at line 62 of file trs_parser.py.
| def parsers.trs_parser.TRSParser.get_utter_by_id |
( |
|
self, |
|
|
|
utter_id |
|
) |
| |
Retreives an Utterance object (residing in one of this trs parser's segments) by its utterance id attribute.
- Parameters
-
| self | |
| utter_id | (int) utterance id to search for |
- Returns
- (Utterance) the requested Utterance object, or None if not found
Definition at line 120 of file trs_parser.py.
| def parsers.trs_parser.TRSParser.parse |
( |
|
self, |
|
|
|
progress_update_fcn = None, |
|
|
|
progress_next_phase_fcn = None, |
|
|
|
validate = True, |
|
|
|
seg_filters = [], |
|
|
|
remove_bad_trans_codes = True |
|
) |
| |
Parses the TRS file, returning a list of Segments.
- Parameters
-
| self | |
| progress_update_fcn | (function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase |
| progress_next_phase_fcn(function=None) | - moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog |
| validate | (boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise |
| seg_filters | (list=[]) list of SegFilter objects. These filters are applied to the internal segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser) |
- Returns
- (list) list of Segment objects
Definition at line 89 of file trs_parser.py.
| def parsers.trs_parser.TRSParser.re_parse |
( |
|
self, |
|
|
|
progress_update_fcn = None, |
|
|
|
progress_next_phase_fcn = None, |
|
|
|
validate = True, |
|
|
|
seg_filters = [], |
|
|
|
remove_bad_trans_codes = True |
|
) |
| |
Resets internal data structures and parses the TRS file a second time.
Useful if the file has changed since the last parse. All cached segments/utterances from the last parse are cleared.
- Parameters
-
| self | |
| progress_update_fcn | (function=None) function accepting a value in [0,1] to display as a progress bar - see utils.ProgressDialog. This value is used to indicate the level of completeness of the current phase |
| progress_next_phase_fcn(function=None) | - moves the progress bar to the next phase, which causes new text to be displayed in the bar - see utils.ProgressDialog |
| validate | (boolean=True) set to True if you want the parser to check for errors (can be retreived with get_errors()), False otherwise |
| seg_filters | (list=[]) list of SegFilter objects. These filters are applied to the segments list in a permanent manner (i.e. anything they filter out will not be returned by this parser) |
- Returns
- (list) list of Segment objects
Definition at line 73 of file trs_parser.py.
| parsers.trs_parser.TRSParser.error_collector |
| parsers.trs_parser.TRSParser.filename |
| parsers.trs_parser.TRSParser.link_sm |
| parsers.trs_parser.TRSParser.logger |
| parsers.trs_parser.TRSParser.parsed |
| parsers.trs_parser.TRSParser.segments |
| parsers.trs_parser.TRSParser.speakers |
| parsers.trs_parser.TRSParser.total_utters |
| string parsers.trs_parser.TRSParser.TRANS_LINE_REGEX = '^\s*([^\|]*?)\s*(' |
|
static |
| string parsers.trs_parser.TRSParser.TRANS_OVERLAP_REGEX = '\s*<.*>\s*' |
|
static |
| parsers.trs_parser.TRSParser.tree |
| parsers.trs_parser.TRSParser.utter_index |
The documentation for this class was generated from the following file: