![]() |
Baby Language Lab Scripts
A collection of data processing tools.
|
This class provides a way to convert a csv file into an in-memory database. More...
Public Member Functions | |
def | __init__ |
Constructor. More... | |
def | csv_insert |
Inserts a row of data straight from a csv file. More... | |
def | csv_select_by_index |
Selects data from the database. More... | |
def | get_db_col_names |
Fetches a list of all of the DB column names from the DB table. More... | |
def | csv_select_by_name |
Selects data from the database. More... | |
def | csv_update_by_index |
Generates and executes a SQL update statement. More... | |
def | csv_delete_by_index |
Generates and executes a SQL delete statement. More... | |
def | csv_delete_by_name |
Generates and executes a SQL delete statement. More... | |
def | csv_update_by_name |
Generates and executes a SQL update statement. More... | |
def | get_db_col_name |
def | add_column |
Adds a named column to the this database's single table. More... | |
def | write_to_file |
Writes out the data in this CSVDatabase contains to a csv file. More... | |
def | set_blanks_to_null |
Sets all values in all columns to NULL if they are equal to the empty string. More... | |
![]() | |
def | __init__ |
Constructor. More... | |
def | dump_to_file |
Writes this in-memory database to a *.db file on disk. More... | |
![]() | |
def | __init__ |
Constructor. More... | |
def | close |
Closes the database connection, releasing any resources that it was using. More... | |
def | execute_stmt |
Executes a verbatim SQL statement. More... | |
def | execute_script |
Reads and executes a sql script file. More... | |
def | insert |
Inserts one or more rows into a table in the database. More... | |
def | select |
Selects rows from a table in the database. More... | |
def | delete |
Deletes rows from a table in the database. More... | |
def | update_timestamp_col |
Updates the value of a timestamp column to the current date/time. More... | |
def | update |
Updates rows in a table. More... | |
def | write_to_file |
Writes out all of the data in a table to a csv file. More... | |
Static Public Member Functions | |
def | create_with_rows |
Convenience method to create a database from a list of rows that have already been read from a csv file. More... | |
def | create_with_files |
Convenience method to create a database from a bunch of files that all have the same columns (and column order). More... | |
Public Attributes | |
name_lookup | |
header_row | |
col_datatypes | |
Static Public Attributes | |
string | TABLE_NAME = 'data' |
string | COL_PREFIX = 'col' |
dictionary | DATA_TYPES |
tuple | ORDER_DIRS = Enum(['ASC', 'DESC'], ['ASC', 'DESC']) |
Private Member Functions | |
def | _create_data_table |
Creates the single table used by this in-memory database. More... | |
def | _get_col_indices |
Gets a list of the indices of columns you specify by name. More... | |
def | _build_where_clause |
Translates CSV column names to DB column names to build a SQL where clause like 'x = y'. More... | |
def | _remove_cols |
Takes a row (list) of data and removes the specified columns from the list. More... | |
This class provides a way to convert a csv file into an in-memory database.
This allows you to do SQL-operations on the data, which in some circumstances can be much more convenient than implementing search (query), sort, or data modification methods in straight Python. The cost for doing this is a little extra time inserting all the data - however, this cost isn't usually too bad because in-memory databases are up to an order of magnitude faster than file-based ones.
Sample usage:
Definition at line 39 of file csv_database.py.
def db.csv_database.CSVDatabase.__init__ | ( | self, | |
header_row, | |||
col_datatypes | |||
) |
Constructor.
self | |
header_row | (list of Strings) List of names of the columns in the csv file (generally this is just the top row of the csv file |
col_datatypes | (list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file. |
Definition at line 56 of file csv_database.py.
|
private |
Translates CSV column names to DB column names to build a SQL where clause like 'x = y'.
This will presumably be attached to a SQL statement to form '... where x = y'.
self | |
where_body | (String) body of the where clause, containing placeholders (s) for the column names (Eg. 's = ?') |
where_cols | (list) list of csv names of columns. These will be translated to DB column names and placed in the where_body (these go in the s placeholders). |
Definition at line 204 of file csv_database.py.
|
private |
Creates the single table used by this in-memory database.
self |
Definition at line 105 of file csv_database.py.
|
private |
Gets a list of the indices of columns you specify by name.
self | |
col_names | (list) list of column names (Strings) |
Definition at line 190 of file csv_database.py.
|
private |
Takes a row (list) of data and removes the specified columns from the list.
This is a convenience method and does not affect the data in the DB in any way.
self | |
row | (list) a row of values from the database or csv file |
col_indices | (list) list of the column indices that you want to remove from the row |
Definition at line 382 of file csv_database.py.
def db.csv_database.CSVDatabase.add_column | ( | self, | |
name, | |||
datatype | |||
) |
Adds a named column to the this database's single table.
The values in the newly added column are all set to NULL.
self | |
name | (string) name of the column to add. |
datatype | (fcn) a data-type function (must be in the keys of the static CSVDatabase.DATA_TYPES dictionary) corresponding to the type of data that this column will hold |
Definition at line 364 of file csv_database.py.
|
static |
Convenience method to create a database from a bunch of files that all have the same columns (and column order).
Note: This method will complain if it finds that any of the files you pass it have different column names.
file_list | (List) list of paths (Strings) to the files whose data you want to put into the new database |
col_datatypes | (list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file. |
Definition at line 82 of file csv_database.py.
|
static |
Convenience method to create a database from a list of rows that have already been read from a csv file.
rows | (list) List of lists, where inner lists are rows of csv data. First row must be column names. |
col_datatypes | (list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file. |
Definition at line 68 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_delete_by_index | ( | self, | |
where_cond = None , |
|||
params = [] |
|||
) |
Generates and executes a SQL delete statement.
This method uses the indices of the columns in the csv file. See csv_select_by_index() for a description of the parameters.
self | |
where_cond | (String=None) |
params | (list) |
Definition at line 300 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_delete_by_name | ( | self, | |
where_body = None , |
|||
where_cols = None , |
|||
params = [] |
|||
) |
Generates and executes a SQL delete statement.
This method uses the names of the columns in the csv file. See csv_select_by_name() for a description of the parameters.
self | |
where_body | (String=None) |
where_cols | (list=None) |
params | (list) |
Definition at line 318 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_insert | ( | self, | |
csv_row | |||
) |
Inserts a row of data straight from a csv file.
The data must have the types given in the col_datatypes param that was passed to the constructor or this will cause an exception.
self | |
csv_row | (list) list of data from a csv file (corresponding to a single row) |
Definition at line 118 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_select_by_index | ( | self, | |
col_indices = None , |
|||
where_cond = None , |
|||
params = [] , |
|||
order_by_indices = None , |
|||
order_by_dirs = None , |
|||
dump_sql = False , |
|||
fcn_indices = None , |
|||
fcns = None , |
|||
group_by_index = None |
|||
) |
Selects data from the database.
This method works using the indices of the columns in the CSV file that you'd like to select.
col_indices | (list=None) a list of the indices of the columns you'd like to select. |
where_cond | (String=None) see database.select() |
params | (list=[]) see database.select() |
order_by_indices | (list=None) list of indices of columns you'd like to order by |
order_by_dirs | (list=None) list of strings (each must be either 'ASC', 'DESC') that indicate the direction in which you want to order the columns specified by order_by_indices. |
dump_sql | (boolean=False) if True, prints the generated SQL statement to stdout |
fcn_indices | (list=None) list of indices of columns you'd like to apply SQL aggregate functions to ('avg', 'sum', etc.). |
fcns | (list=None) list of Strings (SQL functions like 'avg', 'sum', etc.) to apply to the columns specified by fcn_indices |
group_by_index | (int=None) index of column you'd like to group by |
Definition at line 146 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_select_by_name | ( | self, | |
col_names = None , |
|||
where_body = None , |
|||
where_cols = None , |
|||
params = [] , |
|||
order_by = None , |
|||
order_by_dirs = None , |
|||
dump_sql = False , |
|||
fcn_col_names = None , |
|||
fcns = None , |
|||
group_by = None |
|||
) |
Selects data from the database.
This method works using the CSV names of the columns you want to select.
col_names | (list=None) a list of the csv names of the columns you'd like to select. |
where_cond | (String=None) a string like ('s = ?'). This will be translated into 'where age = 14' using the where_cols and params parameters (see below). |
where_cols | (list) list of csv names of columns to put in the s placeholders in where_cond. |
params | (list=[]) see database.select(). These values will be placed in the ? placeholders in where_cond. |
order_by | (list=None) list of csv names of columns you'd like to order by |
order_by_dirs | (list=None) list of strings (each must be either 'ASC', 'DESC') that indicate the direction in which you want to order the columns specified by order_by. |
dump_sql | (boolean=False) if True, prints the generated SQL statement to stdout |
fcn_col_names | (list=None) list of csv names of columns you'd like to apply SQL aggregate functions to ('avg', 'sum', etc.). |
fcns | (list=None) list of Strings (SQL functions like 'avg', 'sum', etc.) to apply to the columns specified by fcn_col_names |
group_by | (int=None) csv name of column you'd like to group by |
Definition at line 247 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_update_by_index | ( | self, | |
col_indices, | |||
where_cond = None , |
|||
params = [] , |
|||
dump_sql = False |
|||
) |
Generates and executes a SQL update statement.
This method uses the indices of the columns in the csv file. See csv_select_by_index() for a description of the parameters.
self | |
col_indices | (list) |
where_cond | (String=None) |
params | (list) |
dump_sql | (boolean=False) |
Definition at line 280 of file csv_database.py.
def db.csv_database.CSVDatabase.csv_update_by_name | ( | self, | |
col_names, | |||
where_body = None , |
|||
where_cols = None , |
|||
params = [] , |
|||
dump_sql = False |
|||
) |
Generates and executes a SQL update statement.
This method uses the names of the columns in the csv file. See csv_select_by_name() for a description of the parameters. Sample usage: col_names = ['File_Name', 'Child_Gender'] where_body = 's > ? and s > ?' where_cols = ('FAN, 'CHN') params = (23.3, 45.5, 20.0, 10.0) Using these parameters will generate the following SQL: 'update data set col0=23.3, col3=45.5 where col10 > 20.0 and col11 > 10.0;' Assuming 'File_Name' -> col0, 'Child_Gender' -> col3, 'FAN' -> col10, 'CHN' -> col11.
self | |
where_body | (String=None) |
where_cols | (list=None) |
params | (list) |
dump_sql | (boolean=False) |
Definition at line 340 of file csv_database.py.
def db.csv_database.CSVDatabase.get_db_col_name | ( | self, | |
text_name | |||
) |
Definition at line 356 of file csv_database.py.
def db.csv_database.CSVDatabase.get_db_col_names | ( | self | ) |
Fetches a list of all of the DB column names from the DB table.
self |
Definition at line 217 of file csv_database.py.
def db.csv_database.CSVDatabase.set_blanks_to_null | ( | self | ) |
Sets all values in all columns to NULL if they are equal to the empty string.
self |
Definition at line 414 of file csv_database.py.
def db.csv_database.CSVDatabase.write_to_file | ( | self, | |
path, | |||
omit_col_indices = [] |
|||
) |
Writes out the data in this CSVDatabase contains to a csv file.
A header row (column names) will be written at the top of the file.
self | |
path | (String) path to the csv file to write to |
omit_col_indices | (list=[]) list of indices of columns that you want to omit from the exported csv file. |
Definition at line 395 of file csv_database.py.
db.csv_database.CSVDatabase.col_datatypes |
Definition at line 61 of file csv_database.py.
|
static |
Definition at line 41 of file csv_database.py.
|
static |
Definition at line 42 of file csv_database.py.
db.csv_database.CSVDatabase.header_row |
Definition at line 60 of file csv_database.py.
db.csv_database.CSVDatabase.name_lookup |
Definition at line 59 of file csv_database.py.
|
static |
Definition at line 50 of file csv_database.py.
|
static |
Definition at line 40 of file csv_database.py.