Baby Language Lab Scripts
A collection of data processing tools.
 All Classes Namespaces Files Functions Variables Pages
db.csv_database.CSVDatabase Class Reference

This class provides a way to convert a csv file into an in-memory database. More...

Inheritance diagram for db.csv_database.CSVDatabase:

Public Member Functions

def __init__
 Constructor. More...
 
def csv_insert
 Inserts a row of data straight from a csv file. More...
 
def csv_select_by_index
 Selects data from the database. More...
 
def get_db_col_names
 Fetches a list of all of the DB column names from the DB table. More...
 
def csv_select_by_name
 Selects data from the database. More...
 
def csv_update_by_index
 Generates and executes a SQL update statement. More...
 
def csv_delete_by_index
 Generates and executes a SQL delete statement. More...
 
def csv_delete_by_name
 Generates and executes a SQL delete statement. More...
 
def csv_update_by_name
 Generates and executes a SQL update statement. More...
 
def get_db_col_name
 
def add_column
 Adds a named column to the this database's single table. More...
 
def write_to_file
 Writes out the data in this CSVDatabase contains to a csv file. More...
 
def set_blanks_to_null
 Sets all values in all columns to NULL if they are equal to the empty string. More...
 
- Public Member Functions inherited from db.mem_database.MemDatabase
def __init__
 Constructor. More...
 
def dump_to_file
 Writes this in-memory database to a *.db file on disk. More...
 
- Public Member Functions inherited from db.database.Database
def __init__
 Constructor. More...
 
def close
 Closes the database connection, releasing any resources that it was using. More...
 
def execute_stmt
 Executes a verbatim SQL statement. More...
 
def execute_script
 Reads and executes a sql script file. More...
 
def insert
 Inserts one or more rows into a table in the database. More...
 
def select
 Selects rows from a table in the database. More...
 
def delete
 Deletes rows from a table in the database. More...
 
def update_timestamp_col
 Updates the value of a timestamp column to the current date/time. More...
 
def update
 Updates rows in a table. More...
 
def write_to_file
 Writes out all of the data in a table to a csv file. More...
 

Static Public Member Functions

def create_with_rows
 Convenience method to create a database from a list of rows that have already been read from a csv file. More...
 
def create_with_files
 Convenience method to create a database from a bunch of files that all have the same columns (and column order). More...
 

Public Attributes

 name_lookup
 
 header_row
 
 col_datatypes
 

Static Public Attributes

string TABLE_NAME = 'data'
 
string COL_PREFIX = 'col'
 
dictionary DATA_TYPES
 
tuple ORDER_DIRS = Enum(['ASC', 'DESC'], ['ASC', 'DESC'])
 

Private Member Functions

def _create_data_table
 Creates the single table used by this in-memory database. More...
 
def _get_col_indices
 Gets a list of the indices of columns you specify by name. More...
 
def _build_where_clause
 Translates CSV column names to DB column names to build a SQL where clause like 'x = y'. More...
 
def _remove_cols
 Takes a row (list) of data and removes the specified columns from the list. More...
 

Detailed Description

This class provides a way to convert a csv file into an in-memory database.

This allows you to do SQL-operations on the data, which in some circumstances can be much more convenient than implementing search (query), sort, or data modification methods in straight Python. The cost for doing this is a little extra time inserting all the data - however, this cost isn't usually too bad because in-memory databases are up to an order of magnitude faster than file-based ones.

Sample usage:

1 #create a list of data types corresponding to the format of (a row in) your csv file
2 data_types = [str, int, float, bool, str, str, str]
3 
4 #this will create a single in-memory table with the contents of multiple csv files (files must have the same columns)
5 db = CSVDatabase.create_with_files(['myFile1.csv', 'myFile2.csv'], data_types)
6 
7 #select Age, Gender from table where Age > 12
8 rows = db.csv_select_by_name(col_names=['Age', 'Gender'],
9  where_body='%s > ?',
10  where_cols=['Age'], params=[12])
11 
12 #you can append new named columns to the db that do not exist in the csv file
13 db.add_column('Extra column', bool)
14 
15 #See also csv_insert(), csv_delete_by_name(), csv_update_by_name()
16 
17 #You can write out the database to a csv file like this:
18 db.write_to_file('path/to/database.db', omit_col_indices=[0]) #this will omit the first column from the exported spreadsheet

Definition at line 39 of file csv_database.py.

Constructor & Destructor Documentation

def db.csv_database.CSVDatabase.__init__ (   self,
  header_row,
  col_datatypes 
)

Constructor.

Parameters
self
header_row(list of Strings) List of names of the columns in the csv file (generally this is just the top row of the csv file
col_datatypes(list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file.

Definition at line 56 of file csv_database.py.

Member Function Documentation

def db.csv_database.CSVDatabase._build_where_clause (   self,
  where_body,
  where_cols 
)
private

Translates CSV column names to DB column names to build a SQL where clause like 'x = y'.

This will presumably be attached to a SQL statement to form '... where x = y'.

Parameters
self
where_body(String) body of the where clause, containing placeholders (s) for the column names (Eg. 's = ?')
where_cols(list) list of csv names of columns. These will be translated to DB column names and placed in the where_body (these go in the s placeholders).
Returns
(String) the clause. This contains db column names.

Definition at line 204 of file csv_database.py.

def db.csv_database.CSVDatabase._create_data_table (   self)
private

Creates the single table used by this in-memory database.

Parameters
self

Definition at line 105 of file csv_database.py.

def db.csv_database.CSVDatabase._get_col_indices (   self,
  col_names 
)
private

Gets a list of the indices of columns you specify by name.

Parameters
self
col_names(list) list of column names (Strings)
Returns
(list) list of the indices that correspond to the column names you specified

Definition at line 190 of file csv_database.py.

def db.csv_database.CSVDatabase._remove_cols (   self,
  row,
  col_indices 
)
private

Takes a row (list) of data and removes the specified columns from the list.

This is a convenience method and does not affect the data in the DB in any way.

Parameters
self
row(list) a row of values from the database or csv file
col_indices(list) list of the column indices that you want to remove from the row
Returns
(list) a new row that does not contain any of the columns specified using col_indices.

Definition at line 382 of file csv_database.py.

def db.csv_database.CSVDatabase.add_column (   self,
  name,
  datatype 
)

Adds a named column to the this database's single table.

The values in the newly added column are all set to NULL.

Parameters
self
name(string) name of the column to add.
datatype(fcn) a data-type function (must be in the keys of the static CSVDatabase.DATA_TYPES dictionary) corresponding to the type of data that this column will hold

Definition at line 364 of file csv_database.py.

def db.csv_database.CSVDatabase.create_with_files (   file_list,
  col_datatypes 
)
static

Convenience method to create a database from a bunch of files that all have the same columns (and column order).

Note: This method will complain if it finds that any of the files you pass it have different column names.

Parameters
file_list(List) list of paths (Strings) to the files whose data you want to put into the new database
col_datatypes(list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file.
Returns
(CSVDatabase) a CSVDatabase object containing the rows from all of the files. This won't have a primary key, so if the data is large you may want to add some indices before you start querying.

Definition at line 82 of file csv_database.py.

def db.csv_database.CSVDatabase.create_with_rows (   rows,
  col_datatypes 
)
static

Convenience method to create a database from a list of rows that have already been read from a csv file.

Parameters
rows(list) List of lists, where inner lists are rows of csv data. First row must be column names.
col_datatypes(list) List of data type functions (these must be keys from the static CSVDatabase.DATA_TYPES dictionary) corresponding to the columns in your csv file.

Definition at line 68 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_delete_by_index (   self,
  where_cond = None,
  params = [] 
)

Generates and executes a SQL delete statement.

This method uses the indices of the columns in the csv file. See csv_select_by_index() for a description of the parameters.

Parameters
self
where_cond(String=None)
params(list)
Returns
(int) number of rows deleted

Definition at line 300 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_delete_by_name (   self,
  where_body = None,
  where_cols = None,
  params = [] 
)

Generates and executes a SQL delete statement.

This method uses the names of the columns in the csv file. See csv_select_by_name() for a description of the parameters.

Parameters
self
where_body(String=None)
where_cols(list=None)
params(list)
Returns
(int) number of rows deleted

Definition at line 318 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_insert (   self,
  csv_row 
)

Inserts a row of data straight from a csv file.

The data must have the types given in the col_datatypes param that was passed to the constructor or this will cause an exception.

Parameters
self
csv_row(list) list of data from a csv file (corresponding to a single row)
Returns
(list) list of all the primary key values of the inserted rows

Definition at line 118 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_select_by_index (   self,
  col_indices = None,
  where_cond = None,
  params = [],
  order_by_indices = None,
  order_by_dirs = None,
  dump_sql = False,
  fcn_indices = None,
  fcns = None,
  group_by_index = None 
)

Selects data from the database.

This method works using the indices of the columns in the CSV file that you'd like to select.

Parameters
col_indices(list=None) a list of the indices of the columns you'd like to select.
where_cond(String=None) see database.select()
params(list=[]) see database.select()
order_by_indices(list=None) list of indices of columns you'd like to order by
order_by_dirs(list=None) list of strings (each must be either 'ASC', 'DESC') that indicate the direction in which you want to order the columns specified by order_by_indices.
dump_sql(boolean=False) if True, prints the generated SQL statement to stdout
fcn_indices(list=None) list of indices of columns you'd like to apply SQL aggregate functions to ('avg', 'sum', etc.).
fcns(list=None) list of Strings (SQL functions like 'avg', 'sum', etc.) to apply to the columns specified by fcn_indices
group_by_index(int=None) index of column you'd like to group by
Returns
(list) list of lists, one sub-list per row. Each sublist contains data in the order specified by col_indices.

Definition at line 146 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_select_by_name (   self,
  col_names = None,
  where_body = None,
  where_cols = None,
  params = [],
  order_by = None,
  order_by_dirs = None,
  dump_sql = False,
  fcn_col_names = None,
  fcns = None,
  group_by = None 
)

Selects data from the database.

This method works using the CSV names of the columns you want to select.

Parameters
col_names(list=None) a list of the csv names of the columns you'd like to select.
where_cond(String=None) a string like ('s = ?'). This will be translated into 'where age = 14' using the where_cols and params parameters (see below).
where_cols(list) list of csv names of columns to put in the s placeholders in where_cond.
params(list=[]) see database.select(). These values will be placed in the ? placeholders in where_cond.
order_by(list=None) list of csv names of columns you'd like to order by
order_by_dirs(list=None) list of strings (each must be either 'ASC', 'DESC') that indicate the direction in which you want to order the columns specified by order_by.
dump_sql(boolean=False) if True, prints the generated SQL statement to stdout
fcn_col_names(list=None) list of csv names of columns you'd like to apply SQL aggregate functions to ('avg', 'sum', etc.).
fcns(list=None) list of Strings (SQL functions like 'avg', 'sum', etc.) to apply to the columns specified by fcn_col_names
group_by(int=None) csv name of column you'd like to group by
Returns
(list) list of lists, one sub-list per row. Each sublist contains data in the order specified by col_names.

Definition at line 247 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_update_by_index (   self,
  col_indices,
  where_cond = None,
  params = [],
  dump_sql = False 
)

Generates and executes a SQL update statement.

This method uses the indices of the columns in the csv file. See csv_select_by_index() for a description of the parameters.

Parameters
self
col_indices(list)
where_cond(String=None)
params(list)
dump_sql(boolean=False)
Returns
(int) number of rows updated

Definition at line 280 of file csv_database.py.

def db.csv_database.CSVDatabase.csv_update_by_name (   self,
  col_names,
  where_body = None,
  where_cols = None,
  params = [],
  dump_sql = False 
)

Generates and executes a SQL update statement.

This method uses the names of the columns in the csv file. See csv_select_by_name() for a description of the parameters. Sample usage: col_names = ['File_Name', 'Child_Gender'] where_body = 's > ? and s > ?' where_cols = ('FAN, 'CHN') params = (23.3, 45.5, 20.0, 10.0) Using these parameters will generate the following SQL: 'update data set col0=23.3, col3=45.5 where col10 > 20.0 and col11 > 10.0;' Assuming 'File_Name' -> col0, 'Child_Gender' -> col3, 'FAN' -> col10, 'CHN' -> col11.

Parameters
self
where_body(String=None)
where_cols(list=None)
params(list)
dump_sql(boolean=False)

Definition at line 340 of file csv_database.py.

def db.csv_database.CSVDatabase.get_db_col_name (   self,
  text_name 
)

Definition at line 356 of file csv_database.py.

def db.csv_database.CSVDatabase.get_db_col_names (   self)

Fetches a list of all of the DB column names from the DB table.

Parameters
self
Returns
(list)

Definition at line 217 of file csv_database.py.

def db.csv_database.CSVDatabase.set_blanks_to_null (   self)

Sets all values in all columns to NULL if they are equal to the empty string.

Parameters
self

Definition at line 414 of file csv_database.py.

def db.csv_database.CSVDatabase.write_to_file (   self,
  path,
  omit_col_indices = [] 
)

Writes out the data in this CSVDatabase contains to a csv file.

A header row (column names) will be written at the top of the file.

Parameters
self
path(String) path to the csv file to write to
omit_col_indices(list=[]) list of indices of columns that you want to omit from the exported csv file.

Definition at line 395 of file csv_database.py.

Member Data Documentation

db.csv_database.CSVDatabase.col_datatypes

Definition at line 61 of file csv_database.py.

string db.csv_database.CSVDatabase.COL_PREFIX = 'col'
static

Definition at line 41 of file csv_database.py.

dictionary db.csv_database.CSVDatabase.DATA_TYPES
static
Initial value:
1 = {
2  str: 'TEXT',
3  float: 'REAL',
4  int: 'INTEGER',
5  bool: 'INTEGER', #sqlite does not have a separate boolean type. It's no trouble to convert these to integers for DB operations.
6  datetime: 'TEXT', #sqlite doesn't have a datetime type, but provides functions that allows us to interpret strings as dates.
7  #These should be entered in the form 'yyyy-mm-dd hh:mm:ss.sss'
8  }

Definition at line 42 of file csv_database.py.

db.csv_database.CSVDatabase.header_row

Definition at line 60 of file csv_database.py.

db.csv_database.CSVDatabase.name_lookup

Definition at line 59 of file csv_database.py.

tuple db.csv_database.CSVDatabase.ORDER_DIRS = Enum(['ASC', 'DESC'], ['ASC', 'DESC'])
static

Definition at line 50 of file csv_database.py.

string db.csv_database.CSVDatabase.TABLE_NAME = 'data'
static

Definition at line 40 of file csv_database.py.


The documentation for this class was generated from the following file: