Python CSV to SQLite - python

I am "converting" a large (~1.6GB) CSV file and inserting specific fields of the CSV into a SQLite database. Essentially my code looks like:
import csv, sqlite3
conn = sqlite3.connect( "path/to/file.db" )
conn.text_factory = str #bugger 8-bit bytestrings
cur = conn.cur()
cur.execute('CREATE TABLE IF NOT EXISTS mytable (field2 VARCHAR, field4 VARCHAR)')
reader = csv.reader(open(filecsv.txt, "rb"))
for field1, field2, field3, field4, field5 in reader:
cur.execute('INSERT OR IGNORE INTO mytable (field2, field4) VALUES (?,?)', (field2, field4))
Everything works as I expect it to with the exception... IT TAKES AN INCREDIBLE AMOUNT OF TIME TO PROCESS. Am I coding it incorrectly? Is there a better way to achieve a higher performance and accomplish what I'm needing (simply convert a few fields of a CSV into SQLite table)?
**EDIT -- I tried directly importing the csv into sqlite as suggested but it turns out my file has commas in fields (e.g. "My title, comma"). That's creating errors with the import. It appears there are too many of those occurrences to manually edit the file...
any other thoughts??**

Chris is right - use transactions; divide the data into chunks and then store it.
"... Unless already in a transaction, each SQL statement has a new transaction started for it. This is very expensive, since it requires reopening, writing to, and closing the journal file for each statement. This can be avoided by wrapping sequences of SQL statements with BEGIN TRANSACTION; and END TRANSACTION; statements. This speedup is also obtained for statements which don't alter the database." - Source: http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
"... there is another trick you can use to speed up SQLite: transactions. Whenever you have to do multiple database writes, put them inside a transaction. Instead of writing to (and locking) the file each and every time a write query is issued, the write will only happen once when the transaction completes." - Source: How Scalable is SQLite?
import csv, sqlite3, time
def chunks(data, rows=10000):
""" Divides the data into 10000 rows each """
for i in xrange(0, len(data), rows):
yield data[i:i+rows]
if __name__ == "__main__":
t = time.time()
conn = sqlite3.connect( "path/to/file.db" )
conn.text_factory = str #bugger 8-bit bytestrings
cur = conn.cur()
cur.execute('CREATE TABLE IF NOT EXISTS mytable (field2 VARCHAR, field4 VARCHAR)')
csvData = csv.reader(open(filecsv.txt, "rb"))
divData = chunks(csvData) # divide into 10000 rows each
for chunk in divData:
cur.execute('BEGIN TRANSACTION')
for field1, field2, field3, field4, field5 in chunk:
cur.execute('INSERT OR IGNORE INTO mytable (field2, field4) VALUES (?,?)', (field2, field4))
cur.execute('COMMIT')
print "\n Time Taken: %.3f sec" % (time.time()-t)

It's possible to import the CSV directly:
sqlite> .separator ","
sqlite> .import filecsv.txt mytable
http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles

As it's been said (Chris and Sam), transactions do improve a lot insert performance.
Please, let me recommend another option, to use a suite of Python utilities to work with CSV, csvkit.
To install:
pip install csvkit
To solve your problem
csvsql --db sqlite:///path/to/file.db --insert --table mytable filecsv.txt

Try using transactions.
begin
insert 50,000 rows
commit
That will commit data periodically rather than once per row.

Pandas makes it easy to load big files into databases in chunks. Read the CSV file into a Pandas DataFrame and then use the Pandas SQL writer (so Pandas does all the hard work). Here's how to load the data in 100,000 row chunks.
import pandas as pd
orders = pd.read_csv('path/to/your/file.csv')
orders.to_sql('orders', conn, if_exists='append', index = False, chunksize=100000)
Modern Pandas versions are very performant. Don't reinvent the wheel. See here for more info.

Related

Upload an entire CSV into SQL Server [duplicate]

Below is my code that I'd like some help with.
I am having to run it over 1,300,000 rows meaning it takes up to 40 minutes to insert ~300,000 rows.
I figure bulk insert is the route to go to speed it up?
Or is it because I'm iterating over the rows via for data in reader: portion?
#Opens the prepped csv file
with open (os.path.join(newpath,outfile), 'r') as f:
#hooks csv reader to file
reader = csv.reader(f)
#pulls out the columns (which match the SQL table)
columns = next(reader)
#trims any extra spaces
columns = [x.strip(' ') for x in columns]
#starts SQL statement
query = 'bulk insert into SpikeData123({0}) values ({1})'
#puts column names in SQL query 'query'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print 'Query is: %s' % query
#starts curser from cnxn (which works)
cursor = cnxn.cursor()
#uploads everything by row
for data in reader:
cursor.execute(query, data)
cursor.commit()
I am dynamically picking my column headers on purpose (as I would like to create the most pythonic code possible).
SpikeData123 is the table name.
As noted in a comment to another answer, the T-SQL BULK INSERT command will only work if the file to be imported is on the same machine as the SQL Server instance or is in an SMB/CIFS network location that the SQL Server instance can read. Thus it may not be applicable in the case where the source file is on a remote client.
pyodbc 4.0.19 added a Cursor#fast_executemany feature which may be helpful in that case. fast_executemany is "off" by default, and the following test code ...
cnxn = pyodbc.connect(conn_str, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("TRUNCATE TABLE fast_executemany_test")
sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)"
params = [(f'txt{i:06d}',) for i in range(1000)]
t0 = time.time()
crsr.executemany(sql, params)
print(f'{time.time() - t0:.1f} seconds')
... took approximately 22 seconds to execute on my test machine. Simply adding crsr.fast_executemany = True ...
cnxn = pyodbc.connect(conn_str, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("TRUNCATE TABLE fast_executemany_test")
crsr.fast_executemany = True # new in pyodbc 4.0.19
sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)"
params = [(f'txt{i:06d}',) for i in range(1000)]
t0 = time.time()
crsr.executemany(sql, params)
print(f'{time.time() - t0:.1f} seconds')
... reduced the execution time to just over 1 second.
Update - May 2022: bcpandas and bcpyaz are wrappers for Microsoft's bcp utility.
Update - April 2019: As noted in the comment from #SimonLang, BULK INSERT under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).
BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...
1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07
... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...
1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07
... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...
1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07
... or a tab-delimited file (where → represents the tab character) ...
1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07
... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:
import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\\__tmp\\biTest.txt' WITH (
FIELDTERMINATOR='\\t',
ROWTERMINATOR='\\n'
);
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()
Note: As mentioned in a comment, executing a BULK INSERT statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.
yes bulk insert is right path for loading large files into a DB. At a glance I would say that the reason it takes so long is as you mentioned you are looping over each row of data from the file which effectively means are removing the benefits of using a bulk insert and making it like a normal insert. Just remember that as it's name implies that it is used to insert chucks of data.
I would remove loop and try again.
Also I'd double check your syntax for bulk insert as it doesn't look correct to me. check the sql that is generated by pyodbc as I have a feeling that it might only be executing a normal insert
Alternatively if it is still slow I would try using bulk insert directly from sql and either load the whole file into a temp table with bulk insert then insert the relevant column into the right tables. or use a mix of bulk insert and bcp to get the specific columns inserted or OPENROWSET.
This problem was frustrating me and I didn't see much improvement using fast_executemany until I found this post on SO. Specifically, Bryan Bailliache's comment regarding max varchar. I had been using SQLAlchemy and even ensuring better datatype parameters did not fix the issue for me; however, switching to pyodbc did. I also took Michael Moura's advice of using a temp table and found it shaved of even more time. I wrote a function in case anyone might find it useful. I wrote it to take either a list or list of lists for the insert. It took my insert of the same data using SQLAlchemy and Pandas to_sql from taking upwards of sometimes 40 minutes down to just under 4 seconds. I may have been misusing my former method though.
connection
def mssql_conn():
conn = pyodbc.connect(driver='{ODBC Driver 17 for SQL Server}',
server=os.environ.get('MS_SQL_SERVER'),
database='EHT',
uid=os.environ.get('MS_SQL_UN'),
pwd=os.environ.get('MS_SQL_PW'),
autocommit=True)
return conn
Insert function
def mssql_insert(table,val_lst,truncate=False,temp_table=False):
'''Use as direct connection to database to insert data, especially for
large inserts. Takes either a single list (for one row),
or list of list (for multiple rows). Can either append to table
(default) or if truncate=True, replace existing.'''
conn = mssql_conn()
cursor = conn.cursor()
cursor.fast_executemany = True
tt = False
qm = '?,'
if isinstance(val_lst[0],list):
rows = len(val_lst)
params = qm * len(val_lst[0])
else:
rows = 1
params = qm * len(val_lst)
val_lst = [val_lst]
params = params[:-1]
if truncate:
cursor.execute(f"TRUNCATE TABLE {table}")
if temp_table:
#create a temp table with same schema
start_time = time.time()
cursor.execute(f"SELECT * INTO ##{table} FROM {table} WHERE 1=0")
table = f"##{table}"
#set flag to indicate temp table was used
tt = True
else:
start_time = time.time()
#insert into either existing table or newly created temp table
stmt = f"INSERT INTO {table} VALUES ({params})"
cursor.executemany(stmt,val_lst)
if tt:
#remove temp moniker and insert from temp table
dest_table = table[2:]
cursor.execute(f"INSERT INTO {dest_table} SELECT * FROM {table}")
print('Temp table used!')
print(f'{rows} rows inserted into the {dest_table} table in {time.time() -
start_time} seconds')
else:
print('No temp table used!')
print(f'{rows} rows inserted into the {table} table in {time.time() -
start_time} seconds')
cursor.close()
conn.close()
And my console results first using a temp table and then not using one (in both cases, the table contained data at the time of execution and Truncate=True):
No temp table used!
18204 rows inserted into the CUCMDeviceScrape_WithForwards table in 10.595500707626343
seconds
Temp table used!
18204 rows inserted into the CUCMDeviceScrape_WithForwards table in 3.810380458831787
seconds
FWIW, I gave a few methods of inserting to SQL Server some testing of my own. I was actually able to get the fastest results by using SQL Server Batches and using pyodbcCursor.execute statements. I did not test the save to csv and BULK INSERT, I wonder how it compares.
Here's my blog on the testing I did:
http://jonmorisissqlblog.blogspot.com/2021/05/python-pyodbc-and-batch-inserts-to-sql.html
adding to Gord Thompson's answer:
# add the below line for controlling batch size of insert
cursor.fast_executemany_rows = batch_size # by default it is 1000

basic pyodbc bulk insert

In a python script, I need to run a query on one datasource and insert each row from that query into a table on a different datasource. I'd normally do this with a single insert/select statement with a tsql linked server join but I don't have a linked server connection to this particular datasource.
I'm having trouble finding a simple pyodbc example of this. Here's how I'd do it but I'm guessing executing an insert statement inside a loop is pretty slow.
result = ds1Cursor.execute(selectSql)
for row in result:
insertSql = "insert into TableName (Col1, Col2, Col3) values (?, ?, ?)"
ds2Cursor.execute(insertSql, row[0], row[1], row[2])
ds2Cursor.commit()
Is there a better bulk way to insert records with pyodbc? Or is this a relatively efficient way to do this anyways. I'm using SqlServer 2012, and the latest pyodbc and python versions.
The best way to handle this is to use the pyodbc function executemany.
ds1Cursor.execute(selectSql)
result = ds1Cursor.fetchall()
ds2Cursor.executemany('INSERT INTO [TableName] (Col1, Col2, Col3) VALUES (?, ?, ?)', result)
ds2Cursor.commit()
Here's a function that can do the bulk insert into SQL Server database.
import pyodbc
import contextlib
def bulk_insert(table_name, file_path):
string = "BULK INSERT {} FROM '{}' (WITH FORMAT = 'CSV');"
with contextlib.closing(pyodbc.connect("MYCONN")) as conn:
with contextlib.closing(conn.cursor()) as cursor:
cursor.execute(string.format(table_name, file_path))
conn.commit()
This definitely works.
UPDATE: I've noticed at the comments, as well as coding regularly, that pyodbc is better supported than pypyodbc.
NEW UPDATE: remove conn.close() since the with statement handles that automatically.
Since the discontinuation of the pymssql library (which seems to be under development again) we started using the cTDS library developed by the smart people at Zillow and for our surprise it supports the FreeTDS Bulk Insert.
As the name suggests cTDS is written in C on top of FreeTDS library, which makes it fast, really fast. IMHO this is the best way to bulk insert into SQL Server since the ODBC driver does not support bulk insert and executemany or fast_executemany as suggested aren't really bulk insert operations. The BCP tool and T-SQL Bulk Insert has it limitations since it needs the file to be accessible by the SQL Server which can be a deal breaker in many scenarios.
Bellow a naive implementation of Bulk Inserting a CSV file. Please, forgive me for any bug, I wrote this from mind without testing.
I don't know why but for my server which uses Latin1_General_CI_AS I needed to wrap the data which goes into NVarChar columns with ctds.SqlVarChar. I opened an issue about this but developers said the naming is correct, so I changed my code to keep me mentally health.
import csv
import ctds
def _to_varchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.NVARCHAR.
"""
if txt == "null":
return None
return ctds.SqlNVarChar(txt)
def _to_nvarchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.VARCHAR.
"""
if txt == "null":
return None
return ctds.SqlVarChar(txt.encode("utf-16le"))
def read(file):
"""
Open CSV File.
Each line is a column:value dict.
https://docs.python.org/3/library/csv.html?highlight=csv#csv.DictReader
"""
with open(file, newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield row
def transform(row):
"""
Do transformations to data before loading.
Data specified for bulk insertion into text columns (e.g. VARCHAR,
NVARCHAR, TEXT) is not encoded on the client in any way by FreeTDS.
Because of this behavior it is possible to insert textual data with
an invalid encoding and cause the column data to become corrupted.
To prevent this, it is recommended the caller explicitly wrap the
the object with either ctds.SqlVarChar (for CHAR, VARCHAR or TEXT
columns) or ctds.SqlNVarChar (for NCHAR, NVARCHAR or NTEXT columns).
For non-Unicode columns, the value should be first encoded to
column’s encoding (e.g. latin-1). By default ctds.SqlVarChar will
encode str objects to utf-8, which is likely incorrect for most SQL
Server configurations.
https://zillow.github.io/ctds/bulk_insert.html#text-columns
"""
row["col1"] = _to_datetime(row["col1"])
row["col2"] = _to_int(row["col2"])
row["col3"] = _to_nvarchar(row["col3"])
row["col4"] = _to_varchar(row["col4"])
return row
def load(rows):
stime = time.time()
with ctds.connect(**DBCONFIG) as conn:
with conn.cursor() as curs:
curs.execute("TRUNCATE TABLE MYSCHEMA.MYTABLE")
loaded_lines = conn.bulk_insert("MYSCHEMA.MYTABLE", map(transform, rows))
etime = time.time()
print(loaded_lines, " rows loaded in ", etime - stime)
if __name__ == "__main__":
load(read('data.csv'))
You should use executemany with the cursor.fast_executemany = True, to improve the performance.
pyodbc's default behaviour is to run many inserts, but this is inefficient. By applying fast_executemany, you can drastically improve performance.
Here is an example:
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server}',host='host', database='db', user='usr', password='foo')
cursor = connection.cursor()
# I'm the important line
cursor.fast_executemany = True
sql = "insert into TableName (Col1, Col2, Col3) values (?, ?, ?)"
tuples=[('foo','bar', 'ham'), ('hoo','far', 'bam')]
cursor.executemany(sql, tuples)
cursor.commit()
cursor.close()
connection.close()
Docs.
Note that this has been available since 4.0.19 Oct 23, 2017
Helpful function for generating the SQL required for using execute_many():
def generate_bulk_insert_sql(self, data:pd.DataFrame, table_name) -> str:
table_sql = str([c for c in data.columns]).replace("'","").replace("[", "").replace("]", "")
return f'INSERT INTO {table_name} ({table_sql}) VALUES ({("?,"*len(data.columns))[:-1]})

How to reference Python object and place it into MySQL?

What I'm trying to achieve is placing data generated with a python script into a MySQL database. So far I have been able to generate the data with Python, and print it, but I'm not too sure how I can place that into the mysql table. I've read that you can use MySQLdb as a connector between the two, and currently I have been able to place data into the table, but only data that I manually type.
Hopefully the code makes sense, but I am trying to place the values from BidSize, AskSize, BidPrice, and AskPrice into the table.
from wrapper_v3 import IBWrapper, IBclient
from swigibpy import Contract as IBcontract
if __name__=="__main__":
"""
This simple example returns streaming price data
"""
callback = IBWrapper()
client=IBclient(callback)
ibcontract = IBcontract()
ibcontract.secType = "FUT"
ibcontract.expiry="201612"
ibcontract.symbol="GE"
ibcontract.exchange="GLOBEX"
ans=client.get_IB_market_data(ibcontract)
print "Bid size, Ask size; Bid price; Ask price"
print ans
import MySQLdb as mdb
con = mdb.connect('localhost', 'testuser', 'test623', 'testdb');
with con:
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS Histdata")
cur.execute("CREATE TABLE Histdata(Id INT PRIMARY KEY AUTO_INCREMENT, \
Name VARCHAR(25), BidSize VARCHAR(20), AskSize VARCHAR(20), BidPrice VARCHAR(20), AskPrice VARCHAR(20))")
cur.execute("INSERT INTO Histdata(Name,BidSize,AskSize,BidPrice,AskPrice) VALUES('test','test','test','test','test'))")
Parameterize your query like this:
# assuming ans is a list
bid_size, ask_size, bid_price, ask_price = ans
cur.execute('''
INSERT INTO Histdata
(Name,BidSize,AskSize,BidPrice,AskPrice)
VALUES(%s,%s,%s,%s,%s)''', ('test', bid_size, ask_size, bid_price, ask_price))
As Fabricator correctly pointed out, please parametrize your queries to prevent SQL injections.
Yet, the reason why you are not seeing the data written to your table is because you did not commit the transaction. Your cursor is adding queries to the transaction, but it is only after you instruct a commit that the data is actually written to the database.
Add con.commit() after adding your cursor queries to finalize the transaction.
In older versions of the MySQLdb adapter for Python data would be committed automatically. But this is considered "bad practice" as the programmer's control over transactions should be honored. This flow has also been laid out in Python's DB API, which all database adapters are asked to honor.
As a side note: Don't forget to close both your cursor and your connection when you are done:
cursor.close()
conn.close()
Otherwise your connection will stay open in an idle state and you could potentially deplete your pool of connection slots.

oursql extremely slow in inserting data

I am trying to store some data generated by a python script in a MySQL database. Essentially I am using the commands:
con = oursql.connect(user="user", host="host", passwd="passwd",
db="testdb")
c = con.cursor()
c.executemany(insertsimoutput, zippedsimoutput)
con.commit()
c.close()
where,
insertsimoutput = '''insert into simoutput
(repnum,
timepd,
...) values (?, ?, ...?)'''
About 30,000 rows are inserted and there are about 15 columns. The above takes about 7 minutes. If I use MySQLdb instead of oursql, it takes about 2 seconds. Why this huge difference? Is this supposed to be done some other way in oursql, our oursql is just plain slow? If there is a better way to insert this data with oursql, I would appreciate if you can let me know.
Thank you.
The difference is that MySQLdb does some hackery to your query while oursql does not...
Taking this:
cursor.executemany("INSERT INTO sometable VALUES (%s, %s, %s)",
[[1,2,3],[4,5,6],[7,8,9]])
MySQLdb translates it before running into this:
cursor.execute("INSERT INTO sometable VALUES (1,2,3),(4,5,6),(7,8,9)")
But if you do:
cursor.executemany("INSERT INTO sometable VALUES (?, ?, ?)",
[[1,2,3],[4,5,6],[7,8,9]])
In oursql, it gets translated into something like this pseudocode:
stmt = prepare("INSERT INTO sometable VALUES (?, ?, ?)")
for params in [[1,2,3],[4,5,6],[7,8,9]]:
stmt.execute(*params)
So if you want to emulate what mysqldb is doing but benefit from prepared statements and other goodness with oursql, you need to do this:
from itertools import chain
data = [[1,2,3],[4,5,6],[7,8,9]]
one_val = "({})".format(','.join("?" for i in data[0]))
vals_clause = ','.join(one_val for i in data)
cursor.execute("INSERT INTO sometable VALUES {}".format(vals_clause),
chain.from_iterable(data))
I bet oursql will be faster when you do this :-)
Also, if you think its ugly, you are right. But just remember MySQL db is doing something uglier internally - its using regular expressions to parse your INSERT statement and break off the parameterized part and THEN doing what I suggested you do for oursql.
I would say to check if oursql supports a bulk insert sql command to boost performance.
Oursql does support bulk insert statements. I've written code to do so, using the sqlalchemy wrapper.
For pure oursql, something like this should be fine:
with open('tmp.csv', 'wb') as tmp:
for item in zippedsimoutput:
tmp.write("{0}\n".format(item))
c.execute("""LOAD DATA LOCAL INFILE 'tmp.csv' INTO TABLE flags FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' ;""")
Note that the rows must be in the same order as the columns on the database.

Python copy MySQL table to SQLite3

I've got a MySQL table with about ~10m rows. I created a parallel schema in SQLite3, and I'd like to copy the table somehow. Using Python seems like an acceptable solution, but this way --
# ...
mysqlcursor.execute('SELECT * FROM tbl')
rows = mysqlcursor.fetchall() # or mysqlcursor.fetchone()
for row in rows:
# ... insert row via sqlite3 cursor
...is incredibly slow (hangs at .execute(), I wouldn't know for how long).
I'd only have to do this once, so I don't mind if it takes a couple of hours, but is there a different way to do this? Using a different tool rather than Python is also acceptable.
The simplest way might be to use mysqldump to get a SQL file of the whole db, then use the SQLite command-line tool to execute the file.
You don't show exactly how you insert rows, but you mention execute().
You might try executemany()* instead.
For example:
import sqlite3
conn = sqlite3.connect('mydb')
c = conn.cursor()
# one '?' placeholder for each column you're inserting
# "rows" needs to be a sequence of values, e.g. ((1,'a'), (2,'b'), (3,'c'))
c.executemany("INSERT INTO tbl VALUES (?,?);", rows)
conn.commit()
*executemany() as described in the Python DB-API:
.executemany(operation,seq_of_parameters)
Prepare a database operation (query or
command) and then execute it against
all parameter sequences or mappings
found in the sequence
seq_of_parameters.
You can export a flat file from mysql using select into outfile and import those with sqlite's .import:
mysql> SELECT * INTO OUTFILE '/tmp/export.txt' FROM sometable;
sqlite> .separator "\t"
sqlite> .import /tmp/export.txt sometable
This handles the data export/import but not copying the schema, of course.
If you really want to do this with python (maybe to transform the data), I would use a MySQLdb.cursors.SSCursor to iterate over the data - otherwise the mysql resultset gets cached in memory which is why your query is hanging on execute. So that would look something like:
import MySQLdb
import MySQLdb.cursors
connection = MySQLdb.connect(...)
cursor = connection.cursor(MySQLdb.cursors.SSCursor)
cursor.execute('SELECT * FROM tbl')
for row in cursor:
# do something with row and add to sqlite database
That will be much slower than the export/import approach.

Categories

Resources