Insert multiple tab-delimited text files into MySQL with Python? - python

I am trying to create a program that takes a number of tab delaminated text files, and works through them one at a time entering the data they hold into a MySQL database. There are several text files, like movies.txt which looks like this:
1 Avatar
3 Iron Man
3 Star Trek
and actors.txt that looks the same etc. Each text file has upwards of one hundred entries each with an id and corresponding value as seen above. I have found a number of code examples on this site and others but I can't quite get my head around how to implement them in this situation.
So far my code looks something like this ...
import MySQLdb
database_connection = MySQLdb.connect(host='localhost', user='root', passwd='')
cursor = database_connection.cursor()
cursor.execute('CREATE DATABASE library')
cursor.execute('USE library')
cursor.execute('''CREATE TABLE popularity (
PersonNumber INT,
Category VARCHAR(25),
Value VARCHAR(60),
)
''')
def data_entry(categories):
Everytime i try to get the other code I have found working with this I just get lost completely. Hopeing someone can help me out by either showing me what I need to do or pointing me in the direction of some more information.
Examples of the code I have been trying to adapt to my situation are:
import MySQLdb, csv, sys
conn = MySQLdb.connect (host = "localhost",user = "usr", passwd = "pass",db = "databasename")
c = conn.cursor()
csv_data=csv.reader(file("a.txt"))
for row in csv_data:
print row
c.execute("INSERT INTO a (first, last) VALUES (%s, %s), row")
c.commit()
c.close()
and:
Python File Read + Write

MySQL can read TSV files directly using the mysqlimport utility or by executing the LOAD DATA INFILE SQL command. This will be faster than processing the file in python and inserting it, but you may want to learn how to do both. Good luck!

Related

SQLITE multiple files

Sorry about this unprofessional question but I'm kinda new to sqlite but I was wondering if there's any way I can open two files in same python command db = sqlite3.connect('./cogs/database/users.sqlite')
when I open this in my command it doesn't allow me to do same thing in the same command to open another file so for example
open db = sqlite3.connect('./cogs/database/users.sqlite') and read something from it if so
open db = sqlite3.connect('./cogs/database/anotherfile.sqlite') and insert to it
but it always accepts first file only and ignore second file
Assign db1 so it connects to users.sqlite,
and db2 so it connects to anotherfile.sqlite.
Then you can e.g. SELECT from one
and INSERT into the other,
with a temp var bridging the two.
Sqlite databases are single file based, so no - sqlite3.connect builds a connection object to a single data base file.
Even if you build two connection objects, you can't execute queries across them.
If you really need the data from two files at a time you need to merge that data into one data - or don't use sqlite.
You can execute queries across two SQLite files, but you will need to execute an ATTACH command on the first connection cursor.
conn = sqlite3.connect("users.sqlite")
cur = conn.cursor()
cmd = "ATTACH DATABASE 'anotherfile.sqlite' AS otra"
try:
cur.execute(cmd)
query = """
SELECT
t1.Id, t1.Name, t2.Address
FROM personnel t1
LEFT JOIN otra.location t2
ON t2.PersonId = t1.Id
WHERE t1.Status = 'current'
ORDER BY t1.Name;
"""
cur.execute(query)
rows = cur.fetchall()
except sqlite3.Error as err:
do_something_with(err)

Unable to import CSV data into MySQL database with Python

Next piece of Python source code I use to import data from CSV file into MySql.
The installed versions in use are: MySql 5.7 and Python2.7
Furthermore, the table has 3 columns. One ID column (auto increment) and two text columns: Firstname and lastname.
The matter seems to go wrong around the SQL statement. In particular the "%s" bit.
Literally, I have done everything with punctuates like: ",' and `. E.g 's', "s", ("s") etc.
Also I have searched and used various source code snippets. All of it failed. Even I doubted the quality of my CSV file. However, importing that via e.g. MySql workbench works fine.
Ploughing my way through I managed to get data in MySql when I add for instance some plain example values in the source code. So technically everything seems to be fine. It is just parsing which fails...
Please help. Most probably I am overlooking something simple but it drives me utterly crazy...
import mysql.connector as mysql
import csv
db = mysql.connect(
host = "xxxx",
user = "xxxx",
passwd = "xxxx",
database = "abc"
)
cursor = db.cursor()
ifile = open('/tmp/import.csv', "rb")
read = csv.reader(ifile)
for row in ifile:
print row
sql = "INSERT INTO Names(Firstname, Lastname) VALUES(%s, %s)"
cursor.execute(sql, row)
db.commit()
print(cursor.rowcount, "record inserted")
What I would expect to happen is dat the data in the CSV is parsed into MySQL.
The error message I receive is:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s, %s, %s)' at line 1
Thanks for any hint, clue and/or solution!!
Assuming you are reading the csv file correctly and you have same parameters required within the query try replacing this line
cursor.execute(sql, row)
to
cursor.execute(sql, tuple(row))
Thanks for the answer. And apologies for this late reply.
The issue has become resolved in the mean time.
It has to do with my CSV file. What exactly that is, I do not know yet. However, I can continue with importing.
Thanks again and have a nice day!
Best regards,
Detlev

Python 3.6 connection to MS SQL Server for large dataframe

I am a new Python coder and also a new data scientist so please forgive any foolish sounding things here. I'll keep the details out unless anyone's curious but basically I need to connect to Microsoft SQL Server and upload a Pandas DF that is relatively large (~500k rows) and I need to do this almost every day as the project currently stands.
It doesn't have to be a Pandas DF - I've read about using odo for csv files but I haven't been able to get anything to work. The issue I'm having is that I can't bulk insert the DF because the file isn't on the same machine as the SQL Server instance. I'm consistently getting errors like the following:
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC SQL
Server Driver][SQL Server]Incorrect syntax near the keyword 'IF'.
(156) (SQLExecDirectW)")
As I've attempted different SQL statements you can replace IF with whatever has been the first COL_NAME in the CREATE statement. I'm using SQLAlchemy to create the engine and connect to the database. This may go without saying but the pd.to_sql() method is just way too slow for how much data I'm moving so that's why I need something faster.
I'm using Python 3.6 by the way. I've put down here most of the things that I've tried that haven't been successful.
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('test_col'))
address = 'mssql+pyodbc://uid:pw#server/path/database?driver=SQL Server'
engine = create_engine(address)
connection = engine.raw_connection()
cursor = connection.cursor()
# Attempt 1 <- This failed to even create a table at the cursor_execute statement so my issues could be way in the beginning here but I know that I have a connection to the SQL Server because I can use pd.to_sql() to create tables successfully (just incredibly slowly for my tables of interest)
create_statement = """
DROP TABLE test_table
CREATE TABLE test_table (test_col)
"""
cursor.execute(create_statement)
test_insert = '''
INSERT INTO test_table
(test_col)
values ('abs');
'''
cursor.execute(test_insert)
Attempt 2 <- From iabdb WordPress blog I came across
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
records = [str(tuple(x)) for x in take_rates.values]
insert_ = """
INSERT INTO test_table
("A")
VALUES
"""
for batch in chunker(records, 2): # This would be set to 1000 in practice I hope
print(batch)
rows = str(batch).strip('[]')
print(rows)
insert_rows = insert_ + rows
print(insert_rows)
cursor.execute(insert_rows)
#conn.commit() # don't know when I would need to commit
conn.close()
# Attempt 3 # From a related Stack Exchange Post
create the table but first drop if it already exists
command = """DROP TABLE IF EXISTS test_table
CREATE TABLE test_table # these columns are from my real dataset
"Serial Number" serial primary key,
"Dealer Code" text,
"FSHIP_DT" timestamp without time zone,
;"""
cursor.execute(command)
connection.commit()
# stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
output = io.StringIO()
# ignore the index
take_rates.to_csv(output, sep='~', header=False, index=False)
# jump to start of stream
output.seek(0)
contents = output.getvalue()
cur = connection.cursor()
# null values become ''
cur.copy_from(output, 'Config_Take_Rates_TEST', null="")
connection.commit()
cur.close()
It seems to me that MS SQL Server is just not a nice Database to play around with...
I want to apologize for the rough formatting - I've been at this script for weeks now but just finally decided to try to organize something for StackOverflow. Thank you very much for any help anyone can offer!
If you only need to replace the existing table, truncate it and use bcp utility to upload the table. It's much faster.
from subprocess import call
command = "TRUNCATE TABLE test_table"
take_rates.to_csv('take_rates.csv', sep='\t', index=False)
call('bcp {t} in {f} -S {s} -U {u} -P {p} -d {db} -c -t "{sep}" -r "{nl}" -e {e}'.format(t='test_table', f='take_rates.csv', s=server, u=user, p=password, db=database, sep='\t', nl='\n')
You will need to install bcp utility (yum install mssql-tools on CentOS/RedHat).
'DROP TABLE IF EXISTS test_table' just looks like invalid tsql syntax.
you can do something like this:
if (object_id('test_table') is not null)
DROP TABLE test_table

How can I select part of sqlite database using python

I have a very big database and I want to send part of that database (1/1000) to someone I am collaborating with to perform test runs. How can I (a) select 1/1000 of the total rows (or something similar) and (b) save the selection as a new .db file.
This is my current code, but I am stuck.
import sqlite3
import json
from pprint import pprint
conn = sqlite3.connect('C:/data/responses.db')
c = conn.cursor()
c.execute("SELECT * FROM responses;")
Create a another database with similar table structure as the original db. Sample records from original database and insert into new data base
import sqlite3
conn = sqlite3.connect("responses.db")
sample_conn = sqlite3.connect("responses_sample.db")
c = conn.cursor()
c_sample = sample_conn.cursor()
rows = c.execute("select no, nm from responses")
sample_rows = [r for i, r in enumerate(rows) if i%10 == 0] # select 1/1000 rows
# create sample table with similar structure
c_sample.execute("create table responses(no int, nm varchar(100))")
for r in sample_rows:
c_sample.execute("insert into responses (no, nm) values ({}, '{}')".format(*r))
c_sample.close()
sample_conn.commit()
sample_conn.close()
Simplest way to do this would be:
Copy the database file in your filesystem same as you would any other file (e.g. ctrl+c then ctrl+v in windows to make responses-partial.db or something)
Then open this new copy in an sqlite editor such as http://sqlitebrowser.org/ run the delete query to remove however many rows you want to. Then you might want to run compact database from file menu.
Close sqlite editor and confirm file size is smaller
Email the copy
Unless you need to create a repeatable system I wouldn't bother with doing this in python. But you could perform similar steps in python (copy the file, open it it run delete query, etc) if you need to.
The easiest way to do this is to
make a copy of the database file;
delete 999/1000th of the data, either by keeping the first few rows:
DELETE FROM responses WHERE SomeID > 1000;
or, if you want really random samples:
DELETE FROM responses
WHERE rowid NOT IN (SELECT rowid
FROM responses
ORDER BY random()
LIMIT (SELECT count(*)/1000 FROM responses));
run VACUUM to reduce the file size.

SQLite Database and Python

I have been given an SQLite file to exam using python. I have imported the SQLite module and attempted to connect to the database but I'm not having any luck. I am wondering if I have to actually open the file up as "r" as well as connecting to it? please see below; ie f = open("History.sqlite","r+")
import sqlite3
conn = sqlite3.connect("history.sqlite")
curs = conn.cursor()
results = curs.execute ("Select * From History.sqlite;")
I keep getting this message when I go to run results:
Operational Error: no such table: History.sqlite
An SQLite file is a single data file that can contain one or more tables of data. You appear to be trying to SELECT from the filename instead of the name of one of the tables inside the file.
To learn what tables are in your database you can use any of these techniques:
Download and use the command line tool sqlite3.
Download any one of a number of GUI tools for looking at SQLite files.
Write a SELECT statement against the special table sqlite_master to list the tables.

Categories

Resources