Unable to import CSV data into MySQL database with Python - python

Next piece of Python source code I use to import data from CSV file into MySql.
The installed versions in use are: MySql 5.7 and Python2.7
Furthermore, the table has 3 columns. One ID column (auto increment) and two text columns: Firstname and lastname.
The matter seems to go wrong around the SQL statement. In particular the "%s" bit.
Literally, I have done everything with punctuates like: ",' and `. E.g 's', "s", ("s") etc.
Also I have searched and used various source code snippets. All of it failed. Even I doubted the quality of my CSV file. However, importing that via e.g. MySql workbench works fine.
Ploughing my way through I managed to get data in MySql when I add for instance some plain example values in the source code. So technically everything seems to be fine. It is just parsing which fails...
Please help. Most probably I am overlooking something simple but it drives me utterly crazy...
import mysql.connector as mysql
import csv
db = mysql.connect(
host = "xxxx",
user = "xxxx",
passwd = "xxxx",
database = "abc"
)
cursor = db.cursor()
ifile = open('/tmp/import.csv', "rb")
read = csv.reader(ifile)
for row in ifile:
print row
sql = "INSERT INTO Names(Firstname, Lastname) VALUES(%s, %s)"
cursor.execute(sql, row)
db.commit()
print(cursor.rowcount, "record inserted")
What I would expect to happen is dat the data in the CSV is parsed into MySQL.
The error message I receive is:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s, %s, %s)' at line 1
Thanks for any hint, clue and/or solution!!

Assuming you are reading the csv file correctly and you have same parameters required within the query try replacing this line
cursor.execute(sql, row)
to
cursor.execute(sql, tuple(row))

Thanks for the answer. And apologies for this late reply.
The issue has become resolved in the mean time.
It has to do with my CSV file. What exactly that is, I do not know yet. However, I can continue with importing.
Thanks again and have a nice day!
Best regards,
Detlev

Related

Sqlite3 in Python not fetching data

I have a sqlite3 DB with a few rows in it. When I try to fetch data from it in Python, fetchall returns an empty list.
con = sqlite3.connect("commands.db")
cursor = con.cursor()
con.execute("SELECT * FROM commands;")
existing = cursor.fetchall()
print(existing)
# Prints []
The data is being inserted in a different part of the project fine. I verified this by opening the DB in "DB Browser for SQLite" and running the following command. This returned the data in the table with no problem.
SELECT * FROM commands;
Has anyone come across a similar issue?
Thank you for the help in advance!
Just change con.execute to cursor.execute. Then it should work

Psycopg2 relation db does not exist

I recently started using Macbook because my laptop was changed at work and right after that I started having problems with some of my code that I use to upload a dataframe to a postgresql database.
import psycopg2
from io import StringIO
def create_connection(user,password):
return psycopg2.connect(
host='HOST',
database='DBNAME',
user=user,
password=password)
conn = create_connection(user,password)
table = "data_analytics.tbl_summary_wingmans_rt"
buffer = StringIO()
df.to_csv(buffer, header=False, index=False)
buffer.seek(0)
cursor = conn.cursor()
cursor.copy_from(buffer, table, sep=",", null="")
conn.commit()
cursor.close()
As you can see, the code is quite simple and even before the change of equipment it ran without major problem on Windows. But as soon as I run this same code on the mac it throws me the following error:
Error: relation "data_analytics.tbl_summary_wingmans_rt" does not exist
In several posts I saw that it could be the use of double quotes but I have already used the following and I still do not have a positive result.
"data_analytics."tbl_summary_wingmans_rt""
""data_analytics"."tbl_summary_wingmans_rt""
'data_analytics."tbl_summary_wingmans_rt"'
The behaviour of copy_from changed in psycopg2 2.9 to properly quote the table name, which means that you can no longer supply a schema-qualified table name that way; you have to use copy_expert instead.
You have to separate schema and table before sending it to Postgres parser now,
when you are sending "data_analytics.tbl_summary_wingmans_rt" its a single string and unable to parse
use '"data_analytics"."tbl_summary_wingmans_rt"' this will parse the output as "schema"."table" and PostgreSQL will be able to parse

Python 3.6 connection to MS SQL Server for large dataframe

I am a new Python coder and also a new data scientist so please forgive any foolish sounding things here. I'll keep the details out unless anyone's curious but basically I need to connect to Microsoft SQL Server and upload a Pandas DF that is relatively large (~500k rows) and I need to do this almost every day as the project currently stands.
It doesn't have to be a Pandas DF - I've read about using odo for csv files but I haven't been able to get anything to work. The issue I'm having is that I can't bulk insert the DF because the file isn't on the same machine as the SQL Server instance. I'm consistently getting errors like the following:
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC SQL
Server Driver][SQL Server]Incorrect syntax near the keyword 'IF'.
(156) (SQLExecDirectW)")
As I've attempted different SQL statements you can replace IF with whatever has been the first COL_NAME in the CREATE statement. I'm using SQLAlchemy to create the engine and connect to the database. This may go without saying but the pd.to_sql() method is just way too slow for how much data I'm moving so that's why I need something faster.
I'm using Python 3.6 by the way. I've put down here most of the things that I've tried that haven't been successful.
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('test_col'))
address = 'mssql+pyodbc://uid:pw#server/path/database?driver=SQL Server'
engine = create_engine(address)
connection = engine.raw_connection()
cursor = connection.cursor()
# Attempt 1 <- This failed to even create a table at the cursor_execute statement so my issues could be way in the beginning here but I know that I have a connection to the SQL Server because I can use pd.to_sql() to create tables successfully (just incredibly slowly for my tables of interest)
create_statement = """
DROP TABLE test_table
CREATE TABLE test_table (test_col)
"""
cursor.execute(create_statement)
test_insert = '''
INSERT INTO test_table
(test_col)
values ('abs');
'''
cursor.execute(test_insert)
Attempt 2 <- From iabdb WordPress blog I came across
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
records = [str(tuple(x)) for x in take_rates.values]
insert_ = """
INSERT INTO test_table
("A")
VALUES
"""
for batch in chunker(records, 2): # This would be set to 1000 in practice I hope
print(batch)
rows = str(batch).strip('[]')
print(rows)
insert_rows = insert_ + rows
print(insert_rows)
cursor.execute(insert_rows)
#conn.commit() # don't know when I would need to commit
conn.close()
# Attempt 3 # From a related Stack Exchange Post
create the table but first drop if it already exists
command = """DROP TABLE IF EXISTS test_table
CREATE TABLE test_table # these columns are from my real dataset
"Serial Number" serial primary key,
"Dealer Code" text,
"FSHIP_DT" timestamp without time zone,
;"""
cursor.execute(command)
connection.commit()
# stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
output = io.StringIO()
# ignore the index
take_rates.to_csv(output, sep='~', header=False, index=False)
# jump to start of stream
output.seek(0)
contents = output.getvalue()
cur = connection.cursor()
# null values become ''
cur.copy_from(output, 'Config_Take_Rates_TEST', null="")
connection.commit()
cur.close()
It seems to me that MS SQL Server is just not a nice Database to play around with...
I want to apologize for the rough formatting - I've been at this script for weeks now but just finally decided to try to organize something for StackOverflow. Thank you very much for any help anyone can offer!
If you only need to replace the existing table, truncate it and use bcp utility to upload the table. It's much faster.
from subprocess import call
command = "TRUNCATE TABLE test_table"
take_rates.to_csv('take_rates.csv', sep='\t', index=False)
call('bcp {t} in {f} -S {s} -U {u} -P {p} -d {db} -c -t "{sep}" -r "{nl}" -e {e}'.format(t='test_table', f='take_rates.csv', s=server, u=user, p=password, db=database, sep='\t', nl='\n')
You will need to install bcp utility (yum install mssql-tools on CentOS/RedHat).
'DROP TABLE IF EXISTS test_table' just looks like invalid tsql syntax.
you can do something like this:
if (object_id('test_table') is not null)
DROP TABLE test_table

Insert multiple tab-delimited text files into MySQL with Python?

I am trying to create a program that takes a number of tab delaminated text files, and works through them one at a time entering the data they hold into a MySQL database. There are several text files, like movies.txt which looks like this:
1 Avatar
3 Iron Man
3 Star Trek
and actors.txt that looks the same etc. Each text file has upwards of one hundred entries each with an id and corresponding value as seen above. I have found a number of code examples on this site and others but I can't quite get my head around how to implement them in this situation.
So far my code looks something like this ...
import MySQLdb
database_connection = MySQLdb.connect(host='localhost', user='root', passwd='')
cursor = database_connection.cursor()
cursor.execute('CREATE DATABASE library')
cursor.execute('USE library')
cursor.execute('''CREATE TABLE popularity (
PersonNumber INT,
Category VARCHAR(25),
Value VARCHAR(60),
)
''')
def data_entry(categories):
Everytime i try to get the other code I have found working with this I just get lost completely. Hopeing someone can help me out by either showing me what I need to do or pointing me in the direction of some more information.
Examples of the code I have been trying to adapt to my situation are:
import MySQLdb, csv, sys
conn = MySQLdb.connect (host = "localhost",user = "usr", passwd = "pass",db = "databasename")
c = conn.cursor()
csv_data=csv.reader(file("a.txt"))
for row in csv_data:
print row
c.execute("INSERT INTO a (first, last) VALUES (%s, %s), row")
c.commit()
c.close()
and:
Python File Read + Write
MySQL can read TSV files directly using the mysqlimport utility or by executing the LOAD DATA INFILE SQL command. This will be faster than processing the file in python and inserting it, but you may want to learn how to do both. Good luck!

Any other way to import data files(like .csv) in python sqlite3 module ? [not insert one by one]

In sqlite3's client CLI, there is " .import file TABLE_name " to do it.
But, I do not want to install sqlite3 to my server at present.
In python sqlite3 module, we can creat and edit a DB.
But, I have not found a way to import data-file to a TABLE,
except inserting rows one by one.
Any other way?
You could insert at one shot using executemany command instead of inserting one by one
Lets say I have users.csv with following contents
"Hugo","Boss"
"Calvin","Klein"
and basically open with csv module and pass it to .executemany function
import csv,sqlite3
persons= csv.reader(open("users.csv"))
con = sqlite3.connect(":memory:")
con.execute("create table person(firstname, lastname)")
con.executemany("insert into person(firstname, lastname) values (?, ?)", persons)
for row in con.execute("select firstname, lastname from person"):
print row
#(u'Hugo', u'Boss')
#(u'Calvin', u'Klein')

Categories

Resources