Reading a delimited text file in python

Reading a delimited text file in python - python

I have a txt file with many mysql inserts (1.5 million).
I need to read this file with python and divide this file at each ';' for each query and run the query with python. How can I divide this file at each ';'? And run the query with python?
Until now my code is:
import MySQLdb
db = MySQLdb.connect(host = "localhost",
user="root",
passwd="da66ro",
db="test")
f = open('E:/estudos/projetos/tricae/tests_python.txt')

First open the file:
with open('youfilename.sql', 'r') as f:
fileAsString = f.read().replace("\n", "")
sqlStatements = fileAsString.split(";")
Then to run the query:
cursor = db.cursor()
for statement in sqlStatements:
try:
cursor.execute(statement)
db.commit()
except:
db.rollback()
But of course you must realize this is a terrible idea. What happens when you have a quoted ";" character in a string you are inserting? You'll have to be a bit more clever than what you posed as a question - in general it's a terrible idea to assume anything about any data you are inserting into a database.
Or even worse than a broken query: what about malicious code? SQL injection? Never trust input you haven't sanitized.

OK, so, first you need read the file and split it by ";", which is done with split() function. Then you can loop or select which queries to execute (or just execute the whole file without splitting). You can find numerous examples on each of these and I'm sure it will be easy enough to combine them in what you need.
I need to read this file with python and divide this file at
each ';' for each query and run the query with python.

I am new to python.. Here is the work around
import io
myfile = open('69_ptc_group_mappingfile_mysql.sql')
data = (myfile.read().decode("utf-8-sig").encode("utf-8")).lower()
query_list=[]
if 'delimiter' not in data:
query_list = (data.strip()).split(";")
else:
tempv = (data.rstrip()).split('delimiter')
for i in tempV:
if (i.strip()).startswith("//"):
i = i.rstrip().split("//")
for a in i:
if len(a)!=0:
query_list.append(a.strip())
else:
corr = ((i.rstrip()).split(";"))
for i in corr:
if len(i.rstrip())!=0:
query_list.append(i.rstrip())
print query_list
for j in query_list: cursor.execute(j)

Related

psycopg2.ProgrammingError: syntax error at or near "select"

I have a python script, which read a sql file and execute the sql command stored in it. But when executing it I got below error:
psycopg2.ProgrammingError: syntax error at or near "select"
LINE 1: select * from image
the sql file content is:
select * from image
which is simple and should be correct.
the code throwing the error(the last line, more specificately):
cur=conn.cursor()
string=open(script,'r',encoding='utf-8').read()#script is the sql file
cur.execute(string)
is there anyone can advise?
----update-----
below is the function in the python script. I don't post the whole script since it is too long.
def list_(csv, sql=None , script=None , host = None, dbname=None , user=None , pwd=None):
print(sql)
print(script)
if (sql):
print("sql")
with conn2db(host,dbname,user,pwd) as conn:
cur = conn.cursor()
cur.execute(sql)
if (script):
print("script")
with conn2db(host,dbname, user, pwd) as conn:
cur = conn.cursor()
string = open(script, 'r', encoding='utf-8').read()
print(string)
cur.execute(string)
#cur.execute(open(script, 'r', encoding='utf-8').read())
with open(csv,'w') as file:
for record in cur:
mystr=str(record)[1:-2] if str(record)[-1]==',' else str(record)[1:-1]
file.write(mystr+'\n')
#file.write('\n')

Seeing the connection string as well as your schemas and tables in your database would help. Please confirm that these are all correct. Additionally, running .strip() on the SQL string after reading it from the file or adding a semi-colon to the end of the SQL string is worth a try.

Today I hit this issue again, and I deleted the original file, created a new one, and typed the sql command. Everything works like a charm now.
My guess is that the original file contains some invisible characters which caused this issue. But why they exist there still puzzled me.

What is the best way to dump MySQL table data to csv and convert character encoding?

I have a table with about 200 columns. I need to take a dump of the daily transaction data for ETL purposes. Its a MySQL DB. I tried that with Python both using pandas dataframe as well as basic write to CSV file method. I even tried to look for the same functionality using shell script. I saw one such for oracle Database using sqlplus. Following are my python codes with the two approaches:
Using Pandas:
import MySQLdb as mdb
import pandas as pd
host = ""
user = ''
pass_ = ''
db = ''
query = 'SELECT * FROM TABLE1'
conn = mdb.connect(host=host,
user=user, passwd=pass_,
db=db)
df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')
Using basic python file write:
import MySQLdb
import csv
import datetime
currentDate = datetime.datetime.now().date()
host = ""
user = ''
pass_ = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data_on_%s.csv' % currentDate, 'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print('Done')
The table has about 300,000 records. It's taking too much time with both the python codes.
Also, there's an issue with encoding here. The DB resultset has some latin-1 characters for which I'm getting some errors like : UnicodeEncodeError: 'ascii' codec can't encode character '\x96' in position 1078: ordinal not in range(128).
I need to save the CSV in Unicode format. Can you please help me with the best approach to perform this task.
A Unix based or Python based solution will work for me. This script needs to be run daily to dump daily data.

You can achieve that just leveraging MySql. For example:
SELECT * FROM your_table WHERE...
INTO OUTFILE 'your_file.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
FIELDS ESCAPED BY '\'
LINES TERMINATED BY '\n';
if you need to schedule your query put such a query into a file (e.g., csv_dump.sql) anche create a cron task like this one
00 00 * * * mysql -h your_host -u user -ppassword < /foo/bar/csv_dump.sql

For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode instead of str.
rows = cursor.fetchall()
f = open('Data_on_%s.csv' % currentDate, 'w')
myFile = csv.writer(f)
myFile.writerow([unicode(s).encode("utf-8") for s in rows])
fp.close()

You can use mysqldump for this task. (Source for command)
mysqldump -u username -p --tab -T/path/to/directory dbname table_name --fields-terminated-by=','
The arguments are as follows:
-u username for the username
-p to indicate that a password should be used
-ppassword to give the password via command line
--tab Produce tab-separated data files
For mor command line switches see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html
To run it on a regular basis, create a cron task like written in the other answers.

Python PYODBC - Previous SQL was not a query

I have the following python code, it reads through a text file line by line and takes characters x to y of each line as the variable "Contract".
import os
import pyodbc
cnxn = pyodbc.connect(r'DRIVER={SQL Server};CENSORED;Trusted_Connection=yes;')
cursor = cnxn.cursor()
claimsfile = open('claims.txt','r')
for line in claimsfile:
#ldata = claimsfile.readline()
contract = line[18:26]
print(contract)
cursor.execute("USE calms SELECT XREF_PLAN_CODE FROM calms_schema.APP_QUOTE WHERE APPLICATION_ID = "+str(contract))
print(cursor.fetchall())
When including the line cursor.fetchall(), the following error is returned:
Programming Error: Previous SQL was not a query.
The query runs in SSMS and replace str(contract) with the actual value of the variable results will be returned as expected.
Based on the data, the query will return one value as a result formatted as NVARCHAR(4).
Most other examples have variables declared prior to the loop and the proposed solution is to set NO COUNT on, this does not apply to my problem so I am slightly lost.
P.S. I have also put the query in its own standalone file without the loop to iterate through the file in case this was causing the problem without success.

In your SQL query, you are actually making two commands: USE and SELECT and the cursor is not set up with multiple statements. Plus, with database connections, you should be selecting the database schema in the connection string (i.e., DATABASE argument), so TSQL's USE is not needed.
Consider the following adjustment with parameterization where APPLICATION_ID is assumed to be integer type. Add credentials as needed:
constr = 'DRIVER={SQL Server};SERVER=CENSORED;Trusted_Connection=yes;' \
'DATABASE=calms;UID=username;PWD=password'
cnxn = pyodbc.connect(constr)
cur = cnxn.cursor()
with open('claims.txt','r') as f:
for line in f:
contract = line[18:26]
print(contract)
# EXECUTE QUERY
cur.execute("SELECT XREF_PLAN_CODE FROM APP_QUOTE WHERE APPLICATION_ID = ?",
[int(contract)])
# FETCH ROWS ITERATIVELY
for row in cur.fetchall():
print(row)
cur.close()
cnxn.close()

Fast MySQL Import

Writing a script to convert raw data for MySQL import I worked with a temporary textfile so far which I later imported manually using the LOAD DATA INFILE... command.
Now I included the import command into the python script:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
cursor = db.cursor()
query = """
LOAD DATA INFILE 'temp.txt' INTO TABLE myDB.values
FIELDS TERMINATED BY ',' LINES TERMINATED BY ';';
"""
cursor.execute(query)
cursor.close()
db.commit()
db.close()
This works but temp.txt has to be in the database directory which isn't suitable for my needs.
Next approch is dumping the file and commiting directly:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
sql = "INSERT INTO values(`timestamp`,`id`,`value`,`status`) VALUES(%s,%s,%s,%s)"
cursor=db.cursor()
for line in lines:
mode, year, julian, time, *values = line.split(",")
del values[5]
date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%m-%d")
time = datetime.strptime(time.rjust(4, "0"), "%H%M" ).strftime("%H:%M:%S")
timestamp = "%s %s" % (date, time)
for i, value in enumerate(values[:20], 1):
args = (timestamp,str(i+28),value, mode)
cursor.execute(sql,args)
db.commit()
Works as well but takes around four times as long which is too much. (The same for construct was used in the first version to generate temp.txt)
My conclusion is that I need a file and the LOAD DATA INFILE command to be faster. To be free where the textfile is placed the LOCAL option seems useful. But with MySQL Connector (1.1.7) there is the known error:
mysql.connector.errors.ProgrammingError: 1148 (42000): The used command is not allowed with this MySQL version
So far I've seen that using MySQLdb instead of MySQL Connector can be a workaround. Activity on MySQLdb however seems low and Python 3.3 support will probably never come.
Is LOAD DATA LOCAL INFILE the way to go and if so is there a working connector for python 3.3 available?
EDIT: After development the database will run on a server, script on a client.

I may have missed something important, but can't you just specify the full filename in the first chunk of code?
LOAD DATA INFILE '/full/path/to/temp.txt'
Note the path must be a path on the server.

To use LOAD DATA INFILE with every accessible file you have to set the
LOCAL_FILES client flag while creating the connection
import mysql.connector
from mysql.connector.constants import ClientFlag
db = mysql.connector.connect(client_flags=[ClientFlag.LOCAL_FILES], <other arguments>)

python - does cx_Oracle allow you to force all columns to be cx_Oracle.STRING?

This is a small snippet of python code (not the entire thing) to write results to a file. But because my table that I'm querying has some TIMESTAMP(6) WITH LOCAL TIME ZONE datatypes, the file is storing the values in a different format ie '2000-5-15 0.59.8.843679000' instead of '15-MAY-00 10.59.08.843679000 AM'. Is there a way to force it to write to the file as if the datatype were a VARCHAR (ie cx_Oracle.STRING or otherwise so that the file has the same content as querying through a client tool)?
db = cx_Oracle.connect(..<MY CONNECT STRING>.)
cursor = db.cursor()
file = open('C:/blah.csv', "w")
r = cursor.execute(<MY SQL>)
for row in cursor:
writer.writerow(row)

Could you use to_char inside your query? That way it will be forced to STRING type.
r = cursor.execute("select to_char( thetime, 'DD-MON-RR HH24.MI.SSXFF' ) from my_table")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a delimited text file in python - python

Related

psycopg2.ProgrammingError: syntax error at or near "select"

What is the best way to dump MySQL table data to csv and convert character encoding?

Python PYODBC - Previous SQL was not a query

Fast MySQL Import

python - does cx_Oracle allow you to force all columns to be cx_Oracle.STRING?

Categories

Resources