Fast MySQL Import - python

Writing a script to convert raw data for MySQL import I worked with a temporary textfile so far which I later imported manually using the LOAD DATA INFILE... command.
Now I included the import command into the python script:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
cursor = db.cursor()
query = """
LOAD DATA INFILE 'temp.txt' INTO TABLE myDB.values
FIELDS TERMINATED BY ',' LINES TERMINATED BY ';';
"""
cursor.execute(query)
cursor.close()
db.commit()
db.close()
This works but temp.txt has to be in the database directory which isn't suitable for my needs.
Next approch is dumping the file and commiting directly:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
sql = "INSERT INTO values(`timestamp`,`id`,`value`,`status`) VALUES(%s,%s,%s,%s)"
cursor=db.cursor()
for line in lines:
mode, year, julian, time, *values = line.split(",")
del values[5]
date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%m-%d")
time = datetime.strptime(time.rjust(4, "0"), "%H%M" ).strftime("%H:%M:%S")
timestamp = "%s %s" % (date, time)
for i, value in enumerate(values[:20], 1):
args = (timestamp,str(i+28),value, mode)
cursor.execute(sql,args)
db.commit()
Works as well but takes around four times as long which is too much. (The same for construct was used in the first version to generate temp.txt)
My conclusion is that I need a file and the LOAD DATA INFILE command to be faster. To be free where the textfile is placed the LOCAL option seems useful. But with MySQL Connector (1.1.7) there is the known error:
mysql.connector.errors.ProgrammingError: 1148 (42000): The used command is not allowed with this MySQL version
So far I've seen that using MySQLdb instead of MySQL Connector can be a workaround. Activity on MySQLdb however seems low and Python 3.3 support will probably never come.
Is LOAD DATA LOCAL INFILE the way to go and if so is there a working connector for python 3.3 available?
EDIT: After development the database will run on a server, script on a client.

I may have missed something important, but can't you just specify the full filename in the first chunk of code?
LOAD DATA INFILE '/full/path/to/temp.txt'
Note the path must be a path on the server.

To use LOAD DATA INFILE with every accessible file you have to set the
LOCAL_FILES client flag while creating the connection
import mysql.connector
from mysql.connector.constants import ClientFlag
db = mysql.connector.connect(client_flags=[ClientFlag.LOCAL_FILES], <other arguments>)

Related

Accessing SQLite DB /w python and getting malformed DBs

I have some python code that copies a SQLite db across sftp. However, it is a highly active db, so many of the times I am running into a malformed db. I'm thinking of these possible options, but I don't know how to implement them because I am newer to python.
Alternate method of getting the sqlite db copied?
Maybe there is a way to query the sqlite file from the device? Not sure if that would work since sqlite is more of a local db not sure how I can query it like I could w mysql etc...
Create a loop? I could call the function again in the exception, but not sure how to retry the rest of the code.
Also, the malformed db issue can possibly occur in other sections im thinking? Maybe I need to run a pragma quick_check?
This is commonly what I am seeing.... The other catch is why am I seeing it as often as I am? Because if I load the sqlite file from my main machine, and it runs the query files?
(venv) dulanic#mediaserver:/opt/python_scripts/rpi$ cd /opt/python_scripts/rpi ; /usr/bin/env /opt/python_scripts/rpi/venv/bin/python /home/dulanic/.vscode-server/extensions/ms-python.python-2021.2.636928669/pythonFiles/lib/python/debugpy/launcher 37599 -- /opt/python_scripts/rpi/rpdb.py
An error occurred: database disk image is malformed
This is my current code:
#!/usr/bin/env python3
import psycopg2, sqlite3, sys, paramiko, sys, os, socket, time
scpuser=os.getenv('scpuser')
scppw = os.getenv('scppw')
sqdb = os.getenv('sqdb')
sqlike = os.getenv('sqlike')
pgdb = os.getenv('pgdb')
pguser = os.getenv('pguser')
pgpswd = os.getenv('pgpswd')
pghost = os.getenv('pghost')
pgport = os.getenv('pgport')
pgschema = os.getenv('pgschema')
database = r"./pihole.db"
pihole = socket.gethostbyname('pi.hole')
tabnames=[]
tabgrab = ''
def pullsqlite():
sftp.get('/etc/pihole/pihole-FTL.db','pihole.db')
sftp.close()
# SFTP pull config
ssh_client=paramiko.SSHClient()
ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_client.connect(hostname=pihole,username=scpuser,password=scppw)
sftp=ssh_client.open_sftp()
# Pull SQlite
pullsqlite()
# Load sqlite tables to list
consq=sqlite3.connect(sqdb)
cursq=consq.cursor()
cursq.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name in ({sqlike})" )
tabgrab = cursq.fetchall()
# postgres connection
conpg = psycopg2.connect(database=pgdb, user=pguser, password=pgpswd,
host=pghost, port=pgport)
#Load data to postgres from sqlite
for item in tabgrab:
tabnames.append(item[0])
start = time.perf_counter()
for table in tabnames:
curpg = conpg.cursor()
if table=='queries':
curpg.execute(f"SELECT max(id) FROM {table};")
max_id = curpg.fetchone()[0]
cursq.execute(f"SELECT * FROM {table} where id > {max_id};")
else:
cursq.execute(f"SELECT * FROM {table};")
try:
rows=cursq.fetchall()
except sqlite3.Error as e:
print("An error occurred:", e.args[0])
colcount=len(rows[0])
pholder=('%s,'*colcount)[:-1]
try:
curpg.execute(f"SET search_path TO {pgschema};" )
curpg.executemany(f"INSERT INTO {table} VALUES ({pholder}) ON CONFLICT DO NOTHING;" ,rows)
conpg.commit()
print(f'Inserted {len(rows)} rows into {table}')
except psycopg2.DatabaseError as e:
print (f'Error {e}')
sys.exit(1)
if 'start' in locals():
elapsed = time.perf_counter() - start
print(f'Time {elapsed:0.4}')
consq.close()

MySQL - LOAD DATA LOCAL

Running a MySQL (8.0) database on a Ubuntu (20.04) VPS. My current objective is trying to load a .CSV automatically into a table via a Python script. The script is theoretically correct and should work, it's the ability to process the data from the CSV into the table.
dbupdate.py:
import mysql.connector
import os
import string
db = mysql.connector.connect (
host="localhost",
user="root",
passwd="********",
db="Rack_Info"
)
sqlLoadData = "LOAD DATA LOCAL INFILE '/home/OSA_ADVA_Dashboard/Processed_CSV/DownloadedCSV.csv' INTO TABLE BerT FIELDS TERMINATED BY ',' ENCLOSED BY '*' IGNORE 1 LINES;"
try:
curs = db.cursor()
curs.execute(sqlLoadData)
db.commit()
print ("SQL execution complete")
resultSet = curs.fetchall()
except IOError:
print ("Error incurred: ")
db.rollback()
db.close()
print ("Data loading complete.\n")
I have consulted the official documentation and enabled local_infile on both the server and client, configured the my.cnf and in SQL.
The my.cnf file:
#
# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
#
# * IMPORTANT: Additional settings that can override those from this file!
# The files must end with '.cnf', otherwise they'll be ignored.
#
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
[client]
local_infile=1
[mysql]
local_infile=1
[mysqld]
local_infile=1
I have restarted both php and MySQL services to no avail, as well as the server. At a loss here at what to do. Any help would be much appreciated.
if im not mistaken php has its own config file where you have to enable load data local infile
I investigated the php.ini file and uncommented the load data lines, still nothing.
Turns out one of mysqld's variable, secure_file_priv, was pointing at an empty/default directory. All I had to do was change the directory to the where my files were located. All working now.

How do I import a MySQL database in a Python script?

I've seen some similar questions about this on StackOverflow but haven't found an answer that works; see http://stackoverflow.com/questions/4408714/execute-sql-file-with-python-mysqldb AND http://stackoverflow.com/questions/10593876/execute-sql-file-in-python-with-mysqldb?lq=1
Here is my code:
import pymysql
import sys
import access # holds credentials
import mysql_connector # connects to MySQL, is fully functional
class CreateDB(object):
def __init__(self):
self.cursor = None
self.conn = pymysql.connect(host, user, passwd)
def create_database(self):
try:
with self.conn.cursor() as cursor:
for line in open('file.sql'):
cursor.execute(line)
self.conn.commit()
except Warning as warn:
f = open(access.Credentials().error_log, 'a')
f.write('Warning: %s ' % warn + '\nStop.\n')
sys.exit()
create = CreateDB()
create.create_database()
When I run my script I get the following error:
pymysql.err.InternalError: (1065, 'Query was empty')
My .sql file is successfully loaded when I import directly through MySQL and there is a single query on each line of the file. Does anybody have a solution for this? I have followed the suggestions on other posts but have not had any success.
Take care of empty lines in the end of the file by:
if line.strip(): cursor.execute(line)
You can execute all the SQL in the file at once, by using the official MySQL Connector/Python and the Multi parameter in its cursor.execute method.
Quote from the second link:
If multi is set to True, execute() is able to execute multiple statements specified in the operation string. It returns an iterator that enables processing the result of each statement.
Example code from the link, slightly modified:
import mysql.connector
file = open('script.sql')
sql = file.read()
cnx = mysql.connector.connect(user='u', password='p', host='h', database='d')
cursor = cnx.cursor()
for result in cursor.execute(sql, multi=True):
if result.with_rows:
print("Rows produced by statement '{}':".format(
result.statement))
print(result.fetchall())
else:
print("Number of rows affected by statement '{}': {}".format(
result.statement, result.rowcount))
cnx.close()

Sybase sybpydb queries not returning anything

I am currently connecting to a Sybase 15.7 server using sybpydb. It seems to connect fine:
import sys
sys.path.append('/dba/sybase/ase/15.7/OCS-15_0/python/python26_64r/lib')
sys.path.append('/dba/sybase/ase/15.7/OCS-15_0/lib')
import sybpydb
conn = sybpydb.connect(user='usr', password='pass', servername='serv')
is working fine. Changing any of my connection details results in a connection error.
I then select a database:
curr = conn.cursor()
curr.execute('use db_1')
however, now when I try to run queries, it always returns None
print curr.execute('select * from table_1')
I have tried running the use and select queries in the same execute, I have tried including go commands after each, I have tried using curr.connection.commit() after each, all with no success. I have confirmed, using dbartisan and isql, that the same queries I am using return entries.
Why am I not getting results from my queries in python?
EDIT:
Just some additional info. In order to get the sybpydb import to work, I had to change two environment variables. I added the lib paths (the same ones that I added to sys.path) to $LD_LIBRARY_PATH, i.e.:
setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH":dba/sybase/ase/15.7/OCS-15_0/python/python26_64r/lib:/dba/sybase/ase/15.7/OCS-15_0/lib
and I had to change the SYBASE path from 12.5 to 15.7. All this was done in csh.
If I print conn.error(), after every curr.execute(), I get:
("Server message: number(5701) severity(10) state(2) line(0)\n\tChanged database context to 'master'.\n\n", 5701)
I completely understand where you might be confused by the documentation. Its doesn't seem to be on par with other db extensions (e.g. psycopg2).
When connecting with most standard db extensions you can specify a database. Then, when you want to get the data back from a SELECT query, you either use fetch (an ok way to do it) or the iterator (the more pythonic way to do it).
import sybpydb as sybase
conn = sybase.connect(user='usr', password='pass', servername='serv')
cur = conn.cursor()
cur.execute("use db_1")
cur.execute("SELECT * FROM table_1")
print "Query Returned %d row(s)" % cur.rowcount
for row in cur:
print row
# Alternate less-pythonic way to read query results
# for row in cur.fetchall():
# print row
Give that a try and let us know if it works.
Python 3.x working solution:
import sybpydb
try:
conn = sybpydb.connect(dsn="Servername=serv;Username=usr;Password=pass")
cur = conn.cursor()
cur.execute('select * from db_1..table_1')
# table header
header = tuple(col[0] for col in cur.description)
print('\t'.join(header))
print('-' * 60)
res = cur.fetchall()
for row in res:
line = '\t'.join(str(col) for col in row)
print(line)
cur.close()
conn.close()
except sybpydb.Error:
for err in cur.connection.messages:
print(f'Error {err[0]}, Value {err[1]}')

using an sqlite3 database with WAL enabled -Python

I'm trying to modify the two database files used by Google Drive to redirect my sync folder via a script (snapshot.db and sync_conf.db). While I can open the files in certain sqlite browsers (not all) I cant get python to execute a query. I just get the message: sqlite3.DatabaseError: file is encrypted or is not a database
Apparently google is using a Write-Ahead-logging (WAL) configuration on the databases and it can be turned off by running PRAGMA journal_mode=DELETE; (according to sqlite.org) against the database, but I can't figure out how to run that against the database if python can't read it.
heres what I have (I tried executing the PRAGMA command and commiting and then reopening but it didnt work):
import sqlite3
snapShot = 'C:\Documents and Settings\user\Local Settings\Application Data\Google\Drive\snapshot.db'
sync_conf = 'C:\Documents and Settings\user\Local Settings\Application Data\Google\Drive\sync_config.db'
sync_folder_path = 'H:\Google Drive'
conn = sqlite3.connect(snapShot)
cursor = conn.cursor()
#cursor.execute('PRAGMA journal_mode=DELETE;')
#conn.commit()
#conn= sqlite3.connect(snapShot)
#cursor = conn.cursor()
query = "UPDATE local_entry SET filename = '\\?\\" + sync_folder_path +"' WHERE filename ='\\?\C:Users\\admin\Google Drive'"
print query
cursor.execute(query)
problem solved. I just downloaded the latest version of sqlite from http://www.sqlite.org/download.html and overwrote the old .dll in my python27/DLL directory. Works fine now.
What a nusance.
I don't think the journal_mode pragma should keep sqlite3 from being able to open the db at all. Perhaps you're using an excessively old version of the sqlite3 lib? What version of Python are you using, and what version of the sqlite3 library?
import sqlite3
print sqlite3.version

Categories

Resources