Python: MySQLdb LOAD DATA INFILE silently fails - python

I am attempting to use a Python script to import a csv file into a MySQL database.
It seems to fail silently.
Here is my code:
#!/usr/bin/python
import MySQLdb
class DB:
host = 'localhost'
user = 'root'
password = '**************'
sqldb = 'agriculture'
conn = None
def connect(self):
self.conn = MySQLdb.connect(self.host,self.user,self.password,self.sqldb )
def query(self, sql, params=None):
try:
cursor = self.conn.cursor()
if params is not None:
cursor.execute(sql, params)
else:
cursor.execute(sql)
except (AttributeError, MySQLdb.OperationalError):
self.connect()
cursor = self.conn.cursor()
if params is not None:
cursor.execute(sql, params)
else:
cursor.execute(sql)
print vars(cursor)
return cursor
def load_data_infile(self, f, table, options=""):
sql="""LOAD DATA LOCAL INFILE '%s' INTO TABLE %s FIELDS TERMINATED BY ',';""" % (f,table)
self.query(sql)
db = DB()
pathToFile = "/home/ariggi/722140-93805-sqltest.csv"
table_name = "agriculture.degreedays"
db.load_data_infile(pathToFile, table_name)
In an attempt to debug this situation I am dumping the cursor object to the screen within the "query()" method. Here is the output:
{'_result': None, 'description': None, 'rownumber': 0, 'messages': [],
'_executed': "LOAD DATA LOCAL INFILE
'/home/ariggi/722140-93805-sqltest.csv' INTO TABLE degreedays FIELDS
TERMINATED BY ',';", 'errorhandler': >, 'rowcount': 500L, 'connection': , 'description_flags': None,
'arraysize': 1, '_info': 'Records: 500 Deleted: 0 Skipped: 0
Warnings: 0', 'lastrowid': 0L, '_last_executed': "LOAD DATA LOCAL
INFILE '/home/ariggi/722140-93805-sqltest.csv' INTO TABLE agriculture.degreedays
FIELDS TERMINATED BY ',';", '_warnings': 0, '_rows': ()}
If I take the "_last_executed" query, which is
LOAD DATA LOCAL INFILE '/home/ariggi/722140-93805-sqltest.csv' INTO TABLE agriculture.degreedays FIELDS TERMINATED BY ',';
and run it through the mysql console it works as expected and fills the table with rows. However when I execute this script my database table remains empty.
I am pretty stumped and could use some help.

Try calling db.conn.commit() at the end of your code to make the changes permanent. Python by default does not use the "autocommit" mode, so until you issue a commit the DB module regards your changes as part of an incomplete transaction.
As #AirThomas points out in a comment it helps to us a "context manager" - though I'd say the correct formulation was
with conn.cursor() as curs:
do_something_with(curs)
because this will automatically commit any changes unless the controlled code raises an exception.

Related

Prepared statement pymysql who is correct?

I have created a test database called test inside it has a table called testTable with an autoincrement id value and a name field that takes a varchar(30).
The PREPARE statement queries (4 of them) execute fine when copied into phpmyadmin but I get the error 👍 2021-01-08 18:26:53,022 (MainThread) [ERROR] (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SET\n #name = 'fred';\nEXECUTE\n statement USING #name;\nDEALLOCATE\nPREPARE\n ' at line 5")
The test code:
import pymysql
import logging
class TestClass():
def __init__(self):
# mysqlconnections
self.mySQLHostName = "localhost"
self.mySQLHostPort = 3306
self.mySQLuserName = "userName"
self.mySQLpassword = "pass"
self.MySQLauthchandb = "mysql"
def QueryMYSQL (self, query):
try:
#logging.info("QueryMYSQL : " + str( query)) # Uncomment to print all mysql queries sent
conn = pymysql.connect(host=self.mySQLHostName, port=self.mySQLHostPort, user=self.mySQLuserName, passwd=self.mySQLpassword, db=self.MySQLauthchandb, charset='utf8')
conn.autocommit(True)
cursor = conn.cursor()
if cursor:
returnSuccess = cursor.execute(query)
if cursor:
returnValue = cursor.fetchall()
#logging.info ("return value : " + str(returnValue)) # Uncomment to print all returned mysql queries
if cursor:
cursor.close()
if conn:
conn.close()
return returnValue
except Exception as e:
logging.error("Problem in ConnectTomySQL")
logging.error(query)
logging.error(e)
return False
# Default error logging log file location:
logging.basicConfig(format='%(asctime)s (%(threadName)-10s) [%(levelname)s] %(message)s', filename= 'ERROR.log',filemode = "w", level=logging.DEBUG)
logging.info("Logging Started")
test = TestClass()
result = test.QueryMYSQL("Describe test.testTable")
print(result)
query = """
PREPARE
statement
FROM
'INSERT INTO test.testTable (id, name) VALUES (NULL , ?)';
SET
#name = 'fred';
EXECUTE
statement USING #name;
DEALLOCATE
PREPARE
statement;
"""
result = test.QueryMYSQL(query)
print(result)
I'm assuming this is a library issue rather than a mysql issue? I am trying to use prepared statements to prevent code injection from user input as I understand this prepared statements are the best way to do this rather than trying to pre filter user input and missing something.
I asked this question on the github but one of the authors (methane Inada Naoki) replied with this:
========
Multistatement can be used by attacker when there is a query injection vulnerability. So it is disabled by default.
as I understand this prepared statements are the best way
You are totally wrong. Your use of prepared statement doesn't protect you from SQL injection at all. If you enable multistatement, your "prepared statement" can be attacked by SQL injection.
But I am not free tech support nor free teacher for you. OSS maintainers are not. Please don't ask here.
and he closed the issue.
Is he correct?
The author book I am reading Robin Nixon,"Learning PHP, MySQL and JavaScript" O'Reilly 5th edition. He appears to be under the misconception and I quote "Let me introduce the best and recommended way to interact with MySQL, which is pretty much bulletproof in terms of Security" Its in the Using Placeholders section pg 260. Is he wrong?
Because I bought this book to improve my security practices and now I'm not sure what is correct.
I found out from the developer of pymysql that the library does not support the PREPARE mysql statement. Also the pymysql library by default does not execute multi-statements.
I understand that my first attempt at substituting values into the INSERT statement is inherently unsafe if multi-statements are enabled. This can be done by using the client_flag=pymysql.constants.CLIENT.MULTI_STATEMENTS in the connect constructor.
The pymysql library does however allow for placeholders to be used in MySQL queries using the cursor.execute(query, (tuple)) method.
To demonstrate this I wrote the following test code example.
import pymysql
import logging
class TestClass():
def __init__(self):
# mysqlconnections
self.mySQLHostName = "localhost"
self.mySQLHostPort = 3306
self.mySQLuserName = "name"
self.mySQLpassword = "pw"
self.MySQLauthchandb = "mysql"
def QueryMYSQL (self, query, data = ()):
try:
logging.info("QueryMYSQL : " + str( query)) # Uncomment to print all mysql queries sent
conn = pymysql.connect(host=self.mySQLHostName, port=self.mySQLHostPort, user=self.mySQLuserName, passwd=self.mySQLpassword, db=self.MySQLauthchandb, charset='utf8', client_flag=pymysql.constants.CLIENT.MULTI_STATEMENTS) #code injection requires multistatements to be allowed this is off in pymysql by default and has to be set on manually.
conn.autocommit(True)
cursor = conn.cursor()
if cursor:
if data:
returnSuccess = cursor.execute(query, data)
else:
returnSuccess = cursor.execute(query)
if cursor:
returnValue = cursor.fetchall()
logging.info ("return value : " + str(returnValue)) # Uncomment to print all returned mysql queries
if cursor:
cursor.close()
if conn:
conn.close()
return returnValue
except Exception as e:
logging.error("Problem in ConnectTomySQL")
logging.error(e)
logging.error(query)
if data:
logging.error("Data {}".format(str(data)))
return False
# Default error logging log file location:
logging.basicConfig(format='%(asctime)s (%(threadName)-10s) [%(levelname)s] %(message)s', filename= 'ERROR.log',filemode = "w", level=logging.DEBUG)
logging.info("Logging Started")
def usePlaceholder(userInput):
query = "INSERT INTO test.testTable (id, name) VALUES (NULL , %s)"
data = (userInput,)
result = test.QueryMYSQL(query,data)
print(result)
def useSubstitution(userInput):
query = "INSERT INTO test.testTable (id, name) VALUES (NULL , '{}')".format(userInput) # this is unsafe.
result = test.QueryMYSQL(query)
print(result)
test = TestClass()
#Create the test database and testTable.
query = "CREATE DATABASE test"
test.QueryMYSQL(query)
query = "CREATE TABLE `test`.`testTable` ( `id` INT NOT NULL AUTO_INCREMENT , `name` VARCHAR(256) NULL DEFAULT NULL , PRIMARY KEY (`id`)) ENGINE = InnoDB;"
test.QueryMYSQL(query)
#Simulated user input.
legitUserEntry = "Ringo"
injectionAttempt = "333' ); INSERT INTO test.testTable (id, name) VALUES (NULL , 666);#" #A simulated user sql injection attempt.
useSubstitution(legitUserEntry) # this will also insert Ringo - but could be unsafe.
usePlaceholder(legitUserEntry) # this will insert Ringo - but is safer.
useSubstitution(injectionAttempt) # this will inject the input code and execute it.
usePlaceholder(injectionAttempt) # this will insert the input into the database without executing the injected code.
So from this exercise, I shall henceforth improve my security by keeping multi-statements set to off (the default) AND using the placeholders and data tuple rather than substitution.

Does postgres copy command amend to or replace the table I import the table into?

The code I am using is this:
import psycopg2
import pandas as pd
import sys
def pg_load_table(file_path, table_name, dbname, host, port, user, pwd):
'''
This function upload csv to a target table
'''
try:
conn = psycopg2.connect(dbname=dbname, host=host, port=port,\
user=user, password=pwd)
print("Connecting to Database")
cur = conn.cursor()
f = open(file_path, "r")
# Truncate the table first
cur.execute("Truncate {} Cascade;".format(table_name))
print("Truncated {}".format(table_name))
# Load table from the file with header
cur.copy_expert("copy {} from STDIN CSV HEADER QUOTE '\"'".format(table_name), f)
cur.execute("commit;")
print("Loaded data into {}".format(table_name))
conn.close()
print("DB connection closed.")
except Exception as e:
print("Error: {}".format(str(e)))
sys.exit(1)
# Execution Example
file_path = '/tmp/restaurants.csv'
table_name = 'usermanaged.restaurants'
dbname = 'db name'
host = 'host url'
port = '5432'
user = 'username'
pwd = 'password'
pg_load_table(file_path, table_name, dbname, host, port, user, pwd)
I expected it to append to my data, but the input file ended up replacing my table.
How can I edit this line:
cur.copy_expert("copy {} from STDIN CSV HEADER QUOTE '\"'".format(table_name), f)
(or more of the code, if neccesary) to make the command append instead of replace? Alternatively, could this support the syntax of an update SQL command based on a where clause?
As Mike Organek notes in the comments, this line removes all data from your table:
cur.execute("Truncate {} Cascade;".format(table_name))
Remove that, and you'll find your data will be appended by the COPY operation.
Note: this means if your CSV data combined with the existing data in the table violates any constraints (say, unique keys...), the entire transaction will fail and you'll get NO new data in your table. If you need to perform an "Upsert", see: How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
(FYI: your question was flagged as potentially invalid due to a typo, but given the explicit nature of the truncate call combined with your last paragraph, I suspect this logic was constructed to avoid just such a constraint violation; COPY is best left to bulk loads, with more flexible approaches used for updates.)

sqlite3 add row to table doesnt add no errors

I have an SQLite3 database that I want to add to with python, this is the code i have to add a row
def create_connection(db_file):
""" create a database connection to a SQLite database """
conn = None
try:
conn = sqlite3.connect(db_file)
return conn
except Error as e:
print(e)
def add_password(conn, data):
"""
Create an entry into the password database
"""
try:
sql = 'INSERT INTO passwords(added,username,password,website,email) VALUES(?,?,?,?,?)'
cur = conn.cursor()
cur.execute(sql, data)
print('done')
return cur.lastrowid
except Error as e:
print(e)
connection = create_connection('passwords.db')
data = (datetime.now(), 'SomeUsername', 'password123', 'stackoverflow.com', 'some#email.com')
add_password(connection, data)
When I run it prints done and ends, there are no errors. However, when I open the database to view the table, it has no entries.
If I open the database and run the same SQL code
INSERT INTO passwords(added,username,password,website,email)
VALUES('13-5-2020', 'SomeUsername', 'password123', 'stackoverflow.com', 'some#email.com')
it adds to the table. So it must be a problem with my python code. How do I get it to add?
Just make conn.commit() after executing query. It should work

Can not write on a table in redshift

I am trying to copy a file from S3 to redshift table but I am unable to do so. However, I can read from the table so I know that my connection is okay.
Please help me to figure out the problem.
def upload_redshift():
conn_string = passd.redshift_login['login'] //the connection string containing dbname, username etc.
con = psycopg2.connect(conn_string);
sql = """FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ; ;"""
try:
con = psycopg2.connect(conn_string)
logging.info("Connection Successful!")
except:
raise ValueError("Unable to connect to Redshift")
cur = con.cursor()
try:
cur.execute(sql)
logging.info(" Copy to redshift executed successfully")
except:
raise ValueError("Failed to execute copy command")
con.close()
I am getting Copy to redshift executed successfully message but nothing is happening in my table.
Try the following,
sql = "copy table_name FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ;"
Also, try creating the connection under "connections tab" and use PostgresHook with aws_access_key_id and key as variables, something like below which enables to store the details encrypted within airflow,
pg_db = PostgresHook(postgres_conn_id='<<connection_id>>')
src_conn = pg_db.get_conn()
src_cursor = src_conn.cursor()
src_cursor.execute(sql)
src_cursor.commit()
src_cursor.close()
Also, you can use s3_to_redshift_operator operator and execute it as a task,
from airflow.operators.s3_to_redshift_operator import S3ToRedshiftTransfer
T1 = S3ToRedshiftTransfer(
schema = ‘’,
table = ‘’,
s3_bucket=‘’,
s3_key=‘’,
redshift_conn_id=‘’, #reference to a specific redshift database
aws_conn_id=‘’, #reference to a specific S3 connection
)

Sqlite insert query not working with python?

I have been trying to insert data into the database using the following code in python:
import sqlite3 as db
conn = db.connect('insertlinks.db')
cursor = conn.cursor()
db.autocommit(True)
a="asd"
b="adasd"
cursor.execute("Insert into links (link,id) values (?,?)",(a,b))
conn.close()
The code runs without any errors. But no updation to the database takes place. I tried adding the conn.commit() but it gives an error saying module not found. Please help?
You do have to commit after inserting:
cursor.execute("Insert into links (link,id) values (?,?)",(a,b))
conn.commit()
or use the connection as a context manager:
with conn:
cursor.execute("Insert into links (link,id) values (?,?)", (a, b))
or set autocommit correctly by setting the isolation_level keyword parameter to the connect() method to None:
conn = db.connect('insertlinks.db', isolation_level=None)
See Controlling Transactions.
It can be a bit late but set the autocommit = true save my time! especially if you have a script to run some bulk action as update/insert/delete...
Reference: https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.isolation_level
it is the way I usually have in my scripts:
def get_connection():
conn = sqlite3.connect('../db.sqlite3', isolation_level=None)
cursor = conn.cursor()
return conn, cursor
def get_jobs():
conn, cursor = get_connection()
if conn is None:
raise DatabaseError("Could not get connection")
I hope it helps you!

Categories

Resources