MySql Python Connector turns INT into int64, but not back again? - python

I am using the MySQL Python connector to manipulate a database, but running into issues when my queries involve the INT database type. When MySQL retrieves an INT column from the database, it seems to convert to a Python int64. This is fine, except it doesn't convert it back into a usable MySql type.
Here's a reduced example:
This is my MySql schema for table 'test', with Id as datatype INT:
My Python code is below. The 2nd execute (an UPDATE query) fails with this exception:
Exception Thrown: Failed processing format-parameters; Python 'int64' cannot be converted to a MySQL type
If I explicitly convert the 'firstId' parameter (which is reported as type <class 'numpy.int64'>), using int(firstId), the code runs successfully: as per another SO answer. I would have, perhaps naively, assumed that if MySql managed the conversion in one direction, it would manage it in the other. As it is, I don't necessarily know the types that I am getting from my actual query (I'm using Python ... I shouldn't have to know). Does this mean that I will have to type-check all my Python variables before running MySql queries?
I tried changing the table column datatype from INT to BIGINT (an 64-bit INT), but I got the same conversion error. Is there perhaps a 32-bit / 64-bit mismatch on the MySql connector package I am using (mysql-connector-python 8.0.23)?
import mysql.connector as msc
import pandas as pd
def main():
dbConn = msc.connect(user='********', password='********',
host='127.0.0.1',
database='********')
#Open a cursor
cursor = dbConn.cursor()
#Find Id of given name
cursor.execute('SELECT * from test WHERE Name = %s',['Hector'])
headers = cursor.column_names
queryVals = list()
for row in cursor:
queryVals.append(row)
cursor.close()
dfQueryResult = pd.DataFrame(queryVals,columns = headers)
print(dfQueryResult)
#Change name
firstId = dfQueryResult['Id'].iloc[0]
print('firstId is of type: ',type(firstId))
cursor = dbConn.cursor()
cursor.execute('UPDATE test SET Name =%s WHERE Id =%s',['Graham',firstId]) #This line gives the error
print(cursor.rowcount,' rows updated')
cursor.close()
dbConn.commit()
dbConn.close()
main()

First off, hat-tip to #NonoLondon for their comments and investigative work.
A pandas Dataframe stores numbers using NumPy types. In this case, the DataFrame constructor was taking a Python 'int' from the MySql return and converting it into a Numpy.int64 object. When this variable was used again by MySql, the connector could not convert the Numpy.int64 back to a straight Python 'int'.
From other SO articles, I discovered the item() method for all Numpy data types, which converts into base Python types. Since all Numpy data types are derived from the base class Numpy.generic, I'm now using the following utility function whenever I extract variables from DataFrames:
import numpy as np
def pyTypeFromNp(val):
if isinstance(val,np.generic):
return val.item()
return val
Hence the amended line is now:
firstId = pyTypeFromNp(dfQueryResult['Id'].iloc[0])
and the code runs as expected

Related

Python MySQL connector returns bytearray instead of string for prepared statements

Given MySQL table like such:
create table demo (id int, name varchar(100)) collate utf8mb4_general_ci;
insert into demo values (1,'abcdef');
And Python script:
import mysql.connector
db = mysql.connector.connect(host='xx', user='xx', password='xx', database='xx')
cursor = db.cursor()
cursor.execute('select * from demo')
for row in cursor:
print(row)
This produces the expected result:
(1, 'abcdef')
If I however change the cursor to a prepared cursor:
cursor = db.cursor(prepared=True)
the result is unexpected:
(1, bytearray(b'abcdef'))
I'm using Python 3.8.0 and mysql.connecter version 2.2.9
In the release notes of MySQL connector 2.1.8 (https://dev.mysql.com/doc/relnotes/connector-python/en/news-2-1-8.html) I read
When using prepared statements, string columns were returned as bytearrays instead of strings. The returned value is now a string decoded using the connection's charset (defaults to 'utf8'), or as a bytearray if this conversion fails. (Bug #27364914)
so I did not expect the behavior in the version I'm using.
What am I missing?
The text should probably read:
The returned value is now a string encoded using the connection's charset (defaults to 'utf8'), or as a bytearray if this conversion fails. (Bug #27364914).
>>> 'abcdef'.encode('utf8') == b'abcdef'
True
>>>
So, when using cursor = db.cursor(prepared=True), the driver is doing what the documentation says it will do (if the encoding fails, only then will it return a bytearray, but otherwise expect a byte string -- this is the change being described). But I see no reason to specify prepared=True for you can use prepared statements without that and get the results you have come to expect if you are only using the prepared statement as a mechanism to avoid SQL Injection attacks and are not using it for repetitive execution.
Update
I did a small benchmark with and without using prepared=True retrieving 10,882 rows:
import mysql.connector
def foo(db):
cursor = db.cursor(prepared=True)
for i in range(10883):
cursor.execute('select Company from company where PK_Company = %s', (i,))
rows = cursor.fetchall()
print(rows[0][0]) # print last fetched row
db = mysql.connector.connect(user='xx', password='xx', database='xx')
foo(db)
Results:
With `prepared=True`: 2.0 seconds for function `foo`
Without `prepeared=True`: 1.6 seconds for function `foo`
Using pymysql: 1.5 seconds for function `foo`
It would seem that prepared=True runs more slowly. ????

pandas/sqlalchemy/pyodbc: Result object does not return rows from stored proc when UPDATE statement appears before SELECT

I'm using SQL Server 2014, pandas 0.23.4, sqlalchemy 1.2.11, pyodbc 4.0.24, and Python 3.7.0. I have a very simple stored procedure that performs an UPDATE on a table and then a SELECT on it:
CREATE PROCEDURE my_proc_1
#v2 INT
AS
BEGIN
UPDATE my_table_1
SET v2 = #v2
;
SELECT * from my_table_1
;
END
GO
This runs fine in MS SQL Server Management Studio. However, when I try to invoke it via Python using this code:
import pandas as pd
from sqlalchemy import create_engine
if __name__ == "__main__":
conn_str = 'mssql+pyodbc://#MODEL_TESTING'
engine = create_engine(conn_str)
with engine.connect() as conn:
df = pd.read_sql_query("EXEC my_proc_1 33", conn)
print(df)
I get the following error:
sqlalchemy.exc.ResourceClosedError: This result object does not return
rows. It has been closed automatically.
(Please let me know if you want full stack trace, I will update if so)
When I remove the UPDATE from the stored proc, the code runs and the results are returned. Note also that selecting from a table other than the one being updated does not make a difference, I get the same error. Any help is much appreciated.
The issue is that the UPDATE statement is returning a row count, which is a scalar value, and the rows returned by the SELECT statement are "stuck" behind the row count where pyodbc cannot "see" them (without additional machinations).
It is considered a best practice to ensure that our stored procedures always start with a SET NOCOUNT ON; statement to suppress the returning of row count values from DML statements (UPDATE, DELETE, etc.) and allow the stored procedure to just return the rows from the SELECT statement.
For me I got the same issue for another reason, I was using sqlachmey the newest syntax select to get the entries of a table and I had forgot to write the name of the table class I want to get values from, so I got this error, so I had only added the name of the table as an argument to fix the error.
the code leaded to the error
query = select().where(Assessment.created_by == assessment.created_by)
simply fix it by adding the table class name sometimes issues are only in the syntax hhh
query = select(Assessment).where(Assessment.created_by == assessment.created_by)

Converting JSON into Python Dict with Postgresql data imported with SQLAlchemy

I've got a little bit of a tricky question here regarding converting JSON strings into Python data dictionaries for analysis in Pandas. I've read a bunch of other questions on this but none seem to work for my case.
Previously, I was simply using CSVs (and Pandas' read_csv function) to perform my analysis, but now I've moved to pulling data directly from PostgreSQL.
I have no problem using SQLAlchemy to connect to my engine and run my queries. My whole script runs the same as it did when I was pulling the data from CSVs. That is, until it gets to the part where I'm trying to convert one of the columns (namely, the 'config' column in the sample text below) from JSON into a Python dictionary. The ultimate goal of converting it into a dict is to be able to count the number of responses under the "options" field within the "config" column.
df = pd.read_sql_query('SELECT questions.id, config from questions ', engine)
df = df['config'].apply(json.loads)
df = pd.DataFrame(df.tolist())
df['num_options'] = np.array([len(row) for row in df.options])
When I run this, I get the error "TypeError: expected string or buffer". I tried converting the data in the 'config' column to string from object, but that didn't do the trick (I get another error, something like "ValueError: Expecting property name...").
If it helps, here's a snipped of data from one cell in the 'config' column (the code should return the result '6' for this snipped since there are 6 options):
{"graph_by":"series","options":["Strongbow Case Card/Price Card","Strongbow Case Stacker","Strongbow Pole Topper","Strongbow Base wrap","Other Strongbow POS","None"]}
My guess is that SQLAlchemy does something weird to JSON strings when it pulls them from the database? Something that doesn't happen when I'm just pulling CSVs from the database?
In recent Psycopg versions the Postgresql json(b) adaption to Python is transparent. Psycopg is the default SQLAlchemy driver for Postgresql
df = df['config']['options']
From the Psycopg manual:
Psycopg can adapt Python objects to and from the PostgreSQL json and jsonb types. With PostgreSQL 9.2 and following versions adaptation is available out-of-the-box. To use JSON data with previous database versions (either with the 9.1 json extension, but even if you want to convert text fields to JSON) you can use the register_json() function.
Just sqlalchemy query:
q = session.query(
Question.id,
func.jsonb_array_length(Question.config["options"]).label("len")
)
Pure sql and pandas' read_sql_query:
sql = """\
SELECT questions.id,
jsonb_array_length(questions.config -> 'options') as len
FROM questions
"""
df = pd.read_sql_query(sql, engine)
Combine both (my favourite):
# take `q` from the above
df = pd.read_sql(q.statement, q.session.bind)

Issues with pyodbc numeric values being labelled as 'None' instead of NULL or empty

I am currently taking numeric values (amongst many other string and numeric values) from a set of access databases and uploading them to a single MS SQL Server database.
I am using 32-bit Python 3.3 and the respective pyodbc package.
I was wondering if there is a way to capture the fact that the numeric field is empty in the
Access database without the driver returning the string 'None' instead*. The syntax used is as follows:
access_con = pyodbc.connect(connection_string)
access_cur = access_con.cursor()
access_SQL = 'SELECT * FROM ' + source_table
rows = access_cur.execute(access_SQL).fetchall()
for row in rows:
[Statement uploading each row to SQL Server using an INSERT INTO statement]
Any help would be appreciated; whether as a solution or as a more direct way to transfer the data.
*EDIT: 'None' is only a string because I turned it into one to add it to the INSERT INTO statement. Using row.replace('None','NULL') replaced all of the 'None' instances and replaced it with 'NULL' which was interpreted as a NULL value by the ODBC driver.
None is a Python object, not a string. It is the equivalent of NULL in SQL Server, or an "empty" column value in Access.
For example, if I have an Access table with the following definition:
That contains the following values (note that the first value of the Number column is empty):
Relevant Python code produces:
...
>>> cursor = connection.cursor()
>>> rows = cursor.execute('select * from Table1').fetchall()
>>> print(rows)
[(None, ), (1, )]
This sample confirms the empty Access value is returned as None.
This PyODBC Documentation provides a good explanation of how ODBC and Python data types are mapped.

Using mysqldb and sqlite3 in the same Python 2.7 script: Should I throw in the towel?

I'm writing a Python script that is meant to pull from, process, and update to a MySQL database.
I originally started attacking this problem with comma-separated value dumps of the MySQL databases, which I'd throw into an sqlite database (using sqlite3). I'd do the processing in Python (2.7), create a CSV file of the output, which I'd upload back to the MySQL database with another script.
Well, then I thought I'd try to just pull/push to the MySQL database directly from the Python script. So I installed MySQLdb, and went to town on that.
What I'm finding out now is that my INSERTs from the MySQL database (into the sqlite database) aren't joining the way they were before. The representations of the integers now have an L tacked on the end, and decimal values are expressed as something like Decimal('4.00').
Basically, things that JOINed nicely when I was inserting them from the CSV files aren't working so well now.
My question: Am I asking for a world of pain continuing down this path, or is there an easy way to get MySQLdb and sqlite3 libraries to play together? If there isn't, then I'll install a MySQL server and refactor my code to use MySQL only.
Each database backend supports different types of data. The sqlite and mysqldb python modules try to help you out by doing appropriate type conversions based on the field types. So, if your mysql database has a DECIMAL field, MySQLdb will return that field automatically as a Python Decimal object.
You can request MySQLdb (and sqlite if you want) to do appropriate type conversion between database and Python types. It's up to you to determine what type conversion is appropriate. For example, since your database has a DECIMAL field, how are you going to represent that value in sqlite which doesn't have a native DECIMAL field? You'll probably end up using a REAL, but of course this isn't the same thing as a DECIMAL which will maintain the required precision.
Since you were already converting from csv data, I suspect you've been using the Python float type, indicating that you're happy to convert MySQL decimal fields to float. In this case, you can then request that MySQLdb do a conversion from DECIMAL to float on all field results.
Here is an example bit of code which creates two tables, one each in mysqldb and sqlite. The MySQL version has a DECIMAL field. You can see in the query_dbs function how to create your own conversion functions.
#!/usr/bin/env python
import os
import sqlite3
import MySQLdb
from MySQLdb.constants import FIELD_TYPE
user = os.getenv('USER')
def create_mysql_table():
conn = MySQLdb.connect(user=user, db='foo')
c = conn.cursor()
c.execute("DROP TABLE stocks")
c.execute("CREATE TABLE stocks"
"(date text, trans text, symbol text, qty real, price Decimal(10,2) UNSIGNED NOT NULL)")
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit()
def create_sqlite_table():
conn = sqlite3.connect('test.db')
c = conn.cursor()
c.execute("DROP TABLE stocks")
c.execute("CREATE TABLE stocks"
"(date text, trans text, symbol text, qty real, price real)")
c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit()
def query_dbs(use_type_converters):
conn = sqlite3.connect('test.db')
c = conn.cursor()
for row in c.execute('SELECT * FROM stocks'):
print 'SQLITE: %s' % str(row)
type_converters = MySQLdb.converters.conversions.copy()
if use_type_converters:
type_converters.update({
FIELD_TYPE.DECIMAL: float,
FIELD_TYPE.NEWDECIMAL: float,
})
conn = MySQLdb.connect(user=user, db='foo', conv=type_converters)
c = conn.cursor()
c.execute('SELECT * FROM stocks')
for row in c.fetchall():
print 'MYSQLDB: %s' % str(row)
create_sqlite_table()
create_mysql_table()
print "Without type conversion:"
query_dbs(False)
print "With type conversion:"
query_dbs(True)
This script produces the following output on my machine:
Without type conversion:
SQLITE: (u'2006-01-05', u'BUY', u'RHAT', 100.0, 35.14)
MYSQLDB: ('2006-01-05', 'BUY', 'RHAT', 100.0, Decimal('35.14'))
With type conversion:
SQLITE: (u'2006-01-05', u'BUY', u'RHAT', 100.0, 35.14)
MYSQLDB: ('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
What this is showing, is that by default MySQLdb is returning Decimal types, but can be coerced to return a different type, suitable for use with sqlite.
Then, once you have all of the types normalized between the two databases, you should no longer have problems with joins.
Python MySQLdb docs are here
There is no conflict between sqlite3 and MySQLdb, so you should be able to use them in the same program. However, you might also consider using SQLAlchemy, which provides a higher-level interface to both kinds of databases.
As far as why you are actually seeing this problem, the symptoms you describe suggest that you're incorrectly converting numbers to strings - in particular, that you're using repr() rather than str().

Categories

Resources