python inserting and retrieving binary data into mysql - python

I'm using the MySQLdb package for interacting with MySQL. I'm having trouble getting the proper type conversions.
I am using a 16-byte binary uuid as a primary key for the table and have a mediumblob holding zlib compressed json information.
I'm using the following schema:
CREATE TABLE repositories (
added_id int auto_increment not null,
id binary(16) not null,
data mediumblob not null,
create_date int not null,
update_date int not null,
PRIMARY KEY (added_id),
UNIQUE(id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
Then I create a new row in the table using the following code:
data = zlib.compress(json.dumps({'hello':'how are you :D'})
row_id = uuid.uuid(4).hex
added_id = cursor.execute('
INSERT INTO repositories (id, data, create_date, update_date)
VALUES (%s, %s, %s, %s)',
binascii.a2b_hex(row_id),
data,
time.time(),
time.time()
)
Then to retrieve data I use a similar query:
query = cursor.execute('SELECT added_id, id, data, create_date, update_date ' \
'FROM repositories WHERE id = %s',
binascii.a2b_hex(row_id)
)
Then the query returns an empty result.
Any help would be appreciated. Also, as an aside, is it better to store unix epoch dates as integers or TIMESTAMP?
NOTE: I am not having problems inserting the data, just trying to retrieve it from the database. The row exists when I check via mysqlclient.
Thanks Alot!#

One tip: you should be able to call uuid.uuid4().bytes to get the raw
bytes. As for timestamps, if you want to perform time/date manipulation
in SQL it's often easier to deal with real TIMESTAMP types.
I created a test table to try to reproduce what you're seeing:
CREATE TABLE xyz (
added_id INT AUTO_INCREMENT NOT NULL,
id BINARY(16) NOT NULL,
PRIMARY KEY (added_id),
UNIQUE (id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
My script is able to insert and query for the rows using the binary field as a
key without problem. Perhaps you are incorrectly fetching / iterating over the
results returned by the cursor?
import binascii
import MySQLdb
import uuid
conn = MySQLdb.connect(host='localhost')
key = uuid.uuid4()
print 'inserting', repr(key.bytes)
r = conn.cursor()
r.execute('INSERT INTO xyz (id) VALUES (%s)', key.bytes)
conn.commit()
print 'selecting', repr(key.bytes)
r.execute('SELECT added_id, id FROM xyz WHERE id = %s', key.bytes)
for row in r.fetchall():
print row[0], binascii.b2a_hex(row[1])
Output:
% python qu.py
inserting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
selecting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
1 96c5a4c35a2b4cf0861e05eb74f75cd5
% python qu.py
inserting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
selecting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
2 acc92c6a6eb24f40bba23768cd3c42da

To supplement existing answers, there's also an issue with the following warning when dealing with binary strings in queries:
Warning: (1300, "Invalid utf8 character string: 'ABCDEF'")
It is reproduced by the following:
cursor.execute('''
CREATE TABLE `table`(
bin_field` BINARY(16) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
''')
bin_value = uuid.uuid4().bytes
cursor.execute('INSERT INTO `table`(bin_field) VALUES(%s)', (bin_value,))
Whenever MySQL sees that a string literal in a query isn't valid against current character_set_connection it will emit the warning. There are several solutions to it:
Explicitly set _binary charset literal
INSERT INTO `table`(bin_field) VALUES(_binary %s)
Manually construct queries with hexadecimal literals
INSERT INTO `table`(bin_field) VALUES(x'abcdef')
Change connection charset if you're only working with binary strings
For more details see MySQL Bug 79317.
Update
As #charlax pointed out, there's binary_prefix flag which can be passed to the connection's initialiser to automatically prepend _binary prefix when interpolating arguments. It's supported by recent versions of both, mysql-client and pymysql.

Related

MySQL BIGINT inconsistent for inserts?

On ubuntu.. running MySQL v 5.6.
created a python program that performs all my operations.
my app creates tables dynamically. there are many. a few are very similar.. for example, here are two:
create table tst.intgn_party_test_load (
party_id bigint unsigned NOT NULL,
party_supertype varchar(15) NOT NULL,
carrier_party_id bigint unsigned NOT NULL,
full_name varchar(500),
lda_actv_ind integer,
lda_file_id integer,
lda_created_by varchar(100),
lda_created_on datetime,
lda_updated_by varchar(100),
lda_updated_on datetime,
PRIMARY KEY(party_id,party_supertype,carrier_party_id)
)
and
create table tst.intgn_party_relationship (
parent_party_id bigint unsigned NOT NULL,
child_party_id bigint unsigned NOT NULL,
relationship_type varchar(10),
lda_actv_ind integer,
lda_file_id integer,
lda_created_by varchar(100),
lda_created_on datetime,
lda_updated_by varchar(100),
lda_updated_on datetime,
PRIMARY KEY(parent_party_id,child_party_id,relationship_type)
)
My program also dynamically populates the tables. I construct the party id fields using source data converted to an BIGINT.
For example, the insert it constructs for the first table is:
INSERT INTO intgn_party_test_load (
party_supertype,
carrier_party_id,
party_id,
full_name,
lda_actv_ind,
lda_file_id)
SELECT
'Agency' as s0,
0 as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
CONCAT(full_name,'-',ga) as s3,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
full_name = VALUES(full_name),
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id) ;
and for the second table the insert constructed looks very similar, and is based on the exact same source data:
INSERT INTO tst.intgn_party_relationship (
parent_party_id,
relationship_type,
child_party_id,
lda_actv_ind,
lda_file_id)
SELECT (Select party_id
from intgn_party
where full_name = 'xxx') as s0,
'Location' as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id)
Now... the first table (intgn_party_test_load) is the issue. I can drop it, recreate it manually even.. no matter what i do, the data inserted into it via python has the BIGINT party_id truncated to just 16 digits.
EVERY OTHER TABLE that uses the exact same formula to populate the party_id, creates BIGINT numbers that are between 18 and 20 digits long. I can see all the same source records loaded in the tables, and i see the truncated values in the first table (intgn_party_test_load). for example, the first table has a record with party id = 7129232523783260. the second table (and many others) has the same record loaded with [child]party id = 7129232523783260081.
The exact same formula, executed the exact same way from python.. but this table gets shorter BIGINTs.
Interestingly, I tried manually running the insert into this table (not using the python program), and it inserts the full BIGINT values.
So I'm confused why the python program has 'chosen' this table to not work correctly, while it works fine on all other tables.
Is there some strange scenario where values get truncated?
BTW, my python program utilizes sqlalchemy to run the creations/inserts. Since it works manually, I have to assume its related to sqlalchemy.. but no idea why it works on all but this table..
[edit]
to add, the sql commands through sqlalchemy are executed using db_connection.execute(sql)
[edit - adding more code detail]
from sqlalchemy import create_engine, exc
engine = create_engine(
connection_string,
pool_size=6, max_overflow=10, encoding='latin1', isolation_level='AUTOCOMMIT'
)
connection = engine.connect()
sql = "INSERT INTO intgn_party_test_load (
party_supertype,
carrier_party_id,
party_id,
full_name,
lda_actv_ind,
lda_file_id)
SELECT
'Agency' as s0,
0 as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
CONCAT(full_name,'-',ga) as s3,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
full_name = VALUES(full_name),
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id) ;"
result = db_connection.execute(sql)
Thats as best i can reduce it too (the code is much more complicated as it dynamically creates the statement amoungst other things).. but from my logging, i see the exact statement it is executing (As above), and i see the result in the BIGINT columns after. all tables but this one. And only when through the app.
so it doesn't happen to the other tables even through the app..
very confusing.. was hoping someone just knew a bug in mySQL 5.6 around BIGINTs as it pertains to maybe the destination table's key construct or total length of records.. or some other crazy reason. I do see that interestingly, if i do a distinct on BIGINT column that has >18 digit lengths, it comes back as 16 digits - guess the distinct function doesn't support BIGINT..
was kinda hoping this hints at an issue, but i don't get why the other tables would work fine...
[EDIT - adding some of the things i see sqlalchemy running apparently, around the actual run of my query.. just in the crazy case they impact anything - for the one table?? ]
SET AUTOCOMMIT = 0
SET AUTOCOMMIT = 1
SET NAMES utf8mb4
SHOW VARIABLES LIKE 'sql_mode'
SHOW VARIABLES LIKE 'lower_case_table_names'
SELECT VERSION()
SELECT DATABASE()
SELECT ##tx_isolation
show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
ROLLBACK
SET NAMES utf8mb4
hard to say the order or anything like that.. there are a ton that get run at the same microsecond.
after racking my brains for days.. coming at it from all angles, i could not figure out why 1 table out of many, had issues with truncating the SHA'd value.
In the end, i have redesigned how i hold my Ids, and i no longer bother converting to BIGINT. it all works fine when i leave it as CHAR.
CAST(SHA(CONCAT(full_name,ga)) AS CHAR)
So changed all my Id columns to varchar(40) and use the above style. All good now. Joins will use varchar instead of bigint - which i'm ok with.

inserting a dictionary to sqlite3 database - python

I'm trying to store a dictionary in a SQLite database (python) and be able to pull it from the DB and use it like any other dict in python and It doesn't seem to work.
I've tried using json.dumps() and still no success.
This is my table:
cursor.execute("""
CREATE TABLE IF NOT EXISTS users(
email TEXT PRIMARY KEY NOT NULL,
password TEXT NOT NULL,
ip_dict VARIANT NOT NULL);
""")
and I would like to insert a dict to the "ip_dict" column, for instance:
ip_dict = {"pc1": "192.168.1.2", "router": "192.168.1.1"}
i tried doing:
ip_dict = json.dumps(ip_dict)
cursor.execute(f"""
INSERT INTO users(email,password,ip_dict)
VALUES("example#gmail.com", "123456789", {ip_dict})
""")
but no success.
Thanks
First, I would not overlay the dictionary with its string (i.e. JSON) representation; I would use separate variable just to keep things straight. But since you now have a string representation and you want to store that value in a TEXT column, nothing further needs to be done (I don't know what you were trying to accomplish by surrounding the string with {}):
ip_dict_json = json.dumps(ip_dict)
cursor.execute(f"""
INSERT INTO users(email,password,ip_dict)
VALUES('example#gmail.com', '123456789', ?)
""", (ip_dict_json,))
Note that I am using single quotes to represent string literals as this is the more common and portable method to do so and that the string value for the dictionary is being passed as a parameter to a prepared statement.
Update
If you want to get more sophisticated, you could define the column type to be a special type such as dictionary and then specify an adapter and converter for this type:
cursor.execute("""
CREATE TABLE IF NOT EXISTS users(
email TEXT PRIMARY KEY NOT NULL,
password TEXT NOT NULL,
ip_dict dictionary NOT NULL);
""")
import json
sqlite3.register_adapter(dict, lambda d: json.dumps(d).encode('utf8'))
sqlite3.register_converter("dictionary", lambda d: json.loads(d.decode('utf8')))
cursor.execute(f"""
INSERT INTO users(email,password,ip_dict)
VALUES('example#gmail.com', '123456789', ?)
""", (ip_dict,)) # passing a dictionary and not a string
cursor.execute('SELECT * FROM users')
rows = cursor.fetchall()
for row in rows:
print(row[2]) # this is a dictionary and not a string

INserting value in mysql DB using python

Here is my Mysql table schema
Table: booking
Columns:
id int(11) PK AI
apt_id varchar(200)
checkin_date date
checkout_date date
price decimal(10,0)
deposit decimal(10,0)
adults int(11)
source_id int(11)
confirmationCode varchar(100)
client_id int(11)
booking_date datetime
note mediumtext
Related Tables:property (apt_id → apt_id)
booking_source (source_id → id)
I am trying to insert the value using python .so Here what I have done
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
x.execute(sql)
But while executing the above script I am getting the error .
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
TypeError: %d format: a number is required, not NoneType
I think my strings formatter are not correct Please help me out .
it looks like either booking_date, notes, source_id, (also you are inserting notes value 2x?)
is None. You could check/validate each value before inserting.
Also please use parameterized queries, NOT string formatting
Usually your SQL operations will need to use values from Python
variables. You shouldn’t assemble your query using Python’s string
operations because doing so is insecure; it makes your program
vulnerable to an SQL injection attack (see http://xkcd.com/327/ for
humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a
placeholder wherever you want to use a value, and then provide a tuple
of values as the second argument to the cursor’s execute() method.
something like:
x.execute("INSERT INTO thing (test_one, test_two) VALUES (?, ?)", (python_var_one, python_var_two,))

pygresql - insert and return serial

I'm using PyGreSQL to access my DB. In the use-case I'm currently working on; I am trying to insert a record into a table and return the last rowid... aka the value that the DB created for my ID field:
create table job_runners (
id SERIAL PRIMARY KEY,
hostname varchar(100) not null,
is_available boolean default FALSE
);
sql = "insert into job_runners (hostname) values ('localhost')"
When I used the db.insert(), which made the most sense, I received an "AttributeError". And when I tried db.query(sql) I get nothing but an OID.
Q: Using PyGreSQL what is the best way to insert records and return the value of the ID field without doing any additional reads or queries?
INSERT INTO job_runners
(hostname,is_available) VALUES ('localhost',true)
RETURNING id
That said, I have no idea about pygresql, but by what you've already written, I guess it's db.query() that you want to use here.
The documentation in PyGreSQL says that if you call dbconn.query() with and insert/update statement that it will return the OID. It goes on to say something about lists of OIDs when there are multiple rows involved.
First of all; I found that the OID features did not work. I suppose knowing the version numbers of the libs and tools would have helped, however, I was not trying to return the OID.
Finally; by appending "returning id", as suggested by #hacker, pygresql simply did the right thing and returned a record-set with the ID in the resulting dictionary (see code below).
sql = "insert into job_runners (hostname) values ('localhost') returning id"
rv = dbconn.query(sql)
id = rv.dictresult()[0]['id']
Assuming you have a cursor object cur:
cur.execute("INSERT INTO job_runners (hostname) VALUES (%(hostname)s) RETURNING id",
{'hostname': 'localhost'})
id = cur.fetchone()[0]
This ensures PyGreSQL correctly escapes the input string, preventing SQL injection.

How do you safely and efficiently get the row id after an insert with mysql using MySQLdb in python?

I have a simple table in mysql with the following fields:
id -- Primary key, int, autoincrement
name -- varchar(50)
description -- varchar(256)
Using MySQLdb, a python module, I want to insert a name and description into the table, and get back the id.
In pseudocode:
db = MySQLdb.connection(...)
queryString = "INSERT into tablename (name, description) VALUES" % (a_name, a_desc);"
db.execute(queryString);
newID = ???
I think it might be
newID = db.insert_id()
Edit by Original Poster
Turns out, in the version of MySQLdb that I am using (1.2.2)
You would do the following:
conn = MySQLdb(host...)
c = conn.cursor()
c.execute("INSERT INTO...")
newID = c.lastrowid
I am leaving this as the correct answer, since it got me pointed in the right direction.
I don't know if there's a MySQLdb specific API for this, but in general you can obtain the last inserted id by SELECTing LAST_INSERT_ID()
It is on a per-connection basis, so you don't risk race conditions if some other client performs an insert as well.
You could also do a
conn.insert_id
The easiest way of all is to wrap your insert with a select count query into a single stored procedure and call that in your code. You would pass in the parameters needed to the stored procedure and it would then select your row count.

Categories

Resources