MySQL BIGINT inconsistent for inserts? - python

On ubuntu.. running MySQL v 5.6.
created a python program that performs all my operations.
my app creates tables dynamically. there are many. a few are very similar.. for example, here are two:
create table tst.intgn_party_test_load (
party_id bigint unsigned NOT NULL,
party_supertype varchar(15) NOT NULL,
carrier_party_id bigint unsigned NOT NULL,
full_name varchar(500),
lda_actv_ind integer,
lda_file_id integer,
lda_created_by varchar(100),
lda_created_on datetime,
lda_updated_by varchar(100),
lda_updated_on datetime,
PRIMARY KEY(party_id,party_supertype,carrier_party_id)
)
and
create table tst.intgn_party_relationship (
parent_party_id bigint unsigned NOT NULL,
child_party_id bigint unsigned NOT NULL,
relationship_type varchar(10),
lda_actv_ind integer,
lda_file_id integer,
lda_created_by varchar(100),
lda_created_on datetime,
lda_updated_by varchar(100),
lda_updated_on datetime,
PRIMARY KEY(parent_party_id,child_party_id,relationship_type)
)
My program also dynamically populates the tables. I construct the party id fields using source data converted to an BIGINT.
For example, the insert it constructs for the first table is:
INSERT INTO intgn_party_test_load (
party_supertype,
carrier_party_id,
party_id,
full_name,
lda_actv_ind,
lda_file_id)
SELECT
'Agency' as s0,
0 as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
CONCAT(full_name,'-',ga) as s3,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
full_name = VALUES(full_name),
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id) ;
and for the second table the insert constructed looks very similar, and is based on the exact same source data:
INSERT INTO tst.intgn_party_relationship (
parent_party_id,
relationship_type,
child_party_id,
lda_actv_ind,
lda_file_id)
SELECT (Select party_id
from intgn_party
where full_name = 'xxx') as s0,
'Location' as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id)
Now... the first table (intgn_party_test_load) is the issue. I can drop it, recreate it manually even.. no matter what i do, the data inserted into it via python has the BIGINT party_id truncated to just 16 digits.
EVERY OTHER TABLE that uses the exact same formula to populate the party_id, creates BIGINT numbers that are between 18 and 20 digits long. I can see all the same source records loaded in the tables, and i see the truncated values in the first table (intgn_party_test_load). for example, the first table has a record with party id = 7129232523783260. the second table (and many others) has the same record loaded with [child]party id = 7129232523783260081.
The exact same formula, executed the exact same way from python.. but this table gets shorter BIGINTs.
Interestingly, I tried manually running the insert into this table (not using the python program), and it inserts the full BIGINT values.
So I'm confused why the python program has 'chosen' this table to not work correctly, while it works fine on all other tables.
Is there some strange scenario where values get truncated?
BTW, my python program utilizes sqlalchemy to run the creations/inserts. Since it works manually, I have to assume its related to sqlalchemy.. but no idea why it works on all but this table..
[edit]
to add, the sql commands through sqlalchemy are executed using db_connection.execute(sql)
[edit - adding more code detail]
from sqlalchemy import create_engine, exc
engine = create_engine(
connection_string,
pool_size=6, max_overflow=10, encoding='latin1', isolation_level='AUTOCOMMIT'
)
connection = engine.connect()
sql = "INSERT INTO intgn_party_test_load (
party_supertype,
carrier_party_id,
party_id,
full_name,
lda_actv_ind,
lda_file_id)
SELECT
'Agency' as s0,
0 as s1,
CONV(SUBSTRING(CAST(SHA(CONCAT(full_name,ga)) AS CHAR), 1, 16), 16, 10) as s2,
CONCAT(full_name,'-',ga) as s3,
lda_actv_ind,
lda_file_id
FROM tst.raw_listing_20210118175114
ON DUPLICATE KEY
UPDATE
full_name = VALUES(full_name),
lda_actv_ind = VALUES(lda_actv_ind),
lda_file_id = VALUES(lda_file_id) ;"
result = db_connection.execute(sql)
Thats as best i can reduce it too (the code is much more complicated as it dynamically creates the statement amoungst other things).. but from my logging, i see the exact statement it is executing (As above), and i see the result in the BIGINT columns after. all tables but this one. And only when through the app.
so it doesn't happen to the other tables even through the app..
very confusing.. was hoping someone just knew a bug in mySQL 5.6 around BIGINTs as it pertains to maybe the destination table's key construct or total length of records.. or some other crazy reason. I do see that interestingly, if i do a distinct on BIGINT column that has >18 digit lengths, it comes back as 16 digits - guess the distinct function doesn't support BIGINT..
was kinda hoping this hints at an issue, but i don't get why the other tables would work fine...
[EDIT - adding some of the things i see sqlalchemy running apparently, around the actual run of my query.. just in the crazy case they impact anything - for the one table?? ]
SET AUTOCOMMIT = 0
SET AUTOCOMMIT = 1
SET NAMES utf8mb4
SHOW VARIABLES LIKE 'sql_mode'
SHOW VARIABLES LIKE 'lower_case_table_names'
SELECT VERSION()
SELECT DATABASE()
SELECT ##tx_isolation
show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
ROLLBACK
SET NAMES utf8mb4
hard to say the order or anything like that.. there are a ton that get run at the same microsecond.

after racking my brains for days.. coming at it from all angles, i could not figure out why 1 table out of many, had issues with truncating the SHA'd value.
In the end, i have redesigned how i hold my Ids, and i no longer bother converting to BIGINT. it all works fine when i leave it as CHAR.
CAST(SHA(CONCAT(full_name,ga)) AS CHAR)
So changed all my Id columns to varchar(40) and use the above style. All good now. Joins will use varchar instead of bigint - which i'm ok with.

Related

Forcing autoincrement on Numeric primary key with Sqlalchemy and SQL Server

I have an existing SQL Server (2012) DB with many tables having a primary key of Numeric(9, 0) type. For all intents and purposes they are integers.
The ORM mapping (generated using sqlacodegen) looks like:
class SomeTable(Base):
__tablename__ = 'SOME_TABLE'
__table_args__ = {'schema': 'dbo'}
SOME_TABLE_ID = Column(Numeric(9, 0), primary_key=True)
some_more_fields_here = XXX
Sample code to insert data:
some_table = SomeTable(_not_specifying_SOME_TABLE_ID_explicityly_)
session.add(some_table)
session.flush() # <---BOOM, FlushError here
When I try to insert data into such tables, my app crashes on session.flush() with the following error:
sqlalchemy.orm.exc.FlushError: Instance SomeTable ... has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such aswithin a load() event.
If I replace Numeric with BigInteger then everything works fine. I did some digging and the query generated with Numeric is like this:
INSERT INTO dbo.[SOME_TABLE] (_columns_without_SOME_TABLE_ID)
VALUES (...)
seems like a valid query from SQL point of view, but Sqlalchemy raises the above exception there
The query generated using BigInteger is as follows:
INSERT INTO dbo.[SOME_TABLE] (_columns_without_SOME_TABLE_ID)
OUTPUT inserted.[SOME_TABLE_ID]
VALUES (...)
I also found this peace of documentation about autoincrement property. And sure enough, it explains the behavior I observe, i.e. autoincrement only works with integers.
So my question is whether there is some kind of workaround to make autoincrement work with Numeric columns without converting them to BigInteger?
My system configuration is - Centos 7 64 bit, Python 3.5.2, Sqlalchemy 1.1.4, pymssql 2.2.0, SQL Server 2012.

PostgreSQL INSERT based on SELECT results using Python

I have been trying to mess around with this issue for the past week and have not been able to work around it. I have a PostgreSQL database which keeps track of players playing matches in a tournament. I am working on a function which reports the results of matches. The way it reports the results of matches is by simply updating the database (I'm not worrying about the actual reporting system at the moment).
Here is the function which does the reporting:
def reportMatch(winner, loser):
"""Records the outcome of a single match between two players.
Args:
winner: the id number of the player who won
loser: the id number of the player who lost
"""
connection = connect()
cursor = connection.cursor()
match_played = 1
insert_statement = "INSERT INTO matches (winner_id, loser_id) VALUES (%s, %s)"
cursor.execute(insert_statement, (winner, loser))
cursor.execute("INSERT INTO players (match_count) VALUES (%s) SELECT players.id FROM players where (id = winner)" (match_played,)) # here is where I have my issue at runtime
connection.commit()
cursor.execute("INSERT INTO players(match_count) SELECT (id) FROM players VALUES (%s, %s)",(match_played, loser,))
connection.commit()
connection.close
The line which is I have commented out above is where I get an error. To be more precise and pinpoint it out: cursor.execute("INSERT INTO players(match_count) VALUES (%s) SELECT (id) FROM players WHERE (id = %s)",(match_played, winner,))
The error given is the following:
File "/vagrant/tournament/tournament.py", line 103, in reportMatch
cursor.execute(insert_statement_two, (match_played, winner))
psycopg2.ProgrammingError: syntax error at or near "SELECT"
LINE 1: INSERT INTO players(match_count) VALUES (1) SELECT (id) FROM...
If it helps, here is my schema:
CREATE TABLE players (
id serial PRIMARY KEY,
name varchar(50),
match_count int DEFAULT 0,
wins int DEFAULT 0,
losses int DEFAULT 0,
bye_count int
);
CREATE TABLE matches (
winner_id serial references players(id),
loser_id serial references players(id),
did_battle BOOLEAN DEFAULT FALSE,
match_id serial PRIMARY KEY
);
I have some experience with MySQL databases, but am fairly new to PostgreSQL. I spent a lot of time poking around with guides and tutorials online but haven't had the best of luck. Any help would be appreciated!
You can't both specify VALUES and use SELECT in an INSERT statement. It's one or the other.
See the definition of INSERT in the Postgres doc for more details.
You also do not need to specify the id value, as serial is a special sequence.
That line is also lacking a comma after the INSERT string.
In order to specify particular values, you can narrow down the SELECT with a WHERE, and/or leverage RETURNING from the earlier INSERTs, but the specifics of that will depend on exactly how you want to link them together. (I'm not quite sure exactly what you're going for in terms of linking the tables together from the code above.)
I suspect using RETURNING to get 2 IDs from players that were INSERTed and using those in turn in the INSERT into matches is along the lines of what you're looking to do.

INserting value in mysql DB using python

Here is my Mysql table schema
Table: booking
Columns:
id int(11) PK AI
apt_id varchar(200)
checkin_date date
checkout_date date
price decimal(10,0)
deposit decimal(10,0)
adults int(11)
source_id int(11)
confirmationCode varchar(100)
client_id int(11)
booking_date datetime
note mediumtext
Related Tables:property (apt_id → apt_id)
booking_source (source_id → id)
I am trying to insert the value using python .so Here what I have done
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
x.execute(sql)
But while executing the above script I am getting the error .
sql = "INSERT INTO `nycaptBS`.`booking` (`apt_id`, `checkin_date`, `checkout_date`, `price`,`deposite` `adults`, `source_id`, `confirmationCode`, `client_id`, `booking_date`) VALUES ('%s','%s','%s','%s','%s','%d','%d','%s','%d','%s' )" % (self.apt_id,self.start_at,self.end_at,self.final_price,self.deposit,self.adults,self.source_id,self.notes,self.client_id,self.booking_date,self.notes)
TypeError: %d format: a number is required, not NoneType
I think my strings formatter are not correct Please help me out .
it looks like either booking_date, notes, source_id, (also you are inserting notes value 2x?)
is None. You could check/validate each value before inserting.
Also please use parameterized queries, NOT string formatting
Usually your SQL operations will need to use values from Python
variables. You shouldn’t assemble your query using Python’s string
operations because doing so is insecure; it makes your program
vulnerable to an SQL injection attack (see http://xkcd.com/327/ for
humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a
placeholder wherever you want to use a value, and then provide a tuple
of values as the second argument to the cursor’s execute() method.
something like:
x.execute("INSERT INTO thing (test_one, test_two) VALUES (?, ?)", (python_var_one, python_var_two,))

python inserting and retrieving binary data into mysql

I'm using the MySQLdb package for interacting with MySQL. I'm having trouble getting the proper type conversions.
I am using a 16-byte binary uuid as a primary key for the table and have a mediumblob holding zlib compressed json information.
I'm using the following schema:
CREATE TABLE repositories (
added_id int auto_increment not null,
id binary(16) not null,
data mediumblob not null,
create_date int not null,
update_date int not null,
PRIMARY KEY (added_id),
UNIQUE(id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
Then I create a new row in the table using the following code:
data = zlib.compress(json.dumps({'hello':'how are you :D'})
row_id = uuid.uuid(4).hex
added_id = cursor.execute('
INSERT INTO repositories (id, data, create_date, update_date)
VALUES (%s, %s, %s, %s)',
binascii.a2b_hex(row_id),
data,
time.time(),
time.time()
)
Then to retrieve data I use a similar query:
query = cursor.execute('SELECT added_id, id, data, create_date, update_date ' \
'FROM repositories WHERE id = %s',
binascii.a2b_hex(row_id)
)
Then the query returns an empty result.
Any help would be appreciated. Also, as an aside, is it better to store unix epoch dates as integers or TIMESTAMP?
NOTE: I am not having problems inserting the data, just trying to retrieve it from the database. The row exists when I check via mysqlclient.
Thanks Alot!#
One tip: you should be able to call uuid.uuid4().bytes to get the raw
bytes. As for timestamps, if you want to perform time/date manipulation
in SQL it's often easier to deal with real TIMESTAMP types.
I created a test table to try to reproduce what you're seeing:
CREATE TABLE xyz (
added_id INT AUTO_INCREMENT NOT NULL,
id BINARY(16) NOT NULL,
PRIMARY KEY (added_id),
UNIQUE (id)
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=InnoDB;
My script is able to insert and query for the rows using the binary field as a
key without problem. Perhaps you are incorrectly fetching / iterating over the
results returned by the cursor?
import binascii
import MySQLdb
import uuid
conn = MySQLdb.connect(host='localhost')
key = uuid.uuid4()
print 'inserting', repr(key.bytes)
r = conn.cursor()
r.execute('INSERT INTO xyz (id) VALUES (%s)', key.bytes)
conn.commit()
print 'selecting', repr(key.bytes)
r.execute('SELECT added_id, id FROM xyz WHERE id = %s', key.bytes)
for row in r.fetchall():
print row[0], binascii.b2a_hex(row[1])
Output:
% python qu.py
inserting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
selecting '\x96\xc5\xa4\xc3Z+L\xf0\x86\x1e\x05\xebt\xf7\\\xd5'
1 96c5a4c35a2b4cf0861e05eb74f75cd5
% python qu.py
inserting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
selecting '\xac\xc9,jn\xb2O#\xbb\xa27h\xcd<B\xda'
2 acc92c6a6eb24f40bba23768cd3c42da
To supplement existing answers, there's also an issue with the following warning when dealing with binary strings in queries:
Warning: (1300, "Invalid utf8 character string: 'ABCDEF'")
It is reproduced by the following:
cursor.execute('''
CREATE TABLE `table`(
bin_field` BINARY(16) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
''')
bin_value = uuid.uuid4().bytes
cursor.execute('INSERT INTO `table`(bin_field) VALUES(%s)', (bin_value,))
Whenever MySQL sees that a string literal in a query isn't valid against current character_set_connection it will emit the warning. There are several solutions to it:
Explicitly set _binary charset literal
INSERT INTO `table`(bin_field) VALUES(_binary %s)
Manually construct queries with hexadecimal literals
INSERT INTO `table`(bin_field) VALUES(x'abcdef')
Change connection charset if you're only working with binary strings
For more details see MySQL Bug 79317.
Update
As #charlax pointed out, there's binary_prefix flag which can be passed to the connection's initialiser to automatically prepend _binary prefix when interpolating arguments. It's supported by recent versions of both, mysql-client and pymysql.

Is there are standard way to store a database schema outside a python app

I am working on a small database application in Python (currently targeting 2.5 and 2.6) using sqlite3.
It would be helpful to be able to provide a series of functions that could setup the database and validate that it matches the current schema. Before I reinvent the wheel, I thought I'd look around for libraries that would provide something similar. I'd love to have something akin to RoR's migrations. xml2ddl doesn't appear to be meant as a library (although it could be used that way), and more importantly doesn't support sqlite3. I'm also worried about the need to move to Python 3 one day given the lack of recent attention to xml2ddl.
Are there other tools around that people are using to handle this?
You can find the schema of a sqlite3 table this way:
import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table foo (bar integer, baz timestamp)')
c.execute("select sql from sqlite_master where type = 'table' and name = 'foo'")
r=c.fetchone()
print(r)
# (u'CREATE TABLE foo (bar integer, baz timestamp)',)
Take a look at SQLAlchemy migrate. I see no problem using it as migration tool only, but comparing of configuration to current database state is experimental yet.
I use this to keep schemas in sync.
Keep in mind that it adds a metadata table to keep track of the versions.
South is the closest I know to RoR migrations. But just as you need Rails for those migrations, you need django to use south.
Not sure if it is standard but I just saved all my schema queries in a txt file like so (tables_creation.txt):
CREATE TABLE "Jobs" (
"Salary" TEXT,
"NumEmployees" TEXT,
"Location" TEXT,
"Description" TEXT,
"AppSubmitted" INTEGER,
"JobID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("JobID")
);
CREATE TABLE "Questions" (
"Question" TEXT NOT NULL,
"QuestionID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("QuestionID" AUTOINCREMENT)
);
CREATE TABLE "FreeResponseQuestions" (
"Answer" TEXT,
"FreeResponseQuestionID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("FreeResponseQuestionID"),
FOREIGN KEY("FreeResponseQuestionID") REFERENCES "Questions"("QuestionID")
);
...
Then I used this function taking advantage of the fact that I made each query delimited by two newline characters:
def create_db_schema(self):
db_schema = open("./tables_creation.txt", "r")
sql_qs = db_schema.read().split('\n\n')
c = self.conn.cursor()
for sql_q in sql_qs:
c.execute(sql_q)

Categories

Resources