I am trying to insert values to a SQL DB where I pull data from a dictionary. I ran into a problem when my program tries to enter 0xqb_QWQDrabGr7FTBREfhCLMZLw4ztx into a column named VersionId. The following is my sample code and error.
cursor.execute("""insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[Size(Bytes)],[VersionID]) values (%s,%s,%s,%s,%s)""",(item['Key'],item['IsLatest'],item['LastModified'],item['Size'],item['VersionId']))
conn_db.commit()
pymssql.ProgrammingError: (102, "Incorrect syntax near 'qb_QWQDrabGr7FTBREfhCLMZLw4ztx'.DB-Lib error message 20018, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n")
Based on the error I assume SQL does not like the 0x in the beginning of the VersionId string because of security issues. If my assumption is correct, what are my options? I also cannot change the value of the VersionId.
Edit: This what I get when I print that cursor command
insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[Size(Bytes)],[VersionID]) values (Docs/F1/Trades/Buy/Person1/Seller_Provided_-_Raw_Data/GTF/PDF/GTF's_v2/NID3154229_23351201.pdf,True,2015-07-22 22:05:38+00:00,753854,0xqb_QWQDrabGr7FTBREfhCLMZLw4ztx)
Edit 2: The odd thing is that when I try to enter the insert command manually on SQL management studio, it doesn't like the (') in the path name in the first parameter, so I escaped the character, added (') to each values except the number and the command worked. At this point I am pretty stumped on why the insert is not working.
Edit 3: I decided to do a try except on every insert and I see that the ones that VersionIds that get caught have the pattern 0x..... Again, does anyone know if my assumption of security correct?
I guess that's what happens when our libraries try to be smarter than us...
No SQL server around to test, but I assume the reason the 0x values are failing is because the way pymssql passes the parameter causes the server to interprete this as a hexadecimal string and the 'q' following the '0x' does not fit its expectations of 0-9 and A-F chars.
I don't have enough information to know if this is a library bug and/or if it can be worked around; the pymssql documentation is not very extensive, but I would try the following:
if you can, check in MSSQL Profiler what command is actually coming in
build your own command as a string and see if the error persists (remember Bobby Tables before putting that in production, though: https://xkcd.com/327/)
try to work around it by adding quotes etc
swith to another library / use SQLAlchemy
Related
I have a Python script, that's using PyMySQL to connect to a MySQL database, and insert rows in there. Some of the columns in the database table are of type json.
I know that in order to insert a json, we can run something like:
my_json = {"key" : "value"}
cursor = connection.cursor()
cursor.execute(insert_query)
"""INSERT INTO my_table (my_json_column) VALUES ('%s')""" % (json.dumps(my_json))
connection.commit()
The problem in my case is that the json is variable over which I do not have much control (it's coming from an API call to a third party endpoint), so my script keeps throwing new error for non-valid json variables.
For example, the json could very well contain a stringified json as a value, so my_json would look like:
{"key": "{\"key_str\":\"val_str\"}"}
→ In this case, running the usual insert script would throw a [ERROR] OperationalError: (3140, 'Invalid JSON text: "Missing a comma or \'}\' after an object member." at position 1234 in value for column \'my_table.my_json_column\'.')
Or another example are json variables that contain a single quotation mark in some of the values, something like:
{"key" : "Here goes my value with a ' quotation mark"}
→ In this case, the usual insert script returns an error similar to the below one, unless I manually escape those single quotation marks in the script by replacing them.
[ERROR] ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'key': 'Here goes my value with a ' quotation mark' at line 1")
So my question is the following:
Are there any best practices that I might be missing on, and that I can use in order to avoid my script breaking, in the 2 scenarios mentioned above, but also in any other potential examples of jsons that might break the insert query ?
I read some existing posts like this one here or this one, where it's recommended to insert the json into a string or a blob column, but I'm not sure if that's a good practice / if other issues (like string length limitations for example) might arise from using a string column instead of json.
Thanks !
I am trying to insert data into a postgres table using variables. Having looked at other answers on this topic it seemed pretty straightforward, however I am getting a syntax error from python before I even get a chance to insert it into the database
The execute statement I am using:
cur.execute("""INSERT INTO "public"."catalogue_product" ("id","structure","upc","title","slug","description","rating","date_created","date_updated","is_discountable","parent_id","product_class_id","collection_id","multiplier","dimension","feat1","feat10","feat2","feat3","feat4","feat5","feat6","feat7","feat8","feat9","image_url","price","short_name","sku") VALUES (nextval'catalogue_product_id_seq'::regclass),'standalone',NULL,%s,'noslug',%s,NULL,current_timestamp,current_timestamp,TRUE,NULL,NULL,NULL,'2.2',%s,'','','','','','','','','','',%s,%s,%s,%s)""", (name, desc, dimension, imageurl, price, shortname, sku))
All parenthesis and quotes match as they should as far as I can see.
What could be causing this?
edit: As per an answer below, I switched to using tripple quotes (and edited the code above to reflect) which does seem to help, but I still get an error:
psycopg2.ProgrammingError: syntax error at or near "'standalone'"
LINE 1: ...ES (nextval'catalogue_product_id_seq'::regclass), 'standalon...
You have an extra ) that closes the VALUES list too early:
... ::regclass),'standalone' ...
THIS^
I'm getting data loss when doing a csv import using the Python MySQLdb module. The crazy thing is that I can load the exact same csv using other MySQL clients and it works fine.
It works perfectly fine when running the exact same command with the exact same csv from sequel pro mysql client
It works perfectly fine when running the exact same command with the exact same csv from the mysql command line
It doesn't work (some rows truncated) when loading through python script using mysqldb module.
It's truncating about 10 rows off of my 7019 row csv.
The command I'm calling:
LOAD DATA LOCAL INFILE '/path/to/load.txt' REPLACE INTO TABLE tble_name FIELDS TERMINATED BY ","
When the above command is ran using the native mysql client on linux or sequel pro mysql client on mac it works fine and I get 7019 rows imported.
When the above command is ran using Python's MySQLdb module such as:
dest_cursor.execute( '''LOAD DATA LOCAL INFILE '/path/to/load.txt' REPLACE INTO TABLE tble_name FIELDS TERMINATED BY ","''' )
dest_db.commit()
Most all rows are imported but I get thrown out a slew of
Warning: (1265L, "Data truncated for column '<various_column_names' at row <various_rows>")
When the warnings pop up, it states at row <row_num> but I'm not seeing that correlate to the row in the csv (I think it's the row it's trying to create on the target table, not the row in the csv) so I can't use that to help troubleshoot.
And sure enough, when it's done, my target table is missing some rows.
Unfortunately with over 7,000 rows in the csv it's hard to tell exactly which line it's choking on for further analysis. When the warnings pop up, it states at row <row_num> but I'm not seeing that correlate to the row in the csv (I think it's the row it's trying to create on the target table, not the row in the csv) so I can't use that to help troubleshoot.
There are many rows that are null and/or empty spaces but they are importing fine.
The fact that I can import the entire csv using other MySQL clients makes me feel that the MySQLdb module is not configured right or something.
This is Python 2.7
Any help is appreciated. Any ideas on how to get better visibility into which line it's choking up on would be helpful.
To Further help I would ask you the following.
Error Checking
After your import using any of your three ways, are there any results from running this after each run? SELECT ##GLOBAL.SQL_WARNINGS; (if so this should show you the errors, as it might be silently failing.)
What is your SQL_MODE? SELECT ##GLOBAL.SQL_MODE;
Check the file and make sure you have an even number of "'s for one.
Check the data for extra " or ,'s or anything that may get caught in translation of bash/python/mysql?
Data Request
Can you provide the data for the 1st row that was missing?
Can you provide the exact script you are using?
Versions
You said your using python 2.7
What version of mysql client? SELECT ##GLOBAL.VERSION;
What version of MySQLdb?
Internationalization
Are you dealing with internationalization (汉语 Hànyǔ or русский etc. languages)?
What is the database/schema collation?
Query:
SELECT DISTINCT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE (
SCHEMA_NAME <> 'sys' AND
SCHEMA_NAME <> 'mysql' AND
SCHEMA_NAME <> 'information_schema' AND
SCHEMA_NAME <> '.mysqlworkbench' AND
SCHEMA_NAME <> 'performance_schema'
);
What is the Table collation?
Query:
SELECT DISTINCT ENGINE, TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES
WHERE (
TABLE_SCHEMA <> 'sys' AND
TABLE_SCHEMA <> 'mysql' AND
TABLE_SCHEMA <> 'information_schema' AND
TABLE_SCHEMA <> '.mysqlworkbench' AND
TABLE_SCHEMA <> 'performance_schema'
);
What is the column collation?
Query:
SELECT DISTINCT CHARACTER_SET_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE (
TABLE_SCHEMA <> 'sys' AND
TABLE_SCHEMA <> 'mysql' AND
TABLE_SCHEMA <> 'information_schema' AND
TABLE_SCHEMA <> '.mysqlworkbench' AND
TABLE_SCHEMA <> 'performance_schema'
);
Lastly
Check the Database
For connection collation/character_set
SHOW VARIABLES
WHERE VARIABLE_NAME LIKE 'CHARACTER\_SET\_%' OR
VARIABLE_NAME LIKE 'COLLATION%';
If the first two ways work without error then I'm leaning toward:
Other Plausible Concerns
I am not ruling out problems with any of the following:
possible python connection configuriation issues around
python to db connection collation
default connection timeout
default character set error
python/bash runtime interpolation of symbols causing a random hidden gem
db collation not set to handle foreign languages
exceeding the MAX(field values)
hidden or unicode characters
emoji processing
issues with the data as i mentioned above with Double-Quotes, Commas, and I forgot to mention about NewLines for Windows or Linux (Carriage return or NewLine)
All in all there is a lot to look at and require more information to further assist.
Please update your question when you have more information and I will do the same for my answer to help you resolve your error.
Hope this helps and all goes well!
Update:
Your Error
Warning: (1265L, "Data truncated for column
Leads me to believe it is the Double-Quote around your "field terminations" Check to make sure your data does NOT have commas inside of the errored out fields. This will cause your data to shift when running command-line. As the gui is "Smart-ENOUGH" per say to deal with this. but the command-line is literal!
This is an embarrassing one but maybe I can help someone in the future making horrible mistakes like I have.
I spent a lot of time analyzing fields, checking for special characters, etc and it turned out I was simply causing the problem myself.
I had spaces in the csv, and NOT using a forced ENCLOSED BY in the load statement. This means I was adding a space character to some fields thus causing an overflow. So the data looked like value1, value2, value3 when it should have been value1,value2,value3. Removing those spaces, putting quotes around the fields and enforcing ENCLOSED BY in my statement fixed this.
I assume that the clients that were working were sanitizing the data behind the scenes or something. I really don't know for sure why it was working elsewhere using the same csv but that got me through the first set of hurdles.
Then after getting through that, the last line in the csv was choking and it was stating Row doesn't contain data for all columns - turns out I didn't close() the file after creating it before attempting to load it. So there was some sort of lock on the file. Once I added the close() statement and fixed the spacing issue, all the data is loading now.
Sorry for anyone that spent any measure of time looking into this issue for me.
I have to delete some dates from mysql by python.
I have tables over 2000. so, I need to finish this code... I can't handle this much by clicking my mouse. I really need help.
well, my guess was like this
sql ="delete from finance.%s where date='2000-01-10'"
def Del():
for i in range(0,len(data_s)):
curs.execute(sql,(data_s[i]))
conn.commit()
Howerver, it doesn't work.
I just though
when I just type like this , it works.
>>> query="delete from a000020 where date ='2000-01-25'"
>>> curs.execute(query) //curs=conn.cursor()
But if I add %s to the syntax, it doesn't work..
>>> table='a000050'
>>> query="delete from %s where date ='2000-01-25'"
>>> curs.execute(query,table)
ProgrammingError: (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''a000050' where date ='2000-01-25'' at line 1")
it doesn't work too.
>>> curs.execute(query,(table))
ProgrammingError: (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''a000050' where date ='2000-01-25'' at line 1")
a bit different... but same.
>>> curs.execute(query,(table,))
I have read many questions from here, but by just adding () or , it doesn't fixed...
Because I'm beginner for the python and mysql, I really need your help. Thank you for reading.
I had the same issue and I fixed by appending as:
def Del():
for i in range(0,len(data_s)):
x = "delete from finance." + data_s[i] + "where date='2000-01-10'"
print x # to check the sql statement :)
curs.execute(x)
conn.commit()
Good question,have a look at MySQLdb User's Guide
paramstyle
String constant stating the type of parameter marker formatting
expected by the interface. Set to 'format' = ANSI C printf format
codes, e.g. '...WHERE name=%s'. If a mapping object is used for
conn.execute(), then the interface actually uses 'pyformat' = Python
extended format codes, e.g. '...WHERE name=%(name)s'. However, the API
does not presently allow the specification of more than one style in
paramstyle.
Note that any literal percent signs in the query string passed to execute() must be escaped, i.e. %%.
Parameter placeholders can only be used to insert column values. They
can not be used for other parts of SQL, such as table names,
statements, etc.
Hope this helps.
I am trying to find out what wrong in below query but unable to do so ,
connect.execute("""INSERT INTO dummy('disk_list_disk_serial_id','disk_list_disk_size','disk_list_service_vm_id','disk_list_disk_id','disk_list_storage_tier','disk_list_statfs_disk_size','storage_pool_id') VALUES ('%s','%s','%s','%s','%s','%s','%s') """, (disk_serial_id,disk_size,service_vm_id,entity_id,storage_tier,statfs_disk_size,disk_storage_id))
When I am executing I am getting an error
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''disk_list_disk_serial_id','disk_list_disk_size','disk_list_service_vm_id','disk' at line 1")
I checked after service_vm_id but could not find anything wrong.Please help me out why I am not able to run it .
You should not quote the field names.
Additionally, you should not the placeholder values either. The DB API does that for you when it inserts the actual values, depending on whether they are strings or not.
connect.execute("""INSERT INTO dummy (disk_list_disk_serial_id,disk_list_disk_size,disk_list_service_vm_id, ...) VALUES (%s,%s,%s,%s,%s,%s,%s) """, (...))
Umm... did you seen your query properly, you have quoted your column names using ' with that it's no more a column name rather a string literal
your query
INSERT INTO dummy('disk_list_disk_serial_id', ...
should be
INSERT INTO dummy(disk_list_disk_serial_id, ....
Always a better way is to use a variable to create a long/complex query.
It would be better readable
query = """
INSERT INTO dummy(disk_list_disk_serial_id,disk_list_disk_size,disk_list_service_vm_id,disk_list_disk_id,disk_list_storage_tier,disk_list_statfs_disk_size,storage_pool_id) VALUES ('%s','%s','%s','%s','%s','%s','%s')
""" % (disk_serial_id,disk_size,service_vm_id,entity_id,storage_tier,statfs_disk_size,disk_storage_id)
connect.execute(query)
ALso you don't needs quotes for %s if the column type is int. I hope this helps
You don't need quotes around the column names.
Edit: I didn't take a look at the tags and made a remark about the syntax that was specific to sqlite, thanks for Daniel Roseman for pointing that out. This is actually the correct way to do parameter substitution in mySQL.