Trouble creating table in Python using MYSQL [duplicate]

Trouble creating table in Python using MYSQL [duplicate] - python

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm trying to execute a simple MySQL query as below:
INSERT INTO user_details (username, location, key)
VALUES ('Tim', 'Florida', 42)
But I'm getting the following error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'key) VALUES ('Tim', 'Florida', 42)' at line 1
How can I fix the issue?

The Problem
In MySQL, certain words like SELECT, INSERT, DELETE etc. are reserved words. Since they have a special meaning, MySQL treats it as a syntax error whenever you use them as a table name, column name, or other kind of identifier - unless you surround the identifier with backticks.
As noted in the official docs, in section 10.2 Schema Object Names (emphasis added):
Certain objects within MySQL, including database, table, index, column, alias, view, stored procedure, partition, tablespace, and other object names are known as identifiers.
...
If an identifier contains special characters or is a reserved word, you must quote it whenever you refer to it.
...
The identifier quote character is the backtick ("`"):
A complete list of keywords and reserved words can be found in section 10.3 Keywords and Reserved Words. In that page, words followed by "(R)" are reserved words. Some reserved words are listed below, including many that tend to cause this issue.
ADD
AND
BEFORE
BY
CALL
CASE
CONDITION
DELETE
DESC
DESCRIBE
FROM
GROUP
IN
INDEX
INSERT
INTERVAL
IS
KEY
LIKE
LIMIT
LONG
MATCH
NOT
OPTION
OR
ORDER
PARTITION
RANK
REFERENCES
SELECT
TABLE
TO
UPDATE
WHERE
The Solution
You have two options.
1. Don't use reserved words as identifiers
The simplest solution is simply to avoid using reserved words as identifiers. You can probably find another reasonable name for your column that is not a reserved word.
Doing this has a couple of advantages:
It eliminates the possibility that you or another developer using your database will accidentally write a syntax error due to forgetting - or not knowing - that a particular identifier is a reserved word. There are many reserved words in MySQL and most developers are unlikely to know all of them. By not using these words in the first place, you avoid leaving traps for yourself or future developers.
The means of quoting identifiers differs between SQL dialects. While MySQL uses backticks for quoting identifiers by default, ANSI-compliant SQL (and indeed MySQL in ANSI SQL mode, as noted here) uses double quotes for quoting identifiers. As such, queries that quote identifiers with backticks are less easily portable to other SQL dialects.
Purely for the sake of reducing the risk of future mistakes, this is usually a wiser course of action than backtick-quoting the identifier.
2. Use backticks
If renaming the table or column isn't possible, wrap the offending identifier in backticks (`) as described in the earlier quote from 10.2 Schema Object Names.
An example to demonstrate the usage (taken from 10.3 Keywords and Reserved Words):
mysql> CREATE TABLE interval (begin INT, end INT);
ERROR 1064 (42000): You have an error in your SQL syntax.
near 'interval (begin INT, end INT)'
mysql> CREATE TABLE `interval` (begin INT, end INT);
Query OK, 0 rows affected (0.01 sec)
Similarly, the query from the question can be fixed by wrapping the keyword key in backticks, as shown below:
INSERT INTO user_details (username, location, `key`)
VALUES ('Tim', 'Florida', 42)"; ^ ^

Related

psycopg2.errors.SyntaxError when instantiating a postgres database [duplicate]

It seems PostgreSQL does not allow to create a database table named 'user'. But MySQL will allow to create such a table.
Is that because it is a key word? But Hibernate cannot identify any issue (even if we set the PostgreSQLDialect).

user is a reserved word and it's usually not a good idea use reserved words for identifiers (tables, columns).
If you insist on doing that you have to put the table name in double quotes:
create table "user" (...);
But then you always need to use double quotes when referencing the table. Additionally the table name is then case-sensitive. "user" is a different table name than "User".
If you want to save yourself a lot of trouble use a different name. users, user_account, ...
More details on quoted identifiers can be found in the manual: http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

It is possible to specify tablename with JPA with next syntax:
#Table(name="\"user\"")

We had this same issue time ago, and we just changed the table name from user to app_user. Due to the use of Hibernate/JPA. We thought it would be easier this way.
Hope this little fix will help someone else.

You can create a table user in a schema other than public.
The example:
CREATE SCHEMA my_schema;
CREATE TABLE my_schema.user(...);

Trailing underscore
The SQL standard explicitly promises to never use a trailing underscore in any keyword or reserved word.
So, to avoid conflicts with any of the over a thousand keywords and reserved words used by various database engines, I name all my database identifiers with a trailing underscore. (Yes, really, over a thousand keywords reserved — I counted them.)
Change this:
CREATE TABLE user ( … ) ;
… to this:
CREATE TABLE user_ ( … ) ;
I do this as a habit for all database names: schemas, tables, columns, indexes, etc.
As an extra benefit, this practice makes quite clear in documentation, email, and such when referring to a programming language variable named user versus the database column user_. Anything with a trailing underscore is obviously from the database side.

sql data formatting and sql injections

I have a database with 2 tables: students, employees and I want to update one of those tables:
import sqlite3
db_file = "school.db"
def update_address(identifier, user_address, user_id):
with sqlite3.connect(db_file) as conn:
c = conn.cursor()
c.execute(f"""
UPDATE {identifier}
SET address = ?
WHERE id = ?;
""",
(user_address, user_id))
update_address("students", "204 Sycamore Street", 2)
The above code works, the problem is I know that using python string formatting in an sql operation can lead to vulnerabilities per sqlite3 docs:
Usually your SQL operations will need to use values from Python variables. You shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack (see https://xkcd.com/327/ for humorous example of what can go wrong).
Instead, use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method.
The placeholder '?' works when it comes to inserting values but not for sql identifiers. Output:
sqlite3.OperationalError: near "?": syntax error
So the question here is: can an sql injection occur if I use python string formatting on an sql identifier or does it only occur on values ?
If it also occurs on identifiers is there a way to format the string in a safe manner?

Yes, if you interpolate any content into an SQL query unsafely, it is an SQL injection vulnerability. It doesn't matter if the content is supposed to be used as a value in the SQL expression, or an identifier, SQL keyword, or anything else.
It's pretty common to format queries from fragments of SQL expressions, if you want to write a query with a variable set of conditions. These are also possible SQL injection risks.
The way to mitigate the SQL injection risk is: don't interpolate untrusted input into your SQL query.
For identifiers, you should make sure the content matches a legitimate name of a table (or column, or other element, if that's what you're trying to make dynamic). I.e. create an "allowlist" of tables known to exist in your database that are permitted to update using your function. If the input doesn't match one of these, then don't run the query.
It's also a good idea to use back-ticks to delimit identifiers, because if one of the table names happens to be a reserved keyword in SQLite, that will allow the table to be used in the SQL query.
if identifier not in ["table1", "table2", "table3"]:
raise Exception("Unknown table name: '{identifier}'")
c.execute(f"""
UPDATE `{identifier}`
SET address = ?
WHERE id = ?;
""",
(user_address, user_id))

Non-integer constant in GROUP BY

I have the following line of code that is supposed to build a Pandas DataFrame from a SQL query:
query_epd = pandas.read_sql_query("SELECT 'Department', COUNT('LastName') FROM thestaff.employees GROUP BY 'Department'", engine)
Yet when I run my code this line gives me the error:
SyntaxError: non-integer constant in GROUP BY
LINE 1: ...OUNT('LastName') FROM thestaff.employees GROUP BY 'Departmen...
^
I don't see where or how I am using constants, integer or not, and this is a very standard query for me on MSSQL, but running under PostgreSQL and Pandas this query is not valid. What is wrong with my query?

The single quotes around the identifiers turn them to literal strings, which is probably not what you want. You should write this query as:
SELECT department, COUNT(*) no_emp
FROM thestaff.employees
GROUP BY department
If your identifiers are case-sensitive, then you need to surround them with double quotes (this is the SQL standard, which Postgres complies to).
Note that I changed COUNT(lastname) to COUNT(*): unless you have null values in the lastname column, this is equivalent, and more efficient. I also gave an alias to this column in the resultset.

This link might be helpful Non-integer constants in the ORDER BY clause they explain what this error is and when it occurs

Solving 'Unrecognized Token' Error While Using SQLite Insert Command

I keep getting an OperationalError: Unrecognized Token. The error hapens when I'm attempting to insert data into my SQLite database using an SQLite Insert command. What do I need to do to correct this error or is there a better way I should go about inserting data into my database? The data is water level data measured in meters above chart datum and is gathered from water level gauge data loggers throughout the Great Lakes region of Canada and the US. The script uses the Pandas library and is hardcoded to merge data from water level gauging stations that are located in close proximity to each other. I'd like to use the insert command so I can deal with overlapping data when adding future data to the database. I won't even begin to pretend I know what I'm talking about with databases and programming so any help would be appreciated in how I can solve this error!
I've tried altering my script in the parameterized query to try and solve the problem without any luck as my research has said this is the likely culprit
# Tecumseh. Merges station in steps due to inability of operation to merge all stations at once. Starts by merging PCWL station to hydromet station followed by remaining PCWL station and 3 minute time series
final11975 = pd.merge(hydrometDF["Station11975"], pcwlDF["station11995"], how='outer', left_index=True,right_index=True)
final11975 = pd.merge(final11975, pcwlDF["station11965"], how='outer', left_index=True,right_index=True)
final11975 = pd.merge(final11975, cts, how='outer', left_index=True,right_index=True)
final11975.to_excel("C:/Users/Andrew/Documents/CHS/SeasonalGaugeAnalysis_v2/SeasonalGaugeAnalysis/Output/11975_Tecumseh.xlsx")
print "-------------------------------"
print "11975 - Tecumseh"
print(final11975.info())
final11975.index = final11975.index.astype(str)
#final11975.to_sql('11975_Tecumseh', conn, if_exists='replace', index=True)
#Insert and Ignore data into database to eliminate overlaps
testvalues = (final11975.index, final11975.iloc[:,0], final11975.iloc[:,1], final11975.iloc[:,2])
c.execute("INSERT OR IGNORE INTO 11975_Tecumseh(index,11975_VegaRadar(m),11995.11965), testvalues")
conn.commit()
I'd like the data to insert into the database using the Insert And Ignore command as data is often overlapping when its downloaded. I'm new to databases but I'm under the impression that the Insert and Ignore command will illiminate overlapping data. The message I receive when running my script is:
</> <Exception has occurred: OperationalError
unrecognized token: "11975_Tecumseh"
File "C:\Users\Documents\CHS\SeasonalGaugeAnalysis_v2\SeasonalGaugeAnalysis\Script\CombineStations.py", line 43, in <module>>
c.execute("INSERT OR IGNORE INTO 11975_Tecumseh(index,11975_VegaRadar(m),11995.11965), testvalues") </>

As per SQL Standards, You can create table or column name such as "11975_Tecumseh" and also Tecumseh_11975, but cannot create table or column name begin with numeric without use of double quotes.
c.execute("INSERT OR IGNORE INTO '11975_Tecumseh'(index,'11975_VegaRadar(m)',11995.11965), testvalues")

The error you are getting is because the table name 11975_Tecumseh is invalid as it stands as it is not suitably enclosed.
If you want to use a keyword as a name, you need to quote it. There
are four ways of quoting keywords in SQLite:
'keyword' A keyword in single quotes is a string literal.
"keyword" A keyword in double-quotes is an identifier. [keyword] A
keyword enclosed in square brackets is an identifier.
This is not
standard SQL. This quoting mechanism is used by MS Access and SQL
Server and is included in SQLite for compatibility. keyword A
keyword enclosed in grave accents (ASCII code 96) is an identifier.
This is not standard SQL. This quoting mechanism is used by MySQL and
is included in SQLite for compatibility. For resilience when
confronted with historical SQL statements, SQLite will sometimes bend
the quoting rules above:
If a keyword in single quotes (ex: 'key' or 'glob') is used in a
context where an identifier is allowed but where a string literal is
not allowed, then the token is understood to be an identifier instead
of a string literal.
If a keyword in double quotes (ex: "key" or "glob") is used in a
context where it cannot be resolved to an identifier but where a
string literal is allowed, then the token is understood to be a string
literal instead of an identifier.
Programmers are cautioned not to use the two exceptions described in
the previous bullets. We emphasize that they exist only so that old
and ill-formed SQL statements will run correctly. Future versions of
SQLite might raise errors instead of accepting the malformed
statements covered by the exceptions above.
SQL As Understood By SQLite - SQLite Keywords
The above is applied to invalid names, which includes names that start with numbers and names that include a non numeric inside parenthesises.
If 11975_Tecumseh is the actual table name then it must be enclosed e.g. [11975_Tecumseh]
Likewise the columns
index
11975_VegaRadar(m)
and 11995.11965
Also have to be suitably enclosed.
Doing so you'd end up with
"INSERT OR IGNORE INTO [11975_Tecumseh]([index],[11975_VegaRadar(m)],[11995.11965]), testvalues"
The the issues is that ,testvalues is syntactically incorrect. after the columns to insert into i.e. ([index],[11975_VegaRadar(m)],[11995.11965]) the keyword VALUES with the three values should be used.
An example of a valid statement is :
"INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES('value1','value2','value3')"
As such
c.execute("INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES('value1','value2','value3')")
would insert a new row (unless a constrain conflict occurred)
However, I suspect that you want to insert values according to variables in which case you could use:
"INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES(?,?,?)"
the question marks being place-holders/bind values
SQL As Understood By SQLite- INSERT
The above would then be invoked using :
c.execute("INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES(?,?,?)",testvalues);
#Working Example :
import sqlite3
drop_sql = "DROP TABLE IF EXISTS [11975_Tecumseh]"
crt_sql = "CREATE TABLE IF NOT EXISTS [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965])"
testvalues = ("X","Y","Z")
c = sqlite3.connect("test.db")
c.execute(drop_sql)
c.execute(crt_sql)
insert_sql1 = "INSERT INTO [11975_Tecumseh] " \
"([index],[11975_VegaRadar(m)],[11995.11965]) " \
"VALUES('value1','value2','value3')"
c.execute(insert_sql1)
insert_sql2 = "INSERT OR IGNORE INTO '11975_Tecumseh'" \
"('index','11975_VegaRadar(m)',[11995.11965])" \
" VALUES(?,?,?)"
c.execute(insert_sql2,(testvalues))
cursor = c.cursor()
cursor.execute("SELECT * FROM [11975_Tecumseh]")
for row in cursor:
print(row[0], "\n" + row[1], "\n" + row[2])
c.commit()
cursor.close()
c.close()
#Result
##Row 1
value1
value2
value3
##Row 2
X
Y
Z

Substituting column names in Python sqlite3 query [duplicate]

This question already has answers here:
How do you escape strings for SQLite table/column names in Python?
(8 answers)
Closed 7 years ago.
I have a wide table in a sqlite3 database, and I wish to dynamically query certain columns in a Python script. I know that it's bad to inject parameters by string concatenation, so I tried to use parameter substitution instead.
I find that, when I use parameter substitution to supply a column name, I get unexpected results. A minimal example:
import sqlite3 as lite
db = lite.connect("mre.sqlite")
c = db.cursor()
# Insert some dummy rows
c.execute("CREATE TABLE trouble (value real)")
c.execute("INSERT INTO trouble (value) VALUES (2)")
c.execute("INSERT INTO trouble (value) VALUES (4)")
db.commit()
for row in c.execute("SELECT AVG(value) FROM trouble"):
print row # Returns 3
for row in c.execute("SELECT AVG(:name) FROM trouble", {"name" : "value"}):
print row # Returns 0
db.close()
Is there a better way to accomplish this than simply injecting a column name into a string and running it?

As Rob just indicated in his comment, there was a related SO post that contains my answer. These substitution constructions are called "placeholders," which is why I did not find the answer on SO initially. There is no placeholder pattern for column names, because dynamically specifying columns is not a code safety issue:
It comes down to what "safe" means. The conventional wisdom is that
using normal python string manipulation to put values into your
queries is not "safe". This is because there are all sorts of things
that can go wrong if you do that, and such data very often comes from
the user and is not in your control. You need a 100% reliable way of
escaping these values properly so that a user cannot inject SQL in a
data value and have the database execute it. So the library writers do
this job; you never should.
If, however, you're writing generic helper code to operate on things
in databases, then these considerations don't apply as much. You are
implicitly giving anyone who can call such code access to everything
in the database; that's the point of the helper code. So now the
safety concern is making sure that user-generated data can never be
used in such code. This is a general security issue in coding, and is
just the same problem as blindly execing a user-input string. It's a
distinct issue from inserting values into your queries, because there
you want to be able to safely handle user-input data.
So, the solution is that there is no problem in the first place: inject the values using string formatting, be happy, and move on with your life.

Why not use string formatting?
for row in c.execute("SELECT AVG({name}) FROM trouble".format(**{"name" : "value"})):
print row # => (3.0,)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.