Table name with special character problem - python

I have a .csv file containing data of all stocks. I wanted to create a table for each stock with the table name as the stock's symbol, in a MySQL database daily_stock_recorder.
There are about 1900 items and obviously it wouldnt be feasible to write them one by one, so I created this python program, which takes the stock_symbol of each row and makes a table for it. Here's the code:
import mysql.connector as mc
import pandas as pd
mycon=mc.connect(host='localhost',user='root',passwd='1234',database='daily_stock_recorder')
cursor=mycon.cursor()
stock_df = pd.read_csv(r'C:\Users\Tirth\Desktop\pyprojects\all_stocks.csv')
for i in range(0, len(stock_df)):
stock=stock_df['SYMBOL'][i]
for j in range(0,len(stock)):
if ord(stock[j])==38:
stock_ticker=stock.replace(stock[j],'')
else:
stock_ticker=stock.replace(stock[j],stock[j])
todo="create table %s (close_price decimal(10,8) not null, traded_value bigint not null,
traded_quantity bigint not null, date_recorded date not null unique)"%(stock_ticker)
cursor.execute(todo)
mycon.commit()
As you can see I have used a 'if' statement to replace the character '&' (ASCII Value = 38) in the symbol with '' (nothing). This is because every time I run the code, it is able to create tables successfully but then gives the error when it comes to the symbol 'COX&KINGS'. I am assuming that Mysql doesnt take table names with special characters.
But even after executing with the 'if' statement for replacing the special character, I get the same name without replacement of '&' and it lands on the same error.
Can You please point out what I am doing wrong?
By the way, this is the error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '&KINGS (close_price decimal(10,8) not null)' at line 1
Thanks!

If you take a small segment of your code out and run it, you'll see it doesn't do what you're expecting:
stock='A&B'
for j in range(0,len(stock)):
if ord(stock[j])==38:
stock_ticker=stock.replace(stock[j],'')
else:
stock_ticker=stock.replace(stock[j],stock[j])
print("stock_ticker is now " + stock_ticker)
This will output:
stock_ticker is now A&B
stock_ticker is now AB
stock_ticker is now A&B
As you're using the source value with each letter in itself, so it will only have the desired impact if the ticker ends with '&`
It would be much better to replace this with:
stock='A&B'
stock_ticker=stock.replace('&','')
print(stock_ticker)
That said, I would highly recommend looking into database normalization techniques. Having lots of tables named after stock tickers is bound to cause lots of problems. What if you want to track a stock that also happens to be a reserved keyword in SQL? What happens if there's a ticker A&B and AB for different stocks. It's much better to store this sort of data in one table.

The whole inner for j ... loop is unnecessary.
Replace it by
stock_ticker=stock.replace("&", "")
In the inner "j" - loop for each character stock_ticker is overwritten. If the last character isn't a & then it is overwritten by the unmodified content of stock by the
stock_ticker=stock.replace(stock[j],stock[j])

Related

Python string formatting for an SQL insert statement with unknown variables

Although there are various similar questions around this topic, I can't find one that answers my problem.
I am using pscopg to build and insert data into a postgresql database, where the insert statement exists inside a for loop, and therefore will be different each iteration.
insert_string = sql.SQL(
"INSERT INTO {region}(id, price, house_type) VALUES ({id}, {price}, {house_type})").format(
region=sql.Literal(region),
id=sql.Literal(str(id)),
price=sql.Literal(price),
house_type=sql.Literal(house_type))
cur.execute(insert_string)
The variables region, id, price, house_type are all defined somewhere else inside said for loop.
The error I'm getting is as follows:
psycopg2.errors.SyntaxError: syntax error at or near "'Gorton'"
LINE 1: INSERT INTO 'Gorton'(id, price, house_typ...
^
where 'Gorton' is the value of the variable 'region' at that particular iteration.
From what I can see psycopg seems to be struggling with the apostrophe around Gorton, calling it a syntax error.
I have read the docs and can't figure out if sql.Literal is the correct choice here, or if I should use sql.Identifier instead?
Thanks in advance

Non-integer constant in GROUP BY

I have the following line of code that is supposed to build a Pandas DataFrame from a SQL query:
query_epd = pandas.read_sql_query("SELECT 'Department', COUNT('LastName') FROM thestaff.employees GROUP BY 'Department'", engine)
Yet when I run my code this line gives me the error:
SyntaxError: non-integer constant in GROUP BY
LINE 1: ...OUNT('LastName') FROM thestaff.employees GROUP BY 'Departmen...
^
I don't see where or how I am using constants, integer or not, and this is a very standard query for me on MSSQL, but running under PostgreSQL and Pandas this query is not valid. What is wrong with my query?
The single quotes around the identifiers turn them to literal strings, which is probably not what you want. You should write this query as:
SELECT department, COUNT(*) no_emp
FROM thestaff.employees
GROUP BY department
If your identifiers are case-sensitive, then you need to surround them with double quotes (this is the SQL standard, which Postgres complies to).
Note that I changed COUNT(lastname) to COUNT(*): unless you have null values in the lastname column, this is equivalent, and more efficient. I also gave an alias to this column in the resultset.
This link might be helpful Non-integer constants in the ORDER BY clause they explain what this error is and when it occurs

Select Query: Error Code 1292 - Truncated incorrect DOUBLE value

I'm attempting to retrieve all the data from a column in mysql by having the user input which table and column the data is through the mysqlconnector library in python. When I ran the query through python no data would show up and then when I ran it through Phpmyadmin I would get these errors:
Warning: #1292 Truncated incorrect DOUBLE value: 'Matisse'
Warning: #1292 Truncated incorrect DOUBLE value: 'Picasso'
Warning: #1292 Truncated incorrect DOUBLE value: 'van Gogh'
Warning: #1292 Truncated incorrect DOUBLE value: 'Deli'
I found the query only works for columns that are integer based and does not work for date-time or varchar columns (The L_Name one from which the query doesn't work is varchar(25).
Here is the query:
SELECT * FROM `artist` WHERE L_Name
After the query is run and throws those errors, the query changes to this by itself:
SELECT * FROM `artist` WHERE 1
This new query returns the whole table and all of its columns and rows but of course all I want is for it to simply return the single column.
EDIT: To clarify, the point of running the SELECT * FROM `artist` WHERE L_Name
query is to bring up the whole list of values in that column for that table. This is just one case and there's many other cases like if the user wanted to search up a specific record from the art_show table and then look at all the values in the column of the gallery location.
I don't think its the error thats the problem since you did varchar, maybe check your python code?
Figured out the solution. My Python had an issue where it would only print the first value due to how I set up the print statement. Fixed it by changing the query on the 3rd line and also changing the print statement from print(rec[0]) to print(rec)
def show_entries(table, column):
print(f"Here's the records in the table {table} and the column {column}.")
mycursor.execute(f"SELECT {column} FROM {table}")
myresult = mycursor.fetchall()
for rec in myresult:
print(rec)

Solving 'Unrecognized Token' Error While Using SQLite Insert Command

I keep getting an OperationalError: Unrecognized Token. The error hapens when I'm attempting to insert data into my SQLite database using an SQLite Insert command. What do I need to do to correct this error or is there a better way I should go about inserting data into my database? The data is water level data measured in meters above chart datum and is gathered from water level gauge data loggers throughout the Great Lakes region of Canada and the US. The script uses the Pandas library and is hardcoded to merge data from water level gauging stations that are located in close proximity to each other. I'd like to use the insert command so I can deal with overlapping data when adding future data to the database. I won't even begin to pretend I know what I'm talking about with databases and programming so any help would be appreciated in how I can solve this error!
I've tried altering my script in the parameterized query to try and solve the problem without any luck as my research has said this is the likely culprit
# Tecumseh. Merges station in steps due to inability of operation to merge all stations at once. Starts by merging PCWL station to hydromet station followed by remaining PCWL station and 3 minute time series
final11975 = pd.merge(hydrometDF["Station11975"], pcwlDF["station11995"], how='outer', left_index=True,right_index=True)
final11975 = pd.merge(final11975, pcwlDF["station11965"], how='outer', left_index=True,right_index=True)
final11975 = pd.merge(final11975, cts, how='outer', left_index=True,right_index=True)
final11975.to_excel("C:/Users/Andrew/Documents/CHS/SeasonalGaugeAnalysis_v2/SeasonalGaugeAnalysis/Output/11975_Tecumseh.xlsx")
print "-------------------------------"
print "11975 - Tecumseh"
print(final11975.info())
final11975.index = final11975.index.astype(str)
#final11975.to_sql('11975_Tecumseh', conn, if_exists='replace', index=True)
#Insert and Ignore data into database to eliminate overlaps
testvalues = (final11975.index, final11975.iloc[:,0], final11975.iloc[:,1], final11975.iloc[:,2])
c.execute("INSERT OR IGNORE INTO 11975_Tecumseh(index,11975_VegaRadar(m),11995.11965), testvalues")
conn.commit()
I'd like the data to insert into the database using the Insert And Ignore command as data is often overlapping when its downloaded. I'm new to databases but I'm under the impression that the Insert and Ignore command will illiminate overlapping data. The message I receive when running my script is:
</> <Exception has occurred: OperationalError
unrecognized token: "11975_Tecumseh"
File "C:\Users\Documents\CHS\SeasonalGaugeAnalysis_v2\SeasonalGaugeAnalysis\Script\CombineStations.py", line 43, in <module>>
c.execute("INSERT OR IGNORE INTO 11975_Tecumseh(index,11975_VegaRadar(m),11995.11965), testvalues") </>
As per SQL Standards, You can create table or column name such as "11975_Tecumseh" and also Tecumseh_11975, but cannot create table or column name begin with numeric without use of double quotes.
c.execute("INSERT OR IGNORE INTO '11975_Tecumseh'(index,'11975_VegaRadar(m)',11995.11965), testvalues")
The error you are getting is because the table name 11975_Tecumseh is invalid as it stands as it is not suitably enclosed.
If you want to use a keyword as a name, you need to quote it. There
are four ways of quoting keywords in SQLite:
'keyword' A keyword in single quotes is a string literal.
"keyword" A keyword in double-quotes is an identifier. [keyword] A
keyword enclosed in square brackets is an identifier.
This is not
standard SQL. This quoting mechanism is used by MS Access and SQL
Server and is included in SQLite for compatibility. keyword A
keyword enclosed in grave accents (ASCII code 96) is an identifier.
This is not standard SQL. This quoting mechanism is used by MySQL and
is included in SQLite for compatibility. For resilience when
confronted with historical SQL statements, SQLite will sometimes bend
the quoting rules above:
If a keyword in single quotes (ex: 'key' or 'glob') is used in a
context where an identifier is allowed but where a string literal is
not allowed, then the token is understood to be an identifier instead
of a string literal.
If a keyword in double quotes (ex: "key" or "glob") is used in a
context where it cannot be resolved to an identifier but where a
string literal is allowed, then the token is understood to be a string
literal instead of an identifier.
Programmers are cautioned not to use the two exceptions described in
the previous bullets. We emphasize that they exist only so that old
and ill-formed SQL statements will run correctly. Future versions of
SQLite might raise errors instead of accepting the malformed
statements covered by the exceptions above.
SQL As Understood By SQLite - SQLite Keywords
The above is applied to invalid names, which includes names that start with numbers and names that include a non numeric inside parenthesises.
If 11975_Tecumseh is the actual table name then it must be enclosed e.g. [11975_Tecumseh]
Likewise the columns
index
11975_VegaRadar(m)
and 11995.11965
Also have to be suitably enclosed.
Doing so you'd end up with
"INSERT OR IGNORE INTO [11975_Tecumseh]([index],[11975_VegaRadar(m)],[11995.11965]), testvalues"
The the issues is that ,testvalues is syntactically incorrect. after the columns to insert into i.e. ([index],[11975_VegaRadar(m)],[11995.11965]) the keyword VALUES with the three values should be used.
An example of a valid statement is :
"INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES('value1','value2','value3')"
As such
c.execute("INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES('value1','value2','value3')")
would insert a new row (unless a constrain conflict occurred)
However, I suspect that you want to insert values according to variables in which case you could use:
"INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES(?,?,?)"
the question marks being place-holders/bind values
SQL As Understood By SQLite- INSERT
The above would then be invoked using :
c.execute("INSERT INTO [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965]) VALUES(?,?,?)",testvalues);
#Working Example :
import sqlite3
drop_sql = "DROP TABLE IF EXISTS [11975_Tecumseh]"
crt_sql = "CREATE TABLE IF NOT EXISTS [11975_Tecumseh] ([index],[11975_VegaRadar(m)],[11995.11965])"
testvalues = ("X","Y","Z")
c = sqlite3.connect("test.db")
c.execute(drop_sql)
c.execute(crt_sql)
insert_sql1 = "INSERT INTO [11975_Tecumseh] " \
"([index],[11975_VegaRadar(m)],[11995.11965]) " \
"VALUES('value1','value2','value3')"
c.execute(insert_sql1)
insert_sql2 = "INSERT OR IGNORE INTO '11975_Tecumseh'" \
"('index','11975_VegaRadar(m)',[11995.11965])" \
" VALUES(?,?,?)"
c.execute(insert_sql2,(testvalues))
cursor = c.cursor()
cursor.execute("SELECT * FROM [11975_Tecumseh]")
for row in cursor:
print(row[0], "\n" + row[1], "\n" + row[2])
c.commit()
cursor.close()
c.close()
#Result
##Row 1
value1
value2
value3
##Row 2
X
Y
Z

Python 3 update sqlite column by an amount specified in a variable

Scenario: A quiz program with questions worth different amounts of points.
Sqlite database with a table Table1 with a field RunningTotal of type Int.
I'm looking to update the RunningTotal by the quantity 'updateby' passed to the function. This is a numerical value (but may be a string, so i'm converting it to integer to be sure.
tableid is used to identify which row to update.
eg (non-working code : error is that updateby is not a column name)
def UpdateRunningTotal(tableid,updateby)
updateby = int(updateby)
conn.execute("UPDATE Table1 RunningTotal=RunningTotal+updateby WHERE tableid=?", (tableid,))
I know if I put the following it works to increment the field by 1, but as a function i want more flexibility to increment by different amounts.
conn.execute("UPDATE Table1 RunningTotal=RunningTotal+1 WHERE tableid=?", (tableid,))
I'm trying to avoid doing a SELECT statement to read the current value of RunningTotal, do the math on that, and then use that result in the UPDATE statement...that seems inefficient to me (but may not be?)
conn.execute("UPDATE Table1 set RunningTotal=RunningTotal+? WHERE tableid=?", (updateby, tableid,))
use this statement ... i have checked.. its working fine its updting the previous qnty present in database by RunningTotal+updateby
hope your issue will be resolved

Categories

Resources