I am new to python and I don't really understand the sql thing that well. Currently on the 6th week of team treehouse so please bare with me here if these are noob questions.
Goal
Import CSV with stock_tickers and 5 other columns of data
Convert CSV into pandas dataframe
Import dataframe into database. If there is already the unique stock_ticker for it to not add a new row, but next to check if the data in the other 5 columns is different. If it is than update it.
Right now I can do steps #1 and #2 and half of #3. With the help on here was able to get the looping thing to work. If there is a new stock_ticker row in the csv it will add it to database. If the data changes for an existing stock_ticker it won't do any updates.
for i in range(len(df)):
try:
df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
pass
Current Code looks like this
import pandas as pd
from pandas import DataFrame
from pandas import ExcelWriter
import csv
import sqlite3
### IMPORT CSV ###
stock_csv_file = pd.read_csv (r'C:\Users\home\Desktop\code_projects\FIRE_Dashboard\db\alpha_vantage_active_stocks.csv')
### CHANGING INDEX NAMES FROM CSV TO TABLE NAMES ###
df = pd.DataFrame(stock_csv_file)
df = df.rename(columns = {"symbol":"stock_ticker", "name":"stock_name", "exchange":"stock_exchange", "ipoDate":"stock_ipoDate", "delistingDate":"stock_delistingDate", "status":"stock_status"})
### UPDATING DATABSE WITH SQLITE3###
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
insert_statement = """
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET (stock_status)"""
for i in range(len(df)):
values = tuple(df.iloc[i])
c.execute(insert_statement, values)
The error I am getting
Traceback (most recent call last):
File "update_stock_tickers.py", line 71, in <module>
c.execute(insert_statement, values)
sqlite3.OperationalError: incomplete input
Found these posts that talk about it, but still getting lost >.<
How to use variables in SQL statement in Python?
python Datetime and SQLite
Loop through individual rows and update those rows SQLite3 Python
Any help is much appreciated.
Code after solution
import pandas as pd
from pandas import DataFrame
from pandas import ExcelWriter
import csv
import sqlite3
### IMPORT CSV ###
stock_csv_file = pd.read_csv (r'C:\Users\home\Desktop\code_projects\FIRE_Dashboard\db\alpha_vantage_active_stocks.csv')
### CHANGING INDEX NAMES FROM CSV TO TABLE NAMES ###
df = pd.DataFrame(stock_csv_file)
df = df.rename(columns = {"symbol":"stock_ticker", "name":"stock_name", "exchange":"stock_exchange", "ipoDate":"stock_ipoDate", "delistingDate":"stock_delistingDate", "status":"stock_status"})
### UPDATING DATABSE WITH SQLITE3###
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
insert_statement = """
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET stock_status = EXCLUDED.stock_status"""
for i in range(len(df)):
values = tuple(df.iloc[i])
c.execute(insert_statement, values)
conn.commit()
This is the ON CONFLICT clause of your query:
ON CONFLICT (stock_ticker) DO UPDATE
SET (stock_status)
This is not valid SQLite syntax. If you want to update stock_status when another row already exists with the same stock_ticker, you can use pseudo-table EXCLUDED like so:
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET stock_status = EXCLUDED.status
Related
I want column values as list output to be
['Repair and Maintenance - General', 'Advance salary', 'EXIM Deposit', 'Office Cleaning Expenses']
but I'm able to obtain only this type of output
[('Repair and Maintenance - General',), ('Advance salary',), ('EXIM Deposit',), ('Office Cleaning Expenses',)]
with the following code, Can anyone please help me
import mysql
import mysql.connector
import pandas as pd
conn = mysql.connector.connect(host="localhost", user="root",
passwd="Abcd11",database="entry",auth_plugin="mysql_native_password")
query= "select ledger from l_db_name"
cur=conn.cursor()
cur.execute(query)
rows=cur.fetchall()
conn.commit()
print(rows);
extract the first element of each tuple with a list comprehension
rows = cur.fetchall()
rows = [row[0] for row in rows]
im doing scraping web and want to save my dataframe to my sql server that updates everymonth,what should i add to code for the data got replaced not added.im using pyodbc(i cant use sql achemy)thank you in advance.
col_names = ["month", "price", "change"]
df = pd.read_csv("minyak_brent.csv",sep=',',quotechar='\'',encoding='utf8', names=col_names,skiprows = 1) # Replace Excel_file_name with your excel sheet name
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.minyak_brent([month],[price],[change]) values (?,?)',
row['month'],
row['price'],
row['change'])
cnxn.commit()
cursor.close()
cnxn.close()
You should use df.to_sql here, which avoids the need to explicitly iterate over your Pandas data frame:
df.to_sql(con=engine, name='dbo.minyak_brent', if_exists='append', dtype={
'month': STRING(255),
'price': FLOAT,
'change': FLOAT, index=False
)
This assumes that the types of the month, price, and change columns are, String(255), FLOAT, and FLOAT, respectively.
If you must stick with your current approach, then fix your insert statement such that it has the right number of placeholders in the VALUES clause:
cursor.execute('''INSERT INTO dbo.minyak_brent([month], [price], [change])
VALUES (?, ?, ?)''', (row['month'], row['price'], row['change']))
Note that we insert a tuple as the second parameter to cursor#execute.
Experts,
I am struggling to find an efficient way to work with pandas and sqlite.
I am building a tool that let's users
extract part of a sql database (sub_table) based on some filters
change part of sub_table
upload changed sub_table back to
overall sql table replacing old values
Users will only see excel data (so I need to write back and forth to excel which is not part of my example as out of scope).
Users can
replace existing rows (entries) with new data
delete existing rows
add new rows
Question: how can I most efficiently do this "replace/delete/add" using Pandas / sqlite3?
Here is my example code. If I use df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace") at the bottom than obviously the entire table is replaced...so there must be another way I cannot think of.
import pandas as pd
import sqlite3
import numpy as np
#### SETTING EXAMPLE UP
### Create DataFrame
data = dict({"City": ["London","Frankfurt","Berlin","Paris","Brondby"],
"Population":[8,2,4,9,0.5]})
df = pd.DataFrame(data,index = pd.Index(np.arange(5)))
### Create SQL DataBase
conn = sqlite3.connect("MyDB.db")
### Upload DataFrame as Table into SQL Database
df.to_sql("MyTable", con = conn, index = False, if_exists="replace")
### Read DataFrame from SQL DB
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
#### CREATE SUB_TABLE AND AMEND
#### EXTRACT sub_table FROM SQL TABLE
query = "SELECT * from MyTable WHERE Population > 2"
df_sub = pd.read_sql_query(query, con = conn)
df_sub
#### Amend Sub DF
df_sub[df_sub["City"] == "London"] = ["Brussel",4]
df_sub
#### Replace new data in SQL DB
df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
Thanks for your help!
Note: I did try to achieve via pure SQL queries but gave up. As I am not an expert on SQL I would want to go with pandas if a solution exists. If not a hint on how to achieve this via sql would be great!
I think there is no way around using SQL queries for this task.
With pandas it is only possible to read a query to a DataFrame and to write a DataFrame to a Database (replace or append).
If you want to update specific values/ rows or want to delete rows, you have to use SQL queries.
Commands you should look into are for example:
UPDATE, REPLACE, INSERT, DELETE
# Update the database, change City to 'Brussel' and Population to 4, for the first row
# (Attention! python indices start at 0, SQL indices at 1)
cur = conn.cursor()
cur.execute('UPDATE MyTable SET City=?, Population=? WHERE ROWID=?', ('Brussel', 4, 1))
conn.commit()
conn.close()
# Display the changes
conn = sqlite3.connect("MyDB.db")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con=conn)
For more examples on sql and pandas you can look at
https://www.dataquest.io/blog/python-pandas-databases/
I am trying to take a dataframe and convert it into sql. I am creating the table first to set the unique indexing to allow for a rolling update with out having duplicates if there happens to be two A. Rods over time. Though I can't seem to shake this table column error and i don't know why.
import pandas as pd
import sqlite3 as sq
conn = sq.connect('test.db')
c = conn.cursor()
def set_table():
c.execute("""CREATE TABLE IF NOT EXISTS players(
"#" INTEGER,
" " REAL,
"Named" TEXT,
"B/T" TEXT,
"Ht" TEXT,
"Wt" TEXT,
"DOB" TEXT);""")
conn.commit()
def set_index_table():
c.execute(""" CREATE UNIQUE INDEX index_unique
ON players (Named, DOB)""")
conn.commit()
set_table()
set_index_table()
roster_active = pd.read_html('http://m.yankees.mlb.com/roster',index_col=0)
df = roster_active[0]
df = df.rename(columns={'Name': 'Named'})
df.to_sql('players', conn, if_exists='append')
conn.commit()
conn.close()
sqlite3.OperationalError: table players has no column named
Thank you for your time.
So I am not completely sure why this doesn't work but I found how I could get it to work. I believe it had something to do with the dataframe index. So I defined what columns I wanted to select for the dataframe and that worked.
df = df[['Named','B/T', 'Ht','Wt','DOB']]
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.