I am trying to take a dataframe and convert it into sql. I am creating the table first to set the unique indexing to allow for a rolling update with out having duplicates if there happens to be two A. Rods over time. Though I can't seem to shake this table column error and i don't know why.
import pandas as pd
import sqlite3 as sq
conn = sq.connect('test.db')
c = conn.cursor()
def set_table():
c.execute("""CREATE TABLE IF NOT EXISTS players(
"#" INTEGER,
" " REAL,
"Named" TEXT,
"B/T" TEXT,
"Ht" TEXT,
"Wt" TEXT,
"DOB" TEXT);""")
conn.commit()
def set_index_table():
c.execute(""" CREATE UNIQUE INDEX index_unique
ON players (Named, DOB)""")
conn.commit()
set_table()
set_index_table()
roster_active = pd.read_html('http://m.yankees.mlb.com/roster',index_col=0)
df = roster_active[0]
df = df.rename(columns={'Name': 'Named'})
df.to_sql('players', conn, if_exists='append')
conn.commit()
conn.close()
sqlite3.OperationalError: table players has no column named
Thank you for your time.
So I am not completely sure why this doesn't work but I found how I could get it to work. I believe it had something to do with the dataframe index. So I defined what columns I wanted to select for the dataframe and that worked.
df = df[['Named','B/T', 'Ht','Wt','DOB']]
Related
I have been banging my head trying to get my Sqlite table to update given the code below.. looping through Pandas dataframe cells in specific columns to update specific columns in the sqlite table given a date match.
Everything works up until updating the table but the table won't update. It is always printing Fail. Any help would be greatly appreciated!
cur.execute('ALTER TABLE Car_parts'+Part_str+' ADD COLUMN Close_'+Column_ID+' number')
conn.commit()
cur.execute('ALTER TABLE Car_parts'+Part_str+' ADD COLUMN Volume_'+Column_ID+' number')
conn.commit()
y=0
while y<len(Car_Data.index):
print(y)
Update_date=Car_Data.iloc[y,0]
#Update_date=datetime.strptime(UpdateDate,date_format)
Update_close=str(CAR_Data.iloc[y,1])
Update_volume=str(Car_Data.iloc[y,2])
print(type(Update_date),type(Update_close), type(Update_volume))
try:
cur.execute('UPDATE Car_parts'+Part_str+' SET Close_'+Column_ID+' = ?, Volume_'+Column_ID+' = ? WHERE Date= ?',(Update_Close, Update_Volume, Update_date,))
conn.commit()
print("Success")
except:
print("fail")
pass
y+=1
I'm trying to accomplish a very simple task:
Create a table in SQLite
Insert several rows
Query a single column in the table and pull back each row
Code to create tab:
import sqlite3
sqlite_file = '/Users/User/Desktop/DB.sqlite'
conn = sqlite3.connect(sqlite_file)
c = conn.cursor()
c.execute('''CREATE TABLE ListIDTable(ID numeric, Day numeric, Month
numeric, MonthTxt text, Year numeric, ListID text, Quantity text)''')
values_to_insert = [
(1,16,7,"Jul",2015,"XXXXXXX1","Q2"),
(2,16,7,"Jul",2015,"XXXXXXX2","Q2"),
(3,14,7,"Jul",2015,"XXXXXXX3","Q1"),
(4,14,7,"Jul",2015,"XXXXXXX4","Q1")] #Entries continue similarly
c.executemany("INSERT INTO ListIdTable (ID, Day, Month, MonthTxt,
Year, ListID, Quantity) values (?,?,?,?,?,?,?)", values_to_insert)
conn.commit()
conn.close()
When I look at this table in SQLite DB Browser, everything looks fine.
Here's my code to try and query the above table:
import sqlite3
sqlite_file = '/Users/User/Desktop/DB.sqlite'
conn = sqlite3.connect(sqlite_file)
conn.row_factory = sqlite3.Row
c = conn.cursor()
for row in c.execute('select * from ListIDTable'):
r = c.fetchone()
ID = r['ID']
print (ID)
I should get a print out of 1, 2, 3, 4.
However, I only get 2 and 4.
My code actually uploads 100 entries to the table, but still, when I query, I only get ID printouts of even numbers (i.e. 2, 4, 6, 8 etc.).
Thanks for any advice on fixing this.
You don't need to fetchone in the loop -- The loop is already fetching the values (one at a time). If you fetchone while you're iterating, you'll only see half the data because the loop fetches one and then you immediately fetch the next one (without ever looking at the one that was fetched by the loop):
for r in c.execute('select * from ListIDTable'):
ID = r['ID']
print (ID)
from bs4 import BeautifulSoup
import requests
import sqlite3
conn = sqlite3.connect('stadiumsDB.db')
c = conn.cursor()
c.executescript('''
DROP TABLE IF EXISTS Stadium;
CREATE TABLE Stadium (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
Stadium TEXT UNIQUE,
Club Text UNIQUE,
Location Text UNIQUE
)
''')
main_site = requests.get('https://en.wikipedia.org/wiki/List_of_Premier_League_stadiums').text
soup = BeautifulSoup(main_site)
column_names = [th.getText() for th in soup.findAll('th')[:9]]
print column_names
stadium_rows = soup.findAll('tr')[1:]
stadium_data = [[td.getText() for td in stadium_rows[i].findAll('td')]
for i in range(len(stadium_rows))]
print stadium_data
I want to build a sqlite database. First row of the table will be the column name and i want them import from my column_names variable. Next rows i want them import from stadium_data variable. Any guidance please !!!!
For your table definition you can insert stadium_data best with executemany like
cn=column_names
stadium=cn.index("Stadium")
club=cn.index("Club")
location=cn.index("Location")
# this is for making sure that column order is correct
c.executemany("insert into stadium(stadium, club, location) values (?,?,?)",
((row[stadium], row[club],row[location]) for row in stadium_data))
This does not put column_names first in the table as this is doesnt seem like a good idea to me but you can do so with (insert this before the executemany)
c.execute("insert into stadium(id,stadium, club, location) values (?,?,?,?)",
(0,cn[stadium], cn[club],cn[location]))
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.
Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?
As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)
Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)
I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).