im doing scraping web and want to save my dataframe to my sql server that updates everymonth,what should i add to code for the data got replaced not added.im using pyodbc(i cant use sql achemy)thank you in advance.
col_names = ["month", "price", "change"]
df = pd.read_csv("minyak_brent.csv",sep=',',quotechar='\'',encoding='utf8', names=col_names,skiprows = 1) # Replace Excel_file_name with your excel sheet name
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.minyak_brent([month],[price],[change]) values (?,?)',
row['month'],
row['price'],
row['change'])
cnxn.commit()
cursor.close()
cnxn.close()
You should use df.to_sql here, which avoids the need to explicitly iterate over your Pandas data frame:
df.to_sql(con=engine, name='dbo.minyak_brent', if_exists='append', dtype={
'month': STRING(255),
'price': FLOAT,
'change': FLOAT, index=False
)
This assumes that the types of the month, price, and change columns are, String(255), FLOAT, and FLOAT, respectively.
If you must stick with your current approach, then fix your insert statement such that it has the right number of placeholders in the VALUES clause:
cursor.execute('''INSERT INTO dbo.minyak_brent([month], [price], [change])
VALUES (?, ?, ?)''', (row['month'], row['price'], row['change']))
Note that we insert a tuple as the second parameter to cursor#execute.
Related
I am new to python and I don't really understand the sql thing that well. Currently on the 6th week of team treehouse so please bare with me here if these are noob questions.
Goal
Import CSV with stock_tickers and 5 other columns of data
Convert CSV into pandas dataframe
Import dataframe into database. If there is already the unique stock_ticker for it to not add a new row, but next to check if the data in the other 5 columns is different. If it is than update it.
Right now I can do steps #1 and #2 and half of #3. With the help on here was able to get the looping thing to work. If there is a new stock_ticker row in the csv it will add it to database. If the data changes for an existing stock_ticker it won't do any updates.
for i in range(len(df)):
try:
df[df.index == i].to_sql(name='stocks', con=conn, if_exists='append', index=False)
conn.commit()
except sqlite3.IntegrityError:
pass
Current Code looks like this
import pandas as pd
from pandas import DataFrame
from pandas import ExcelWriter
import csv
import sqlite3
### IMPORT CSV ###
stock_csv_file = pd.read_csv (r'C:\Users\home\Desktop\code_projects\FIRE_Dashboard\db\alpha_vantage_active_stocks.csv')
### CHANGING INDEX NAMES FROM CSV TO TABLE NAMES ###
df = pd.DataFrame(stock_csv_file)
df = df.rename(columns = {"symbol":"stock_ticker", "name":"stock_name", "exchange":"stock_exchange", "ipoDate":"stock_ipoDate", "delistingDate":"stock_delistingDate", "status":"stock_status"})
### UPDATING DATABSE WITH SQLITE3###
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
insert_statement = """
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET (stock_status)"""
for i in range(len(df)):
values = tuple(df.iloc[i])
c.execute(insert_statement, values)
The error I am getting
Traceback (most recent call last):
File "update_stock_tickers.py", line 71, in <module>
c.execute(insert_statement, values)
sqlite3.OperationalError: incomplete input
Found these posts that talk about it, but still getting lost >.<
How to use variables in SQL statement in Python?
python Datetime and SQLite
Loop through individual rows and update those rows SQLite3 Python
Any help is much appreciated.
Code after solution
import pandas as pd
from pandas import DataFrame
from pandas import ExcelWriter
import csv
import sqlite3
### IMPORT CSV ###
stock_csv_file = pd.read_csv (r'C:\Users\home\Desktop\code_projects\FIRE_Dashboard\db\alpha_vantage_active_stocks.csv')
### CHANGING INDEX NAMES FROM CSV TO TABLE NAMES ###
df = pd.DataFrame(stock_csv_file)
df = df.rename(columns = {"symbol":"stock_ticker", "name":"stock_name", "exchange":"stock_exchange", "ipoDate":"stock_ipoDate", "delistingDate":"stock_delistingDate", "status":"stock_status"})
### UPDATING DATABSE WITH SQLITE3###
conn = sqlite3.connect('stockmarket.db')
c = conn.cursor()
insert_statement = """
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET stock_status = EXCLUDED.stock_status"""
for i in range(len(df)):
values = tuple(df.iloc[i])
c.execute(insert_statement, values)
conn.commit()
This is the ON CONFLICT clause of your query:
ON CONFLICT (stock_ticker) DO UPDATE
SET (stock_status)
This is not valid SQLite syntax. If you want to update stock_status when another row already exists with the same stock_ticker, you can use pseudo-table EXCLUDED like so:
INSERT INTO stocks (stock_ticker,
stock_name,
stock_exchange,
stock_ipoDate,
stock_delistingDate,
stock_status
)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT (stock_ticker) DO UPDATE
SET stock_status = EXCLUDED.status
I am getting the following error when trying to insert via python:
TypeError: not all arguments converted during string formatting
This is the first row I want to insert, as an example:
(Timestamp('2019-01-31 00:00:00'),
Timestamp('2018-10-03 00:00:00'),
'APP-552498',
'Company Name Lawyer',
'Funded',
36500,
1095.0,
1.35,
49275.0,
15509.0,
251.0,
'Daily',
1825.0,
196.31,
78,
0.0,
'Law Offices',
NaT,
'',
'CO',
8.4,
'Company Name',
0.7647,
38003.68,
7154.34,
'West',
33766.0,
'N')
With the aforementioned commands to insert:
df_svc_vals = [tuple(x) for x in df.values]
c.execute("""INSERT INTO schema.table VALUES (%s)""", df_svc_vals[0])
c.executemany("""INSERT INTO schema.table VALUES (%s)""", df_svc_vals)
Furthermore, when I actually copy the data into a separate CSV and load into the DB directly, the data inserts correctly.
I put the first two columns as type date, every column with a number as real, and the strings as character varying. Also, the column with NaT is a date column, it's just a null value (that's the appearance w/in pandas).
How can I circumvent this issue?
you need to put "%s" for each values, so you can do it like this (it will produce "INSERT INTO schema.table VALUES (%s,%s,%s,....,%s) )
"INSERT INTO schema.table VALUES ("+(",".join(["%s"]*len(df_svc_vals[o])))+")"
so overall, you can do soemthing like this
c.executemany("INSERT INTO schema.table VALUES ("+(",".join(["%s"]*len(df_vals[0])))+")", df_svc_vals)
I am currently writing a program that will take data from an excel spreadsheet and insert it into a sql server table that I have created within the program.
I have previously assigned the datetime column to be a nvarchar(250) for the purpose of getting the overall programme to work, however when I go to change it to datetime, the data is inputted into the wrong columns? The rest of the code worked with the nvarchar datatype aswell.
import pyodbc
connection_string = r'connection_string'
data = 'file_path'
conn = pyodbc.connect(connection_string)
cur = conn.cursor()
createtable = """
create table table1(
ID Int NULL,
Date datetime(250) NULL,
City nvarchar(250) NULL,
Country nvarchar(250) NULL,
Image nvarchar(250) NULL,
Length nvarchar(250) NULL,
Date_Of_capture nvarchar(250) NULL,
Comments nvarchar(1000) NULL
)"""
truncatetable = """truncate table table1"""
with open(data) as file:
file.readline()
lines = file.readlines()
if cur.tables(table="table1").fetchone():
cur.execute(truncatetable)
for line in lines:
cols = line.split(',')
cols = line.replace("'", "")
sql = "INSERT INTO table1 VALUES({}, '{}', '{}', '{}', '{}', '{}','{}','{}')".format(cols[0], cols[1],cols[2], cols[3], cols[4], cols[5], cols[6], cols[7])
cur.execute(sql)
else:
cur.execute(createtable)
for line in lines:
cols = line.split(',')
sql = "INSERT INTO table1 VALUES({}, '{}', '{}', '{}', '{}', '{}','{}','{}')".format(cols[0], cols[1],cols[2], cols[3], cols[4], cols[5], cols[6], cols[7])
cur.execute(sql)
conn.commit()
conn.close()
I would expect the date column to show as a datetime data type whilst being contained within one column however it changes the tables so that all the columns are incorrect and each digit of the date is within a different column?
Any help is greatly appreciated. Thank you.
Consider the following best practices:
Always specify columns in INSERT INTO even SELECT clauses, specifically use INSERT INTO myTable (Col1, Col2, Col3, ...) which helps in readability and maintainability;
Use parameterization with prepared statement to avoid quote escaping or type casting among other important items. Additionally Python allows tuples to be passed into params argument of cursor.execute() without listing each individual column.
Use the csv library of Python to traverse CSV files with lists or dictionary for proper alignment and avoid the memory intensive .readlines() call;
Combine CREATE TABLE and TRUNCATE in one SQL call to avoid if conditionals with cursor fetch call.
See adjusted code.
import csv
...
action_query = """
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'mytable')
BEGIN
TRUNCATE TABLE table1
END
ELSE
BEGIN
CREATE TABLE table1(
ID Int NULL,
Date datetime NULL,
City nvarchar(250) NULL,
Country nvarchar(250) NULL,
Image nvarchar(250) NULL,
Length nvarchar(250) NULL,
Date_Of_capture nvarchar(250) NULL,
Comments nvarchar(1000) NULL
)
END
""")
cur.execute(action_query)
conn.commit()
# PREPARED STATEMENT
append_query = """INSERT INTO mytable (ID, Date, City, Country, Image,
Length, Date_Of_capture, Comments)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
"""
# ITERATE THROUGH CSV AND INSERT ROWS
with open(mydatafile) as f:
next(f) # SKIP HEADERS
reader = csv.reader(f)
for r in reader:
# RUN APPEND AND BIND PARAMS
cur.execute(append_query, params=r)
conn.commit()
cur.close()
conn.close()
I try to load csv data into postgres. The creating table part is fine. But when I try to load data from csv, it got error. My code and error are attached below. Is %s wrong?
import psycopg2
import csv
conn = psycopg2.connect(host="127.0.0.1", port="5432", database="postgres", user="postgres", password="*******")
print "Opened database successfully"
cur = conn.cursor()
cur.execute('''create table calls_aapl("Ask" float,"Bid" float,"Change" float,"ContractSymbol" varchar(50),"ImpliedVolatility" float,"LastPrice" float,
"LastTradeDate" date,"OpenInterest" int,"PercentChange" float,"Strike" float,"Volume" int);''')
print "Table created successfully"
reader = csv.reader(open('D:/python/Anaconda/AAPL_Data/Calls.csv', 'r'))
for i, row in enumerate(reader):
print(i, row)
if i == 0: continue
cur.execute('''
INSERT INTO "calls_aapl"(
"Ask", "Bid", "Change", "ContractSymbol", "ImpliedVolatility", "LastPrice", "LastTradeDate", "OpenInterest", "PercentChange", "Strike", "Volume"
) values (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)''', row
)
conn.commit()
cur.close()
Error:
(0, ['Ask', 'Bid', 'Change', 'ContractSymbol', 'LastPrice', 'LastTradeDate', 'OpenInterest', 'PercentChange', 'PercentImpliedVolatility', 'Strike', 'Volume'])
(1, ['41.7', '39.75', '1.15', 'AAPL180803C00150000', '41.05', '7/31/2018', '52', '2.88', '154.59', '150', '6'])
DataError: invalid input syntax for type double precision: "7/31/2018"
LINE 4: ...1.7','39.75','1.15','AAPL180803C00150000','41.05','7/31/2018...
^
Using %s is ok because PostgreSQL can cast strings to numbers in an INSERT.
Your problem is a different one. Your INSERT statement specifies a column "ImpliedVolatility" (too late for a warning against mixed case identifiers) which is not in the data.
This causes the fifth column (labeled LastPrice to be inserted into "ImpliedVolatility" and the next column (labeled LastTradeDate) to be inserted into "LastPrice".
The former of these is wrong but works, because both "LastPrice" and "ImpliedVolatility" are float^H^H^H^H^Hdouble precision, but the latter fails because it tries to insert a date string into a double precision column.
Omit the column "ImpliedVolatility" from the INSERT statement.
its just about typo i think,
you should equalize the table column to your insert query.
That "LastTradeDate" is inserted to "LastPrice" which is which is not the right column
thank you
Usually occurs when your column headers and values aren't matched up properly. Try checking to see if the number of values specified are the same and of similar data types.
I am trying to take a dataframe and convert it into sql. I am creating the table first to set the unique indexing to allow for a rolling update with out having duplicates if there happens to be two A. Rods over time. Though I can't seem to shake this table column error and i don't know why.
import pandas as pd
import sqlite3 as sq
conn = sq.connect('test.db')
c = conn.cursor()
def set_table():
c.execute("""CREATE TABLE IF NOT EXISTS players(
"#" INTEGER,
" " REAL,
"Named" TEXT,
"B/T" TEXT,
"Ht" TEXT,
"Wt" TEXT,
"DOB" TEXT);""")
conn.commit()
def set_index_table():
c.execute(""" CREATE UNIQUE INDEX index_unique
ON players (Named, DOB)""")
conn.commit()
set_table()
set_index_table()
roster_active = pd.read_html('http://m.yankees.mlb.com/roster',index_col=0)
df = roster_active[0]
df = df.rename(columns={'Name': 'Named'})
df.to_sql('players', conn, if_exists='append')
conn.commit()
conn.close()
sqlite3.OperationalError: table players has no column named
Thank you for your time.
So I am not completely sure why this doesn't work but I found how I could get it to work. I believe it had something to do with the dataframe index. So I defined what columns I wanted to select for the dataframe and that worked.
df = df[['Named','B/T', 'Ht','Wt','DOB']]