I have the following Python code:
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector
# Give the location of the file
loc = ("C:\\Users\\27826\\Desktop\\11Sixteen\\Models and Reports\\Historical results files\\EPL 1993-94.csv")
df = pd.read_csv(loc)
# Remove empty columns then rows
df = df.dropna(axis=1, how='all')
df = df.dropna(axis=0, how='all')
# Create DataFrame and then import to db (new game results table)
engine = create_engine("mysql://root:xxx#localhost/11sixteen")
df.to_sql('new_game_results', con=engine, if_exists="replace")
# Move from new games results table to game results table
db = mysql.connector.connect(host="localhost",
user="root",
passwd="xxx",
database="11sixteen")
my_cursor = db.cursor()
my_cursor.execute("INSERT INTO 11sixteen.game_results "
"SELECT * FROM 11sixteen.new_game_results WHERE "
"NOT EXISTS (SELECT date, HomeTeam "
"FROM 11sixteen.game_results WHERE "
"11sixteen.game_results.date = 11sixteen.new_game_results.date AND "
"11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)")
print("complete")
Basically the objective is that I copy data from several excel files to a SQL table (one at a time) and then transfer it from there to the fuller table where ALL the data will be aggregated (without duplicates hopefully)
Everything works 100% except the SQL query as below:
INSERT INTO 11sixteen.game_results
SELECT * FROM 11sixteen.new_game_results
WHERE NOT EXISTS ( SELECT date, HomeTeam
FROM 11sixteen.game_results WHERE
11sixteen.game_results.date = 11sixteen.new_game_results.date AND
11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)
If I run the same query on MySQL Workbench it works perfect. Any ideas why I can't get Python to execute the query as expected?
add a commit at the end.
db.commit()
Related
Ok, I have tried several kinds of solutions recommended by others on this site and other sited. However, I can't get it work as I would like it to do.
I get a XML-response which I normalize and then save to a CSV. This first part works fine.
Instead of saving it to CSV I would like to save it into an existing table in an access database. The second part below:
Would like to use an existing table instead of creating a new one
The result is not separated with ";" into different columns. Everything ends up in the same column not separated, see image below
response = requests.get(u,headers=h).json()
dp = pd.json_normalize(response,'Units')
response_list.append(dp)
export = pd.concat(response_list)
export.to_csv(r'C:\Users\username\Documents\Python Scripts\Test\Test2_'+str(now)+'.csv', index=False, sep=';',encoding='utf-8')
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
strSQL = "SELECT * INTO projects2 FROM [text;HDR=Yes;FMT=sep(;);" + \
"Database=C:\\Users\\username\\Documents\\Python Scripts\\Test].Testdata.csv;"
cur = conn.cursor()
cur.execute(strSQL)
conn.commit()
conn.close()
If you already have the data in a well-formed pandas DataFrame then you don't really need to dump it to a CSV file; you can use the sqlalchemy-access dialect to push the data directly into an Access table using pandas' to_sql() method:
from pprint import pprint
import urllib
import pandas as pd
import sqlalchemy as sa
connection_string = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
r"ExtendedAnsiSQL=1;"
)
connection_uri = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
engine = sa.create_engine(connection_uri)
with engine.begin() as conn:
# existing data in table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com')]
"""
# DataFrame to insert
df = pd.DataFrame(
[
("newdev", "newdev#example.com"),
("newerdev", "newerdev#example.com"),
],
columns=["username", "email"],
)
df.to_sql("user_table", engine, index=False, if_exists="append")
with engine.begin() as conn:
# updated table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com'),
('newdev', 'newdev#example.com'),
('newerdev', 'newerdev#example.com')]
"""
(Disclosure: I am currently the maintainer of the sqlalchemy-access dialect.)
Solved with the following code
SE_export_Tuple = list(zip(SE_export.Name,SE_export.URL,SE_export.ImageUrl,......,SE_export.ID))
print(SE_export_Tuple)
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
cursor = conn.cursor()
mySql_insert_query="INSERT INTO Temp_table (UnitName,URL,ImageUrl,.......,ID) VALUES (?,?,?,......,?)"
cursor.executemany(mySql_insert_query,SE_export_Tuple)
conn.commit()
conn.close()
However, when I add many fields I get an error at "executemany", saying:
cursor.executemany(mySql_insert_query,SE_export_Tuple)
Error: ('HY004', '[HY004] [Microsoft][ODBC Microsoft Access Driver]Invalid SQL data type (67) (SQLBindParameter)')
I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.
Below is the code which is inserting the data but it is very slow. Please advise.
import pandas as pd
import xlsxwriter
import pyodbc
df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
'SERVER=Server Name;'
'Database=Database Name;'
'UID=User ID;'
'PWD=Password;'
'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:-
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:-
for index,row in df.iterrows():
cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C'])
conn.commit()
cursor.close()
conn.close()
To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True
connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)
with engine.connect() as connection:
df.to_sql(<table name>, connection, index=False)
Here is the script and hope this works for you.
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
This should do what you want...very generic example...
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=Excel-PC\SQLEXPRESS;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Try that and post back if you have additional questions/issues/concerns.
Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.
Try to populate a temp table with 1 or none indexes then insert it into your good table all at once.
Might speed things up due to not having to update the indexes after each insert??
I created a table inserting data fetched from an api and store in to a pandas dataframe using sqlalchemy.
I am gonna need to query the api, every 4 hours, to get new data.
Problem being that the api, will give me back not only the new data but as well the old ones, already imported in mysql
how can i import just the new data into the mysql table
i retrieved the data from the api, stored the data in to a pandas object, created the connection to the mysql db and created a fresh new table.
import requests
import json
from pandas.io.json import json_normalize
myToken = 'xxx'
myUrl = 'somewebsite'
head = {'Authorization': 'token {}'.format(myToken)}
response = requests.get(myUrl, headers=head)
data=response.json()
#print(data.dumps(data, indent=4, sort_keys=True))
results=json_normalize(data['results'])
results.rename(columns={'datastream.name': 'datastream_name',
'datastream.url':'datastream_url',
'datastream.datastream_type_id':'datastream_id',
'start':'error_date'}, inplace=True)
results_final=pd.DataFrame([results.datastream_name,
results.datastream_url,
results.error_date,
results.datastream_id,
results.message,
results.type_label]).transpose()
from sqlalchemy import create_engine
from sqlalchemy import exc
engine = create_engine('mysql://usr:psw#ip/schema')
con = engine.connect()
results_final.to_sql(name='error',con=con,if_exists='replace')
con.close()
End goal is to insert into the table, just the not existing data coming from the api
You could pull the results already in the database into a new dataframe and then compare the two dataframes. After that you would only insert the rows not in the table. Not knowing the format of your table or data I'm just using a generic SELECT statement here.
from sqlalchemy import create_engine
from sqlalchemy import exc
engine = create_engine('mysql://usr:psw#ip/schema')
con = engine.connect()
sql = "SELECT * FROM table_name"
old_results = pd.read_sql(sql, con)
df = pd.merge(old_results, results_final, how='outer', indicator=True)
new_results = df[df['_merge']=='right_only'][results_final.columns]
new_results.to_sql(name='error',con=con,if_exists='append')
con.close()
You also need to change if_exists to append because set to replace it drops all values in the table and replaces them with the values in the pandas dataframe.
I developed this function to handle both: news values and when columns from the source table and target table are not equal.
def load_data(df):
engine = create_engine('mysql+pymysql://root:pass#localhost/dw', echo_pool=True, pool_size=10, max_overflow=20)
with engine.connect() as conn, conn.begin():
try:
df_old = pd.read_sql('SELECT * FROM table', conn)
# Check if exists new rows to be inserted
if len(df) > len(df_saved) or df.disconnected_time.max() > df_saved.disconnected_time.max():
print("There are new rows to be inserted. ")
df_merged = pd.merge(df_old, df, how='outer', indicator=True)
df_final = df_merged[df_merged['_merge']=='right_only'][df.columns]
df_final.to_sql(name='table',con=conn,index=False, if_exists='append')
except Exception as err:
print (str(err))
else:
# This handling errors when the lengths of the columns are not equal to the target
if df_bulbr.shape[1] > df_old.shape[1]:
data = pd.read_sql('SELECT * FROM table', conn)
df2 = pd.concat([df,data])
df2.to_sql('table', conn, index=False, if_exists='replace')
outcome = conn.execute("select count(1) from table")
countRow = outcome.first()[0]
return print(f" Total of {countRow} rows load." )
I created a python script in Pycharm that works like a charm... But these days I'm sensing that I could have a problem with size of monthly .csv file and also I would like to analyze data using SQL over Python so I could automatize process and make charts, pies from that queries.
So instead of exporting to csv I would like to append to SQL table.
Here is a part of code that exports data to .csv:
for content in driver.find_elements_by_class_name('companychatroom'):
people = content.find_element_by_xpath('.//div[#class="theid"]').text
if people != "Monthy" and people != "Python" :
pass
mybook = open(r'C:\Users\Director\PycharmProjects\MyProjects\Employees\archive' + datetime.now().strftime("%m_%y") + '.csv', 'a')
mybook.write('%20s,%20s\n'%(people, datetime.now().strftime("%d/%m/%y %H:%M")))
mybook.close()
================== EDIT: ==============
I tried over sqlite3 and these is what I could manage to write by now and it works... But how to append data into SQL table it always overwrite previous with INSERT TO and it shouldn't?
import sqlite3
from datetime import datetime
sqlite_file = (r"C:\Users\Director\PycharmProjects\MyProjects\Employees\MyDatabase.db")
conn = sqlite3.connect(sqlite_file)
cursor = conn.cursor()
table_name = 'Archive' + datetime.now().strftime("%m_%y")
sql = 'CREATE TABLE IF NOT EXISTS ' + table_name + '("first_name" varchar NOT NULL, "Currdat"date NOT NULL)'
cursor.execute(sql)
sql = 'INSERT INTO ' + table_name + '(first_name,Currdat) VALUES ("value1",CURRENT_TIMESTAMP);'
cursor.execute(sql)
sql = 'SELECT * FROM Archive06_16 '
for row in cursor.execute(sql):
print(row)
cursor.close()
Thanks in advance,
I have a database that contains multiple tables, and I am trying to import each table as a pandas dataframe. I can do this for a single table as follows:
import pandas as pd
import pandas.io.sql as psql
import pypyodbc
conn = pypyodbc.connect("DRIVER={SQL Server};\
SERVER=serveraddress;\
UID=uid;\
PWD=pwd;\
DATABASE=db")
df1 = psql.read_frame('SELECT * FROM dbo.table1', conn)
The number of tables in the database will change, and at any time I would like to be able to import each table into its own dataframe. How can I get all of these tables into pandas?
Depending on your SQL server, you can inspect the tables in a database.
For example:
tables_df = pd.read_sql('SELECT table_name FROM database_name', conn)
Now your table names are accessible as a pandas data frame, you just need to parse it out:
table_name_list = tables_df.table_name
select_template = 'SELECT * FROM {table_name}'
frames_dict = {}
for tname in table_name_list:
query = select_template.format(table_name = tname)
frames_dict[tname] = pd.read_sql(query, conn)
Your dictionary frames_dict contains all the dataframes with the table_name as the key