I have SQL database created by 3rd party program and I am importing some datas from excel table to sql db with python. Here is the previews of database and excel table;
As you see sql and excel columns' name are matched exactly. and here is code I use to import;
import pandas as pd
import sqlite3
#Paths
excel_path="C:/users/user/desktop/ACC_Import.xlsx"
sql_db_path="c:/users/P6_BD_DataBase_001"
#Defs
df=pd.read_excel(excel_path, dtype={"ACCT_SHORT_NAME":object}) #this dtype important, pandas turns to int which we don't want to ...
conn=sqlite3.connect(sql_db_path)
cur=conn.cursor()
def insert_excel_to_sql(df):
for row in df.itertuples():
data_tuple=(row.ACCT_ID,
row.PARENT_ACCT_ID,
row.ACCT_SEQ_NUM,
row.ACCT_NAME,
row.ACCT_SHORT_NAME,
row.ACCT_DESCR,
row.CREATE_DATE,
row.CREATE_USER,
row.UPDATE_DATE,
row.UPDATE_USER,
row.DELETE_SESSION_ID,
row.DELETE_DATE)
sqlite_insert_with_param='''
INSERT INTO ACCOUNT (ACCT_ID,PARENT_ACCT_ID,ACCT_SEQ_NUM,ACCT_NAME,
ACCT_SHORT_NAME,ACCT_DESCR,CREATE_DATE,CREATE_USER,
UPDATE_DATE,UPDATE_USER,DELETE_SESSION_ID,DELETE_DATE)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?);
'''
cur.execute(sqlite_insert_with_param,data_tuple)
conn.commit()
I still type all columns' names one by one which I am sure that they are exactly the same.
Is there any other way importing excel table (sql and excel column names are exactly same) to sql by NOT typing all column names one by one ?
From the sqlite INSERT doc:
INSERT INTO table VALUES(...);
The first form (with the "VALUES" keyword) creates one or more new rows in an existing table. If the column-name list after table-name is omitted then the number of values inserted into each row must be the same as the number of columns in the table. In this case the result of evaluating the left-most expression from each term of the VALUES list is inserted into the left-most column of each new row, and so forth for each subsequent expression.
Therefore it is not necessary to type all the column names in the INSERT statement.
First, #DinoCoderSaurus is right, not necessary to type column names in INSERT statement. For df column names, I used this path
df=pd.read_excel("Acc_Import.xlsx",dtype=object)
Reading excel table with dtypes as an object
for index in range(len(df)):
row_tuple=tuple(df.iloc[index].to_list())
cur.execute('''INSERT INTO ACCOUNT VALUES (?,?,?,?,?,?,?,?,?,?,?,?);
''',row_tuple)
With reading "df dtype=object" returns row_tuple items' type "int,str,float" so, I was able to import excel datas to SQL database. image as below;
If I don't read df with dtype=object;
row_tuple=tuple(df.iloc[index].to_list())
row_tuple items' dtypes return "numpy.int64, numpy.float64" which causes "data mismatch error" when importing excel table to sql db. Image as below ;
Related
I need to create columns dynamically in table1 based on the values retrieved from the form. I am using sqlalchemy to do the same and using the below code:
engine.execute('ALTER TABLE %s ADD COLUMN %s %s;' % ('table1', col_name, "VARCHAR(100)"))
In the above statement:
table1: name of the table we are inserting the column dynamically.
col_name: string containing the column name that is to be inserted.
VARCHAR(100): column type
The above code runs without any error and the new column is added. However, all the columns being created have the datatype as VARCHAR(60) in the table. I need to increase the length of column. Also, I'm not using flask-sqlalchemy.
Any ideas what might be causing the problem.
I have a pandas.DataFrame with columns having different data types like object, int64 , etc.
I have a postgresql table created with appropriate data types. I want to insert all the dataframe data into postgresql table. How should manage to do this?
Note : The data in pandas is coming from another source so the
data types are not specified manually by me.
The easiest way is to use sqlalchemy:
from sqlalchemy import create_engine
engine = create_engine('postgresql://abc:def#localhost:5432/database')
df.to_sql('table_name', engine, if_exists='replace')
If the table exists, you can choose what you want to do with if_exists option
if_exists {‘fail’, ‘replace’, ‘append’}, default ‘fail’
If the table does not exist, it will create a new table with the corresponding datatypes.
Maybe you have the problem I had that you want to create new columns on the existing table, and then the solution to replace or append the table does not work for me. Shortly for me it looks like this ( I guess for the converting of datatypes is no general solution and you should adapt for your need):
lg.debug('table gets extended with the columns: '+",".join(dataframe.dtypes))
#check whether we have to add a field
df_postgres={'object':'text','int64':'bigint','float64':'numeric','bool':'boolean','datetime64':'timestamp','timedelta':'interval'}
for col in dataframe.columns:
#convert the columns to postgres:
if str(dataframe.dtypes[col]) in df_postgres:
dbo.table_column_if_not_exists(self.table_name,col,df_postgres[str(dataframe.dtypes[col])],original_endpoint)
else:
lg.error('Fieldtype '+str(dataframe.dtypes[col])+' is not configured')
and the function to create the columns:
def table_column_if_not_exists(self,table,name,dtype,original_endpoint=''):
self.query(query='ALTER TABLE '+table+' ADD COLUMN IF NOT EXISTS '+name+' '+dtype)
#make a comment when we know which source create this column
if original_endpoint!='':
self.query(query='comment on column '+table+'.'+name+" IS '"+original_endpoint+"'")
I have a ms access db I've connected to with (ignore the ... in the drive name, it's working):
driver = 'DRIVER={...'
con = pyodbc.connect(driver)
cursor = con.cursor()
I have a pandas dataframe which is exactly the same as a table in the db except there's an additional column. Basically I pulled the table with pyodbc, merged it with external excel data to add this additional column, and now want to push the data back to the ms access table with the new column. The pandas df containing the new information is merged_df['Item']
Trying things like below does not work, I've had a variety of errors.
cursor.execute("insert into ToolingData(Item) values (?)", merged_df['Item'])
con.commit()
How can I push the new column to the original table? Can I just write over the entire table instead? Would that be easier? Since merged_df is literally the same thing with the addition of one new column.
If the target MS Access table does not already contain a field to house the data held within the additional column, you'll first need to execute an alter table statement to add the new field.
For example, the following will add a 255-character text field called item to the table ToolingData:
alter table ToolingData add column item text(255)
I created a table importing data from a csv file into a SQL Server table. The table contains about 6000 rows that are all float. I am trying to insert a new row using INSERT (I am using Python/Spyder and SQL Server Management Studio) and it does insert the row but not at the bottom of the table but towards the middle. I have no idea why it does that. This is the code that I am using:
def create (conn):
print ("Create")
cursor = conn.cursor()
cursor.execute ("insert into PricesTest
(Price1,Price2,Price3,Price4,Price5,Price6,Price7,Price8,Price9,Price10,Price
11,Price12) values (?,?,?,?,?,?,?,?,?,?,?,?);",
(46,44,44,44,44,44,44,44,44,44,44,44))
conn.commit()
read (conn)
Any idea why this is happening? What I should add to my code to "force" that row to be added at the bottom of the table? Many thanks.
I managed to sort it out following different suggestions posted here. Basically I was conceptually wrong to think that tables in MS SQL have an order. I am now working with the data in my table using the ORDER BY dates (I added dates as my first column) and works well. Many thanks all for your help!!
The fact is that the new rows are inserted without any order by default because the server has no rule to order the newly inserted rows (there is no primary key defined). You should have created an identity column before importing your data (even you can do it now):
Id Int IDENTITY(1,1) primary key
This will ensure all rows will be added at the end of the table.
More info on the data type you could use on w3school : https://www.w3schools.com/sql/sql_datatypes.asp
I have created a lookup table (in Excel) which has the table and column name for the various tables and the the column names under these table along with all the SQL queries to be run on these fields. Below is an example table.
Results from all SQL Queries are in the format Total_Count and Fail_Count. I want to output these results along with all the information in the current version of the lookup table and date of execution into a separate table.
Sample result Table:
Below is the code I used to get the results together in the same lookup table but have trouble storing the same results in a separate result_set table with separate columns for total and fail counts.
df['Results'] = ''
from pandas import DataFrame
for index, row in df.iterrows():
cur.execute(row["SQL_Query"])
df.loc[index,'Results'] = (cur.fetchall())
It might be easier to load the queries into a DataFrame directly using the read_sql method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html.
The one caveat is that you need to use a sqlAlchemy engine for the connection. I also find itertuples easier to work with.
One you have those your code is merely:
df['Results'] = ''
for row in df.itertuples():
df_result = pd.read_sql(row.Sql_Query)
df.loc[row.Table_Name, 'Total_count'] = df_result.total_count
df.loc[row.Table_Name, 'Fail_count'] = df_result.fail_count
Your main problem above is that you're passing two columns from the result query to one column in df. You need to pass each column separately.