Uploading an excel file to a table in PostgreSQL - python

I have the following code that successfully uploads an excel file to postgreSQL
import pandas as pd
from sqlalchemy import create_engine
dir_path = os.path.dirname(os.path.realpath(__file__))
df = pd.read_excel(dir_path + '/'+file_name, "Sheet1")
engine= create_engine('postgresql://postgres:!Password#localhost/Database')
df.to_sql('identifier', con=engine, if_exists='replace', index=False)
However this leads to problems when trying to do simple queries such as updates in PgAdmin4.
Are there any other ways to insert an excel file into a postgeSQL table using python?

There is a faster way.
Take a look.

Related

Loading data to_sql comeup with error "(psycopg2.errors.StringDataRightTruncation) value too long for type character varying(256)" - Redhift/python

I am trying to load a csv file from s3 to redshift table using python. I have used boto3 to pull data from s3. Used pandas to convert data types (timestamp, string and integer) and tried to upload the dataframe to table using to_sql (sqlalchemy). It ended up with error
cursor.executemany(statement, parameters) psycopg2.errors.StringDataRightTruncation: value too long for type character varying(256)".
Additional Info: string contains large amount of mixed data. Also I am able to take the output as csv in my local machine.
My code as follows,
import io
import boto3
import pandas as pd
from sqlalchemy import create_engine
from datetime import datetime
client = boto3.client('s3', aws_access_key_id="",
aws_secret_access_key="")
response = client.get_object(Bucket='', Key='*.csv')
file = response['Body'].read()
df = pd.read_csv(io.BytesIO(file))
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
df['text'] = df['text'].astype(str)
df['count'] = df['count'].fillna(0).astype(int)
con = create_engine('postgresql://*.redshift.amazonaws.com:5439/dev')
select_list = ['date','text','count']
write = df[select_list]
df = pd.DataFrame(write)
df.to_sql('test', con, schema='parent', index=False, if_exists='replace')
I am a beginner, please help me to understand what I am doing wrong. Ignore any typo errors. Thanks.

How to put multi csv files into one sqlite or mysql Database in Python?

For only one cvs file, I can import it into sqlite as follows:
conn = sqlite3.connect("data.sqlite")
df = pd.read_csv('data.csv')
df.to_sql('data', conn, if_exists='append', index=False)
conn.close()
What if I have multi cvs files? How to ingest all the tables from cvs files into one sqlite or mysql database?
Assuming all files need to be written to different tables. Assuming the names of the tables should be set as file names, without the extension.
file_names = [...]
conn = sqlite3.connect("data.sqlite")
for file_name in file_names:
table_name = file_name.split('.')[0]
df = pd.read_csv('data.csv')
df.to_sql(table_name, conn, if_exists='append', index=False)
conn.close()
this would create or open data.sqlite and for each file in file_names it will create a pandas df and write it to a new or append to an existing table in the same SQLite DB
This method will not work with MySQL, you will need SQLAlchemy connection to write to MySQL
more on it: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

PowerQueryConnection into the Python pandas dataframe

I am trying the retrieve/download a .csv file from the .odc file via Sharepoint (Office 365).
The below is the connection string which is used by the powerquery connection parameter from .odc connection string (this is usually used in Excel for the powerqueryconnection to retrieve the data):
<xml id=msodc>
<odc:OfficeDataConnection
xmlns:odc="urn:schemas-microsoft-com:office:odc"
xmlns="http://www.w3.org/TR/REC-html40">
<odc:PowerQueryConnection odc:Type="OLEDB">
<odc:ConnectionString>Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties=""
</odc:ConnectionString>
<odc:CommandType>SQL</odc:CommandType>
<odc:CommandText>SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]</odc:CommandText>
</odc:PowerQueryConnection>
</odc:OfficeDataConnection>
</xml>
I want to extract the csv file which is this connecting to pandas dataframe. I tried the below code but not able to work it out
import pyodbc
import pandas as pd
conn = pyodbc.connect('Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location="GetViewData?ListViewID=52&csvformat=true";Extended Properties="&quot')
df = pd.read_sql("SELECT * FROM [GetViewData?ListViewID=52&csvformat=true]", conn)
conn.close()
please help,
thanks in advance
import pandas as pd
df = pd.read_excel(r'Path where the Excel file is stored\File name.xlsx', sheet_name='your Excel sheet name')
print(df)
The above code might be useful for your requirement.

Exporting CSV file into a specific path and Creating an automated code that runs daily

I am a novice at python and I am trying to create my first automated code in jupyter notebooks that will export my data pull from SQL server to a specific path and this code needs to run daily.
My questions:
1- It needs to export the CSV file to a specific folder, don't know how to do that
2- I need the code to run by itself on a daily basis
I am stuck, Any help is appreciated.
I have connected to the sql server and successfully pull the report and write a CSV file.
import smtplib
import pyodbc
import pandas as pd
import pandas.io.sql
server = 'example server'
db = 'ExternalUser'
conn = pyodbc.connect('Driver={SQL Server};'
'Server=example server;'
'Database=ExternalUser;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
cursor.execute("my SQL query")
col_headers = [ i[0] for i in cursor.description ]
rows = [ list(i) for i in cursor.fetchall()]
df = pd.DataFrame(rows, columns=col_headers)
df.to_csv("Test v2.csv", header = True, index=False)
For needing to export the csv too a certain folder: It depends where/how you run the script. If you run the script in the folder you want the csv file saved then your current df.to_csv('filename.csv') would work great, or add a path 'Test_dir/filename.csv'. Otherwise you could use a library like shutil (https://docs.python.org/3/library/shutil.html) that will then move the .csv file to a given folder.
For running the code on a daily basis, you could do this locally on your machine (https://medium.com/#thabo_65610/three-ways-to-automate-python-via-jupyter-notebook-d14aaa78de9). Or you could look into configuring a cronjob.

Loading csv files into database in linux

I have been scraping csv files from the web every minute and storing them into a directory.
The files are being named according to the time of retrieval:
name = 'train'+str(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))+'.csv'
I need to upload each file into a database created on some remote server.
How can I do the above?
You can use pandas and sqlalchemy for loading CSV into databases. I use MSSQL and my code looks like this:
import os
import pandas as pd
import sqlalchemy as sa
server = 'your server'
database = 'your database'
for filename in os.listdir(directory): #iterate over files
df = pandas.read_csv(filename, sep=',')
engine = sa.create_engine('mssql+pyodbc://'+server+'/'+database+'?
driver=SQL+Server+Native+Client+11.0')
tableName = os.path.splitext(filename)[0]) #removes .csv extension
df.to_sql(tableName, con=engine,dtype=None) #sent data to server
By setting the dtype parameter you can change the conversion of datatype (e.g. if you want smallint instead of integer, etc)
to ensure you dont write the same file/table twice I would suggest to perhaps keep a logfile in the directory, where you can log what csv files are written to the DB. and then exclude those in your for-loop.

Categories

Resources