I have a Pandas DataFrame with around 200,000 indexes/rows and 30 columns.
I need to have this directly exported into an .mdb file, converting it into a csv and manually importing it will not work.
I understand there's tools like pyodbc that help a lot with importing/reading access, but there is little documentation on how to export.
I'd love any help anyone can give, and would strongly appreciate any examples.
First convert the dataframe into .csv file using the below command
name_of_your_dataframe.to_csv("filename.csv", sep='\t', encoding='utf-8')
Then load .csv to .mdb using pyodbc
MS Access can directly query CSV files and run a Make-Table Query(https://support.office.com/en-us/article/Create-a-make-table-query-96424f9e-82fd-411e-aca4-e21ad0a94f1b) to produce a resulting table. However, some cleaning is needed to remove the rubbish rows. Below opens two files one for reading and other for writing. Assuming rubbish is in first column of csv, the if logic writes any line that has some data in second column (adjust as needed):
import os
import csv
import pyodbc
# TEXT FILE CLEAN
with open('C:\Path\To\Raw.csv', 'r') as reader, open('C:\Path\To\Clean.csv', 'w') as writer:
read_csv = csv.reader(reader); write_csv = csv.writer(writer,lineterminator='\n')
for line in read_csv:
if len(line[1]) > 0:
write_csv.writerow(line)
# DATABASE CONNECTION
access_path = "C:\Path\To\Access\\DB.mdb"
con = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
# RUN QUERY
strSQL = "SELECT * INTO [TableName] FROM [text;HDR=Yes;FMT=Delimited(,);" + \
"Database=C:\Path\To\Folder].Clean.csv;"
cur = con.cursor()
cur.execute(strSQL)
con.commit()
con.close() # CLOSE CONNECTION
os.remove('C\Path\To\Clean.csv') # DELETE CLEAN TEMP
2020 Update
There is now a supported external SQLAlchemy dialect for Microsoft Access ...
https://github.com/gordthompson/sqlalchemy-access
... which enables you to use pandas' to_sql method directly via pyodbc and the Microsoft Access ODBC driver (on Windows).
I would recommend to export the pandas dataframe to csv as usual like this:
dataframe_name.to_csv("df_filename.csv", sep=',', encoding='utf-8')
Then you can convert it to .mdb file as this stackoverflow answer shows
Related
The goal is to load a csv file into an Azure SQL database from Python directly, that is, not by calling bcp.exe. The csv files will have the same number of fields as do the destination tables. It'd be nice to not have to create the format file bcp.exe requires (xml for +-400 fields for each of 16 separate tables).
Following the Pythonic approach, try to insert the data and ask SQL Server to throw an exception if there is a type mismatch, or other.
If you don't want use bcp cammand to import the csv file, you can using Python pandas library.
Here's the example that I import a no header 'test9.csv' file on my computer to Azure SQL database.
Csv file:
Python code example:
import pandas as pd
import sqlalchemy
import urllib
import pyodbc
# set up connection to database (with username/pw if needed)
params = urllib.parse.quote_plus("Driver={ODBC Driver 17 for SQL Server};Server=tcp:***.database.windows.net,1433;Database=Mydatabase;Uid=***#***;Pwd=***;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# read csv data to dataframe with pandas
# datatypes will be assumed
# pandas is smart but you can specify datatypes with the `dtype` parameter
df = pd.read_csv (r'C:\Users\leony\Desktop\test9.csv',header=None,names = ['id', 'name', 'age'])
# write to sql table... pandas will use default column names and dtypes
df.to_sql('test9',engine,if_exists='append',index=False)
# add 'dtype' parameter to specify datatypes if needed; dtype={'column1':VARCHAR(255), 'column2':DateTime})
Notice:
get the connect string on Portal.
UID format is like [username]#[servername].
Run this scripts and it works:
Please reference these documents:
HOW TO IMPORT DATA IN PYTHON
pandas.DataFrame.to_sql
Hope this helps.
I have a csv file which contains 60000 rows. I need to insert this data into postgres database table. Is there any way to do this to reduce time to insert data from file to database without looping? Please help me
Python Version : 2.6
Database : postgres
table: keys_data
File Structure
1,ED2,'FDFDFDFDF','NULL'
2,ED2,'SDFSDFDF','NULL
Postgres can read CSV directly into a table with the COPY command. This either requires you to be able to place files directly on the Postgres server, or data can be piped over a connection with COPY FROM STDIN.
The \copy command in Postgres' psql command-line client will read a file locally and insert using COPY FROM STDIN so that's probably the easiest (and still fastest) way to do this.
Note: this doesn't require any use of Python, it's native functionality in Postgres and not all or most other RDBs have the same functionality.
I've performed similar task, the only exception is that my solution is python 3.x based. I am sure you can find equivalent code of this solution. Code is pretty self explanatory.
from sqlalchemy import create_engine
def insert_in_postgre(table_name, df):
#create engine object
engine = create_engine('postgresql+psycopg2://user:password#hostname/database_name')
#push dataframe in given database engine
df.head(0).to_sql(table_name, engine, if_exists='replace',index=False )
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, table_name, null="")
conn.commit()
cur.close()
Currently I'm using the code below on Python 3.5, Windows to read in a parquet file.
import pandas as pd
parquetfilename = 'File1.parquet'
parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'column2'])
However, I'd like to do so without using pandas. How to best do this? I'm using both Python 2.7 and 3.6 on Windows.
You can use duckdb for this. It's an embedded RDBMS similar to SQLite but with OLAP in mind. There's a nice Python API and a SQL function to import Parquet files:
import duckdb
conn = duckdb.connect(":memory:") # or a file name to persist the DB
# Keep in mind this doesn't support partitioned datasets,
# so you can only read one partition at a time
conn.execute("CREATE TABLE mydata AS SELECT * FROM parquet_scan('/path/to/mydata.parquet')")
# Export a query as CSV
conn.execute("COPY (SELECT * FROM mydata WHERE col = 'val') TO 'col_val.csv' WITH (HEADER 1, DELIMITER ',')")
Problem Statement:
I have multiple csv files. I am cleaning them using python and inserting them to SQL server using bcp. Now I want to insert that into Greenplum instead of SQL Server. Please suggest a way to bulk insert into greenplum table directly from python data-frame to GreenPlum table.
Solution: (What i can think)
Way i can think is CSV-> Dataframe -> Cleainig -> Dataframe -> CSV -> then Use Gpload for Bulk load. And integrate it in Shell script for automation.
Do anyone has a good solution for it.
Issue in loading data directly from dataframe to gp table:
As gpload ask for the file path. Can i pass a varibale or dataframe to that? Is there any way to bulkload into greenplum ?I dont want to create a csv or txt file from dataframe and then load it to greenplum.
I would use psycopg2 and the io libraries to do this. io is built-in and you can install psycopg2 using pip (or conda).
Basically, you write your dataframe to a string buffer ("memory file") in the csv format. Then you use psycopg2's copy_from function to bulk load/copy it to your table.
This should get you started:
import io
import pandas
import psycopg2
# Write your dataframe to memory as csv
csv_io = io.StringIO()
dataframe.to_csv(csv_io, sep='\t', header=False, index=False)
csv_io.seek(0)
# Connect to the GreenPlum database.
greenplum = psycopg2.connect(host='host', database='database', user='user', password='password')
gp_cursor = greenplum.cursor()
# Copy the data from the buffer to the table.
gp_cursor.copy_from(csv_io, 'db.table')
greenplum.commit()
# Close the GreenPlum cursor and connection.
gp_cursor.close()
greenplum.close()
I have a large sql file (20 GB) that I would like to convert into csv. I plan to load the file into Stata for analysis. I have enough ram to load the entire file (my computer has 32GB in RAM)
Problem is: the solutions I found online with Python so far (sqlite3) seem to require more RAM than my current system has to:
read the SQL
write the csv
Here is the code
import sqlite3
import pandas as pd
con=sqlite3.connect('mydata.sql')
query='select * from mydata'
data=pd.read_sql(query,con)
data.to_csv('export.csv')
con.close()
The sql file contains about 15 variables that can be timestamps, strings or numerical values. Nothing really fancy.
I think one possible solution could be to read the sql and write the csv file one line at a time. However, I have no idea how to do that (either in R or in Python)
Any help really appreciated!
You can read the SQL database in batches and write them to file instead of reading the whole database at once. Credit to How to add pandas data to an existing csv file? for how to add to an existing CSV file.
import sqlite3
import pandas as pd
# Open the file
f = open('output.csv', 'w')
# Create a connection and get a cursor
connection = sqlite3.connect('mydata.sql')
cursor = connection.cursor()
# Execute the query
cursor.execute('select * from mydata')
# Get data in batches
while True:
# Read the data
df = pd.DataFrame(cursor.fetchmany(1000))
# We are done if there are no data
if len(df) == 0:
break
# Let's write to the file
else:
df.to_csv(f, header=False)
# Clean up
f.close()
cursor.close()
connection.close()
Use the sqlite3 command line program like this from the Windows cmd line or UNIX shell:
sqlite3 -csv "mydata.sql" "select * from mydata;" > mydata.csv
If mydata.sql is not in the current directory use the path and on Windows use forward slashes rather than backslashes.
Alternately run sqlite3
sqlite3
and enter these commands at the sqlite prompt:
.open "mydata.sql"
.ouptut mydata.csv
.mode csv
select * from mydata;
.quit
(or put them in a file called run, say, and use sqlite3 < run .
Load the .sql file in mysql database and export it as CSV.
Commans to load mysql dump file in MySQL database.
Create a MySQL database
create database <database_name>
mysqldump -u root -p <database_name> < dumpfilename.sql
Command to export MySQL table as CSV
mysql -u root -p
use <database_name>
SELECT * INTO OUTFILE 'file.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
FROM <table_name>;