Using python to insert a large csv file into mysql table [duplicate]

Using python to insert a large csv file into mysql table [duplicate] - python

This question already has answers here:
Python MySql Insert not working
(5 answers)
Load CSV data into MySQL in Python
(7 answers)
Closed 1 year ago.
I have a workflow where I need to take a 500k row csv and import it into a mysql table. I have a python script that seems to be working, but no data is being saved into the actual table when I select it. I'm dropping and re-creating the table headers then trying to bulk insert the csv file, but it doesn't look like the data is going in. No errors are reported in the python console when running.
The script takes about 2 minutes to run which makes me think that it's doing something, but I don't get anything other than the column headers when I select * from the table itself.
My script looks roughly like:
import pandas as pd
import mysql.connector
dataframe.to_csv('import-data.csv', header=False, index=False)
DB_NAME = 'SCHEMA1'
TABLES = {}
TABLES['TableName'] = (
"CREATE TABLE `TableName` ("
"`x_1` varchar(10) NOT NULL,"
"`x_2` varchar(20) NOT NULL,"
"PRIMARY KEY (`x_1`)"
") ENGINE = InnoDB")
load = """
LOAD DATA LOCAL INFILE 'import-data.csv'
INTO TABLE SCHEMA1.TableName
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
"""
conn = mysql.connector.connect(
host=writer_host,
port=port,
user=username,
password=password,
database=username,
ssl_ca=cert_path
)
cursor = conn.cursor(buffered=True)
cursor.execute("DROP TABLE IF EXISTS SCHEMA1.TableName")
cursor.execute(TABLES['TableName'])
cursor.execute(load)
cursor.close()
conn.close()

missing commit after executing your commands
cursor.commit()

Related

Update SQL Server using pyodbc

I want to save my dataframe to SQL Server with pyodbc that updates every month (I want the SQL data contains 300 data with updates everymonth).the problem is every time I run the py file, it gets added instead replace all data. Before I'm using sqlachemy and I can do it with if_exist=replace. Now I'm using pyodbc, I don't know what to do. This is what I do
col_names = ["month", "price", "change"]
df = pd.read_csv("sawit.csv",sep=',',quotechar='\'',encoding='utf8', names=col_names,skiprows = 1) # Replace Excel_file_name with your excel sheet name
for index,row in df.iterrows():
cursor.execute("update dbo.sawit set month = ?, price = ?, change =? ;", (row.month, row.price, row.change))
cnxn.commit()
cursor.close()
cnxn.close()
But the result that I got is the date all replaced with last record. What should I do? Thank you in advance.

There's a much simpler way to do this kind of thing.
import pandas as pd
import pyodbc
from fast_to_sql import fast_to_sql as fts
# Test Dataframe for insertion
df = pd.DataFrame(your_dataframe_here)
# Create a pyodbc connection
conn = pyodbc.connect(
"""
Driver={ODBC Driver 17 for SQL Server};
Server=localhost;
Database=my_database;
UID=my_user;
PWD=my_pass;
"""
)
# If a table is created, the generated sql is returned
create_statement = fts.fast_to_sql(df, "my_great_table", conn, if_exists="replace")
# Commit upload actions and close connection
conn.commit()
conn.close()
Main Function:
fts.fast_to_sql(df, name, conn, if_exists="append", custom=None, temp=False)
Here is a slightly different way to do essentially the same thing.
import pyodbc
engine = "mssql+pyodbc://server_name/db_name?driver=SQL Server Native Client 11.0?trusted_connection=yes"
# your dataframe is here
df.to_sql(name_of_dataframe, engine, if_exists='append', index=True, chunksize=100000)
NOTE: pyodbc will dynamically create the appropriate strongly-types fields in the table for you.

Your sql sql query does not say what entry to be replaced. There is neither a where clause to select the correct line for each entry, neither there is some primary key. So in every loop, all rows are replaced with the current entry. The last time this is done, is with the last entry, therefore every row is replaced with the last entry.
You can add some a where clause looking for the correct month to replaced.
something equivalent to this:
updatedbo.sawit set month = ?, price = ? where month = ?;", (row.month, row.price, row.month)

Creating a list in Python and inserting into Oracle table, and then able to retrieve the count ,but the values are not found in oracle table

Creating a list in python and inserting into oracle table , but no records found in oracle table.
Created a list in python.
Created a Oracle table using Python code.
Using executeMany inserted the list.
Run the count(*) query in python and obtained the number of rows and printed in python.
Output : Table has been created in Oracle using python code succesfully , but cannot find the records inserted using python
import cx_Oracle
con = cx_Oracle.connect('username/password#127.0.0.1/orcl')
cursor = con.cursor()
create_table = """CREATE TABLE python_modules ( module_name VARCHAR2(1000) NOT NULL, file_path VARCHAR2(1000) NOT NULL )"""
cursor.execute(create_table)
M = []
M.append(('Module1', 'c:/1'))
M.append(('Module2', 'c:/2'))
M.append(('Module3', 'c:3'))
cursor.prepare("INSERT INTO python_modules(module_name, file_path) VALUES (:1, :2)")
cursor.executemany(None,M)
con.commit
cursor.execute("SELECT COUNT(*) FROM python_modules")
print(cursor.fetchone() [0])
Executing below query "select * from python_modules " should have the 3 records in Oracle SQL Developer tool

Change your commit to con.commit().

SQLite Python Blaze - Attempting to create a table after dropping a table of same name returns old schema

I am trying to work out why the schema of a dropped table returns when I attempt to create a table using a different set of column names?
After dropping the table, I can confirm in an SQLite explorer that the table has disappeared. Once trying to load the new file via ODO it then returns an error "Column names of incoming data don't match column names of existing SQL table names in SQL table". Then I can see the same table is re-created in the database, using the previously dropped schema! I attempted a VACUUM statement after dropping the table but still same issue.
I can create the table fine using a different table name, however totally confused as to why I can't use the previously dropped table name I want to use?
import sqlite3
import pandas as pd
from odo import odo, discover, resource, dshape
conn = sqlite3.connect(dbfile)
c = conn.cursor()
c.execute("DROP TABLE <table1>")
c.execute("VACUUM")
importfile = pd.read_csv(csvfile)
odo(importfile,'sqlite:///<db_path>::<table1'>)
ValueError: Column names of incoming data don't match column names of existing SQL table Names in SQL table:

import sqlite3
import pandas as pd
from odo import odo, discover, resource, dshape
conn = sqlite3.connect('test.db')
cursor = conn.cursor();
table = """ CREATE TABLE IF NOT EXISTS TABLE1 (
id integer PRIMARY KEY,
name text NOT NULL
); """;
cursor.execute(table);
conn.commit(); # Save table into database.
cursor.execute(''' DROP TABLE TABLE1 ''');
conn.commit(); # Save that table has been dropped.
cursor.execute(table);
conn.commit(); # Save that table has been created.
conn.close();

Running a entire SQL script via python

I'm looking to run the following test.sql located in a folder on my C: drive. I've been playing with cx_Oracle and just can't get it to work.
test.sql contains the following.
CREATE TABLE MURRAYLR.test
( customer_id number(10) NOT NULL,
customer_name varchar2(50) NOT NULL,
city varchar2(50)
);
CREATE TABLE MURRAYLR.test2
( customer_id number(10) NOT NULL,
customer_name varchar2(50) NOT NULL,
city varchar2(50)
);
This is my code:
import sys
import cx_Oracle
connection = cx_Oracle.connect('user,'password,'test.ora')
cursor = connection.cursor()
f = open("C:\Users\desktop\Test_table.sql")
full_sql = f.read()
sql_commands = full_sql.split(';')
for sql_command in sql_commands:
cursor.execute(sql_command)
cursor.close()
connection.close()

This answer is relevant only if your test.sql file contains new lines '\n\' characters (like mine which I got from copy-pasting your sql code). You will need to remove them in your code, if they are present. To check, do
print full_sql
To fix the '\n's,
sql_commands = full_sql.replace('\n', '').split(';')[:-1]
The above should help.
It removes the '\n's and removes the empty string token at the end when splitting the sql string.

MURRAYLR.test is not acceptable table name in any DBMS I've used. The connection object the cx_oracle.connect returns should already have a schema selected. To switch to a different schema set the current_schema field on the connection object or add using <Schemaname>; in your sql file.
Obviously make sure that the schema exists.

Read a csv and insert to database performance

I have a mission to read a csv file line by line and insert them to database.
And the csv file contains about 1.7 million lines.
I use python with sqlalchemy orm(merge function) to do this.
But it spend over five hours.
Is it caused by python slow performance or sqlalchemy or sqlalchemy?
or what if i use golang to do it to make a obvious better performance?(but i have no experience on go. Besides, this job need to be scheduled every month)
Hope you guy giving any suggestion, thanks!
Update: database - mysql

For such a mission you don't want to insert data line by line :) Basically, you have 2 ways:
Ensure that sqlalchemy does not run queries one by one. Use BATCH INSERT query (How to do a batch insert in MySQL) instead.
Massage your data in a way you need, then output it into some temporary CSV file and then run LOAD DATA [LOCAL] INFILE as suggested above. If you don't need to preprocess you data, just feed the CSV to the database (I assume it's MySQL)

Follow below three steps
Save the CSV file with the name of table what you want to save it
to.
Execute below python script to create a table dynamically
(Update CSV filename, db parameters)
Execute "mysqlimport
--ignore-lines=1 --fields-terminated-by=, --local -u dbuser -p db_name dbtable_name.csv"
PYTHON CODE:
import numpy as np
import pandas as pd
from mysql.connector import connect
csv_file = 'dbtable_name.csv'
df = pd.read_csv(csv_file)
table_name = csv_file.split('.')
query = "CREATE TABLE " + table_name[0] + "( \n"
for count in np.arange(df.columns.values.size):
query += df.columns.values[count]
if df.dtypes[count] == 'int64':
query += "\t\t int(11) NOT NULL"
elif df.dtypes[count] == 'object':
query += "\t\t varchar(64) NOT NULL"
elif df.dtypes[count] == 'float64':
query += "\t\t float(10,2) NOT NULL"
if count == 0:
query += " PRIMARY KEY"
if count < df.columns.values.size - 1:
query += ",\n"
query += " );"
#print(query)
database = connect(host='localhost', # your host
user='username', # username
passwd='password', # password
db='dbname') #dbname
curs = database.cursor(dictionary=True)
curs.execute(query)
# print(query)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using python to insert a large csv file into mysql table [duplicate] - python

missing commit after executing your commands cursor.commit()

Related

Update SQL Server using pyodbc

Creating a list in Python and inserting into Oracle table, and then able to retrieve the count ,but the values are not found in oracle table

SQLite Python Blaze - Attempting to create a table after dropping a table of same name returns old schema

Running a entire SQL script via python

Read a csv and insert to database performance

Categories

Resources