Python mysql memory leak in insertion - python

I'm inserting millions of rows in MySQL using Python3 but I found the memory usage keeps growing and finally reached 64GB. I tried to diagnose the problem and here is a reproduction of the problem: say I have 100 CSV files. Each file contains 50000 rows and I want to insert them into the database. Here is a sample code:
import mysql.connector
insert_sql = ("INSERT INTO table (Value) VALUES (%s)")
for i in range(100):
cnx = mysql.connector.connect(user='root', password='password', host='127.0.0.1', database='database')
cursor = cnx.cursor()
# Insert 50000 rows here
for j in range(50000):
cursor.execute(insert_sql, (j,))
cnx.commit()
cursor.close()
cnx.close()
print('Finished processing one file')
print('All done')
The database contains only 1 table with 2 columns:
CREATE TABLE `table` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Value` int(11) NOT NULL,
PRIMARY KEY (`Id`)
)
Environment: Mac OS Sierra; Python 3.6.x; MySQL 8.0.1; mysql-connector-python 8.0.11
I understand the memory should grow before committing because the changes are buffered. But I supposed it will decrease after the committing. However, it doesn't. Since in my real application I have thousands of files with 100MB each, my memory will blow up.
Did I do anything wrong here? (I'm new to database) How can I keep the memory usage under control? Any suggestion will be appreciated!
Edit: I also tried the following code according to the comments and answers but it still doesn't work:
import mysql.connector
insert_sql = ("INSERT INTO table (Value) VALUES (%s)")
for i in range(100):
cnx = mysql.connector.connect(user='root', password='password', host='127.0.0.1', database='database')
cursor = cnx.cursor()
params = [(j,) for j in range(50000)]
# If I don't excute the following insertion, the memory is stable.
cnx.executemany(insert_sql, params)
cnx.commit()
cursor.close()
del cursor
cnx.close()
del cnx
print('Finished processing one file')
print('All done')

Try batch execution, this loop of inserts might be the problem.
You can do executemany:
c.executemany("INSERT INTO table (Value) VALUES (%s)",
[('a'),('b')])
or big insert statement with all the values you want at the same time.

Related

How can I sort a database by date?

I am creating a Python app that will store my homework in a database (using PhpMyAdmin). Here comes my problem:
At this moment, I am sorting every input with an ID (1, 2, 3, 4...), a date (23/06/2018...), and a task (read one chapter of a book). Now I would like to sort them by the date because when I want to read what do I have to do. I would prefer to see what shall I do first, depending on when should I get it done. For example:
If I have two tasks: one 25/07/2018 and the other 11/07/2018, I would like to show the 11/07/2018 first, no matter if it was addead later than the 25/07/2018. I am using Python (3.6), pymysql and PhpMyAdmin to manage the database.
I have had an idea to get this working, maybe I could run a Python script every 2 hours, that sorts all the elements in the database, but I have no clue about how can I do it.
Now, I will show you the code that enters the values into a database and then it shows them all.
def dba():
connection = pymysql.connect(host='localhost',
user='root',
password='Adminhost123..',
db='deuresc',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
# Create a new record
sql = "INSERT INTO `deures` (`data`, `tasca`) VALUES (%s, %s)"
cursor.execute(sql, (data, tasca))
# connection is not autocommit by default. So you must commit to save
# your changes.
connection.commit()
with connection.cursor() as cursor:
# Read a single record
sql = "SELECT * FROM `deures` WHERE `data`=%s"
cursor.execute(sql, (data,))
resultat = cursor.fetchone()
print('Has introduït: ' + str(resultat))
finally:
connection.close()
def dbb():
connection = pymysql.connect(host='localhost',
user='root',
password='Adminhost123..',
db='deuresc',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
# Read a single record
sql = "SELECT * FROM `deures`"
cursor.execute(sql)
resultat = cursor.fetchall()
for i in resultat:
print(i)
finally:
connection.close()
Can someone help?
You don't sort the database. You sort the results of the query when you ask for data. So in your dbb function you should do:
SELECT * FROM `deures` ORDER BY `data`
assuming that data is the field with the date.

inserting a file to ms sql server through python

I am quite new to programming. I have written the following code by researching from StackOverflow and other sites. I am trying to upload a csv file to the MS SQL Server. Every time I run this it connects and then a message pops up 'Previous SQL was not a query'. I am not sure how to actually tackle this. Any suggestions and help will be appreciated
import pyodbc import _csv
source_path= r'C:\Users\user\Documents\QA Canvas\module2\Module 2 Challenge\UFO_Merged.csv'
source_expand= open(source_path, 'r')
details= source_expand.readlines
print('Connecting...')
try:
conn = pyodbc.connect(r'DRIVER={ODBC Driver 13 for SQL Server};'r'SERVER=FAHIM\SQLEXPRESS;'r'DATABASE=Ash;'r'Trusted_Connection=yes')
print('Connected')
cur = conn.cursor()
print('Cursor established')
sqlquery ="""
IF EXISTS
(
SELECT TABLE_NAME ,TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'UFO_MERGED' AND TABLE_SCHEMA = 'dbo')
BEGIN
DROP TABLE [dbo].[UFO_MERGED]
END
CREATE TABLE [dbo].[UFO_MERGED]
( [ID] smallint
,[COMMENTS] varchar(max)
,[FIRST OCCURANCE] datetime
,[CITY] varchar(60)
,[COUNTRY] varchar(20)
,[SHAPE] varchar(20)
,[SPEED] smallint
,[SECOND OCCURANCE] datetime
PRIMARY KEY(id)
) ON [PRIMARY]
"""
result = cur.execute(sqlquery).fetchall()
for row in result:
print(row)
print("{} rows returned".format(len(result)))
sqlstr= """
Insert into [dbo].[UFO_Merged] values ('()','()','()','()','()','()','()','()')
"""
for row in details[1:]:
row_data =row.split(',')
sqlquery=sqlstr.format(row_data[0],row_data[1],row_data[2],row_data[3],row_data[4],row_data[5],row_data[6],row_data[7])
result=cur.execute(sqlquery)
conn.commit()
conn.close()
except Exception as inst:
if inst.args[0]== '08001':
print("Cannot connect to the server")
elif inst.args[0] == '28000':
print("Login failed - check connection string")
else:
print(inst)
Well, make sure the SQL works first, before you try to introduce other technologies (Python, R, C#, etc.) on top of it. The SQL looks a little funky, but I'm not a SQL expert, so I can't say for sure, and I don't have time to recreate your setup on my machine. Maybe you can try with something a bit less complex, get that working, and then graduate to something more advanced. Does the following work for you?
import pyodbc
user='sa'
password='PC#1234'
database='climate'
port='1433'
TDS_Version='8.0'
server='192.168.1.103'
driver='FreeTDS'
con_string='UID=%s;PWD=%s;DATABASE=%s;PORT=%s;TDS=%s;SERVER=%s;driver=%s' % (user,password, database,port,TDS_Version,server,driver)
cnxn=pyodbc.connect(con_string)
cursor=cnxn.cursor()
cursor.execute("INSERT INTO mytable(name,address) VALUES (?,?)",('thavasi','mumbai'))
cnxn.commit()

Pymysql INSERT does not insert but rising Duplicate Error

I try to insert data via python in my mysql database, it doesn't insert the data, I find my database to be empty but if I redo the same INSERT command, it rises a Duplicate Error.
Here is some example for my code:
connection = pymysql.connect(host='localhost',
user='user',
password='password',
db='literatur',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
with connection.cursor() as cursor:
sql = "INSERT into tf_data (tf_data_id, abstract) values (4, 'TEST')"
cursor.execute(sql)
connection.commit
It is somehow connecting the database (UPDATE commands actually work) and increasing the auto-increment of tf_data_id.
If I however do SELECT * FROM tf_data; mysql gives me >> empty set (0.00 s). How to find out what the problem is?

Optimising mysql insert query via python

db = MySQLdb.connect(host="xxx.xx.xx.x",
user="xxx",
passwd="xxx",
db="xxxx")
for loop on json data:
cursor = db.cursor()
cursor.execute('Insert Query')
db.commit()
db.close()
Would it be possible for me to improve this query? I'm considering doing multiple cursor.execute before db.commit()
I'm unsure how db.commit() works and the importance of it.
I'm basically looping a json data and inserting it with a loop. I cannot avoid having multiple inserts.
Depending on how json_data is structured you should be able to use .executemany():
db = MySQLdb.connect(host="xxx.xx.xx.x",
user="xxx",
passwd="xxx",
db="xxxx")
cursor = db.cursor()
cursor.executemany('Insert Query',json_data)
db.commit()
cursor.close()
db.close()

How do I determine if the row has been inserted?

When using sqlite3 for python, how do I determine if a row has been successfully inserted into a table? e.g.
conn = sqlite3.connect("test.db")
c = conn.cursor()
c.execute("INSERT INTO TEST VALUES ('sample text')")
c.commit()
c.close()
If no exception was thrown when calling execute() or commit(), it was inserted when you called commit().
Committing a transaction successfully is a guarantee from the database layer that the insert was written to disk.
you can get all the rows and see if its in there with:
SELECT * FROM TEST
But SQLite will give you an error message if it didnt work.
you can count() rows before inserting and after inserting.
You could try something like this to have an error message:
try:
c.execute("INSERT INTO TEST VALUES ('sample text')")
except sqlite3.OperationalError, msg:
print msg
You should do the commit on connection made(db selected which is conn) not on cursor.
conn = sqlite3.connect("test.db")
c = conn.cursor()
c.execute("INSERT INTO TEST VALUES ('sample text')")
#commit the changes to db
conn.commit()
conn.close()
First You should do the commit on the connection object not the cursor i.e
conn.commit() not c.commit()
Then you can examine lastrowid on the cursor to determine if the insert was successful after conn.commit()
c.lastrowid

Categories

Resources