what slows down parsing?

what slows down parsing? - python

I have big XML files to parse (about 200k lines and 10MB). The structure is following:
<el1>
<el2>
<el3>
<el3-1>...</el3-1>
<el3-2>...</el3-2>
</el3>
<el4>
<el4-1>...</el4-1>
<el4-2>...</el4-2>
</el4>
<el5>
<el5-1>...</el4-1>
<el5-2>...</el5-2>
</el5>
</el2>
</el1>
Here is my code:
tree = ElementTree.parse(filename)
doc = tree.getroot()
cursor.execute(
'INSERT INTO first_table() VALUES()',
())
cursor.execute('SELECT id FROM first_table ORDER BY id DESC limit 1')
row = cursor.fetchone()
v_id1 = row[0]
for el1 in doc.findall('EL1'):
cursor.execute(
'INSERT INTO second_table() VALUES(v_id1)',
(v_id1))
cursor.execute(
'SELECT id FROM second_table ORDER BY id DESC limit 1')
row = cursor.fetchone()
v_id2 = row[0]
for el2 in el1.findall('EL2'):
cursor.execute(
'INSERT INTO third_table(v_id2) VALUES()',
(v_id2))
cursor.execute(
'SELECT id FROM third_table ORDER BY id DESC limit 1')
row = cursor.fetchone()
v_id3 = row[0]
for el3 in el2.findall('EL3'):
cursor.execute(
'INSERT INTO fourth_table(v_id3) VALUES()',
(v_id3))
cursor.execute(
'SELECT id FROM fourth_table ORDER BY id DESC limit 1')
row = cursor.fetchone()
v_id4 = row[0]
for el4 in el3.findall('EL4'):
cursor.execute(
'INSERT INTO fifth_table(v_id4) VALUES()',
(v_id4))
for el5 in el4.findall('EL5'):
cursor.execute(
'INSERT INTO sixth_table(v_id4) VALUES()',
(v_id4))
cursor.execute(
'SELECT id FROM sixth_table ORDER BY id DESC limit 1')
row = cursor.fetchone()
v_id5 = row[0]
...
conn.commit()
Basically I get values from attributes and send them into the database. When I need to process nested elements, I have to SELECT last inserted ID from the database and INSERT it as a foreign key into the next INSERT statement.
The whole process takes about 50s but apparently it's too long for the data I have. The SELECT statements for sure take some time, but I already selecting only 1 attribute on last row.
I don't know if it can be faster since I'm not good at programming so I ask you guys.

You have 4 nested for loops. That's why. It is O(n^4).

Related

How to fix 'SQLite3.Programming Error: Incorrect number of bindings' in Python?

I am trying to SELECT an 'ItemID' from a table WHERE 'OrderNo' is equal to user the user's input.
I'm doing this to create a report on how ever many items the customer has purchased
However, I keep getting the same error;
cursor.execute("SELECT ItemID FROM 'Order_Line' WHERE OrderNo=?", ordernum)
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 2 supplied.
This is what I tried;
customerID = int(input("Enter customer ID: "))
check_cid = "SELECT * FROM Customer WHERE CustomerID=?"
cursor.execute(check_cid, [customerID])
check_val = cursor.fetchone()
if check_val:
cname_query = "SELECT CustomerName FROM Customer WHERE CustomerID=?"
cursor.execute(cname_query, [customerID])
cname = cursor.fetchone()
connection.commit()
# All items customer has ordered and total quantity of each item ordered
ordernum_query = "SELECT OrderNo FROM 'Order' WHERE CustomerID=?"
cursor.execute(ordernum_query, [customerID])
ordernum = cursor.fetchall()
connection.commit()
cursor.execute("SELECT ItemID FROM 'Order_Line' WHERE OrderNo=?", ordernum)
itemid = cursor.fetchone()
connection.commit()
print(itemid)
itemname_query = "SELECT ItemName FROM Inventory WHERE ItemID=?"
cursor.execute(itemname_query, [itemid])
itemname = cursor.fetchone()
connection.commit()
print(itemname)
I was hoping to create a list of all items they have ordered and the total quantity of each item ordered - but I'm totally clueless on how to achieve this (I was thinking of a dictionary but I'm not sure), any help would be appreciated on how to do this.

The problem is that ordernum is a list of tuples containing all of the customer's order numbers. If you just want one order, you can index it.
Also, itemid is a tuple, you shouldn't put it into a list in the next query. Even when you query a single column, fetchone() will put it into a tuple.
if len(ordernum) > 0:
cursor.execute("SELECT ItemID FROM 'Order_Line' WHERE OrderNo=?", ordernum[0])
itemid = cursor.fetchone()
print(itemid)
itemname_query = "SELECT ItemName FROM Inventory WHERE ItemID=?"
cursor.execute(itemname_query, itemid)
itemname = cursor.fetchone()
connection.commit()
print(itemname)else:
print("No orders for this customer")
But you can do this all in one query that joins all the tables:
SELECT c.CustomerName, o.OrderNo, ol.ItemID, i.ItemName
FROM customer AS c
JOIN Order AS o ON o.CustomerID = c.CustomerID
JOIN Order_Line AS ol ON ol.OrderNo = o.OrderNo
JOIN Inventory AS i ON i.ItemID = ol.ItemID
WHERE c.CustomerID = ?
LIMIT 1

pyodbc - cannot delete from MSSQL tables

Trying to delete some of the table entries by using pyodbc in database results in nothing happening. I know for sure that database connection is working as intended, can select data. Perhaps any suggestions what could be the cause?
get_user_id = conn.cursor()
get_user_id.execute('''
SELECT b.UserId
FROM Bindery b
INNER JOIN ActiveUser au
ON au.Id = b.UserId
WHERE au.UserId = ?
''', user_to_kick)
id_list = [id[0] for id in get_user_id.fetchall()]
delete_user = conn.cursor()
#delete from bindery first
delete_user.execute('''
DELETE FROM Bindery
WHERE UserId in (?)
''', id_list)
conn.commit
#delete from active user list
delete_user.execute('''
DELETE FROM ActiveUser
WHERE UserId = ?
''', user_to_kick)
conn.commit
delete_user.close()
conn.close
This is a code block that should imo trigger the delete query, but nothing happens. Select query does indeed get the data.
UPDATE:
After some adjustments and passing list as a parameter fixed, the delete query now indeed works as intended.
get_user_id = conn.cursor()
get_user_id.execute('''
SELECT b.UserId
FROM Bindery b
INNER JOIN ActiveUser au
ON au.Id = b.UserId
WHERE au.UserId = ?
''', user_to_kick)
id_list = [id[0] for id in get_user_id.fetchall()]
placeholders = ", ".join(["?"] * len(id_list))
sql = 'DELETE FROM Bindery\
WHERE UserId in (%s)' % placeholders
delete_user = conn.cursor()
#delete from bindery first
delete_user.execute(sql, id_list)
conn.commit()
#delete from active user list
delete_user.execute('''
DELETE FROM ActiveUser
WHERE UserId = ?
''', user_to_kick)
conn.commit()
get_user_id.close()
delete_user.close()
conn.close()

Python MySQL SELECT WHERE with list

I have the following Python MySQL code.
cursor = mydb.cursor()
cursor.execute('SELECT id FROM table1 WHERE col1=%s AND col2=%s', (val1, val2))
ids = cursor.fetchall()
for id in ids:
cursor.execute('SELECT record_key FROM table2 WHERE id=%s limit 1', (id[0], ))
record_keys = cursor.fetchall()
print(record_keys[0][0])
How can I make this more efficient? I am using 5.5.60-MariaDB and Python 2.7.5. I have approximately 350 million entries in table1 and 15 million entries in table2.

Happily, you can do this in a single query using a LEFT JOIN.
cursor = mydb.cursor()
cursor.execute(
"SELECT t1.id, t2.record_key FROM table1 t1 "
"LEFT JOIN table2 t2 ON (t1.id = t2.id) "
"WHERE t1.col1=%s AND t2.col2=%s",
(val1, val2),
)
for id, record_key in cursor.fetchall():
pass # do something...

Sqlite&Python: Exiting for loop

Can someone please explain why the first loop gets exited, when the second loop is done.
First i get all table names in database(Total 4 results)
Then i want to get all data from that table.
But i only get the data from the first table for some reason.
If i remove the loop that gets the data from the table, then it runs the first for loop all the way to the end.
#Get all tables in database file
for tablename in c.execute("SELECT name FROM sqlite_master WHERE type='table';"):
print(tablename[0])
for elementdate in c.execute('SELECT * FROM %s ORDER BY Date DESC' % tablename[0]):
print(elementdate)
Output:
table_1
(1, '20120210', 360)
(2, '20100210', 204)
Loop Excited
Same code just without last for loop
#Get table names
for tablename in c.execute("SELECT name FROM sqlite_master WHERE type='table';"):
print(tablename[0])
#for elementdate in c.execute('SELECT * FROM %s ORDER BY Date DESC' % tablename[0]):
# print(elementdate)
Output:
table_1
table_2
table_3
table_4
Loop Excited
Have i found an error or am i just dumb?

You shouldn't execute few queries in the same cursor before fetching results of first one:
c.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = c.fetchall()
for tablename in tables:
print(tablename[0])
c.execute('SELECT * FROM %s ORDER BY Date DESC' % tablename[0])
for elementdate in c.fetchall():
print(elementdate)

A single cursor object works only with a single query at a time; execute() overwrites any previous results.
If you want to execute two queries at the same time, use two cursors:
c = db.cursor()
c2 = db.cursor()
for row in c.execute("SELECT name FROM sqlite_master WHERE type='table'"):
tablename = row[0]
for row2 in c2.execute("SELECT * FROM %s ORDER BY Date DESC" % tablename):
...
Note: it would be a bad idea to modify the table while some other query on it is still running.

Python & SQLite3 Selecting from two tables

I have written this code in python, which I basically opens up my SQLite3 database and looks at each row in the table 'contact' and then takes each 'id' number and then looks at the matching 'id' in the table 'Users'. My problem is that it only outputs the first one and does not loop through all the rows.
import sqlite3
conn = sqlite3.connect('sqlite3.db')
cursor = conn.cursor()
cursor2 = conn.cursor()
cursor3 = conn.cursor()
text_file = open("Output.txt", "w");
try:
cursor.execute("SELECT Id, address FROM contact;") # Get address details by ID
for row in cursor:
ID = row[0]
address= row[1]
cursor2.execute("SELECT name FROM Users WHERE id= " + str(ID) + ";") # Get users's name by ID
row2 = cursor2.fetchone()
sendername = row2[0]
text_file.write(firstname, lastname, address);
finally:
conn.close()
Any suggestions, I'm very new to python.

You can ask the database to do a join instead:
cursor.execute("""\
SELECT u.name, c.address
FROM contact c
INNER JOIN Users u ON u.id = c.Id
""")
with open('Output.txt', 'w') as outfh:
for name, address in cursor:
outfh.write('{} {}\n'.format(name, address)
The INNER JOIN tells SQLite to only pick rows for which there is an actual match on the id columns. If you marked the id column as a foreign key in the contact table, you could use a NATURAL INNER JOIN as well, and omit the ON clause.

If I understand you:
cursor.execute("SELECT Users.name, contact.address FROM Users, contact WHERE contact.Id = Users.id;")
for row in cursor:
name= row[0]
address= row[1]
text_file.write(name+" "+address)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

what slows down parsing? - python

You have 4 nested for loops. That's why. It is O(n^4).

Related

How to fix 'SQLite3.Programming Error: Incorrect number of bindings' in Python?

pyodbc - cannot delete from MSSQL tables

Python MySQL SELECT WHERE with list

Sqlite&Python: Exiting for loop

Python & SQLite3 Selecting from two tables

Categories

Resources