Inserting to sqlite dynamically with Python 3 - python

I want to write to multiple tables with sqlite, but I don't want to manually specify the query ahead of time (there are dozens of possible permutations).
So for example:
def insert_sqlite(tablename, data_list)
global dbc
dbc.execute("insert into " + tablename + " values (?)", data_list)
tables_and_data = {
'numbers_table': [1,2,3,4,5],
'text_table': ["pies","cakes"]
}
for key in tables_and_data:
insert_sqlite(key, tables_and_data[key])
I want two things to happen:
a) for the tablename to be set dynamically - I've not found a single example where this is done.
b) The data_list values to be correctly used - note that the length of the list varies (as per the example).
But the above doesn't work - How do I dynamically create a sqlite3.execute statement?
Thanks

a) Your code above seems to be setting the table name correctly so no problem there
b) You need a ?(placeholder) per column you wish to insert a value for.
when i recreate your code as is and run it i get the error message:
"sqlite3.OperationalError: table numbers_table has 5 columns but 1 values were supplied".
A solution would be to edit your function to dynamically create the correct number of placeholders:
def insert_sqlite(tablename, data_list):
global dbc
dbc.execute("insert into " + tablename + " values (" + ('?,' * len(data_list))[:-1] + ")", data_list)
after doing this and then re-executing the code with an added select statements (just to test it out):
dbc.execute("""
select * from numbers_table
""")
print(dbc.fetchall());
dbc.execute("""
select * from text_table
""")
print(dbc.fetchall());
I get the result:
[(1, 2, 3, 4, 5)]
[(u'pies', u'cakes')]

Related

Making comparing 2 tables faster (Postgres/SQLAlchemy)

I wrote a code in python to manipulate a table I have in my database. I am doing so using SQL Alchemy. Basically I have table 1 that has 2 500 000 entries. I have another table 2 with 200 000 entries. Basically what I am trying to do, is compare my source ip and dest ip in table 1 with source ip and dest ip in table 2. if there is a match, I replace the ip source and ip dest in table 1 with a data that matches ip source and ip dest in table 2 and I add the entry in table 3. My code also checks if the entry isn't already in the new table. If so, it skips it and then goes on with the next row.
My problem is its extremely slow. I launched my script yesterday and in 24 hours it only went through 47 000 entries out of 2 500 000. I am wondering if there are anyways I can speed up the process. It's a postgres db and I can't tell if the script taking this much time is reasonable or if something is up. If anyone had a similar experience with something like this, how much time did it take before completion ?
Many thanks.
session = Session()
i = 0
start_id = 1
flows = session.query(Table1).filter(Table1.id >= start_id).all()
result_number = len(flows)
vlan_list = {"['0050']", "['0130']", "['0120']", "['0011']", "['0110']"}
while i < result_number:
for flow in flows:
if flow.vlan_destination in vlan_list:
usage = session.query(Table2).filter(Table2.ip ==
str(flow.ip_destination)).all()
if len(usage) > 0:
usage = usage[0].usage
else:
usage = str(flow.ip_destination)
usage_ip_src = session.query(Table2).filter(Table2.ip ==
str(flow.ip_source)).all()
if len(usage_ip_src) > 0:
usage_ip_src = usage_ip_src[0].usage
else:
usage_ip_src = str(flow.ip_source)
if flow.protocol == "17":
protocol = func.REPLACE(flow.protocol, "17", 'UDP')
elif flow.protocol == "1":
protocol = func.REPLACE(flow.protocol, "1", 'ICMP')
elif flow.protocol == "6":
protocol = func.REPLACE(flow.protocol, "6", 'TCP')
else:
protocol = flow.protocol
is_in_db = session.query(Table3).filter(Table3.protocol ==
protocol)\
.filter(Table3.application == flow.application)\
.filter(Table3.destination_port == flow.destination_port)\
.filter(Table3.vlan_destination == flow.vlan_destination)\
.filter(Table3.usage_source == usage_ip_src)\
.filter(Table3.state == flow.state)\
.filter(Table3.usage_destination == usage).count()
if is_in_db == 0:
to_add = Table3(usage_ip_src, usage, protocol, flow.application, flow.destination_port,
flow.vlan_destination, flow.state)
session.add(to_add)
session.flush()
session.commit()
print("added " + str(i))
else:
print("usage already in DB")
i = i + 1
session.close()
EDIT As requested, here are more details : Table 1 has 11 columns, the two we are interested in are source ip and dest ip.
Table 1
Here, I have Table 2 :Table 2. It has an IP and a Usage. What my script is doing is that it takes source ip and dest ip from table one and looks up if there is a match in Table 2. If so, it replaces the ip address by usage, and adds this along with some of the columns of Table 1 in Table 3 :[Table3][3]
Along doing this, when adding the protocol column into Table 3, it writes the protocol name instead of the number, just to make it more readable.
EDIT 2 I am trying to think about this differently, so I did a diagram of my problem Diagram (X problem)
What I am trying to figure out is if my code (Y solution) is working as intended. I've been coding in python for a month only and I feel like I am messing something up. My code is supposed to take every row from my Table 1, compare it to Table 2 and add data to table 3. My Table one has over 2 million entries and it's understandable that it should take a while but its too slow. For example, when I had to load the data from the API to the db, it went faster than the comparisons im trying to do with everything that is already in the db. I am running my code on a virtual machine that has sufficient memory so I am sure it's my code that is lacking and I need direction to as what can be improved. Screenshots of my tables:
Table 2
Table 3
Table 1
EDIT 3 : Postgresql QUERY
SELECT
coalesce(table2_1.usage, table1.ip_source) AS coalesce_1,
coalesce(table2_2.usage, table1.ip_destination) AS coalesce_2,
CASE table1.protocol WHEN %(param_1) s THEN %(param_2) s WHEN %(param_3) s THEN %(param_4) s WHEN %(param_5) s THEN %(param_6) s ELSE table1.protocol END AS anon_1,
table1.application AS table1_application,
table1.destination_port AS table1_destination_port,
table1.vlan_destination AS table1_vlan_destination,
table1.state AS table1_state
FROM
table1
LEFT OUTER JOIN table2 AS table2_2 ON table2_2.ip = table1.ip_destination
LEFT OUTER JOIN table2 AS table2_1 ON table2_1.ip = table1.ip_source
WHERE
table1.vlan_destination IN (
%(vlan_destination_1) s,
%(vlan_destination_2) s,
%(vlan_destination_3) s,
%(vlan_destination_4) s,
%(vlan_destination_5) s
)
AND NOT (
EXISTS (
SELECT
1
FROM
table3
WHERE
table3.usage_source = coalesce(table2_1.usage, table1.ip_source)
AND table3.usage_destination = coalesce(table2_2.usage, table1.ip_destination)
AND table3.protocol = CASE table1.protocol WHEN %(param_1) s THEN %(param_2) s WHEN %(param_3) s THEN %(param_4) s WHEN %(param_5) s THEN %(param_6) s ELSE table1.protocol END
AND table3.application = table1.application
AND table3.destination_port = table1.destination_port
AND table3.vlan_destination = table1.vlan_destination
AND table3.state = table1.state
)
)
Given the current question, I think this at least comes close to what you might be after. The idea is to perform the entire operation in the database, instead of fetching everything – the whole 2,500,000 rows – and filtering in Python etc.:
from sqlalchemy import func, case
from sqlalchemy.orm import aliased
def newhotness(session, vlan_list):
# The query needs to join Table2 twice, so it has to be aliased
dst = aliased(Table2)
src = aliased(Table2)
# Prepare required SQL expressions
usage = func.coalesce(dst.usage, Table1.ip_destination)
usage_ip_src = func.coalesce(src.usage, Table1.ip_source)
protocol = case({"17": "UDP",
"1": "ICMP",
"6": "TCP"},
value=Table1.protocol,
else_=Table1.protocol)
# Form a query producing the data to insert to Table3
flows = session.query(
usage_ip_src,
usage,
protocol,
Table1.application,
Table1.destination_port,
Table1.vlan_destination,
Table1.state).\
outerjoin(dst, dst.ip == Table1.ip_destination).\
outerjoin(src, src.ip == Table1.ip_source).\
filter(Table1.vlan_destination.in_(vlan_list),
~session.query(Table3).
filter_by(usage_source=usage_ip_src,
usage_destination=usage,
protocol=protocol,
application=Table1.application,
destination_port=Table1.destination_port,
vlan_destination=Table1.vlan_destination,
state=Table1.state).
exists())
stmt = insert(Table3).from_select(
["usage_source", "usage_destination", "protocol", "application",
"destination_port", "vlan_destination", "state"],
flows)
return session.execute(stmt)
If the vlan_list is selective, or in other words filters out most rows, this will perform a lot less operations in the database. Depending on the size of Table2 you may benefit from indexing Table2.ip, but do test first. If it is relatively small, I would guess that PostgreSQL will perform a hash or nested loop join there. If some column of the ones used to filter out duplicates in Table3 is unique, you could perform an INSERT ... ON CONFLICT ... DO NOTHING instead of removing duplicates in the SELECT using the NOT EXISTS subquery expression (which PostgreSQL will perform as an antijoin). If there is a possibility that the flows query may produce duplicates, add a call to Query.distinct() to it.

I'm using an INSERT query to insert in mysql table, there's an error in my syntax

x = self.npprocess_PTE.text()
y = 123
if len(self.npprocess_PTE.text()) == 0:
print("yes")
else:
model = QtGui.QStandardItemModel()
self.npprocessview_LV.setModel(model)
z = model.rowCount()
z = z + 1
a = str(z)+'_Process'
b = "INSERT INTO new_part_info(%s) VALUES(%s) WHERE Part_ID = %s"
c = (z, x, y)
mycursor.execute(b, c)
mydb.commit()
The error i am getting:u have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1) VALUES('wb') WHERE Part_ID = 123' at line 1
There are two errors that I can see in your insert query:
As others pointed out, insert statement does not have a where clause. Either you need to get rid of it, or you need to change the insert into an update. However, this is not the error that causes the error message.
You use z variable as the name of the column for the insert. z is a number. As mysql manual on object identifiers says:
Identifiers may begin with a digit but unless quoted may not consist solely of digits.
So, you need to quote the value in z:
w = '`' + str(z) + '`'
and then use w insted of z when executing the insert. Alternatively, you need to use a variable instead of z, since it is created as:
a = str(z)+'_Process'
It depends on the name of the columns within your table, really.

iterating through a user defined list with arcpy

taxNo = arcpy.GetParameterAsText(0)
thisMap = arcpy.mapping.MapDocument("CURRENT")
myDF = arcpy.mapping.ListDataFrames(thisMap)[0]
myLayers = arcpy.mapping.ListLayers(myDF)
for lyr in myLayers:
if lyr.name == "Address Numbers":
arcpy.SelectLayerByAttribute_management(lyr,"NEW_SELECTION","EKEY = " + taxNo[0])
for tax in taxNo:
arcpy.SelectLayerByAttribute_management(lyr,"ADD_TO_SELECTION","EKEY = " + tax)
arcpy.AddWarning("Additional Selection " + tax)
I'm trying to build a script in ArcGIS that will select a series of user defined values, in this case I'm trying to select 1784102 and 1784110. When I use arcpy.AddWarning(taxNo) before the loop, I get the output "1784102;1784110" but it's iterating through it one number at a time i.e.
Additional Selection 1
Additional Selection 7
Additional Selection 8
Additional Selection 4
etc.
then pops up an error when it hits the semi-colon.
The parameters for taxNo are set up in ArcMap as a String, Multivalue, Valuelist.
I will just assume you are calling your script like this:
python script.py 1784102;1784110
Your variable taxNo = arcpy.GetParameterAsText(0) then is a single string "1784102;1784110". If you use "array indexes" on strings (for example taxNo[0], taxNo[1] etc.) you are getting single characters out of that string, i.e. "1", "7", "8" ...
Call .split(';') to your arcpy.GetParameterAsText(0) result to split the string "1784102;1784110" into an array of two strings: ["1784102", "1784110"]. If you need a numeric item, i.e. integers, try this too.
taxNo = arcpy.GetParameterAsText(0).split(';')

Python SQLite3 UPDATE with tuple only uses last value

I'm trying to update all rows of 1 column of my database with a big tuple.
c.execute("SELECT framenum FROM learnAlg")
db_framenum = c.fetchall()
print(db_framenum)
db_framenum_new = []
# How much v6 framenum differentiates from v4
change_fn = 0
for f in db_framenum:
t = f[0]
if t in change_numbers:
change_fn += 1
t = t + change_fn
db_framenum_new.append((t,))
print("")
print(db_framenum_new)
c.executemany("UPDATE learnAlg SET framenum=?", (db_framenum_new))
First I take the existing values of the column 'framenum', which look like:
[(0,), (1,), (2,) , ..., (104,)]
Then I transform the tuple to a list so I can change some values in the for f in db_framenum: loop, which result in a similar tuple:
[(0,), (1,), (2,) , ..., (108,)]
Problem
So far so good, but then I try to update the column 'framenum' with these new framenumbers:
c.executemany("UPDATE learnAlg SET framenum=?", (db_framenum_new))
I expect the rows in the column 'framenum' to have the new values, but instead they all have the value: 108 (which is the last value of the tuple 'db_framenum_new'). Why are they not being updated in order (from 1 till 108)?
Expect:
framenum: 1, 2, .., 108
Got:
framenum: 108, 108, ..., 108
Note: The list of tuples has not become longer, only certain values have been changed to. Everything above 46 has +1, everything above 54 additional +1 (+2 total)...
Note2: The column is created with: 'framenum INTEGER'. Another column has the PRIMARY KEY if this matters, made with: 'framekanji TEXT PRIMARY KEY', which has (for now) all value 'NULL'.
Edit
Solved my problem, but I'm still interested in proper use of c.executemany(). I don't know why this only updates the first rowid:
c.execute("SELECT rowid, framenum FROM learnAlg")
db_framenum = c.fetchall()
print(db_framenum)
db_framenum_new = []
# How much v6 framenum differentiates from v4
change_fn = 0
for e, f in enumerate(db_framenum):
e += 1
t = f[1]
if t in change_numbers:
change_fn += 1
t = t + change_fn
db_framenum_new.append((e,t))
print(db_framenum_new)
c.executemany("UPDATE learnAlg SET framenum=? WHERE rowid=?",
(db_framenum_new[1], db_framenum_new[0]))
Yes, you are telling the database to update all rows with the same framenum. That's because the UPDATE statement did not select any specific row. You need to tell the database to change one row at a time, by including a primary key for each value.
Since you are only altering specific framenumbers, you could ask the database to only provide those specific rows instead of going through all of them. You probably also need to specify an order in which to change the numbers; perhaps you need to do so in incrementing framenumber order?
c.execute("""
SELECT rowid, framenum FROM learnAlg
WHERE framenum in ({})
ORDER BY framenum
""".format(', '.join(['?'] * len(change_numbers))),
change_numbers)
update_cursor = conn.cursor()
for change, (rowid, f) in enumerate(c, 1):
update_cursor.execute("""
UPDATE learnAlg SET framenum=? WHERE rowid=?""",
(f + change, rowid))
I altered the structure somewhat there; the query limits the results to frame numbers in the change_numbers sequence only, through a WHERE IN clause. I loop over the cursor directly (no need to fetch all results at once) and use separate UPDATEs to set the new frame number. Instead of a manual counter I used enumerate() to keep count for me.
If you needed to group the updates by change_numbers, then just tell the database to do those updates:
change = len(change_numbers)
for framenumber in reversed(change_numbers):
update_cursor.execute("""
UPDATE learnAlg SET framenum=framenum + ? WHERE framenum=?
""", (change, framenumber))
change -= 1
This starts at the highest framenumber to avoid updating framenumbers you already updated before. This does assume your change_numbers are sorted in incremental order.
Your executemany update should just pass in the whole list, not just the first two items; you do need to alter how you append the values:
for e, f in enumerate(db_framenum):
# ...
db_framenum_new.append((t, e)) # framenum first, then rowid
c.executemany("UPDATE learnAlg SET framenum=? WHERE rowid=?",
db_framenum_new)
Note that the executemany() call takes place outside the for loop!
Thanks #Martijn Pieters, using rowid is what I needed. This is the code that made it work for me:
c.execute("SELECT rowid, framenum FROM learnAlg")
db_framenum = c.fetchall()
print(db_framenum)
# How much v6 framenum differentiates from v4
change_fn = 0
for e, f in enumerate(db_framenum):
e += 1
db_framenum_new = f[1]
if db_framenum_new in change_numbers:
change_fn += 1
db_framenum_new = db_framenum_new + change_fn
c.execute("UPDATE learnAlg SET framenum=? WHERE rowid=?",
(db_framenum_new, e))
However I still don't know how to properly use c.executemany(). See edit for updated question.

Problem Inserting data into MS Access database using ADO via Python

[Edit 2: More information and debugging in answer below...]
I'm writing a python script to export MS Access databases into a series of text files to allow for more meaningful version control (I know - why Access? Why aren't I using existing solutions? Let's just say the restrictions aren't of a technical nature).
I've successfully exported the full contents and structure of the database using ADO and ADOX via the comtypes library, but I'm getting a problem re-importing the data.
I'm exporting the contents of each table into a text file with a list on each line, like so:
[-9, u'No reply']
[1, u'My home is as clean and comfortable as I want']
[2, u'My home could be more clean or comfortable than it is']
[3, u'My home is not at all clean or comfortable']
And the following function to import the said file:
import os
import sys
import datetime
import comtypes.client as client
from ADOconsts import *
from access_consts import *
class Db:
def create_table_contents(self, verbosity = 0):
conn = client.CreateObject("ADODB.Connection")
rs = client.CreateObject("ADODB.Recordset")
conn.ConnectionString = self.new_con_string
conn.Open()
for fname in os.listdir(self.file_path):
if fname.startswith("Table_"):
tname = fname[6:-4]
if verbosity > 0:
print "Filling table %s." % tname
conn.Execute("DELETE * FROM [%s];" % tname)
rs.Open("SELECT * FROM [%s];" % tname, conn,
adOpenDynamic, adLockOptimistic)
f = open(self.file_path + os.path.sep + fname, "r")
data = f.readline()
print repr(data)
while data != '':
data = eval(data.strip())
print data[0]
print rs.Fields.Count
rs.AddNew()
for i in range(rs.Fields.Count):
if verbosity > 1:
print "Into field %s (type %s) insert value %s." % (
rs.Fields[i].Name, str(rs.Fields[i].Type),
data[i])
rs.Fields[i].Value = data[i]
data = f.readline()
print repr(data)
rs.Update()
rs.Close()
conn.Close()
Everything works fine except that numerical values (double and int) are being inserted as zeros. Any ideas on whether the problem is with my code, eval, comtypes, or ADO?
Edit: I've fixed the problem with inserting numbers - casting them as strings(!) seems to solve the problem for both double and integer fields.
However, I now have a different issue that had previously been obscured by the above: the first field in every row is being set to 0 regardless of data type... Any ideas?
And found an answer.
rs = client.CreateObject("ADODB.Recordset")
Needs to be:
rs = client.CreateObject("ADODB.Recordset", dynamic=True)
Now I just need to look into why. Just hope this question saves someone else a few hours...
Is data[i] being treated as a string? What happens if you specifically cast it as a int/double when you set rs.Fields[i].Value?
Also, what happens when you print out the contents of rs.Fields[i].Value after it is set?
Not a complete answer yet, but it appears to be a problem during the update. I've added some further debugging code in the insertion process which generates the following (example of a single row being updated):
Inserted into field ID (type 3) insert value 1, field value now 1.
Inserted into field TextField (type 202) insert value u'Blah', field value now Blah.
Inserted into field Numbers (type 5) insert value 55.0, field value now 55.0.
After update: [0, u'Blah', 55.0]
The last value in each "Inserted..." line is the result of calling rs.Fields[i].Value before calling rs.Update(). The "After..." line shows the results of calling rs.Fields[i].Value after calling rs.Update().
What's even more annoying is that it's not reliably failing. Rerunning the exact same code on the same records a few minutes later generated:
Inserted into field ID (type 3) insert value 1, field value now 1.
Inserted into field TextField (type 202) insert value u'Blah', field value now Blah.
Inserted into field Numbers (type 5) insert value 55.0, field value now 55.0.
After update: [1, u'Blah', 2.0]
As you can see, results are reliable until you commit them, then... not.

Categories

Resources