I am using SQLAlchemy with declarative base and Python 2.6.7 to insert data in a loop into an SQLite database.
As brief background, I have implemented a dictionary approach to creating a set of variables in a loop. What I am trying to do is scrape some data from a website, and have between 1 and 12 pieces of data in the following element:
overall_star_ratings = doc.findall("//div[#id='maincontent2']/div/table/tr[2]//td/img")
count_stars = len(overall_star_ratings)
In an empty SQLite database I have variables "t1_star,"..."t12_star," and I want to iterate over the list of values in "overall_star_ratings" and assign the values to the database variables, which varies depending on the page. I'm using SQLAlchemy, so (in highly inefficient language) what I'm looking to do is assign the values and insert into the DB as follows (I'm looping through 'rows' in the code, such that the 'row' command inserts the value for *t1_star* into the database column 't1_star', etc.):
if count==2:
row.t1_star = overall_star_ratings[1].get('alt')
row.t2_star = overall_star_ratings[2].get('alt')
elif count==1:
row.t1_star = overall_star_ratings[1].get('alt')
This works but is highly inefficient, so I implemented a "dictionary" approach to creating the variables, as I've seen in some "variable variables" questions on Stack Overflow. So, here is what I've tried:
d = {}
for x in range(1, count_stars+1):
count = x-1
d["t{0}_star".format(x)] = overall_star_ratings[count].get('alt')
This works for creating the 't1_star,' 't2_star" keys for the dictionary as well as the values. The problem comes when I try to insert the data into the database. I have tried adding the following to the above loop:
key = "t{0}_star".format(x)
value = d["t{0}_star".format(x)]
row.key = value
I've also tried adding the following after the above loop is completed:
for key, value in d.items():
row.key = value
The problem is that it is not inserting anything. It appears that the problem is in the row.key part of the script, not in the value, but I am not certain of that. From all that I can see, the keys are the same strings as I'm seeing when I do it the "inefficient" way (i.e., t1_star, etc.), so I'm not sure why this isn't working.
Any suggestions would be greatly appreciated!
Thanks,
Greg
Python attribute access doesn't work like that. row.key looks up the attribute with the literal name "key", not the value that's in the variable key.
You probably need to use setattr:
setattr(row, key, value)
Related
I need conditionaly update Oracle table from my Python code. It's a simple piece of code, but I encountered cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number with following attempts
id_as_list = ['id-1', 'id-2'] # list of row IDs in the DB table
id_as_list_of_tuples = [('id-1'), ('id-2')] # the same as list of tuples
sql_update = "update my_table set processed = 1 where object_id = :1"
# then when I tried any of following commands, result was "illegal variable name/number"
cursor.executemany(sql_update, id_as_list) # -> ends with error
cursor.executemany(sql_update, id_as_list_of_tuples) # -> ends with error
for id in id_as_list:
cursor.execute(sql_update, id) # -> ends with error
Correct solution was to use list of dictionaries and the key name in the SQL statement:
id_as_list_of_dicts = [{'id': 'id-1'}, {'id': 'id-2'}]
sql_update = "update my_table set processed = 1 where object_id = :id"
cursor.executemany(sql_update, id_as_list_of_dicts) # -> works
for id in id_as_list_of_dicts:
cursor.execute(sql_update, id) # -> also works
I've found some helps and tutorials like this and they all used ":1, :2,..." syntax (but on the other hand I haven't found any example with update and cx_Oracle). Although my issue has been solved with help of dictionaries I wonder if it's common way of update or if I do something wrong in the ":1, :2,..." syntax.
Oracle 12c, Python 3.7, cx_Oracle 7.2.1
You can indeed bind with dictionaries but the overhead of creating the dictionaries can be undesirable. You need to make sure you create a list of sequences when using executemany(). So in your case, you want something like this instead:
id_as_list = [['id-1'], ['id-2']] # list of row IDs in the DB table
id_as_list_of_tuples = [('id-1',), ('id-2',)] # the same as list of tuples
In the first instance you had a list of strings. Strings are sequences in their own right so in that case cx_Oracle was expecting 4 bind variables (the number of characters in each string).
In the second instance you had the same data as the first instance -- as you were simply including parentheses around the strings, not creating tuples! You need the trailing comma as shown in my example to create tuples as you thought you were creating!
I basically have this test2 file that has domain information that I want to use so I strip the additional stuff and get just the domain names as new_list.
What I want to then do is query a database with these domain names, pull the name and severity score and then (the part I'm really having a hard time with) getting a stored list (or tuple) that I can use that consists of the pulled domains and severity score.
It is a psql database for reference but my problem lies more on the managing after the query.
I'm still really new to Python and mainly did a bit of Java so my code probably looks terrible, but I've tried converting to strings and tried appending to a list at the end but I am quite unsuccessful with most of it.
def get_new():
data = []
with open('test2.txt', 'r') as file:
data = [line.rstrip('\n') for line in open('test2.txt')]
return data
new_list = get_new()
def db_query():
cur = connect.cursor()
query ="SELECT name, de.severity_score FROM domains d JOIN ips i ON i.domain_id = d.id JOIN domains_extended de ON de.domain_id = d.id WHERE name = '"
for x in new_list:
var = query + x + "'"
cur.execute(var)
get = cur.fetchall()
# STORE THE LOOPED QUERIES INTO A VARIABLE OF SOME KIND (problem area)
print(results)
cur.close()
connect.close()
db_query()
Happy place: Takes domain names from file, uses those domain names a part of the query parameters to get severity score associated, then stores it into a variable of some sort so that I can use those values later (in a loop or some logic).
I've tried everything I could think of and ran into errors with it being a query that I'm trying to store, lists won't combine, etc.
I would make sure that your get_new() function is returning what yous expect from that file. Just iterate on your new_data list.
There is no reference to results in your db_query() function (perhaps it is global like new_data[]) but try printing the result of the query, that is, print(get) in your for loop and see what comes out. If this works then you can create a list to append to.
Well first off in your code you are resetting your get variable in each loop. So after fixing that by initializing get = [] above your loop then adding get.extend(cur.fetchall()) into your loop instead of the current statement. You could then do something like domainNames = [row[0] for row in get] . If get is loading properly though getting the values out of it should be no problem.
So i am trying to fetch data from the mysql into a python dictionary
here is my code.
def getAllLeadsForThisYear():
charges={}
cur.execute("select lead_id,extract(month from transaction_date),pid,extract(Year from transaction_date) from transaction where lead_id is not NULL and transaction_type='CHARGE' and YEAR(transaction_date)='2015'")
for i in cur.fetchall():
lead_id=i[0]
month=i[1]
pid=i[2]
year=str(i[3])
new={lead_id:[month,pid,year]}
charges.update(new)
return charges
x=getAllLeadsForThisYear()
when i prints (len(x.keys()) it gave me some number say 450
When i run the same query in mysql it returns me 500 rows.Although i do have some same keys in dictionary but it should count them as i have not mentioned it if i not in charges.keys(). Please correct me if i am wrong.
Thanks
As I said, the problem is that you are overwriting your value at a key every time a duplicate key pops up. This can be fixed two ways:
You can do a check before adding a new value and if the key already exists, append to the already existing list.
For example:
#change these lines
new={lead_id:[month,pid,year]}
charges.update(new)
#to
if lead_id in charges:
charges[lead_id].extend([month,pid,year])
else
charges[lead_id] = [month,pid,year]
Which gives you a structure like this:
charges = {
'123':[month1,pid1,year1,month2,pid2,year2,..etc]
}
With this approach, you can reach each separate entry by chunking the value at each key by chunks of 3 (this may be useful)
However, I don't really like this approach because it requires you to do that chunking. Which brings me to approach 2.
Use defaultdict from collections which acts in the exact same way as a normal dict would except that it defaults a value when you try to call a key that hasn't already been made.
For example:
#change
charges={}
#to
charges=defaultdict(list)
#and change
new={lead_id:[month,pid,year]}
charges.update(new)
#to
charges[lead_id].append((month,pid,year))
which gives you a structure like this:
charges = {
'123':[(month1,pid1,year1),(month2,pid2,year2),(..etc]
}
With this approach, you can now iterate through each list at each key with:
for key in charges:
for entities in charges[key]:
print(entities) # would print `(month,pid,year)` for each separate entry
If you are using this approach, dont forget to from collections import defaultdict. If you don't want to import external, you can mimic this by:
if lead_id in charges:
charges[lead_id].append((month,pid,year))
else
charges[lead_id] = [(month,pid,year)]
Which is incredibly similar to the first approach but does the explicit "create a list if the key isnt there" that defaultdict would do implicitly.
I'm trying to figure out if it's possible to replace record values in a Microsoft Access (either .accdb or .mdb) database using pyodbc. I've poured over the documentation and noted where it says that "Row Values Can Be Replaced" but I have not been able to make it work.
More specifically, I'm attempting to replace a row value from a python variable. I've tried:
setting the connection autocommit to "True"
made sure that it's not a data type issue
Here is a snippet of the code where I'm executing a SQL query, using fetchone() to grab just one record (I know with this script the query is only returning one record), then I am grabbing the existing value for a field (the field position integer is stored in the z variable), and then am getting the new value I want to write to the field by accessing it from an existing python dictionary created in the script.
pSQL = "SELECT * FROM %s WHERE %s = '%s'" % (reviewTBL, newID, basinID)
cursor.execute(pSQL)
record = cursor.fetchone()
if record:
oldVal = record[z]
val = codeCrosswalk[oldVal]
record[z] = val
I've tried everything I can think bit cannot get it to work. Am I just misunderstanding the help documentation?
The script runs successfully but the newly assigned value never seems to commit. I even tried putting "print str(record[z])this after the record[z] = val line to see if the field in the table has the new value and the new value would print like it worked...but then if I check in the table after the script has finished the old values are still in the table field.
Much appreciate any insight into this...I was hoping this would work like how using VBA in MS Access databases you can use an ADO Recordset to loop through records in a table and assign values to a field from a variable.
thanks,
Tom
The "Row values can be replaced" from the pyodbc documentation refers to the fact that you can modify the values on the returned row objects, for example to perform some cleanup or conversion before you start using them. It does not mean that these changes will automatically be persisted in the database. You will have to use sql UPDATE statements for that.
I have a table in MySql DB which I want to load it to a dictionary in python.
the table columns is as follows:
id,url,tag,tagCount
tagCount is the number of times that a tag has been repeated for a certain url. So in that case I need a nested dictionary, in other words a dictionary of dictionary, to load this table. Because each url have several tags for which there are different tagCounts.the code that I used is this:( the whole table is about 22,000 records )
cursor.execute( ''' SELECT url,tag,tagCount
FROM wtp ''')
urlTagCount = cursor.fetchall()
d = defaultdict(defaultdict)
for url,tag,tagCount in urlTagCount:
d[url][tag]=tagCount
print d
first of all I want to know if this is correct.. and if it is why it takes so much time? Is there any faster solutions? I am loading this table into memory to have fast access to get rid of the hassle of slow database operations, but with this slow speed it has become a bottleneck itself, it is even much slower than DB access. and anyone help? thanks
You need to ensure that the dictionary (and each of the nested dictionaries) exist before you assign a key, value to them. It is helpful to use setdefault for this purpose. You end up with something like this:
d = {}
for url, tag, tagCount in urlTagCount:
d.setdefault(url, {})[tag] = tagCount
maybe you could try with normal dicts and tuple keys like
d = dict()
for url,tag,tagCount in urlTagCount:
d[(url, tag)] = tagCount
in any case did you try:
d = defaultdict(dict)
instead of
d = defaultdict(defaultdict)
I could manage to verify the code, and it is working perfectly. For those amateurs like me, i suggest never try to "print" a very large nested dictionary. that "print d" in the last line of the code was the problem for it being slow. If remove it or try to access the dictionary with actual keys, then it is very fast.