Generating dictionary of lists in a loop (3.2.3) - python

I need to create a dictionary where will be key a string and a value a list. The trick is I need to do it in a loop.
My minimalised code looks like this at the moment:
for elem in xmlTree.iter():
# skipping root element
if elem.tag == xmlTree.getroot().tag:
continue
# this is supposed to be my temporary list
tmpList = []
for child in elem:
tableWColumns[elem.tag] = tmpList.append(child.tag)
print(tableWColumns)
This prints only the list created in the last iteration.
Problem apparently lies in the fact that whenever I change the list, all of its references are changed as well. I Googled that. What I haven't Googled though is the way how can I deal with it when using a loop.
The solution I am supposed to use when I want to keep the list is to copy it to some other list and then I can change the original one without losing data. What I don't know is how do I do it, when I basically need to do this dynamically.
Also I am limited to use of standard libraries only.

The problem is because of that you are creating the tmpList = [] list in each iteration and put it [].So python replace the new with older in each iteration, thus you see the last iteration result in your list.
Instead you can use collections.defaultdict :
from collections import defaultdict
d=defaultdict(list)
for elem in xmlTree.iter():
# skipping root element
if elem.tag == xmlTree.getroot().tag:
continue
# this is supposed to be my temporary list
for child in elem:
d[elem.tag].append(child.tag)
print(tableWColumns)
Or you can use dict.setdefault method :
d={}
for elem in xmlTree.iter():
# skipping root element
if elem.tag == xmlTree.getroot().tag:
continue
# this is supposed to be my temporary list
for child in elem:
d.setdefault(elem.tag,[]).append(child.tag)
print(tableWColumns)
Also note as #abarnert says tmpList.append(child.tag) will return None.so after assignment actually python will assign None to tableWColumns[elem.tag].

The big problem here is that tmpList.append(child.tag) returns None. In fact, almost all mutating methods in Python return None.
To fix that, you can either do the mutation, then insert the value in a separate statement:
for child in elem:
tmpList.append(child.tag)
tableWColumns[elem.tag] = tmpList
… or not try to mutate the list in the first place. For example
tableWColumns[elem.tag] = tmpList + [child.tag for child in elem]
That will get rid of your all-values-are-None problem, but then you've got a new problem. If any tag appears more than once, you're only going to get the children from the last copy of that tag, not from all copies. That's because you build a new list each time, and reassign tableWColumns[elem.tag] to that new list, instead of modifying whatever was there.
To solve that problem, you need to fetch the existing value into tmpList instead of creating a new one:
tmpList = tableWColumns.get(elem.tag, [])
tableWColumns[elem.tag] = tmpList + [child.tag for child in elem]
Or, as Kasra's answer says, you can simplify this by using a defaultdict or the setdefault method.

Related

How do I index f.items()?

I could run a for loop as such:
for v in f.items():
BUT, it takes too much time. I know I want the second object in f.items(). How to I directly get the second object and save it?
Im not sure what the syntax is: e.g is it f.items(2), f.items()[2]? None of these work so I was wondering what does.
If you want values(your 2nd objects) from f.items() you should use the below: -
for k,v in f.items()
Or if you want the 2nd item from f.items() you should use the below: -
f = {1:'A',2:'B',3:'C'}
for item in enumerate(f.items(),1):
k,v = item
if k == 2:
print(v)
Do still want to extract 2nd value from 2nd item ?
You can create a list and then index.
item = list(f.items())[1]
Lists are created often in python and this operation is relatively inexpensive. If your dictionary is large, you can create an iterator and take its second value.
i = iter(f.items())
next(i)
item = next(i)
But the dict would need to be rather large to make this the better option.

Proper way to iterate through a list which may not be a list

Currently i am getting information from an xml file.
If an xml tag has more then one children it will return as a list inside that tag, however if that xml tag has only 1 child it will return not as a list and only as a regular string.
My question is: is there a better way to iterate through this tag? if it is a list, iterate through the list length amount of times, but if it is a string only iterate once?
This is my current approach:
#check if tag is a list, if not then make a list with empty slot at end
if not isinstance(accents['ac'], list):
accents['ac'] = list((accents['ac'], {}))
#loop through guaranteed list
for ac in accents['ac']: #this line throws error if not list object!
#if the empty slot added is encountered at end, break out of loop
if bool(ac) == False:
break
any ideas on how to make this cleaner or more professional is appreciated.
Assuming that the problem is caused by accents['ac'] being either a list of string or a single string, a simple processing could be:
#check if tag is a list, if not then make a list with empty slot at end
if not isinstance(accents['ac'], list):
accents['ac'] = [ accents['ac'] ]
#loop through guaranteed list
for ac in accents['ac']: #this line throws error if not list object!
...
For readabilities sake, it might be better to do
if isinstance(accents['ac'], str):
pass #insert what you want to happen here when it is a string
else:
for(ac in accents['ac']):
pass #insert what you want to happen here when it is a list
suppposing that i want to add them to a list, i would check if its a nested tag first then add them.
tag_list=[]
if(len(accents['ac'])>1):
for tag in accents['ac']:
tag_list.append(tag)
else:
tag_list.append(tag)

In Python how to make sure item in a list be processed successfully

I try this:
for k in keywords_list:
google_add = random.choice(google_adds_list)
url = make_up_url(google_add, k, False)
if scrape_keyword_count(k, useragent_list, url, result_dir):
keyword_count = scrape_keyword_count(k, useragent_list, url, result_dir)
all_keyword_count.append(keyword_count)
print '%s Finish. Removeing it from the list' % k
keywords_list.remove(k)
else:
print "%s may run into problem, removing it from list" % google_add
google_adds_list.remove(google_add)
with open(google_adds, 'w') as f:
f.write('\n'.join(google_adds_list))
I set up many reverse proxy server for google. the server list is google_add_list
I mean to search all the item in the list with the add i provide and get the result
If google block me, the scrape_keyword_count() will return None. then i
And I need to change to another add to do the search. but the script i wrote will skip the keyword no matter the scrape_keyword_count() success or not
I know removing an item within the for loop is dangerous i will improve this part later
The problem here is that you're modifying the list while iterating over it.
Use "for i in the_list[:]" instead. This will iterate over a copy of the list, fixing your "skipping" (missed elements) issue.
The for loop will use each item once... Your code looks fine... But I think you may not be returning the correct value in do_something_with
Try this:
for i in the_list:
value = do_something_with(i)
print bool(value)
if value:
the_list.remove(i)
<etc> <etc>
I think you may always be returning True from do_something_with
Perhaps:
new_list = []
for i in the_list:
if do_something_with(i):
continue
do_something_else(i)
new_list.append(i)
If do_something_with(i) succeeded then continue with the next item otherwise do_something_else(i).
You can't mutate a list while iterating over it. If the filtered list is needed, rather than removing elements from the old one produce a new one.

OOP python - removing class instance from a list

I have a list where I save the objects created by a specific class.
I would like to know, cause I can't manage to solve this issue, how do I delete an instance of the class from the list?
This should happen based on knowing one attribute of the object.
Iterate through the list, find the object and its position, then delete it:
for i, o in enumerate(obj_list):
if o.attr == known_value:
del obj_list[i]
break
You could use a list comprehension:
thelist = [item for item in thelist if item.attribute != somevalue]
This will remove all items with item.attribute == somevalue.
If you wish to remove just one such item, then use WolframH's solution.
You could have stored them in a dict and removed them by name
di = {"test" : my_instance()}
del di['test']

Python: problem with list append

Here is my code -
cumulative_nodes_found_list = []
cumulative_nodes_found_total_list = []
no_of_runs = 10
count = 0
while count < no_of_runs:
#My program code
print 'cumulative_nodes_found_list - ' + str(cumulative_nodes_found_list)
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
print 'cumulative_nodes_found_total_list - ' + str(cumulative_nodes_found_total_list)
count = count + 1
Here is a part of the output -
#count = 0
cumulative_nodes_found_list - [0.0, 0.4693999, 0.6482, 0.6927999999, 0.7208999999, 0.7561999999, 0.783399999, 0.813999999, 0.8300999999, 0.8498, 0.8621999999]
cumulative_nodes_found_total_list - [[0.0, 0.4693999, 0.6482, 0.6927999999, 0.7208999999, 0.7561999999, 0.783399999, 0.813999999, 0.8300999999, 0.8498, 0.8621999999]]
#count = 1
cumulative_nodes_found_list - [0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003]
cumulative_nodes_found_total_list -[[0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003],
[0.0, 0.55979999999999996, 0.66220000000000001, 0.69479999999999997, 0.72040000000000004, 0.75380000000000003, 0.77629999999999999, 0.79679999999999995, 0.82979999999999998, 0.84850000000000003, 0.85760000000000003]]
As the new item is appended the old item is replaced by new item. This trend continues.
Can anyone tell me why this is happening. I have tried using 'append' in place of insert but got the same output. However when I use 'extend' I get the correct output but I need inner items as lists which I dont get with extend.
You need to rebind cumulative_nodes_found_list at the beginning of the loop, instead of just clearing it.
This is psychic debugging at its best, since you're effectively asking "what is wrong with my code, which I'm not going to show to you".
All I can do is assume.
I'm assuming you're re-using the array objects in memory.
In other words, you do something like this:
list1.insert(0, list2)
list2.clear()
list2.append(10)
list2.append(15)
list1.insert(0, list2)
Since list1 points to the same array/list the whole time, and you're adding a reference to the object, and not a copy of it, later changes will make it appear your copy changed.
In other words, the result of the code above is going to be:
[[10, 15], [10, 15]]
regardless of what was in the list before you added it the first time.
Try assigning the changing list a new, empty, object each time you enter the loop body and see if that fixes anything.
You are adding a reference to cumulative_nodes_found_list to the cumulative_nodes_found_total_list, but it's the same reference each time. Move this line into the loop body:
cumulative_nodes_found_list = []
Lists are mutable objects. You're mutating cumulative_nodes_found_list inside your code, so the object added to your total list in the previous run is also mutated, because they are the same object.
Either make a copy to insert in the total:
for count in xrange(no_of_runs):
# ...
cumulative_nodes_found_total_list.append(list(cumulative_nodes_found_list))
... or reset the list on each iteration:
for count in xrange(no_of_runs):
cumulative_nodes_found_list = [] # creates a NEW list for this iteration
# ...
cumulative_nodes_found_total_list.append(cumulative_nodes_found_list)
I believe the problem is in the rest of your program code.
The items in cummulative_nodes_found_list is being replaced in-place each time through the loop.
I assume you're doing something like this:
while count < no_of_runs:
cummulative_nodes_found_list.clear()
#fill up the list with values using whatever program logic you have
cummulative_nodes_found_list.append(1.1)
cummulative_nodes_found_list.append(2.1)
print 'cumulative_nodes_found_list - ' + str(cumulative_nodes_found_list)
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
print 'cumulative_nodes_found_total_list - ' + str(cumulative_nodes_found_total_list)
count = count + 1
if this is, infact, what you're doing, then instead of using 'clear()' to clear the list, create a new one:
ie, replace cummulative_nodes_found_list.clear() with
cummulative_nodes_found_list = []
My guess is that you are not assigning the cumulative_nodes_found_list to be a new list each time, but updating its contents instead. So each time around the loop you are adding the same list reference to the total list. Since the reference within the totals list is the same object, when you update this list the next time around the loop, it affects what you hoped was the last loops values.
If you want to append to a list, use mylist.append(item) instead.
Also, if you iterate a fixed number of times it's better to use a for loop:
for i in range(no_of_runs):
# do stuff
The idea is, that range(no_of_runs) generates the list [0, 1, 2, ..., 10] for no_of_runs = 10 and the loop then iterates over its values.
Edit: this doesn't solve the problem. Other answers in this thread do, however. It's just a comment on style.
This method worked for me. Just like you, I was trying to append/insert a list into another list.
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list)
But the old values were being appended by the new values. So instead I tried this -
cumulative_nodes_found_total_list.insert(count,cumulative_nodes_found_list[:])
"Assignment statements in Python do not copy objects, they create
bindings between a target and an object."
Use deepcopy (or copy)

Categories

Resources