Populating Networkx Graph with info iteratively - python

I have been trying to develop a graph structure that will link entities according to co-mentioned features between them, e.g. 2 places are linked if co-mentioned in an article.
I have managed to do so but I have been having problems to iteratively populate an edge with new information keeping the already existing one.
My approach (since I haven't found anything related anywhere) is to append existing information to a list, append the new link in the list and assign that list to the appropriate feature.
temp = []
if G.has_edge(i[z],i[j]):
temp.append(G[i[z]][i[j]]['article'])
temp.append(url[index])
G[i[z]][i[j]]['article'] = temp
else:
print "Create edge!"
G.add_edge(i[z],i[j], article=url)
del temp[:]
As you can see above, as there are many links to be populated, I defined a dedicated list (temp), loaded the old contents of a link's variable called article (if the link does not exist I create a link and add as first value the url that "brought" 2 places together.
My problem is that while I empty the list each time in order to be empty when a new pair comes in when I try to see a link's urls I get something like this:
{'article': [[...], u'http://www.huffingtonpost.co.uk/.../']
It seems like I am keeping only the last link as each time I delete the temporary list's contents but I cannot find a better way to do so without declaring an unnecessary bunch of temp lists.
Any ideas?
Thank you for your time.

TL/DR summary: change your entire snippet to
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
Here's what's going on:
When you create the edge the first time you use
G.add_edge(i[z],i[j], article=url)
So it's a string. But later when you do
G[i[z]][i[j]]['article'] = temp
you've defined temp to be a list whose first element is G[i[z]][i[j]]['article']. So G[i[z]][i[j]]['article'] is now a list with two elements, the first of which is the old value for G[i[z]][i[j]]['article'] (a string) and the second of which is the new url (also a string).
Your problem comes at the later steps:
From then on, it's exactly the same thing. G[i[z]][i[j]]['article'] is again a list with two elements, the first of which is its old value (a list) and the second is the new url (a string). So you've got a nested list.
let's trace through with three urls: 'a', 'b', and 'c', and I'll use E to abbreviate G[i[z]][i[j]]. First time through, you get E='a'. Second time through you get E=['a', 'b']. Third time through it gives E=[['a','b'],'c']. So it's always making E[0] to be the former value of E, and E[1] to be the new url.
Two choices:
1) you can handle the creation of temp differently if you've got a string or a list. This is the bad choice.
2)Better: Make it a list the whole time through and then don't even deal with temp. Try creating the edge as (...,article = [url]) and then just use G[i[z]][i[j]]['article'].append(url) instead of defining temp.
So your code would be
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
A separate thing that could also cause you problems is the call
del temp[:]
This should cause behavior different from what I think you're describing. So I think this is a bit different from how it's actually coded. When you set G[i[z]][i[j]] = temp and then do del temp[:], you've made the two lists to be one list with two different names. When you del temp[:] you're also doing it to G[i[z]][i[j]]. Consider the following
temp = []
temp.append(1)
print temp
> [1]
L = temp
print L
> [1]
del temp[:]
print L
> []

I think all your previous URLs are in your new list. They are in the [...].
You must use extend instead of append when you get the existing list from the edge.
temp = []
temp.append([1, 2, 3])
temp.append(1)
print(temp)
You will get:
[[1, 2, 3], 4]
But if you do:
temp = []
temp.extend([1, 2, 3])
temp.append(4)
print(temp)
You get:
[1, 2, 3, 4]

Related

Get unique entries in list of lists by an item

This seems like a fairly straightforward problem but I can't seem to find an efficient way to do it. I have a list of lists like this:
list = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
I want to get a list of unique entries by the third item in each child list (the 'id'), i.e. the end result should be
unique_entries = [['abc','def','123'],['ghi','jqk','456']]
What is the most efficient way to do this? I know I can use set to get the unique ids, and then loop through the whole list again. However, there are more than 2 million entries in my list and this is taking too long. Appreciate any pointers you can offer! Thanks.
How about this: Create a set that keeps track of ids already seen, and only append sublists where id's where not seen.
l = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
seen = set()
new_list = []
for sl in l:
if sl[2] not in seen:
new_list.append(sl)
seen.add(sl[2])
print new_list
Result:
[['abc', 'def', '123'], ['ghi', 'jqk', '456']]
One approach would be to create an inner loop. within the first loop you iterate over the outer list starting from 1, previously you will need to create an arraylist which will add the first element, inside the inner loop starting from index 0 you will check only if the third element is located as part of the third element within the arraylist current holding elements, if it is not found then on another arraylist whose scope is outside outher loop you will add this element, else you will use "continue" keyword. Finally you will print out the last arraylist created.

Lookup, concatenate and remove items in python list

I have a list of lists of lists that looks like this:
[[[1], [’apple’], [’AAA’]]
[[2], [’banana’], [’BBB’]]
[[3], [’orange’], [’CCC’]]
[[4], [’pineapple’], [’AAA’]]
[[5], [’tomato’], [’ABC’]]]
Probably the wrong terminology, but: I want to find duplicates in the third column, add that row's second column item to the first instance of the duplicates, and then remove the duplicate row.
So using the example: I want to iterate through the list, find the duplicate value ‍'AAA', add 'pineapple' after 'apple' and remove the (second level) list containing the second instance of 'AAA'.
The list I want to end up with should look like:
[[[1], [’apple’, 'pineapple'], [’AAA’]]
[[2], [’banana’], [’BBB’]]
[[3], [’orange’], [’CCC’]]
[[5], [’tomato’], [’ABC’]]]
I tried the following but I can't figure out how to do this..
seen = set()
for l in final:
if l[2] not in seen: # TypeError: unhashable type: 'list'
# Here I want to add value to first instance
seen.add(l[2])
# Remove list
This will do what you're asking for... but I seriously wonder whether you can't change your data structure. It's strange and hard to work with!
newList = []
lookup = {}
for l in final:
if l[2][0] not in lookup:
lookup[l[2][0]] = l
newList.append(l)
else:
lookup[l[2][0]][1].append(l[1][0])
print newList
The reason you were getting the TypeError is you that you were doing this l[2] instead of l[2][0]. Remember, l[2] is a list. What you want is to grab the item inside that list (index 0 in this case) and check if that is in lookup. The lookup replaces the seen set implemented in your example because it can also help get back the entry that a duplicate l[2][0] would correspond to, since your data structure currently isn't set up to do something like final['AAA']. However, this isn't very ideal and I'd heavily recommend you do something about changing this, if possible.
Something else to think about...
Currently, because your items are all essentially lists within lists, the current algorithm will essentially change the nested objects (lists) you were working with, because of object mutability. This means that while final would contain the same objects it did originally, those objects will have changed (in this case with ['apple', 'pineapple']).
If you want to prevent that from happening, look into using the copy module. Specifically, using the deepcopy method to copy all objects (even through the nesting).
Edit:
w0lf's version (Improved readability)
newList = []
lookup = {}
for l in final:
row_no, fruit, code = l
unique_id = code[0] # because `code` is a one element list
if unique_id not in lookup:
lookup[unique_id] = l
newList.append(l)
else:
lookup[unique_id][1].extend(fruit)
print(newList)
Also note: He remembered to do print(newList) instead of print newList for Py3k users. Since the question is tagged for Python 3, that's the way to go.
List is unhashable type, i.e you cannot add it (as is) to data structures that uses hash maps (like python dictionary or set). but strings are hashable.
I'd do
seen.add(str(ls[2]))
This will solve the TypeError

Nested for loop index out of range

I'm coming up with a rather trivial problem, but since I'm quite new to python, I'm smashing my head to my desk for a while. (Hurts). Though I believe that's more a logical thing to solve...
First I have to say that I'm using the Python SDK for Cinema 4D so I had to change the following code a bit. But here is what I was trying to do and struggling with:
I'm trying to group some polygon selections, which are dynamically generated (based on some rules, not that important).
Here's how it works the mathematical way:
Those selections are based on islands (means, that there are several polygons connected).
Then, those selections have to be grouped and put into a list that I can work with.
Any polygon has its own index, so this one should be rather simple, but like I said before, I'm quite struggling there.
The main problem is easy to explain: I'm trying to access a non existent index in the first loop, resulting in an index out of range error. I tried evaluating the validity first, but no luck. For those who are familiar with Cinema 4D + Python, I will provide some of the original code if anybody wants that. So far, so bad. Here's the simplified and adapted code.
edit: Forgot to mention that the check which causes the error actually should only check for duplicates, so the current selected number will be skipped since it hal already been processed. This is necessary due to computing-heavy calculations.
Really hope, anybody can bump me in the right direction and this code makes sense so far. :)
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
indices = [] # list to store selected indices
count = 0 # number of groups
tmp = [] # temporary list to copy the indices list into before resetting
for i in range(len(all)): # loop through values
if i not in groups[count]: # that's the problematic one; this one actually should check whether "i" is already inside of any list inside the group list, error is simply that I'm trying to check a non existent value
for index, selected in enumerate(sel): # loop through "sel" and return actual indices. "selected" determines, if "index" is selected. boolean.
if not selected: continue # pretty much self-explanatory
indices.append(index) # push selected indices to the list
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
count += 1 # increment count
print groups # debug
myFunc()
edit:
After adding a second list which will be filled by extend, not append that acts as counter, everything worked as expected! The list will be a basic list, pretty simple ;)
groups[count]
When you first call this, groups is an empty list and count is 0. You can't access the thing at spot 0 in groups, because there is nothing there!
Try making
groups = [] to groups = [[]] (i.e. instead of an empty list, a list of lists that only has an empty list).
I'm not sure why you'd want to add the empty list to groups. Perhaps this is better
if i not in groups[count]:
to
if not groups or i not in groups[count]:
You also don't need to copy the list if you're not going to use it for anything else. So you can replace
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
with
groups.append(indices) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
You may even be able to drop count altogether (you can always use len(groups)). You can also replace the inner loop with a listcomprehension
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
for i in range(len(all)): # loop through values
if not groups or i not in groups[-1]: # look in the latest group
indices = [idx for idx, selected in enumerate(sel) if selected]
groups.append(indices) # push the previous generated list to another list to store groups into
print groups # debug
correct line 11 from:
if i not in groups[count]
to:
if i not in groups:

Deleting Numbers From A String -Python

I am creating a program for a high school course and our teacher is very specific about what is allowed into our programs. We use python 2.x and he only allows if statements, while loops, functions, boolean values, and lists. I am working on a project that will print the reversal of a string, then print again the same reversal without the numbers in it but I cannot figure it out. Help please. What i have so far is this..
def reverse_str(string):
revstring =('')
length=len(string)
i = length - 1
while i>=0:
revstring = revstring + string[i]
i = i - 1
return revstring
def strip_digits(string):
l = [0,1,2,3,4,5,6,7,8,9]
del (l) rev_string
string = raw_input("Enter a string->")
new_str = rev_str(string)
print new_str
I cannot figure out how to use the "del" function properly, how do i delete any of the items in the list from the reversed string..thanks
In general, you have two options for a task like this:
Iterate through the items in your list, deleting the ones that you do not want to keep.
Iterate through the items in your list, copying the ones that you do want to keep to a new list. Return the new list.
Now, although I would normally prefer option (2), that won't help with your specific question about del. To delete an item at index x from a list a, the following syntax will do it:
del a[x]
That will shift all the elements past index x to the left to close the gap left by deleting the element. You will have to take this shift into account if you're iterating through all the items in the list.
Type str in python is immutable (cannot be altered in place) and does not support the del item deletion function.
Map the characters of the string to a list and delete the elements you want and reconstruct the string.
OR
Iterate through the string elements whilst building a new one, omitting numbers.
correct usage of del is:
>>> a = [1, 2, 3]
>>> del a[1]
>>> a
[1, 3]
You could iterate back over the string copying it again but not copying the digits... It would be interesting for you to also figure out the pythonic way to do everything your not allowed to. Both methods are good to know.

Python error: IndexError: list assignment index out of range

a=[]
a.append(3)
a.append(7)
for j in range(2,23480):
a[j]=a[j-2]+(j+2)*(j+3)/2
When I write this code, it gives an error like this:
Traceback (most recent call last):
File "C:/Python26/tcount2.py", line 6, in <module>
a[j]=a[j-2]+(j+2)*(j+3)/2
IndexError: list assignment index out of range
May I know why and how to debug it?
Change this line of code:
a[j]=a[j-2]+(j+2)*(j+3)/2
to this:
a.append(a[j-2] + (j+2)*(j+3)/2)
You're adding new elements, elements that do not exist yet. Hence you need to use append: since the items do not exist yet, you cannot reference them by index. Overview of operations on mutable sequence types.
for j in range(2, 23480):
a.append(a[j - 2] + (j + 2) * (j + 3) / 2)
The reason for the error is that you're trying, as the error message says, to access a portion of the list that is currently out of range.
For instance, assume you're creating a list of 10 people, and you try to specify who the 11th person on that list is going to be. On your paper-pad, it might be easy to just make room for another person, but runtime objects, like the list in python, isn't that forgiving.
Your list starts out empty because of this:
a = []
then you add 2 elements to it, with this code:
a.append(3)
a.append(7)
this makes the size of the list just big enough to hold 2 elements, the two you added, which has an index of 0 and 1 (python lists are 0-based).
In your code, further down, you then specify the contents of element j which starts at 2, and your code blows up immediately because you're trying to say "for a list of 2 elements, please store the following value as the 3rd element".
Again, lists like the one in Python usually aren't that forgiving.
Instead, you're going to have to do one of two things:
In some cases, you want to store into an existing element, or add a new element, depending on whether the index you specify is available or not
In other cases, you always want to add a new element
In your case, you want to do nbr. 2, which means you want to rewrite this line of code:
a[j]=a[j-2]+(j+2)*(j+3)/2
to this:
a.append(a[j-2]+(j+2)*(j+3)/2)
This will append a new element to the end of the list, which is OK, instead of trying to assign a new value to element N+1, where N is the current length of the list, which isn't OK.
At j=2 you're trying to assign to a[2], which doesn't exist yet. You probably want to use append instead.
If you want to debug it, just change your code to print out the current index as you go:
a=[]
a.append(3)
a.append(7)
for j in range(2,23480):
print j # <-- this line
a[j]=a[j-2]+(j+2)*(j+3)/2
But you'll probably find that it errors out the second you access a[2] or higher; you've only added two values, but you're trying to access the 3rd and onward.
Try replacing your list ([]) with a dictionary ({}); that way, you can assign to whatever numbers you like -- or, if you really want a list, initialize it with 23479 items ([0] * 23479).
Python lists must be pre-initialzed. You need to do a = [0]*23480
Or you can append if you are adding one at a time. I think it would probably be faster to preallocate the array.
Python does not dynamically increase the size of an array when you assign to an element. You have to use a.append(element) to add an element onto the end, or a.insert(i, element) to insert the element at the position before i.

Categories

Resources