I'm coming up with a rather trivial problem, but since I'm quite new to python, I'm smashing my head to my desk for a while. (Hurts). Though I believe that's more a logical thing to solve...
First I have to say that I'm using the Python SDK for Cinema 4D so I had to change the following code a bit. But here is what I was trying to do and struggling with:
I'm trying to group some polygon selections, which are dynamically generated (based on some rules, not that important).
Here's how it works the mathematical way:
Those selections are based on islands (means, that there are several polygons connected).
Then, those selections have to be grouped and put into a list that I can work with.
Any polygon has its own index, so this one should be rather simple, but like I said before, I'm quite struggling there.
The main problem is easy to explain: I'm trying to access a non existent index in the first loop, resulting in an index out of range error. I tried evaluating the validity first, but no luck. For those who are familiar with Cinema 4D + Python, I will provide some of the original code if anybody wants that. So far, so bad. Here's the simplified and adapted code.
edit: Forgot to mention that the check which causes the error actually should only check for duplicates, so the current selected number will be skipped since it hal already been processed. This is necessary due to computing-heavy calculations.
Really hope, anybody can bump me in the right direction and this code makes sense so far. :)
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
indices = [] # list to store selected indices
count = 0 # number of groups
tmp = [] # temporary list to copy the indices list into before resetting
for i in range(len(all)): # loop through values
if i not in groups[count]: # that's the problematic one; this one actually should check whether "i" is already inside of any list inside the group list, error is simply that I'm trying to check a non existent value
for index, selected in enumerate(sel): # loop through "sel" and return actual indices. "selected" determines, if "index" is selected. boolean.
if not selected: continue # pretty much self-explanatory
indices.append(index) # push selected indices to the list
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
count += 1 # increment count
print groups # debug
myFunc()
edit:
After adding a second list which will be filled by extend, not append that acts as counter, everything worked as expected! The list will be a basic list, pretty simple ;)
groups[count]
When you first call this, groups is an empty list and count is 0. You can't access the thing at spot 0 in groups, because there is nothing there!
Try making
groups = [] to groups = [[]] (i.e. instead of an empty list, a list of lists that only has an empty list).
I'm not sure why you'd want to add the empty list to groups. Perhaps this is better
if i not in groups[count]:
to
if not groups or i not in groups[count]:
You also don't need to copy the list if you're not going to use it for anything else. So you can replace
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
with
groups.append(indices) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
You may even be able to drop count altogether (you can always use len(groups)). You can also replace the inner loop with a listcomprehension
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
for i in range(len(all)): # loop through values
if not groups or i not in groups[-1]: # look in the latest group
indices = [idx for idx, selected in enumerate(sel) if selected]
groups.append(indices) # push the previous generated list to another list to store groups into
print groups # debug
correct line 11 from:
if i not in groups[count]
to:
if i not in groups:
Related
I'm asking more out of curiosity at this point since I found a work-around, but it's still bothering me.
I have a list of dataframes (x) that all have the same column names. I'm trying to use pandas and re to make a list of the subset of column names that have the format
"D(number) S(number)"
so I wrote the following function:
def extract_sensor_columns(x):
sensor_name = list(x[0].columns)
for j in sensor_name:
if bool(re.match('D(\d+)S(\d+)', j))==False:
sensor_name.remove(j)
return sensor_name
The list that I'm generating has 103 items (98 wanted items, 5 items). This function removes three of the five columns that I want to get rid of, but keeps the columns labeled 'Pos' and 'RH.' I generated the sensor_name list outside of the function and tested the truth value of the
bool(re.match('D(\d+)S(\d+)', sensor_name[j]))
for all five of the items that I wanted to get rid of and they all gave the False value. The other thing I tried is changing the conditional to ==True, which even more strangely gave me 54 items (all of the unwanted column names and half of the wanted column names).
If I rewrite the function to add the column names that have a given format (rather than remove column names that don't follow the format), I get the list I want.
def extract_sensor_columns(x):
sensor_name = []
for j in list(x[0].columns):
if bool(re.match('D(\d+)S(\d+)', j))==True:
sensor_name.append(j)
return sensor_name
Why is the first block of code acting so strangely?
In general, do not change arrays while iterating over them. The problem lies in the fact that you remove elements of the iterable in the first (wrong) case. But in the second (correct) case, you add correct elements to an empty list.
Consider this:
arr = list(range(10))
for el in arr:
print(el)
for i, el in enumerate(arr):
print(el)
arr.remove(arr[i+1])
The second only prints even number as every next one is removed.
in one of my work i need to find the mode a list called "dataset" using no modual or function that would find the mode by itself.
i tried to make it so it can output the mode or the list of modes depending on the list of numbers. I used 2 for loops so the first number of the list checks each number of the list including its self to see how many numbers of its self there is, for example if my list was 123415 it would say there is 2 ones, and it does this for all the numbers of the list. the number with the most counts would be the mode. The bottom section of the code where the if elif and else is, there is where it checks if the number has the most counts by comparing with the other numbers of the list checking if it has more numbers or the same as the previous biggest number.
I've tried to change the order of the codes but i'm still confused why it is doing this error
pop_number = []
pop_amount = 0
amount = 0
for i in range(len(dataset)):
for x in dataset:
if dataset[i] == x:
amount += 1
if amount>pop_amount:
pop_amount = amount
pop_number = []
pop_number.append(x)
amount = 0
elif amount==pop_amount:
pop_amount = amount
if x not in pop_number:
pop_number.append(x)
pop_amount = amount
amount = 0
else:
continue
print(pop_number)
i expected the output to be the mode of the list or the list of modes but it came up with the last number from the list
As this is apparently homework, I will present a sketch, not working code.
Observe that a dict in Python can hold key-value mappings.
Let the numbers in the input list be the keys, and the values the number of times they occur. Going over the list, use each item as the key for the dict, and add one to the value (starting at 0 -- defaultdict(int) is good for this). If the result is bigger than any previous maximum, remember this key.
Since you want to allow for more than one mode value, the variable which remembers the maximum key should be a list; but since you have a new maximum, replace the old list with a list containing just this key. If another value also reaches the maximum, add it to the list. (That's the append method.)
(See how this is if bigger than maximum so far and then else if equal to maximum so far and then otherwise there is no need to do anything.)
When you have looped over all items in the input list, the list of remembered keys is your result.
Go back and think about what variables you need already before the loop. The maximum so far should be defined but guaranteed to be smaller than any value you will see -- it makes sense to start this at 0 because as soon as you see one key, it will have a bigger count than zero. And the keys you want to remember can start out as an empty list.
Now think about how you would test this. What happens if the input list is empty? What happens if the input list contains just the same number over and over? What happens if every item on the input list is unique? Can you think of other corner cases?
Without using any module or function that will specifically find the mode itself, you can do that with much less code. Your code will work with a little more effort, I highly suggest you to try to solve the problem on your own logic, but meanwhile let me show you how to take the help of all the built-in data structures in Python List, Tuples, Dictionaries and Sets within 7-8 lines. Also there is unzipping at the end (*). I will suggest you to look these up, when you get time.
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
# finds the unique elements
unique_elems = set(lst)
# creates a dictionary with the unique elems as keys and initializes the values to 0
count = dict.fromkeys(unique_elems,0)
# gets the frequency of each element in the lst
for elem in unique_elems:
count[elem] = lst.count(elem)
# finds max frequency
max_freq = max(count.values())
# stores list of mode(s)
modes = [i for i in count if count[i] == max_freq]
# prints mode(s), I have used unzipping here so that in case there is one mode,
# you don't have to print ugly [x]
print(*modes)
Or if you want to go for the shortest (I really shouldn't be making such bold claims in StackOverflow), then I guess this will be it (even though, writing short codes for the sake of it is discouraged)
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
freq_dist = [(i, lst.count(i)) for i in set(lst)]
[print(i,end=' ') for i,j in freq_dist if j==max(freq_dist, key=lambda x:x[1])[1]]
And if you just want to go bonkers and say goodbye to loops (Goes without saying, this is ugly, really ugly):
lst = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,2,2,2,5,5,6]
unique_elems = set(lst)
freq_dist = list(map(lambda x:(x, lst.count(x)), unique_elems))
print(*list(map(lambda x:x[0] if x[1] == max(freq_dist,key = lambda y: y[1])[1] else '', freq_dist)))
I have been trying to develop a graph structure that will link entities according to co-mentioned features between them, e.g. 2 places are linked if co-mentioned in an article.
I have managed to do so but I have been having problems to iteratively populate an edge with new information keeping the already existing one.
My approach (since I haven't found anything related anywhere) is to append existing information to a list, append the new link in the list and assign that list to the appropriate feature.
temp = []
if G.has_edge(i[z],i[j]):
temp.append(G[i[z]][i[j]]['article'])
temp.append(url[index])
G[i[z]][i[j]]['article'] = temp
else:
print "Create edge!"
G.add_edge(i[z],i[j], article=url)
del temp[:]
As you can see above, as there are many links to be populated, I defined a dedicated list (temp), loaded the old contents of a link's variable called article (if the link does not exist I create a link and add as first value the url that "brought" 2 places together.
My problem is that while I empty the list each time in order to be empty when a new pair comes in when I try to see a link's urls I get something like this:
{'article': [[...], u'http://www.huffingtonpost.co.uk/.../']
It seems like I am keeping only the last link as each time I delete the temporary list's contents but I cannot find a better way to do so without declaring an unnecessary bunch of temp lists.
Any ideas?
Thank you for your time.
TL/DR summary: change your entire snippet to
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
Here's what's going on:
When you create the edge the first time you use
G.add_edge(i[z],i[j], article=url)
So it's a string. But later when you do
G[i[z]][i[j]]['article'] = temp
you've defined temp to be a list whose first element is G[i[z]][i[j]]['article']. So G[i[z]][i[j]]['article'] is now a list with two elements, the first of which is the old value for G[i[z]][i[j]]['article'] (a string) and the second of which is the new url (also a string).
Your problem comes at the later steps:
From then on, it's exactly the same thing. G[i[z]][i[j]]['article'] is again a list with two elements, the first of which is its old value (a list) and the second is the new url (a string). So you've got a nested list.
let's trace through with three urls: 'a', 'b', and 'c', and I'll use E to abbreviate G[i[z]][i[j]]. First time through, you get E='a'. Second time through you get E=['a', 'b']. Third time through it gives E=[['a','b'],'c']. So it's always making E[0] to be the former value of E, and E[1] to be the new url.
Two choices:
1) you can handle the creation of temp differently if you've got a string or a list. This is the bad choice.
2)Better: Make it a list the whole time through and then don't even deal with temp. Try creating the edge as (...,article = [url]) and then just use G[i[z]][i[j]]['article'].append(url) instead of defining temp.
So your code would be
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
A separate thing that could also cause you problems is the call
del temp[:]
This should cause behavior different from what I think you're describing. So I think this is a bit different from how it's actually coded. When you set G[i[z]][i[j]] = temp and then do del temp[:], you've made the two lists to be one list with two different names. When you del temp[:] you're also doing it to G[i[z]][i[j]]. Consider the following
temp = []
temp.append(1)
print temp
> [1]
L = temp
print L
> [1]
del temp[:]
print L
> []
I think all your previous URLs are in your new list. They are in the [...].
You must use extend instead of append when you get the existing list from the edge.
temp = []
temp.append([1, 2, 3])
temp.append(1)
print(temp)
You will get:
[[1, 2, 3], 4]
But if you do:
temp = []
temp.extend([1, 2, 3])
temp.append(4)
print(temp)
You get:
[1, 2, 3, 4]
I have a list of URLs in an open CSV which I have ordered alphabetically, and now I would like to iterate through the list and check for duplicate URLs. In a second step, the duplicate should then be removed from the list, but I am currently stuck on the checking part which I have tried to solve with a nested for-loop as follows:
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
The print statements will obviously be replaced once the nested for-loop is working. Currently, the list contains a few duplicates, but my nested loop does not seem to work correctly as it does not recognise any of the duplicates.
My question is: are there better ways to do perform this exercise, and what is the problem with the current nested for-loop?
Many thanks :)
By construction, your method is faulty, even if you indent the if/else block correctly. For instance, imagine if you had [1, 2, 3] as short_urls for the sake of argument. The outer for loop will pick out 1 to compare to the list against. It will think it's finding a duplicate when in the inner for loop it encounters the first element, a 1 as well. Essentially, every element will be tagged as a duplicate and if you plan on removing duplicates, you'll end up with an empty list.
The better solution is to call set(short_urls) to get a set of your urls with the duplicates removed. If you want a list (as opposed to a set) of urls with the duplicates removed, you can convert the set back into a list with list(set(short_urls)).
In other words:
short_urls = ['google.com', 'twitter.com', 'google.com']
duplicates_removed_list = list(set(short_urls))
print duplicates_removed_list # Prints ['google.com', 'twitter.com']
if i == s:
is not inside the second for loop. You missed an indentation
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
EDIT: Also you are comparing every element of an array with every element of the same array. This means compare the element at position 0 with the element at postion 0, which is obviously the same.
What you need to do is starting the second for at the position after that reached in the first for.
I know that it is not allowed to remove elements while iterating a list, but is it allowed to add elements to a python list while iterating. Here is an example:
for a in myarr:
if somecond(a):
myarr.append(newObj())
I have tried this in my code and it seems to work fine, however I don't know if it's because I am just lucky and that it will break at some point in the future?
EDIT: I prefer not to copy the list since "myarr" is huge, and therefore it would be too slow. Also I need to check the appended objects with "somecond()".
EDIT: At some point "somecond(a)" will be false, so there can not be an infinite loop.
EDIT: Someone asked about the "somecond()" function. Each object in myarr has a size, and each time "somecond(a)" is true and a new object is appended to the list, the new object will have a size smaller than a. "somecond()" has an epsilon for how small objects can be and if they are too small it will return "false"
Why don't you just do it the idiomatic C way? This ought to be bullet-proof, but it won't be fast. I'm pretty sure indexing into a list in Python walks the linked list, so this is a "Shlemiel the Painter" algorithm. But I tend not to worry about optimization until it becomes clear that a particular section of code is really a problem. First make it work; then worry about making it fast, if necessary.
If you want to iterate over all the elements:
i = 0
while i < len(some_list):
more_elements = do_something_with(some_list[i])
some_list.extend(more_elements)
i += 1
If you only want to iterate over the elements that were originally in the list:
i = 0
original_len = len(some_list)
while i < original_len:
more_elements = do_something_with(some_list[i])
some_list.extend(more_elements)
i += 1
well, according to http://docs.python.org/tutorial/controlflow.html
It is not safe to modify the sequence
being iterated over in the loop (this
can only happen for mutable sequence
types, such as lists). If you need to
modify the list you are iterating over
(for example, to duplicate selected
items) you must iterate over a copy.
You could use the islice from itertools to create an iterator over a smaller portion of the list. Then you can append entries to the list without impacting the items you're iterating over:
islice(myarr, 0, len(myarr)-1)
Even better, you don't even have to iterate over all the elements. You can increment a step size.
In short: If you'are absolutely sure all new objects fail somecond() check, then your code works fine, it just wastes some time iterating the newly added objects.
Before giving a proper answer, you have to understand why it considers a bad idea to change list/dict while iterating. When using for statement, Python tries to be clever, and returns a dynamically calculated item each time. Take list as example, python remembers a index, and each time it returns l[index] to you. If you are changing l, the result l[index] can be messy.
NOTE: Here is a stackoverflow question to demonstrate this.
The worst case for adding element while iterating is infinite loop, try(or not if you can read a bug) the following in a python REPL:
import random
l = [0]
for item in l:
l.append(random.randint(1, 1000))
print item
It will print numbers non-stop until memory is used up, or killed by system/user.
Understand the internal reason, let's discuss the solutions. Here are a few:
1. make a copy of origin list
Iterating the origin list, and modify the copied one.
result = l[:]
for item in l:
if somecond(item):
result.append(Obj())
2. control when the loop ends
Instead of handling control to python, you decides how to iterate the list:
length = len(l)
for index in range(length):
if somecond(l[index]):
l.append(Obj())
Before iterating, calculate the list length, and only loop length times.
3. store added objects in a new list
Instead of modifying the origin list, store new object in a new list and concatenate them afterward.
added = [Obj() for item in l if somecond(item)]
l.extend(added)
You can do this.
bonus_rows = []
for a in myarr:
if somecond(a):
bonus_rows.append(newObj())
myarr.extend( bonus_rows )
Access your list elements directly by i. Then you can append to your list:
for i in xrange(len(myarr)):
if somecond(a[i]):
myarr.append(newObj())
make copy of your original list, iterate over it,
see the modified code below
for a in myarr[:]:
if somecond(a):
myarr.append(newObj())
I had a similar problem today. I had a list of items that needed checking; if the objects passed the check, they were added to a result list. If they didn't pass, I changed them a bit and if they might still work (size > 0 after the change), I'd add them on to the back of the list for rechecking.
I went for a solution like
items = [...what I want to check...]
result = []
while items:
recheck_items = []
for item in items:
if check(item):
result.append(item)
else:
item = change(item) # Note that this always lowers the integer size(),
# so no danger of an infinite loop
if item.size() > 0:
recheck_items.append(item)
items = recheck_items # Let the loop restart with these, if any
My list is effectively a queue, should probably have used some sort of queue. But my lists are small (like 10 items) and this works too.
You can use an index and a while loop instead of a for loop if you want the loop to also loop over the elements that is added to the list during the loop:
i = 0
while i < len(myarr):
a = myarr[i];
i = i + 1;
if somecond(a):
myarr.append(newObj())
Expanding S.Lott's answer so that new items are processed as well:
todo = myarr
done = []
while todo:
added = []
for a in todo:
if somecond(a):
added.append(newObj())
done.extend(todo)
todo = added
The final list is in done.
Alternate solution :
reduce(lambda x,newObj : x +[newObj] if somecond else x,myarr,myarr)
Assuming you are adding at the last of this list arr, You can try this method I often use,
arr = [...The list I want to work with]
current_length = len(arr)
i = 0
while i < current_length:
current_element = arr[i]
do_something(arr[i])
# Time to insert
insert_count = 1 # How many Items you are adding add the last
arr.append(item_to_be inserted)
# IMPORTANT!!!! increase the current limit and indexer
i += 1
current_length += insert_count
This is just boilerplate and if you run this, your program will freeze because of infinite loop. DO NOT FORGET TO TERMINATE THE LOOP unless you need so.