difference between 2 pieces Python code

difference between 2 pieces Python code - python

I'm doing an exercise as following:
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
sample solution:
def front_x(words):
listX = []
listO = []
for w in words:
if w.startswith('x'):
listX.append(w)
else:
listO.append(w)
listX.sort()
listO.sort()
return listX + listO
my solution:
def front_x(words):
listX = []
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
listX.sort()
words.sort()
return listX + words
as I tested my solution, the result is a little weird. Here is the source code with my solution: http://dl.dropbox.com/u/559353/list1.py. You might want to try it out.

The problem is that you loop over the list and remove elements from it (modifying it):
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
Example:
>>> a = range(5)
>>> for i in a:
... a.remove(i)
...
>>> a
[1, 3]
This code works as follows:
Get first element, remove it.
Move to the next element. But it is not 1 anymore because we removed 0 previously and thus 1 become the new first element. The next element is therefore 2 and 1 is skipped.
Same for 3 and 4.

Two main differences:
Removing an element from a list inside loop where the list is being iterated doesn't quite work in Python. If you were using Java you would get an exception saying that you are modifying a collection that is being iterated. Python doesn't shout this error apparently. #Felix_Kling explains it quite well in his answer.
Also you are modifying the input parameter words. So the caller of your function front_x will see words modified after the execution of the function. This behaviour, unless is explicitly expected, is better to be avoided. Imagine that your program is doing something else with words. Keeping two lists as in the sample solution is a better approach.

Altering the list you're iterating over results in undefined behaviour. That's why the sample solution creates two new lists instead of deleting from the source list.
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w) # Problem here!
See this question for a discussion on this matter. It basically boils down to list iterators iterating through the indexes of the list without going back and checking for modifications (which would be expensive!).
If you want to avoid creating a second list, you will have to perform two iterations. One to iterate over words to create listX and another to iterate over listX deleting from words.

That hint is misleading and unnecessary, you can do this without sorting and combining two lists independently:
>>> items = ['mix', 'xyz', 'apple', 'xanadu', 'aardvark']
>>> sorted(items, key=lambda item: (item[0]!='x', item))
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
The built-in sorted() function takes an option key argument that tells it what to sort by. In this case, you want to create a tuples like (False, 'xanadu') or (True, 'apple') for each element of the original list, which you can do with a lambda.

Related

How does the list comprehension to flatten a python list work? [duplicate]

This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed 7 months ago.
I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].
Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
I thought I understood how list comprehensions work, but apparently I haven't got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn't give the same result):
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because 'sublist' does not exist. My mind is completely warped, help!

Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.
The second list comprehension you wrote will raise a NameError, as 'sublist' has not yet been defined. You can see this by writing the list comprehension as a for loop:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
for sublist in l:
flattened_l.append(item)
The only reason you didn't see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.
For more information, you may want to check out Guido's tutorial on list comprehensions.

For the lazy dev that wants a quick answer:
>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]

While this approach definitely works for flattening lists, I wouldn't recommend it unless your sublists are known to be very small (1 or 2 elements each).
I've done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…
def flatten(l):
flattened = []
for sublist in l:
flattened.extend(sublist)
return flattened
While it's not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.
One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you'll be sorting it later, for example. If you're ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…
def flatten(l):
return (item for sublist in l for item in sublist) # note the parens

Note, of course, that the sort of comprehension will only "flatten" a list of lists (or list of other iterables). Also if you pass it a list of strings you'll "flatten" it into a list of characters.
To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let's start with a simple function:
import collections
def non_str_seq(p):
'''p is putatively a sequence and not a string nor bytearray'''
return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))
Using that we can then build a recursive function to flatten any
def flatten(s):
'''Recursively flatten any sequence of objects
'''
results = list()
if non_str_seq(s):
for each in s:
results.extend(flatten(each))
else:
results.append(s)
return results
There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).

Accessing elements of a list

I have a list of strings, and calling a function on each string which returns a string. The thing I want is to update the string in the list. How can I do that?
for i in list:
func(i)
The function func() returns a string. i want to update the list with this string. How can it be done?

If you need to update your list in place (not create a new list to replace it), you'll need to get indexes that corresponds to each item you get from your loop. The easiest way to do that is to use the built-in enumerate function:
for index, item in enumerate(lst):
lst[index] = func(item)

You can reconstruct the list with list comprehension like this
list_of_strings = [func(str_obj) for str_obj in list_of_strings]
Or, you can use the builtin map function like this
list_of_strings = map(func, list_of_strings)
Note : If you are using Python 3.x, then you need to convert the map object to a list, explicitly, like this
list_of_strings = list(map(func, list_of_strings))
Note 1: You don't have to worry about the old list and its memory. When you make the variable list_of_strings refer a new list by assigning to it, the reference count of the old list reduces by 1. And when the reference count drops to 0, it will be automatically garbage collected.

First, don't call your lists list (that's the built-in list constructor).
The most Pythonic way of doing what you want is a list comprehension:
lst = [func(i) for i in lst]
or you can create a new list:
lst2 = []
for i in lst:
lst2.append(func(i))
and you can even mutate the list in place
for n, i in enumerate(lst):
lst[n] = func(i)
Note: most programmers will be confused by calling the list item i in the loop above since i is normally used as a loop index counter, I'm just using it here for consistency.
You should get used to the first version though, it's much easier to understand when you come back to the code six months from now.
Later you might also want to use a generator...
g = (func(i) for i in lst)
lst = list(g)

You can use map() to do that.
map(func, list)

Python loop through list and shorten by one

I have a list:
mylist = ['apple', 'orange', 'dragon', 'panda']
I want to be able to is loop over the list, do something on each element and then remove the element. I tried this:
for l in mylist:
print l
list.remove(l)
but the my output is:
apple
dragon
EDIT
I actually want to be able to do some comparisons in the loop. So basically I want to be able to take each element, one-by-one, remove that element for the list and compare it against all the other elements in the list. The comparison is a little complex so I don't want to use list comprehension. And I want to be reducing the list by one each time until the list is empty and all elements have been compared with each other.
What is the best way to get each element, work with it and remove it without skipping elements in the list?
Any help, much appreciated.
REDIT
Just to make clear - the real point of this is to go through each element, which is a string fragment and match it with other fragments which have overlapping sequences on either end, thereby building up a complete sequence. The element being processed should be removed from the list prior to looping so that it isn't compared with itself, and the list should shrink by 1 element each processing loop.
In the case a better list example would be:
mylist = ['apples and or', 'oranges have', 'in common', 'e nothing in c']
to give:
'apples and oranges have nothing in common'
Apologies for not being clear from the outset, but it was a specific part of this larger problem that I was stuck on.

You can just reverse the list order (if you want to process the items in the original order), then use pop() to get the items and remove them in turn:
my_list = ['apple', 'orange', 'dragon', 'panda']
my_list.reverse()
while my_list:
print(my_list.pop())

Based on your requirement that you want to "be able to take each element, one-by-one, . . . for the list and compare it against all the other elements in the list", I believe you're best suited to use itertools. Here, without the inefficiency of removing elements from your list, you gain the fool-proof ability to compare every combination to eachother once and only once. Since your spec doesn't seem to provide any use for the deletion (other than achieving the goal of combinations), I feel this works quite nicely.
That said, list comprehensions would be the most python way to approach this, in my opinion, as it does not compromise any capability to do complex comparisons.
import itertools
l = ['apple', 'orange', 'dragon', 'panda']
def yourfunc(a,b):
pass
for a, b in itertools.combinations_with_replacement(l, 2):
yourfunc(a,b)
A list comprehension approach would have this code instead:
[yourfunc(a, b) for a,b in itertools.combinations(l, 2)]
EDIT: Based on your additional information, I believe you should reconsider itertools.
import itertools
l = ['apples and or', 'oranges have', 'in common', 'e nothing in c', 'on, dont you know?']
def find_overlap(a,b):
for i in xrange(len(a)):
if a[-i:] == b[0:i]:
return a + b[i:]
return ''
def reduce_combinations(fragments):
matches = []
for c in itertools.combinations(fragments, 2):
f = reduce(find_overlap, c[1:], c[0])
if f: matches.append(f)
return matches
copy = l
while len(copy) > 1:
copy = reduce_combinations(copy)
print copy
returns
['apples and oranges have nothing in common, dont you know?']
**EDIT: (again). **This permutation is a practical solution and has the added benefit of--while having more computations than the above solution, will provide all possible technical matches. The problem with the above solution is that it expects exactly one answer, which is evidenced by the while loop. Thus, it is much more efficient, but also potentially returning nothing if more than one answer exists.
import itertools
l = ['apples and or', 'oranges have', 'in common', 'e nothing in c', 'on, dont you know?']
def find_overlap(a,b):
for i in xrange(len(a)):
if a[-i:] == b[0:i]:
return a + b[i:]
return ''
matches = []
for c in itertools.combinations(l, 2):
f = reduce(find_overlap, c[1:], c[0])
if f: matches.append(f)
for c in itertools.combinations(matches, len(matches)):
f = reduce(find_overlap, c[1:], c[0])
if f: print f

Is there any reason you can't simply loop through all of the elements, do something to them and then reset the list to an empty list afterwards? Something like:
for l in my_list:
print l
my_list = []
# or, if you want to mutate the actual list object, and not just re-assign
# a blank list to my_list
my_list[:] = []
EDIT
Based on your update, what you need to do is use the popping approach that has been mentioned:
while len(my_list):
item = my_list.pop()
do_some_complicated_comparisons(item)
if you do care about order, then just pop from the front:
my_list.pop(0)
or reverse the list before looping:
my_list.reverse()

You can't remove elements while iterating over the list. Process the elements and then take care of the list. This is the case in all programming languages, not just Python, because it causes these skipping issues.
As an alternative, you can do list = [] afterwards when you're done with the elements.

By making a copy:
for l in original[:]:
print l
original.remove(l)

You could use the stack operations to achieve that:
while len(mylist):
myitem = mylist.pop(0)
# Do something with myitem
# ...

#! C:\python27
import string
list = ['apple', 'orange', 'dragon', 'panda']
print list
myLength = len(list) -1
print myLength
del list[myLength]
print list
[EDIT]
Heres the code to loop through and find a word which a user input and remove it.
#! C:\python27
import string
whattofind = raw_input("What shall we look for and delete?")
myList = ['apple', 'orange', 'dragon', 'panda']
print myList
for item in myList:
if whattofind in myList:
myList.remove(whattofind)
print myList

If forward order doesn't matter, I might do something like this:
l = ['apple', 'orange', 'dragon', 'panda']
while l:
print l.pop()
Given your edit, an excellent alternative is to use a deque instead of a list.
>>> import collections
>>> l = ['apple', 'orange', 'dragon', 'panda']
>>> d = collections.deque(l)
>>> while d:
i = d.popleft()
for j in d:
if i > j: print (i, j)
...
('orange', 'dragon')
Using a deque is nice because popping from either end is O(1). Best not to use a deque for random access though, because that's slower than for a list.
On the other hand, since you're iterating over the whole list every time anyway, the asymptotic performance of your code will be O(n ** 2) anyway. So using a list and popping from the beginning with pop(0) is justifiable from an asymptotic point of view (though it will be slower than using a deque by some constant multiple).
But in fact, since your goal seems to be generating combinations, you should consider hexparrot's answer, which is quite elegant -- though performance-wise, it shouldn't be too different from the above deque-based solution, since removing items from a deque is cheap.

Hmm.. Seeing as you are not able to remove all items this way, even though you iterate through all of them.. try this:
#! C:\python27
list1 = ["apple","pear","falcon","bear"] #define list 1
list2 = [] #define list 2
item2 ="" #define temp item
for item in list1[:]:
item2 = item+"2" #take current item from list 2 and do something. (Add 2 in my case)
list2.append(item2) #add modified item to list2
list1.remove(item) #remove the un-needed item from list1
print list1 #becomes empty
print list2 #full and with modified items.
Im assuming if you are running a comparison, you can dump an ''if'' clause after ''for'' to run the comparison or something. But that seems to be the way to do it.

append/extend list in loop

I would like to extend a list while looping over it:
for idx in xrange(len(a_list)):
item = a_list[idx]
a_list.extend(fun(item))
(fun is a function that returns a list.)
Question:
Is this already the best way to do it, or is something nicer and more compact possible?
Remarks:
from matplotlib.cbook import flatten
a_list.extend(flatten(fun(item) for item in a_list))
should work but I do not want my code to depend on matplotlib.
for item in a_list:
a_list.extend(fun(item))
would be nice enough for my taste but seems to cause an infinite loop.
Context:
I have have a large number of nodes (in a dict) and some of them are special because they are on the boundary.
'a_list' contains the keys of these special/boundary nodes. Sometimes nodes are added and then every new node that is on the boundary needs to be added to 'a_list'. The new boundary nodes can be determined by the old boundary nodes (expresses here by 'fun') and every boundary node can add several new nodes.

Have you tried list comprehensions? This would work by creating a separate list in memory, then assigning it to your original list once the comprehension is complete. Basically its the same as your second example, but instead of importing a flattening function, it flattens it through stacked list comprehensions. [edit Matthias: changed + to +=]
a_list += [x for lst in [fun(item) for item in a_list] for x in lst]
EDIT: To explain what going on.
So the first thing that will happen is this part in the middle of the above code:
[fun(item) for item in a_list]
This will apply fun to every item in a_list and add it to a new list. Problem is, because fun(item) returns a list, now we have a list of lists. So we run a second (stacked) list comprehension to loop through all the lists in our new list that we just created in the original comprehension:
for lst in [fun(item) for item in a_list]
This will allow us to loop through all the lists in order. So then:
[x for lst in [fun(item) for item in a_list] for x in lst]
This means take every x (that is, every item) in every lst (all the lists we created in our original comprehension) and add it to a new list.
Hope this is clearer. If not, I'm always willing to elaborate further.

Using itertools, it can be written as:
import itertools
a_list += itertools.chain(* itertools.imap(fun, a_list))
or, if you're aiming for code golf:
a_list += sum(map(fun, a_list), [])
Alternatively, just write it out:
new_elements = map(fun, a_list) # itertools.imap in Python 2.x
for ne in new_elements:
a_list.extend(ne)

As you want to extend the list, but loop only over the original list, you can loop over a copy instead of the original:
for item in a_list[:]:
a_list.extend(fun(item))

Using generator
original_list = [1, 2]
original_list.extend((x for x in original_list[:]))
# [1, 2, 1, 2]

Python recursion to print items from a list backwards

Using Python, I'm trying to read a list or strings backwards. When finding the item of interest, I want to print all of those items from that point to the end of the list. I can do this without recursion and it works fine, but I feel like there's a nicer way to do this with recursion. :)
Example without recursion:
items = ['item1', 'item2', 'item3', 'item4', 'item5']
items_of_interest = []
items.reverse()
for item in items:
items_of_interest.append(item)
if item == 'item3':
break
else:
continue
items_of_interest.reverse()
print items_of_interest
['item3', 'item4', 'item5']
Update:
To add clarity to the question, the list is actually the output of a grep of a set of strings from a log file. The set of strings may be repeating and I only want the last set.

Recursion wouldn't make this simpler, it would make it more complicated.
for i, item in enumerate(reversed(items), 1):
if item == 'item3':
items_of_interest = items[-i:]
break
else:
# 'item3' wasn't found
seems to be the simplest efficient way to do this to me. You only have to iterate over the list from the end to 'item3', since reversed returns an iterator.
Edit: if you don't mind iterating over the whole list to create a reversed version, you can use:
i = list(reversed(items)).index('item3')
items_of_interest = items[-i-1:]
which is even simpler. It raises an error if 'item3' isn't in the list. I'm using list(reversed()) instead of [:] then reverse() because it's one iteration over the list instead of two.
Edit 2: Based on your comment to the other answer, my first version does what you want -- searches for the item from the end without iterating over the whole list. The version in the question has to iterate the list to reverse it, as does my second version.
A minimally modified, but more efficient, version of your original would be:
items_of_interest = []
for item in reversed(items):
items_of_interest.append(item)
if item == 'item3':
break
items_of_interest.reverse()

A recursive solution is not called for here. To the problem of finding the slice of a list from the last occurrence of a item to the end of the list, one approach is to define an auxiliary function
>>> def rindex(s, x):
... for i, y in enumerate(reversed(s)):
... if x == y:
... return -i-1
... raise ValueError
...
>>> items[rindex(items, "b"):]
['b', 'f']
The auxiliary function can be called rindex because Python has a rindex method to find the last occurrence of a substring in a string.
If you must do it recursively (perhaps it is homework) then think about it as in this pseudocode (not yet worked out completely)
def tail_from(list, x):
return tail_from_aux(list, x, [])
def tail_from_aux(list, element, accumulated):
if list is empty:
return []
elif list ends with element
return element::accumulated
else:
last = list[-1]
return tail_from_aux(list[:-1], element, last::accumulated)
But, this is memory-intensive, goes through the whole list (inefficient), and is not Pythonic. It may be appropriate for other languages, but not Python. Do not use.
Since your actual question refers to files, and log files at that, you may not be able to reduce this problem to an array search. Therefore, check out
Read a file in reverse order using python, there are some interesting answers there as well as some links to follow.
Consider mixing tac with awk and grep if you are able to as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.