Using Python, I'm trying to read a list or strings backwards. When finding the item of interest, I want to print all of those items from that point to the end of the list. I can do this without recursion and it works fine, but I feel like there's a nicer way to do this with recursion. :)
Example without recursion:
items = ['item1', 'item2', 'item3', 'item4', 'item5']
items_of_interest = []
items.reverse()
for item in items:
items_of_interest.append(item)
if item == 'item3':
break
else:
continue
items_of_interest.reverse()
print items_of_interest
['item3', 'item4', 'item5']
Update:
To add clarity to the question, the list is actually the output of a grep of a set of strings from a log file. The set of strings may be repeating and I only want the last set.
Recursion wouldn't make this simpler, it would make it more complicated.
for i, item in enumerate(reversed(items), 1):
if item == 'item3':
items_of_interest = items[-i:]
break
else:
# 'item3' wasn't found
seems to be the simplest efficient way to do this to me. You only have to iterate over the list from the end to 'item3', since reversed returns an iterator.
Edit: if you don't mind iterating over the whole list to create a reversed version, you can use:
i = list(reversed(items)).index('item3')
items_of_interest = items[-i-1:]
which is even simpler. It raises an error if 'item3' isn't in the list. I'm using list(reversed()) instead of [:] then reverse() because it's one iteration over the list instead of two.
Edit 2: Based on your comment to the other answer, my first version does what you want -- searches for the item from the end without iterating over the whole list. The version in the question has to iterate the list to reverse it, as does my second version.
A minimally modified, but more efficient, version of your original would be:
items_of_interest = []
for item in reversed(items):
items_of_interest.append(item)
if item == 'item3':
break
items_of_interest.reverse()
A recursive solution is not called for here. To the problem of finding the slice of a list from the last occurrence of a item to the end of the list, one approach is to define an auxiliary function
>>> def rindex(s, x):
... for i, y in enumerate(reversed(s)):
... if x == y:
... return -i-1
... raise ValueError
...
>>> items[rindex(items, "b"):]
['b', 'f']
The auxiliary function can be called rindex because Python has a rindex method to find the last occurrence of a substring in a string.
If you must do it recursively (perhaps it is homework) then think about it as in this pseudocode (not yet worked out completely)
def tail_from(list, x):
return tail_from_aux(list, x, [])
def tail_from_aux(list, element, accumulated):
if list is empty:
return []
elif list ends with element
return element::accumulated
else:
last = list[-1]
return tail_from_aux(list[:-1], element, last::accumulated)
But, this is memory-intensive, goes through the whole list (inefficient), and is not Pythonic. It may be appropriate for other languages, but not Python. Do not use.
Since your actual question refers to files, and log files at that, you may not be able to reduce this problem to an array search. Therefore, check out
Read a file in reverse order using python, there are some interesting answers there as well as some links to follow.
Consider mixing tac with awk and grep if you are able to as well.
Related
Is there any way to rewrite the below python code in one line
for i in range(len(main_list)):
if main_list[i] != []:
for j in range(len(main_list[i])):
main_list[i][j][6]=main_list[i][j][6].strftime('%Y-%m-%d')
something like below,
[main_list[i][j][6]=main_list[i][j][6].strftime('%Y-%m-%d') for i in range(len(main_list)) if main_list[i] != [] for j in range(len(main_list[i]))]
I got SyntaxError for this.
Actually, i'm trying to storing all the values fetched from table into one list. Since the table contains date method/datatype, my requirement needs to convert it to string as i faced with malformed string error.
So my approach is to convert that element of list from datetime.date() to str. And i got it working. Just wanted it to work with one line
Use the explicit for loop. There's no better option.
A list comprehension is used to create a new list, not to modify certain elements of an existing list.
You may be able to update values via a list comprehension, e.g. [L.__setitem__(i, 'some_value') for i in range(len(L))], but this is not recommended as you are using a side-effect and in the process creating a list of None values which you then discard.
You could also write a convoluted list comprehension with a ternary statement indicating when you meet the 6th element in a 3rd nested sublist. But this will make your code difficult to maintain.
In short, use the for loop.
You're getting a syntax error because you're not allowed to perform assignments within a list comprehension. Python forbids assignments because it is discouraging over complex list comprehensions in favour of for loops.
Obviously you shouldn't do this on one line, but this is how to do it:
import datetime
# Example from your comment:
type1 = "some type"
main_list = [[], [],
[[1, 2, 3, datetime.date(2016, 8, 18), type1],
[3, 4, 5, datetime.date(2016, 8, 18), type1]], [], []]
def fmt_times(lst):
"""Format the fourth value of each element of each non-empty sublist"""
for i in range(len(lst)):
if lst[i] != []:
for j in range(len(lst[i])):
lst[i][j][3] = lst[i][j][3].strftime('%Y-%m-%d')
return lst
def fmt_times_one_line(main_list):
"""Format the fourth value of each element of each non-empty sublist"""
return [[] if main_list[i] == [] else [[main_list[i][j][k] if k != 3 else main_list[i][j][k].strftime('%Y-%m-%d') for k in range(len(main_list[i][j]))] for j in range(len(main_list[i])) ] for i in range(len(main_list))]
import copy
# Deep copy needed because fmt_times modifies the sublists.
assert fmt_times(copy.deepcopy(main_list)) == fmt_times_one_line(main_list)
The list comprehension is a functional thing. If you know how map() works in python or javascript then it's the same thing. In a map() or comprehension we generally don't mutate the data we're mapping over (and python discourages attempting it) so instead we recreate the entire object, substituting only the values we wanted to modify.
One line?
main_list = convert_list(main_list)
You will have to put a few more lines somewhere else though:
def convert_list(main_list):
for i, ml in enumerate(main_list):
if isinstance(ml, list) and len(ml) > 0:
main_list[i] = convert_list(ml)
elif isinstance(ml, datetime.date):
main_list[i] = ml.strftime('%Y-%m-%d')
return main_list
You might be able to whack this together with a list comprehension but it's a terrible idea (for reasons better explained in the other answer).
I'm trying to write a program that removes duplicates from a list, but my program keeps throwing the error "list index out of range" on line 5, if n/(sequence[k]) == 1:. I can't figure this out. Am I right in thinking that the possible values of "k" are 0, 1, and 2? How is "sequence" with any of those as the index outside of the possible index range?
def remove_duplicates(sequence):
new_list = sequence
for n in sequence:
for k in range(len(sequence)):
if n/(sequence[k]) == 1:
new_list.remove(sequence[k])
print new_list
remove_duplicates([1,2,3])
I strongly suggest Akavall's answer:
list(set(your_list))
As to why you get out of range errors: Python passes by reference, that is sequence and new_list still point at the same memory location. Changing new_list also changes sequence.
And finally, you are comparing items with themselves, and then remove them. So basically even if you used a copy of sequence, like:
new_list = list(sequence)
or
new_list = sequence[:]
It would return an empty list.
Your error is concurrent modification of the list:
for k in range(len(sequence)):
if n/(sequence[k]) == 1:
new_list.remove(sequence[k])
It may seem removing from new_list shouldn't effect sequence, but you did new_list = sequence at the beginning of the function. This means new_list actually literally is sequence, perhaps what you meant is new_list=list(sequence), to copy the list?
If you accept that they are the same list, the error is obvious. When you remove items, the length, and the indexes change.
P.S. As mentioned in a comment by #Akavall, all you need is:
sequence=list(set(sequence))
To make sequence contain no dupes. Another option, if you need to preserve ordering, is:
from collections import OrderedDict
sequence=list(OrderedDict.fromkeys(sequence))
If you don't like list(set(your_list)) because it's not guaranteed to preserved order, you could grab the OrderedSet recipe and then do:
from ordered_set import OrderedSet
foo = list("face a dead cabbage")
print foo
print list(set(foo)) # Order might change
print list(OrderedSet(foo)) # Order preserved
# like #Akavall suggested
def remove_duplicates(sequence):
# returns unsorted unique list
return list(set(sequence))
# create a list, if ele from input not in that list, append.
def remove_duplicates(sequence):
lst = []
for i in sequence:
if i not in lst:
lst.append(i)
# returns unsorted unique list
return lst
This question already has answers here:
Strange result when removing item from a list while iterating over it
(8 answers)
Closed 7 years ago.
This is the most common problem I face while trying to learn programming in python. The problem is, when I try to iterate a list using "range()" function to check if given item in list meets given condition and if yes then to delete it, it will always give "IndexError". So, is there a particular way to do this without using any other intermediate list or "while" statement? Below is an example:
l = range(20)
for i in range(0,len(l)):
if l[i] == something:
l.pop(i)
First of all, you never want to iterate over things like that in Python. Iterate over the actual objects, not the indices:
l = range(20)
for i in l:
...
The reason for your error was that you were removing an item, so the later indices cease to exist.
Now, you can't modify a list while you are looping over it, but that isn't a problem. The better solution is to use a list comprehension here, to filter out the extra items.
l = range(20)
new_l = [i for i in l if not i == something]
You can also use the filter() builtin, although that tends to be unclear in most situations (and slower where you need lambda).
Also note that in Python 3.x, range() produces a generator, not a list.
It would also be a good idea to use more descriptive variable names - I'll presume here it's for example, but names like i and l are hard to read and make it easier to introduce bugs.
Edit:
If you wish to update the existing list in place, as pointed out in the comments, you can use the slicing syntax to replace each item of the list in turn (l[:] = new_l). That said, I would argue that that case is pretty bad design. You don't want one segment of code to rely on data being updated from another bit of code in that way.
Edit 2:
If, for any reason, you need the indices as you loop over the items, that's what the enumerate() builtin is for.
You can always do this sort of thing with a list comprehension:
newlist=[i for i in oldlist if not condition ]
As others have said, iterate over the list and create a new list with just the items you want to keep.
Use a slice assignment to update the original list in-place.
l[:] = [item for item in l if item != something]
You should look the problem from the other side: add an element to a list when it is equal with "something". with list comprehension:
l = [i for i in xrange(20) if i != something]
you should not use for i in range(0,len(l)):, use for i, item in enumerate(l): instead if you need the index, for item in l: if not
you should not manipulate a structure you are iterating over. when faced to do so, iterate over a copy instead
don't name a variable l (may be mistaken as 1 or I)
if you want to filter a list, do so explicitly. use filter() or list comprehensions
BTW, in your case, you could also do:
while something in list_: list_.remove(something)
That's not very efficient, though. But depending on context, it might be more readable.
The reason you're getting an IndexError is because you're changing the length of the list as you iterate in the for-loop. Basically, here's the logic...
#-- Build the original list: [0, 1, 2, ..., 19]
l = range(20)
#-- Here, the range function builds ANOTHER list, in this case also [0, 1, 2, ..., 19]
#-- the variable "i" will be bound to each element of this list, so i = 0 (loop), then i = 1 (loop), i = 2, etc.
for i in range(0,len(l)):
if i == something:
#-- So, when i is equivalent to something, you "pop" the list, l.
#-- the length of l is now *19* elements, NOT 20 (you just removed one)
l.pop(i)
#-- So...when the list has been shortened to 19 elements...
#-- we're still iterating, i = 17 (loop), i = 18 (loop), i = 19 *CRASH*
#-- There is no 19th element of l, as l (after you popped out an element) only
#-- has indices 0, ..., 18, now.
NOTE also, that you're making the "pop" decision based on the index of the list, not what's in the indexed cell of the list. This is unusual -- was that your intention? Or did you
mean something more like...
if l[i] == something:
l.pop(i)
Now, in your specific example, (l[i] == i) but this is not a typical pattern.
Rather than iterating over the list, try the filter function. It's a built-in (like a lot of other list processing functions: e.g. map, sort, reverse, zip, etc.)
Try this...
#-- Create a function for testing the elements of the list.
def f(x):
if (x == SOMETHING):
return False
else:
return True
#-- Create the original list.
l = range(20)
#-- Apply the function f to each element of l.
#-- Where f(l[i]) is True, the element l[i] is kept and will be in the new list, m.
#-- Where f(l[i]) is False, the element l[i] is passed over and will NOT appear in m.
m = filter(f, l)
List processing functions go hand-in-hand with "lambda" functions - which, in Python, are brief, anonymous functions. so, we can re-write the above code as...
#-- Create the original list.
l = range(20)
#-- Apply the function f to each element of l.
#-- Where lambda is True, the element l[i] is kept and will be in the new list, m.
#-- Where lambda is False, the element l[i] is passed over and will NOT appear in m.
m = filter(lambda x: (x != SOMETHING), l)
Give it a go and see it how it works!
I would like to extend a list while looping over it:
for idx in xrange(len(a_list)):
item = a_list[idx]
a_list.extend(fun(item))
(fun is a function that returns a list.)
Question:
Is this already the best way to do it, or is something nicer and more compact possible?
Remarks:
from matplotlib.cbook import flatten
a_list.extend(flatten(fun(item) for item in a_list))
should work but I do not want my code to depend on matplotlib.
for item in a_list:
a_list.extend(fun(item))
would be nice enough for my taste but seems to cause an infinite loop.
Context:
I have have a large number of nodes (in a dict) and some of them are special because they are on the boundary.
'a_list' contains the keys of these special/boundary nodes. Sometimes nodes are added and then every new node that is on the boundary needs to be added to 'a_list'. The new boundary nodes can be determined by the old boundary nodes (expresses here by 'fun') and every boundary node can add several new nodes.
Have you tried list comprehensions? This would work by creating a separate list in memory, then assigning it to your original list once the comprehension is complete. Basically its the same as your second example, but instead of importing a flattening function, it flattens it through stacked list comprehensions. [edit Matthias: changed + to +=]
a_list += [x for lst in [fun(item) for item in a_list] for x in lst]
EDIT: To explain what going on.
So the first thing that will happen is this part in the middle of the above code:
[fun(item) for item in a_list]
This will apply fun to every item in a_list and add it to a new list. Problem is, because fun(item) returns a list, now we have a list of lists. So we run a second (stacked) list comprehension to loop through all the lists in our new list that we just created in the original comprehension:
for lst in [fun(item) for item in a_list]
This will allow us to loop through all the lists in order. So then:
[x for lst in [fun(item) for item in a_list] for x in lst]
This means take every x (that is, every item) in every lst (all the lists we created in our original comprehension) and add it to a new list.
Hope this is clearer. If not, I'm always willing to elaborate further.
Using itertools, it can be written as:
import itertools
a_list += itertools.chain(* itertools.imap(fun, a_list))
or, if you're aiming for code golf:
a_list += sum(map(fun, a_list), [])
Alternatively, just write it out:
new_elements = map(fun, a_list) # itertools.imap in Python 2.x
for ne in new_elements:
a_list.extend(ne)
As you want to extend the list, but loop only over the original list, you can loop over a copy instead of the original:
for item in a_list[:]:
a_list.extend(fun(item))
Using generator
original_list = [1, 2]
original_list.extend((x for x in original_list[:]))
# [1, 2, 1, 2]
I'm doing an exercise as following:
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
sample solution:
def front_x(words):
listX = []
listO = []
for w in words:
if w.startswith('x'):
listX.append(w)
else:
listO.append(w)
listX.sort()
listO.sort()
return listX + listO
my solution:
def front_x(words):
listX = []
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
listX.sort()
words.sort()
return listX + words
as I tested my solution, the result is a little weird. Here is the source code with my solution: http://dl.dropbox.com/u/559353/list1.py. You might want to try it out.
The problem is that you loop over the list and remove elements from it (modifying it):
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
Example:
>>> a = range(5)
>>> for i in a:
... a.remove(i)
...
>>> a
[1, 3]
This code works as follows:
Get first element, remove it.
Move to the next element. But it is not 1 anymore because we removed 0 previously and thus 1 become the new first element. The next element is therefore 2 and 1 is skipped.
Same for 3 and 4.
Two main differences:
Removing an element from a list inside loop where the list is being iterated doesn't quite work in Python. If you were using Java you would get an exception saying that you are modifying a collection that is being iterated. Python doesn't shout this error apparently. #Felix_Kling explains it quite well in his answer.
Also you are modifying the input parameter words. So the caller of your function front_x will see words modified after the execution of the function. This behaviour, unless is explicitly expected, is better to be avoided. Imagine that your program is doing something else with words. Keeping two lists as in the sample solution is a better approach.
Altering the list you're iterating over results in undefined behaviour. That's why the sample solution creates two new lists instead of deleting from the source list.
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w) # Problem here!
See this question for a discussion on this matter. It basically boils down to list iterators iterating through the indexes of the list without going back and checking for modifications (which would be expensive!).
If you want to avoid creating a second list, you will have to perform two iterations. One to iterate over words to create listX and another to iterate over listX deleting from words.
That hint is misleading and unnecessary, you can do this without sorting and combining two lists independently:
>>> items = ['mix', 'xyz', 'apple', 'xanadu', 'aardvark']
>>> sorted(items, key=lambda item: (item[0]!='x', item))
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
The built-in sorted() function takes an option key argument that tells it what to sort by. In this case, you want to create a tuples like (False, 'xanadu') or (True, 'apple') for each element of the original list, which you can do with a lambda.