Python generator function to loop over iterable sequence while eliminating duplicates

Python generator function to loop over iterable sequence while eliminating duplicates - python

I am trying to create a generator function that loops over an iterable sequence while eliminating duplicates and then returns each result in order one at a time (not as a set or list), but I am having difficulty getting it to work. I have found similar questions here, but the responses pretty uniformly result in a list being produced.
I would like the output to be something like:
>>> next(i)
2
>>> next(i)
8
>>> next(i)
4....
I was able to write it as a regular function that produces a list:
def unique(series):
new_series = []
for i in series:
if i not in new_series:
new_series.append(i)
return new_series
series = ([2,8,4,5,5,6,6,6,2,1])
print(unique(series))
I then tried rewriting it as a generator function by eliminating the lines that create a blank list and that append to that list, and then using "yield" instead of "return"; but I’m not getting it to work:
def unique(series):
for i in series:
if i not in new_series:
yield new_series
I don't know if I'm leaving something out or putting too much in. Thank you for any assistance.

Well, to put it simply, you need something to "remember" the values you find. In your first function you were using the new list itself, but in the second one you don't have it, so it fails. You can use a set() for this purpose.
def unique(series):
seen = set()
for i in series:
if i not in seen:
seen.add(i)
yield i
Also, yield should "yield" a single value at once, not the entire new list.
To print out the elements, you'll have to iterate on the generator. Simply doing print(unique([1, 2, 3])) will print the resulting generator object.
>>> print(unique([1, 1, 2, 3]))
<generator object unique at 0x1023bda98>
>>> print(*unique([1, 1, 2, 3]))
1 2 3
>>> for x in unique([1, 1, 2, 3]):
print(x)
1
2
3
Note: * in the second example is the iterable unpack operator.

Try this:
def unique(series):
new_se = []
for i in series:
if i not in new_se:
new_se.append(i)
new_se = list(dict.fromkeys(new_se)) # this will remove duplicates
return new_se
series = [2,8,4,5,5,6,6,6,2,1]
print(unique(series))

Related

Confused about python, lists/generator objects [duplicate]

This question already has an answer here:
Why a generator object is obtained instead of a list
(1 answer)
Closed 3 years ago.
Not sure if this has been asked before but I couldn't find a proper, clear explanation.
I had a concern about something related to python syntax.
While practicing some python, I Intuitively assumed this would print all the elements of the list; list1.
But it doesn't seem to do so, why would that be?
I could obviously print it in many other ways; but I fail to understand the inherent python logic at play here.
list1 = [1,2,3,4]
print(list1[i] for i in range(len(list1)))
I expected the output to be '[1, 2, 3, 4]', but it instead prints a generator object.

You need to surround list1[i] for i in range(len(list)) with [] to indicate that it's a list. Although list1 is a list, you are trying to use a generator expression to print it out, which will return a generator object (type of iterable similar to a list.) Without specifying you want to convert the generator to a list, it won't print a list. (A generator expression converted to a list is called list comprehension.)
Even if you did do this, it would still print it [1, 2, 3, 4] rather than 1 2 3 4. You need to do [print(list1[i], end=" ") for i in range(len(list1)))] for that to work. There are far better ways of doing this: see donkopotamus's answer.

The expression (list1[i] for i in range(len(list))) defines a generator object. So that is what is printed.
If you wish to print a list, then make it a list comprehension rather than a generator, and print that:
print( [list1[i] for i in range(len(list1))] )
Alternatively, you could force evaluation of the generator into a tuple (or list or set), by passing the generator to the appropriate type using eg
print(tuple(list1[i] for i in range(len(list1))))
In order to get the specific output you intended (space separated) of 1 2 3 4 you could use str.join in the following way:
>>> list1 = [1, 2, 3, 4]
>>> print(" ".join(list1[i] for i in range(len(list1))))
1 2 3 4
or unpack the list into print (this will not work in python 2, as in python 2 print is not a function)
>>> print(*(list1[i] for i in range(len(list1))))
1 2 3 4

(list1[i] for i in range(len(list1)))
is indeed a generator object, equivalent to simply
(x for x in list1)
You're passing that generator to print as a single argument, so print simply prints it: it does not extract the elements from it.
Alternatively, you can unpack it as you pass it to print:
print(*(list1[i] for i in range(len(list1))))
This will pass each element of the generated sequence to print as a separate argument, so they should each get printed.
If you simply meant to print your list, any of the following would have worked:
print(list1)
print([list1[i] for i in range(len(list1))])
print([x for x in list1])
The use of square brackets makes a list comprehension rather than a generator expression.

There is something called list comprehension and generator expression in python. They are awesome tools, you can find more info by googling. Here is a link.
Basically, you can make a list on the fly in python.
list1 = [1,2,3,4]
squared_list = [i*i for i in list1]
would return a list with all the items squared. However, is we say
squared_gen_list = (i*i for i in list1)
this returns what is known as a generator object. That is what is happening in your case, as you can see from the syntax and so you are just printing that out. Hope that clears up the confusion.

Python - list comprehension , 2D list

I'm trying to figure out how to delete duplicates from 2D list. Let's say for example:
x= [[1,2], [3,2]]
I want the result:
[1, 2, 3]
in this order.
Actually I don't understand why my code doesn't do that :
def removeDuplicates(listNumbers):
finalList=[]
finalList=[number for numbers in listNumbers for number in numbers if number not in finalList]
return finalList
If I should write it in nested for-loop form it'd look same
def removeDuplicates(listNumbers):
finalList=[]
for numbers in listNumbers:
for number in numbers:
if number not in finalList:
finalList.append(number)
return finalList
"Problem" is that this code runs perfectly. Second problem is that order is important. Thanks

finalList is always an empty list on your list-comprehension even though you think it's appending during that to it, which is not the same exact case as the second code (double for loop).
What I would do instead, is use set:
>>> set(i for sub_l in x for i in sub_l)
{1, 2, 3}
EDIT:
Otherway, if order matters and approaching your try:
>>> final_list = []
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> list(filter(lambda x: f.append(x) if x not in final_list else None, x_flat))
[] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
Or
>>> list(map(lambda x: final_list.append(x) if x not in final_list else None, x_flat))
[None, None, None, None] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
EDIT2:
As mentioned by timgeb, obviously the map & filter will throw away lists that are at the end useless and worse than that, they consume memory. So, I would go with the nested for loop as you did in your last code example, but if you want it with the list comprehension approach than:
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> final_list = []
>>> for number in x_flat:
if number not in final_list:
finalList.append(number)

The expression on the right-hand-side is evalueated first, before assigning the result of this list comprehension to the finalList.
Whereas in your second approach you write to this list all the time between the iterations. That's the difference.
That may be similar to the considerations why the manuals warn about unexpected behaviour when writing to the iterated iterable inside a for loop.
you could use the built-in set()-method to remove duplicates (you have to do flatten() on your list before)

You declare finalList as the empty list first, so
if number not in finalList
will be False all the time.
The right hand side of your comprehension will be evaluated before the assignment takes place.
Iterate over the iterator chain.from_iterable gives you and remove duplicates in the usual way:
>>> from itertools import chain
>>> x=[[1,2],[3,2]]
>>>
>>> seen = set()
>>> result = []
>>> for item in chain.from_iterable(x):
... if item not in seen:
... result.append(item)
... seen.add(item)
...
>>> result
[1, 2, 3]
Further reading: How do you remove duplicates from a list in Python whilst preserving order?
edit:
You don't need the import to flatten the list, you could just use the generator
(item for sublist in x for item in sublist)
instead of chain.from_iterable(x).

There is no way in Python to refer to the current comprehesion. In fact, if you remove the line finalList=[], which does nothing, you would get an error.
You can do it in two steps:
finalList = [number for numbers in listNumbers for number in numbers]
finalList = list(set(finalList))
or if you want a one-liner:
finalList = list(set(number for numbers in listNumbers for number in numbers))

Python: Get first element of list potentially containing sublists

I'm looking for the first element of a Python list potentially containing either numbers (integer or float), or many levels of nested sublists containing the same. In these examples, let's suppose I am always looking for the number '1'. If the list contains no sublists, we have:
>>> foo = [1,2,3]
>>> foo[0]
1
If the list contains one sublist, and I know this information, I can again obtain 1 with
>>> foo = [[1,2],[3,4]]
>>> foo[0][0]
1
Similarly if the first element of my list is a list containing a list:
>>> foo = [[[1,2],[3,4]],[[5,6],[7,8]]]
>>> foo[0][0][0]
1
Is there a general way to get the first integer or float in foo, without resorting to calling a function recursively until drilling down to a value of foo[0] that is no longer a list?

There shouldn't be any need for recursion. Assuming that you are always working with lists and ints, this should work perfectly well for you.
foo = [[[1,2],[3,4]],[[5,6],[7,8]]]
result = None
while True:
try:
result = foo[0]
except TypeError:
break
Unlike the other answers, this asks for forgiveness rather than for permission, which is a bit more Pythonic.
If you really want to be Pythonic, you could define a function like as follows. However, this would admittedly be overkill given your specification.
def first_scalar(foo):
result = None
while True:
try:
result = next(iter(foo))
except TypeError:
return result
Note that it returns None if the argument is not an iterable. The same applies for the first segment of code.
Note that this doesn't work if the if the deepest "left-most" child list is empty. To account for this, you'll need to totally flatten the list.
def _flatten(foo):
try:
for item in foo:
yield from flatten(foo)
except TypeError:
yield foo
def flatten(foo):
for item in foo:
yield from _flatten(foo)
def first_scalar(foo):
return next(flatten(foo))
Note that the above must be written in at least Python 3.3.
The following code is for earlier versions of Python.
def _flatten(foo):
try:
for item in foo:
for subitem in _flatten(foo):
yield subitem
except TypeError:
yield foo
def flatten(foo):
for item in foo:
for subitem in _flatten(foo):
yield subitem

The general-case answer for this is "Fix your data structure." Lists are supposed to be homogeneous, e.g. every element of the list should have the same type (be that int or list of ints or list of lists of ints or etc).
The special case here would be to recurse until you find a number and return it.
def foo(lst):
first_el = lst[0]
if isinstance(first_el, (float, int)):
return first_el
else:
return foo(first_el)

create a simple, recursive function:
>>> def getFirst(l):
return l[0] if not isinstance(l[0],list) else getFirst(l[0])
>>> getFirst([1,2,3,4])
1
>>> getFirst([[1,2,3],[4,5]])
1
>>> getFirst([[[4,2],12,[1,3]],1])
4
this will return l[0] if l[0] is anything but a list. else, it will return the first item of l[0] recursively

You can just "dive in", without any recursion:
lst = [[1, 2], [3, 4]]
first = lst
while isinstance(first, list):
first = first[0]

If you really want to avoid any loops or recursion, there is an ugly workaround. Transform the list to a string and then remove the list-specific chars:
','.join(map(str,foo)).replace('[','').replace(']','').replace(' ','').split(',')
Of course it only works if the list is composed by strings or integers. If the objects in the list are custom, you would have to transform them to string. But, since there is an unknown number of sublists, you would have to use recursion, so using this workaround wouldn't make sense.
Another thing, maybe the elements of the list and sublists have the same chars as the list-specific ones, such as '[' or ',', so that would also be a problem.
In short, this is a bad workaround that only works for sure if the list and sublists are composed of numbers. Otherwise, using some kind of recursion is most probably necessary.

what is a quick way to delete all elements from a list that do not satisfy a constraint?

I have a list of strings. I have a function that given a string returns 0 or 1. How can I delete all strings in the list for which the function returns 0?

[x for x in lst if fn(x) != 0]
This is a "list comprehension", one of Python's nicest pieces of syntactical sugar that often takes lines of code in other languages and additional variable declarations, etc.
See:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions

I would use a generator expression over a list comprehension to avoid a potentially large, intermediate list.
result = (x for x in l if f(x))
# print it, or something
print list(result)
Like a list comprehension, this will not modify your original list, in place.

edit: see the bottom for the best answer.
If you need to mutate an existing list, for example because you have another reference to it somewhere else, you'll need to actually remove the values from the list.
I'm not aware of any such function in Python, but something like this would work (untested code):
def cull_list(lst, pred):
"""Removes all values from ``lst`` which for which ``pred(v)`` is false."""
def remove_all(v):
"""Remove all instances of ``v`` from ``lst``"""
try:
while True:
lst.remove(v)
except ValueError:
pass
values = set(lst)
for v in values:
if not pred(v):
remove_all(v)
A probably more-efficient alternative that may look a bit too much like C code for some people's taste:
def efficient_cull_list(lst, pred):
end = len(lst)
i = 0
while i < end:
if not pred(lst[i]):
del lst[i]
end -= 1
else:
i += 1
edit...: as Aaron pointed out in the comments, this can be done much more cleanly with something like
def reversed_cull_list(lst, pred):
for i in range(len(lst) - 1, -1, -1):
if not pred(lst[i]):
del lst[i]
...edit
The trick with these routines is that using a function like enumerate, as suggested by (an) other responder(s), will not take into account the fact that elements of the list have been removed. The only way (that I know of) to do that is to just track the index manually instead of allowing python to do the iteration. There's bound to be a speed compromise there, so it may end up being better just to do something like
lst[:] = (v for v in lst if pred(v))
Actually, now that I think of it, this is by far the most sensible way to do an 'in-place' filter on a list. The generator's values are iterated before filling lst's elements with them, so there are no index conflict issues. If you want to make this more explicit just do
lst[:] = [v for v in lst if pred(v)]
I don't think it will make much difference in this case, in terms of efficiency.
Either of these last two approaches will, if I understand correctly how they actually work, make an extra copy of the list, so one of the bona fide in-place solutions mentioned above would be better if you're dealing with some "huge tracts of land."

>>> s = [1, 2, 3, 4, 5, 6]
>>> def f(x):
... if x<=2: return 0
... else: return 1
>>> for n,x in enumerate(s):
... if f(x) == 0: s[n]=None
>>> s=filter(None,s)
>>> s
[3, 4, 5, 6]

With a generator expression:
alist[:] = (item for item in alist if afunction(item))
Functional:
alist[:] = filter(afunction, alist)
or:
import itertools
alist[:] = itertools.ifilter(afunction, alist)
All equivalent.
You can also use a list comprehension:
alist = [item for item in alist if afunction(item)]
An in-place modification:
import collections
indexes_to_delete= collections.deque(
idx
for idx, item in enumerate(alist)
if afunction(item))
while indexes_to_delete:
del alist[indexes_to_delete.pop()]

How do you create a list like PHP's in Python?

This is an incredibly simple question (I'm new to Python).
I basically want a data structure like a PHP array -- i.e., I want to initialise it and then just add values into it.
As far as I can tell, this is not possible with Python, so I've got the maximum value I might want to use as an index, but I can't figure out how to create an empty list of a specified length.
Also, is a list the right data structure to use to model what feels like it should just be an array? I tried to use an array, but it seemed unhappy with storing strings.
Edit: Sorry, I didn't explain very clearly what I was looking for. When I add items into the list, I do not want to put them in in sequence, but rather I want to insert them into specified slots in the list.
I.e., I want to be able to do this:
list = []
for row in rows:
c = list_of_categories.index(row["id"])
print c
list[c] = row["name"]

Depending on how you are going to use the list, it may be that you actually want a dictionary. This will work:
d = {}
for row in rows:
c = list_of_categories.index(row["id"])
print c
d[c] = row["name"]
... or more compactly:
d = dict((list_of_categories.index(row['id']), row['name']) for row in rows)
print d
PHP arrays are much more like Python dicts than they are like Python lists. For example, they can have strings for keys.
And confusingly, Python has an array module, which is described as "efficient arrays of numeric values", which is definitely not what you want.

If the number of items you want is known in advance, and you want to access them using integer, 0-based, consecutive indices, you might try this:
n = 3
array = n * [None]
print array
array[2] = 11
array[1] = 47
array[0] = 42
print array
This prints:
[None, None, None]
[42, 47, 11]

Use the list constructor, and append your items, like this:
l = list ()
l.append ("foo")
l.append (3)
print (l)
gives me ['foo', 3], which should be what you want. See the documentation on list and the sequence type documentation.
EDIT Updated
For inserting, use insert, like this:
l = list ()
l.append ("foo")
l.append (3)
l.insert (1, "new")
print (l)
which prints ['foo', 'new', 3]

http://diveintopython3.ep.io/native-datatypes.html#lists
You don't need to create empty lists with a specified length. You just add to them and query about their current length if needed.
What you can't do without preparing to catch an exception is to use a non existent index. Which is probably what you are used to in PHP.

You can use this syntax to create a list with n elements:
lst = [0] * n
But be careful! The list will contain n copies of this object. If this object is mutable and you change one element, then all copies will be changed! In this case you should use:
lst = [some_object() for i in xrange(n)]
Then you can access these elements:
for i in xrange(n):
lst[i] += 1
A Python list is comparable to a vector in other languages. It is a resizable array, not a linked list.

Sounds like what you need might be a dictionary rather than an array if you want to insert into specified indices.
dict = {'a': 1, 'b': 2, 'c': 3}
dict['a']
1

I agree with ned that you probably need a dictionary for what you're trying to do. But here's a way to get a list of those lists of categories you can do this:
lst = [list_of_categories.index(row["id"]) for row in rows]

use a dictionary, because what you're really asking for is a structure you can access by arbitrary keys
list = {}
for row in rows:
c = list_of_categories.index(row["id"])
print c
list[c] = row["name"]
Then you can iterate through the known contents with:
for x in list.values():
print x
Or check if something exists in the "list":
if 3 in list:
print "it's there"

I'm not sure if I understood what you mean or want to do, but it seems that you want a list which
is dictonary-like where the index is the key. Even if I think, the usage of a dictonary would be a better
choice, here's my answer: Got a problem - make an object:
class MyList(UserList.UserList):
NO_ITEM = 'noitem'
def insertAt(self, item, index):
length = len(self)
if index < length:
self[index] = item
elif index == length:
self.append(item)
else:
for i in range(0, index-length):
self.append(self.NO_ITEM)
self.append(item)
Maybe some errors in the python syntax (didn't check), but in principle it should work.
Of course the else case works also for the elif, but I thought, it might be a little harder
to read this way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python generator function to loop over iterable sequence while eliminating duplicates - python

Try this: def unique(series): new_se = [] for i in series: if i not in new_se: new_se.append(i) new_se = list(dict.fromkeys(new_se)) # this will remove duplicates return new_se series = [2,8,4,5,5,6,6,6,2,1] print(unique(series))

Related

Confused about python, lists/generator objects [duplicate]

Python - list comprehension , 2D list

Python: Get first element of list potentially containing sublists

what is a quick way to delete all elements from a list that do not satisfy a constraint?

How do you create a list like PHP's in Python?

Categories

Resources