I read the documentation on next() and I understand it abstractly. From what I understand, next() is used as a reference to an iterable object and makes python cycle to the next iterable object sequentially. Makes sense! My question is, how is this useful outside the context of the builtin for loop? When would someone ever need to use next() directly? Can someone provide a simplistic example? Thanks mates!
As luck would have it, I wrote one yesterday:
def skip_letters(f, skip=" "):
"""Wrapper function to skip specified characters when encrypting."""
def func(plain, *args, **kwargs):
gen = f(p for p in plain if p not in skip, *args, **kwargs)
for p in plain:
if p in skip:
yield p
else:
yield next(gen)
return func
This uses next to get the return values from the generator function f, but interspersed with other values. This allows some values to be passed through the generator, but others to be yielded straight out.
There are many places where we can use next, for eg.
Drop the header while reading a file.
with open(filename) as f:
next(f) #drop the first line
#now do something with rest of the lines
Iterator based implementation of zip(seq, seq[1:])(from pairwise recipe iterools):
from itertools import tee, izip
it1, it2 = tee(seq)
next(it2)
izip(it1, it2)
Get the first item that satisfies a condition:
next(x for x in seq if x % 100)
Creating a dictionary using adjacent items as key-value:
>>> it = iter(['a', 1, 'b', 2, 'c', '3'])
>>> {k: next(it) for k in it}
{'a': 1, 'c': '3', 'b': 2}
next is useful in many different ways, even outside of a for-loop. For example, if you have an iterable of objects and you want the first that meets a condition, you can give it a generator expression like so:
>>> lst = [1, 2, 'a', 'b']
>>> # Get the first item in lst that is a string
>>> next(x for x in lst if isinstance(x, str))
'a'
>>> # Get the fist item in lst that != 1
>>> lst = [1, 1, 1, 2, 1, 1, 3]
>>> next(x for x in lst if x != 1)
2
>>>
Related
I want to create a function that take a lsit as argument, for example:
list = ['a','b','a','d','e','f','a','b','g','b']
and returns a specific number of list elements ( i chose the number) such that no number occurs twice. For example if i chose 3:
new_list = ['a','b','d']
I tried the following:
def func(j, list):
new_list=[]
for i in list:
while(len(new_list)<j):
for k in new_list:
if i != k:
new_list.append(i)
return new_list
But the function went through infinite loop.
def func(j, mylist):
# dedup, preserving order (dict is insertion-ordered as a language guarantee as of 3.7):
deduped = list(dict.fromkeys(mylist))
# Slice off all but the part you care about:
return deduped[:j]
If performance for large inputs is a concern, that's suboptimal (it processes the whole input even if j unique elements are found in first j indices out of an input where j is much smaller than the input), so the more complicated solution can be used for maximum efficiency. First, copy the itertools unique_everseen recipe:
from itertools import filterfalse, islice # At top of file, filterfalse for recipe, islice for your function
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
now wrap it with islice to only pull off as many elements as required and exiting immediately once you have them (without processing the rest of the input at all):
def func(j, mylist): # Note: Renamed list argument to mylist to avoid shadowing built-in
return list(islice(unique_everseen(mylist), j))
Try this.
lst = ['a','b','a','d','e','f','a','b','g','b']
j = 3
def func(j,list_):
new_lst = []
for a in list_:
if a not in new_lst:
new_lst.append(a)
return new_lst[:j]
print(func(j,lst)) # ['a', 'b', 'd']
I don't know why someone does not post a numpy.unique solution
Here is memory efficient way(I think š).
import numpy as np
lst = ['a','b','a','d','e','f','a','b','g','b']
def func(j,list_):
return np.unique(list_).tolist()[:j]
print(func(3,lst)) # ['a', 'b', 'd']
list is a reserved word in python.
If order of the elements is not a concern then
def func(j, user_list):
return list(set(user_list))[:j]
it's bad practice to use "list" as variable name
you can solve the problem by just using the Counter lib in python
from collections import Counter
a=['a','b','a','d','e','f','a','b','g','b']
b = list(Counter(a))
print(b[:3])
so your function will be something like that
def unique_slice(list_in, elements):
new_list = list(Counter(list_in))
print("New list: {}".format(new_list))
if int(elements) <= len(new_list):
return new_list[:elements]
return new_list
hope it solves your question
As others have said you should not Shadow built-in name 'list'. Because that could lead to many issues. This is a simple problem where you should add to a new list and check if the element was already added.
The [:] operator in python lets you separate the list along an index.
>>>l = [1, 2, 3, 4]
>>>l[:1]
[1]
>>>l[1:]
[2, 3, 4]
lst = ['a', 'b', 'a', 'd', 'e', 'f', 'a', 'b', 'g', 'b']
def func(number, _list):
out = []
for a in _list:
if a not in out:
out.append(a)
return out[:number]
print(func(4, lst)) # ['a', 'b', 'd', 'e']
I am trying to create a generator function that loops over an iterable sequence while eliminating duplicates and then returns each result in order one at a time (not as a set or list), but I am having difficulty getting it to work. I have found similar questions here, but the responses pretty uniformly result in a list being produced.
I would like the output to be something like:
>>> next(i)
2
>>> next(i)
8
>>> next(i)
4....
I was able to write it as a regular function that produces a list:
def unique(series):
new_series = []
for i in series:
if i not in new_series:
new_series.append(i)
return new_series
series = ([2,8,4,5,5,6,6,6,2,1])
print(unique(series))
I then tried rewriting it as a generator function by eliminating the lines that create a blank list and that append to that list, and then using "yield" instead of "return"; but Iām not getting it to work:
def unique(series):
for i in series:
if i not in new_series:
yield new_series
I don't know if I'm leaving something out or putting too much in. Thank you for any assistance.
Well, to put it simply, you need something to "remember" the values you find. In your first function you were using the new list itself, but in the second one you don't have it, so it fails. You can use a set() for this purpose.
def unique(series):
seen = set()
for i in series:
if i not in seen:
seen.add(i)
yield i
Also, yield should "yield" a single value at once, not the entire new list.
To print out the elements, you'll have to iterate on the generator. Simply doing print(unique([1, 2, 3])) will print the resulting generator object.
>>> print(unique([1, 1, 2, 3]))
<generator object unique at 0x1023bda98>
>>> print(*unique([1, 1, 2, 3]))
1 2 3
>>> for x in unique([1, 1, 2, 3]):
print(x)
1
2
3
Note: * in the second example is the iterable unpack operator.
Try this:
def unique(series):
new_se = []
for i in series:
if i not in new_se:
new_se.append(i)
new_se = list(dict.fromkeys(new_se)) # this will remove duplicates
return new_se
series = [2,8,4,5,5,6,6,6,2,1]
print(unique(series))
Can someone please explain the groupby operation and the lambda function being used on this SO post?
key=lambda k, line=count(): next(line) // chunk
import tempfile
from itertools import groupby, count
temp_dir = tempfile.mkdtemp()
def tempfile_split(filename, temp_dir, chunk=4000000):
with open(filename, 'r') as datafile:
# The itertools.groupby() function takes a sequence and a key function,
# and returns an iterator that generates pairs.
# Each pair contains the result of key_function(each item) and
# another iterator containing all the items that shared that key result.
groups = groupby(datafile, key=lambda k, line=count(): next(line) // chunk)
for k, group in groups:
print(key, list(group))
output_name = os.path.normpath(os.path.join(temp_dir + os.sep, "tempfile_%s.tmp" % k))
for line in group:
with open(output_name, 'a') as outfile:
outfile.write(line)
Edit: It took me a while to wrap my head around the lambda function used with groupby. I don't think I understood either of them very well.
Martijn explained it really well, however I have a follow up question. Why is line=count() passed as an argument to the lambda function every time? I tried assigning the variable line to count() just once, outside the function.
line = count()
groups = groupby(datafile, key=lambda k, line: next(line) // chunk)
and it resulted in TypeError: <lambda>() missing 1 required positional argument: 'line'
Also, calling next on count() directly within the lambda expression, resulted in all the lines in the input file getting bunched together i.e a single key was generated by the groupby function.
groups = groupby(datafile, key=lambda k: next(count()) // chunk)
I'm learning Python on my own, so any help or pointers to reference materials /PyCon talks are much appreciated. Anything really!
itertools.count() is an infinite iterator of increasing integer numbers.
The lambda stores an instance as a keyword argument, so every time the lambda is called the local variable line references that object. next() advances an iterator, retrieving the next value:
>>> from itertools import count
>>> line = count()
>>> next(line)
0
>>> next(line)
1
>>> next(line)
2
>>> next(line)
3
So next(line) retrieves the next count in the sequence, and divides that value by chunk (taking only the integer portion of the division). The k argument is ignored.
Because integer division is used, the result of the lambda is going to be chunk repeats of an increasing integer; if chunk is 3, then you get 0 three times, then 1 three times, then 2 three times, etc:
>>> chunk = 3
>>> l = lambda k, line=count(): next(line) // chunk
>>> [l('ignored') for _ in range(10)]
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
>>> chunk = 4
>>> l = lambda k, line=count(): next(line) // chunk
>>> [l('ignored') for _ in range(10)]
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2]
It is this resulting value that groupby() groups the datafile iterable by, producing groups of chunk lines.
When looping over the groupby() results with for k, group in groups:, k is the number that the lambda produced and the results are grouped by; the for loop in the code ignores this. group is an iterable of lines from datafile, and will always contain chunk lines.
In response to the updated OP...
The itertools.groupby iterator offers ways to group items together, giving more control when a key function is defined. See more on how itertools.groupby() works.
The lambda function, is a functional, shorthand way of writing a regular function. For example:
>>> keyfunc = lambda k, line=count(): next(line)
Is equivalent to this regular function:
>>> def keyfunc(k, line=count()):
... return next(line) // chunk
Keywords: iterator, functional programming, anonymous functions
Details
Why is line=count() passed as an argument to the lambda function every time?
The reason is the same for normal functions. The line parameter by itself is a positional argument. When a value is assigned, it becomes a default keyword argument. See more on positional vs. keyword arguments.
You can still define line=count() outside the function by assigning the result to a keyword argument:
>>> chunk = 3
>>> line=count()
>>> keyfunc = lambda k, line=line: next(line) // chunk # make `line` a keyword arg
>>> [keyfunc("") for _ in range(10)]
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3]
>>> [keyfunc("") for _ in range(10)]
[3, 3, 4, 4, 4, 5, 5, 5, 6, 6] # note `count()` continues
... calling next on count() directly within the lambda expression, resulted in all the lines in the input file getting bunched together i.e a single key was generated by the groupby function ...
Try the following experiment with count():
>>> numbers = count()
>>> next(numbers)
0
>>> next(numbers)
1
>>> next(numbers)
2
As expected, you will notice next() is yielding the next item from the count() iterator. (A similar function is called iterating an iterator with a for loop). What is unique here is that generators do not reset - next() simply gives the next item in the line (as seen in the former example).
#Martijn Pieters pointed out next(line) // chunk computes a floored integer that is used by groupby to identify each line (bunching similar lines with similar ids together), which is also expected. See the references for more on how groupby works.
References
Docs for itertools.count
Docs for itertools.groupby()
Beazley, D. and Jones, B. "7.7 Capturing Variables in Anonymous Functions," Python Cookbook, 3rd ed. O'Reilly. 2013.
I'm trying to figure out how to delete duplicates from 2D list. Let's say for example:
x= [[1,2], [3,2]]
I want the result:
[1, 2, 3]
in this order.
Actually I don't understand why my code doesn't do that :
def removeDuplicates(listNumbers):
finalList=[]
finalList=[number for numbers in listNumbers for number in numbers if number not in finalList]
return finalList
If I should write it in nested for-loop form it'd look same
def removeDuplicates(listNumbers):
finalList=[]
for numbers in listNumbers:
for number in numbers:
if number not in finalList:
finalList.append(number)
return finalList
"Problem" is that this code runs perfectly. Second problem is that order is important. Thanks
finalList is always an empty list on your list-comprehension even though you think it's appending during that to it, which is not the same exact case as the second code (double for loop).
What I would do instead, is use set:
>>> set(i for sub_l in x for i in sub_l)
{1, 2, 3}
EDIT:
Otherway, if order matters and approaching your try:
>>> final_list = []
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> list(filter(lambda x: f.append(x) if x not in final_list else None, x_flat))
[] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
Or
>>> list(map(lambda x: final_list.append(x) if x not in final_list else None, x_flat))
[None, None, None, None] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
EDIT2:
As mentioned by timgeb, obviously the map & filter will throw away lists that are at the end useless and worse than that, they consume memory. So, I would go with the nested for loop as you did in your last code example, but if you want it with the list comprehension approach than:
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> final_list = []
>>> for number in x_flat:
if number not in final_list:
finalList.append(number)
The expression on the right-hand-side is evalueated first, before assigning the result of this list comprehension to the finalList.
Whereas in your second approach you write to this list all the time between the iterations. That's the difference.
That may be similar to the considerations why the manuals warn about unexpected behaviour when writing to the iterated iterable inside a for loop.
you could use the built-in set()-method to remove duplicates (you have to do flatten() on your list before)
You declare finalList as the empty list first, so
if number not in finalList
will be False all the time.
The right hand side of your comprehension will be evaluated before the assignment takes place.
Iterate over the iterator chain.from_iterable gives you and remove duplicates in the usual way:
>>> from itertools import chain
>>> x=[[1,2],[3,2]]
>>>
>>> seen = set()
>>> result = []
>>> for item in chain.from_iterable(x):
... if item not in seen:
... result.append(item)
... seen.add(item)
...
>>> result
[1, 2, 3]
Further reading: How do you remove duplicates from a list in Python whilst preserving order?
edit:
You don't need the import to flatten the list, you could just use the generator
(item for sublist in x for item in sublist)
instead of chain.from_iterable(x).
There is no way in Python to refer to the current comprehesion. In fact, if you remove the line finalList=[], which does nothing, you would get an error.
You can do it in two steps:
finalList = [number for numbers in listNumbers for number in numbers]
finalList = list(set(finalList))
or if you want a one-liner:
finalList = list(set(number for numbers in listNumbers for number in numbers))
This question already has answers here:
Finding the index of an item in a list
(43 answers)
Closed 9 years ago.
What is a good way to find the index of an element in a list in Python?
Note that the list may not be sorted.
Is there a way to specify what comparison operator to use?
From Dive Into Python:
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
>>> li.index("example")
5
If you just want to find out if an element is contained in the list or not:
>>> li
['a', 'b', 'new', 'mpilgrim', 'z', 'example', 'new', 'two', 'elements']
>>> 'example' in li
True
>>> 'damn' in li
False
The best way is probably to use the list method .index.
For the objects in the list, you can do something like:
def __eq__(self, other):
return self.Value == other.Value
with any special processing you need.
You can also use a for/in statement with enumerate(arr)
Example of finding the index of an item that has value > 100.
for index, item in enumerate(arr):
if item > 100:
return index, item
Source
Here is another way using list comprehension (some people might find it debatable). It is very approachable for simple tests, e.g. comparisons on object attributes (which I need a lot):
el = [x for x in mylist if x.attr == "foo"][0]
Of course this assumes the existence (and, actually, uniqueness) of a suitable element in the list.
assuming you want to find a value in a numpy array,
I guess something like this might work:
Numpy.where(arr=="value")[0]
There is the index method, i = array.index(value), but I don't think you can specify a custom comparison operator. It wouldn't be hard to write your own function to do so, though:
def custom_index(array, compare_function):
for i, v in enumerate(array):
if compare_function(v):
return i
I use function for returning index for the matching element (Python 2.6):
def index(l, f):
return next((i for i in xrange(len(l)) if f(l[i])), None)
Then use it via lambda function for retrieving needed element by any required equation e.g. by using element name.
element = mylist[index(mylist, lambda item: item["name"] == "my name")]
If i need to use it in several places in my code i just define specific find function e.g. for finding element by name:
def find_name(l, name):
return l[index(l, lambda item: item["name"] == name)]
And then it is quite easy and readable:
element = find_name(mylist,"my name")
The index method of a list will do this for you. If you want to guarantee order, sort the list first using sorted(). Sorted accepts a cmp or key parameter to dictate how the sorting will happen:
a = [5, 4, 3]
print sorted(a).index(5)
Or:
a = ['one', 'aardvark', 'a']
print sorted(a, key=len).index('a')
how's this one?
def global_index(lst, test):
return ( pair[0] for pair in zip(range(len(lst)), lst) if test(pair[1]) )
Usage:
>>> global_index([1, 2, 3, 4, 5, 6], lambda x: x>3)
<generator object <genexpr> at ...>
>>> list(_)
[3, 4, 5]
I found this by adapting some tutos. Thanks to google, and to all of you ;)
def findall(L, test):
i=0
indices = []
while(True):
try:
# next value in list passing the test
nextvalue = filter(test, L[i:])[0]
# add index of this value in the index list,
# by searching the value in L[i:]
indices.append(L.index(nextvalue, i))
# iterate i, that is the next index from where to search
i=indices[-1]+1
#when there is no further "good value", filter returns [],
# hence there is an out of range exeption
except IndexError:
return indices
A very simple use:
a = [0,0,2,1]
ind = findall(a, lambda x:x>0))
[2, 3]
P.S. scuse my english