Check if iterator is sorted

Check if iterator is sorted - python

I have an iterator it that I'm assuming already sorted but I would like to raise an exception if it isn't.
Data from iterator is not in memory so I do not want to use sorted() builtin because AFAIK it puts the whole iterator in a list.
The solution I'm using now is to wrap the iterator in a generator function like this:
def checkSorted(it):
prev_v = it.next()
yield prev_v
for v in it:
if v >= prev_v:
yield v
prev_v = v
else:
raise ValueError("Iterator is not sorted")
So that I can use it like this:
myconsumer(checkSorted(it))
Does someone know if there are better solutions?
I know that my solution works but it seems quite strange (at least to me) writing a module on my own to accomplish such a trivial task. I'm looking for a simple one liner or builtin solution (If it exists)

Basically your solution is almost as elegant as it gets (you could of course put it in an utility module if you find it generally useful). You could if you wanted it use an infinity object to cut the code down a bit, but then you have to include a class definition as well which grows the code again (unless you inline the class definition):
def checkSorted(it):
prev = type("", (), {"__lt__": lambda a, b: False})()
for x in it:
if prev < x:
raise ValueError("Not sorted")
prev = x
yield x
The first line is using the type to first create a class and then instantiate it. Objects of this class compares less than to anything (infinity object).
The problem with doing a one-liner is that you have to deal with three constructs: you have to update state (assignment), throw an exception and doing a loop. You could easily perform these by using statements, but making them into a oneliner will mean that you will have to try to put the statements on the same line - which in turn will result in problem with the loop and if-constructs.
If you want to put the whole thing into an expression you will have to use dirty tricks to do these, the assignment and looping the iterutils can provide and the throwing can be done by using the throw method in a generator (which can be provided in an expression too):
imap( lambda i, j: (i >= j and j or (_ for _ in ()).throw(ValueError("Not sorted"))), *(lambda pre, it: (chain([type("", (), {"__ge__": lambda a, b: True})()], pre), it))(*tee(it)))
The last it is the iterator you want to check and the expression evaluates to a checked iterator. I agree it's not good looking and not obvious what it does, but you asked for it (and I don't think you wanted it).

As an alternative i suggest to use itertools.izip_longest (and zip_longest in python 3 )to create a generator contains consecutive pairs :
You can use tee to create 2 independent iterators from a first iterable.
from itertools import izip_longest,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
for i,j in izip_longest(pre,it):
if j:
if i >= j:
yield i
else:
raise ValueError("Iterator is not sorted")
else :
yield i
Demo :
it=iter([5,4,3,2,1])
print list(checkSorted(it))
[5, 4, 3, 2, 1]
it=iter([5,4,3,2,3])
print list(checkSorted(it))
Traceback (most recent call last):
File "/home/bluebird/Desktop/ex2.py", line 19, in <module>
print list(checkSorted(it))
File "/home/bluebird/Desktop/ex2.py", line 10, in checkSorted
raise ValueError("Iterator is not sorted")
ValueError: Iterator is not sorted
Note : Actually I think there is no need to yield the values of your iterable wen you have them already.So as a more elegant way I suggest to use a generator expression within all function and return a bool value :
from itertools import izip,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
return all(i>=j for i,j in izip(pre,it))

Related

Python: be sure only to get one element from lambda back [duplicate]

I have a generator function like the following:
def myfunct():
...
yield result
The usual way to call this function would be:
for r in myfunct():
dostuff(r)
My question, is there a way to get just one element from the generator whenever I like?
For example, I'd like to do something like:
while True:
...
if something:
my_element = pick_just_one_element(myfunct())
dostuff(my_element)
...

Create a generator using
g = myfunct()
Everytime you would like an item, use
next(g)
(or g.next() in Python 2.5 or below).
If the generator exits, it will raise StopIteration. You can either catch this exception if necessary, or use the default argument to next():
next(g, default_value)

For picking just one element of a generator use break in a for statement, or list(itertools.islice(gen, 1))
According to your example (literally) you can do something like:
while True:
...
if something:
for my_element in myfunct():
dostuff(my_element)
break
else:
do_generator_empty()
If you want "get just one element from the [once generated] generator whenever I like" (I suppose 50% thats the original intention, and the most common intention) then:
gen = myfunct()
while True:
...
if something:
for my_element in gen:
dostuff(my_element)
break
else:
do_generator_empty()
This way explicit use of generator.next() can be avoided, and end-of-input handling doesn't require (cryptic) StopIteration exception handling or extra default value comparisons.
The else: of for statement section is only needed if you want do something special in case of end-of-generator.
Note on next() / .next():
In Python3 the .next() method was renamed to .__next__() for good reason: its considered low-level (PEP 3114). Before Python 2.6 the builtin function next() did not exist. And it was even discussed to move next() to the operator module (which would have been wise), because of its rare need and questionable inflation of builtin names.
Using next() without default is still very low-level practice - throwing the cryptic StopIteration like a bolt out of the blue in normal application code openly. And using next() with default sentinel - which best should be the only option for a next() directly in builtins - is limited and often gives reason to odd non-pythonic logic/readablity.
Bottom line: Using next() should be very rare - like using functions of operator module. Using for x in iterator , islice, list(iterator) and other functions accepting an iterator seamlessly is the natural way of using iterators on application level - and quite always possible. next() is low-level, an extra concept, unobvious - as the question of this thread shows. While e.g. using break in for is conventional.

Generator is a function that produces an iterator. Therefore, once you have iterator instance, use next() to fetch the next item from the iterator.
As an example, use next() function to fetch the first item, and later use for in to process remaining items:
# create new instance of iterator by calling a generator function
items = generator_function()
# fetch and print first item
first = next(items)
print('first item:', first)
# process remaining items:
for item in items:
print('next item:', item)

You can pick specific items using destructuring, e.g.:
>>> first, *middle, last = range(10)
>>> first
0
>>> middle
[1, 2, 3, 4, 5, 6, 7, 8]
>>> last
9
Note that this is going to consume your generator, so while highly readable, it is less efficient than something like next(), and ruinous on infinite generators:
>>> first, *rest = itertools.count()
🔥🔥🔥

I don't believe there's a convenient way to retrieve an arbitrary value from a generator. The generator will provide a next() method to traverse itself, but the full sequence is not produced immediately to save memory. That's the functional difference between a generator and a list.

generator = myfunct()
while True:
my_element = generator.next()
make sure to catch the exception thrown after the last element is taken

For those of you scanning through these answers for a complete working example for Python3... well here ya go:
def numgen():
x = 1000
while True:
x += 1
yield x
nums = numgen() # because it must be the _same_ generator
for n in range(3):
numnext = next(nums)
print(numnext)
This outputs:
1001
1002
1003

I believe the only way is to get a list from the iterator then get the element you want from that list.
l = list(myfunct())
l[4]

how to create own map() function in python

I am trying to create the built-in map() function in python.
Here is may attempt:
def mapper(func, *sequences):
if len(sequences) > 1:
while True:
list.append(func(sequences[0][0],sequences[0][0],))
return list
return list
But i really stuck, because if the user gives e.g 100 arguments how do i deal with those

You use the asterisk * when you call the function:
def mapper(func, *sequences):
result = []
if len(sequences) > 0:
minl = min(len(subseq) for subseq in sequences)
for i in range(minl):
result.append(func(*[subseq[i] for subseq in sequences]))
return result
This produces:
>>> import operator
>>> mapper(operator.add, [1,2,4], [3,6,9])
[4, 8, 13]
By using the asterisk, we unpack the iterable as separate parameters in the function call.
Note that this is still not fully equivalent, since:
the sequences should be iterables, not per se lists, so we can not always index; and
the result of a map in python-3.x is an iterable as well, so not a list.
A more python-3.x-like map function would be:
def mapper(func, *sequences):
if not sequences:
raise TypeError('Mapper should have at least two parameters')
iters = [iter(seq) for seq in sequences]
while True:
yield func(*[next(it) for it in iters])
Note however that most Python interpreters will implement map closer to the interpreter than Python code, so it is definitely more efficient to use the builtin map, than writing your own.
N.B.: it is better not to use variable names like list, set, dict, etc. since these will override (here locally) the reference to the list type. As a result a call like list(some_iterable) will no longer work.

Separating the part of combining of the sequence or sequences logic is much more easier to read and understand.
def mapper(func, *args):
for i in zip(*args):
yield func(*i)
Here we are using Python inbuilt zip
if you want to replace it entirely with your own implementation replace with zip with the below zipper function
def zipper(*args):
for i in range(len(args[0])):
index_elements = []
for arg in args:
index_elements.append(arg[i])
yield positional_elements

Python fluent filter, map, etc

I love python. However, one thing that bugs me a bit is that I don't know how to format functional activities in a fluid manner like a can in javascript.
example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?
var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
console.log(x); // prints it for fun
return x;
})
.reduce(function(num_set, val) {
num_set[val] = true;
}, {});
I'd like to know if there are fluid options? Maybe a library.
In general, I've been using list comprehensions for most things but it's a real problem if I want to print
e.g., How can I print every even number between 1 - 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn't). It's also a bit annoying that a list is constructed and returned. I'd rather just for loop.

Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:
from infixpy import *
a = (Seq(range(1,51))
.map(lambda x: x * 4)
.filter(lambda x: x <= 170)
.filter(lambda x: len(str(x)) == 2)
.filter( lambda x: x % 20 ==0)
.enumerate() Ï
.map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
.mkstring(' .. '))
print(a)
pip3 install infixpy
Older
I am looking now at an answer that strikes closer to the heart of the question:
fluentpy https://pypi.org/project/fluentpy/ :
Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:
import fluentpy as _
(
_(range(1,50+1))
.map(_.each * 4)
.filter(_.each <= 170)
.filter(lambda each: len(str(each))==2)
.filter(lambda each: each % 20 == 0)
.enumerate()
.map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
.join(',')
.print()
)
And it works fine:
Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80
I am just now trying this out. It will be a very good day today if this were working as it is shown above.
Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:
python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
Here is it in action on command line:
$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye"
| python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
hello world line1
line 2
line 3
goodbye
There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).

Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.
For example in this case, take care of the logging with a helper function:
def echo(x):
print(x)
return x
Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:
In [118]: d={echo(x):True for x in s if x%2==0}
2
4
In [119]: d
Out[119]: {2: True, 4: True}
or to add these values to an existing dictionary, use update.
new_set.update({echo(x):True for x in s if x%2==0})
another way to write this is with an intermediate generator:
{y:True for y in (echo(x) for x in s if x%2==0)}
Or combine the echo and filter in one generator
def even(s):
for x in s:
if x%2==0:
print(x)
yield(x)
followed by a dict comp using it:
{y:True for y in even(s)}

Comprehensions are the fluent python way of handling filter/map operations.
Your code would be something like:
def evenize(input_list):
return [x for x in input_list if x % 2 == 0]
Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.
Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:
List comprehension: [x for x in range(3)] == [0, 1, 2]
Set comprehension: {x for x in range(3)} == {0, 1, 2}
Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}
Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>
With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.
For instance, if you try to do the following, even with python3 semantics for range:
for number in [x**2 for x in range(10000000000000000)]:
print(number)
you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:
for number in (x**2 for x in range(1e20)):
print(number)
and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls
GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.
(Note the GENEXP_ITERATOR object is not visible at the code level)
next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.
This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.
def pipeline(filenames):
basepath = path.path('/usr/share/stories')
fullpaths = (basepath / fn for fn in filenames)
realfiles = (fn for fn in fullpaths if os.path.exists(fn))
openfiles = (open(fn) for fn in realfiles)
def read_and_close(file):
output = file.read(100)
file.close()
return output
prefixes = (read_and_close(file) for file in openfiles)
noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
return {prefix[:32]: prefix for prefix in prefixes}
At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.

The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.

Arguments against doing this notwithstanding, here is a translation into Python of your JS code.
from __future__ import print_function
from functools import reduce
def print_and_return(x):
print(x)
return x
def isodd(x):
return x % 2 == 0
def add_to_dict(d, x):
d[x] = True
return d
even_set = list(reduce(add_to_dict,
map(print_and_return,
filter(isodd, [1, 2, 3, 4, 5])), {}))
It should work on both Python 2 and Python 3.

There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension and it's prod ready and maintained on PyPi.
Your code would be rewritten in this form:
from pyxtension.strams import stream
def console_log(x):
print(x)
return x
even_set = stream([1,2,3,4,5])\
.filter(lambda x:x%2 === 0)\
.map(console_log)\
.reduce(lambda num_set, val: num_set.__setitem__(val,True))
Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.

We can use Pyterator for this (disclaimer: I am the author).
We define the function that prints and returns (which I believe you can omit completely however).
def print_and_return(x):
print(x)
return x
then
from pyterator import iterate
even_dict = (
iterate([1,2,3,4,5])
.filter(lambda x: x%2==0)
.map(print_and_return)
.map(lambda x: (x, True))
.to_dict()
)
# {2: True, 4: True}
where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.

How to remove and return element in python list

In python you can do list.pop(i) which removes and returns the element in index i, but is there a built in function like list.remove(e) where it removes and returns the first element equal to e?
Thanks

I mean, there is list.remove, yes.
>>> x = [1,2,3]
>>> x.remove(1)
>>> x
[2, 3]
I don't know why you need it to return the removed element, though. You've already passed it to list.remove, so you know what it is... I guess if you've overloaded __eq__ on the objects in the list so that it doesn't actually correspond to some reasonable notion of equality, you could have problems. But don't do that, because that would be terrible.
If you have done that terrible thing, it's not difficult to roll your own function that does this:
def remove_and_return(lst, item):
return lst.pop(lst.index(item))

Is there a builtin? No. Probably because if you already know the element you want to remove, then why bother returning it?1
The best you can do is get the index, and then pop it. Ultimately, this isn't such a big deal -- Chaining 2 O(n) algorithms is still O(n), so you still scale roughly the same ...
def extract(lst, item):
idx = lst.index(item)
return lst.pop(idx)
1Sure, there are pathological cases where the item returned might not be the item you already know... but they aren't important enough to warrant a new method which takes only 3 lines to write yourself :-)

Strictly speaking, you would need something like:
def remove(lst, e):
i = lst.index(e)
# error if e not in lst
a = lst[i]
lst.pop(i)
return a
Which would make sense only if e == a is true, but e is a is false, and you really need a instead of e.
In most case, though, I would say that this suggest something suspicious in your code.
A short version would be :
a = lst.pop(lst.index(e))

itertools.takewhile within a generator function - why is it evaluated once only?

I have a text file like this:
11
2
3
4
11
111
Using Python 2.7, I want to turn it into a list of lists of lines, where line breaks divide items in the inner list and empty lines divide items in the outer list. Like so:
[["11","2","3","4"],["11"],["111"]]
And for this purpose, I wrote a generator function that would yield the inner lists one at a time once passed an open file object:
def readParag(fileObj):
currentParag = []
for line in fileObj:
stripped = line.rstrip()
if len(stripped) > 0: currentParag.append(stripped)
elif len(currentParag) > 0:
yield currentParag
currentParag = []
That works fine, and I can call it from within a list comprehension, producing the desired result. However, it subsequently occurred to me that I might be able to do the same thing more concisely using itertools.takewhile (with a view to rewriting the generator function as a generator expression, but we'll leave that for now). This is what I tried:
from itertools import takewhile
def readParag(fileObj):
yield [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
In this case, the resulting generator yields only one result (the expected first one, i.e. ["11","2","3","4"]). I had hoped that calling its next method again would cause it to evaluate takewhile(lambda line: line != "\n", fileObj) again on the remainder of the file, thus leading it to yield another list. But no: I got a StopIteration instead. So I surmised that the take while expression was being evaluated once only, at the time when the generator object was created, and not each time I called the resultant generator object's next method.
This supposition made me wonder what would happen if I called the generator function again. The result was that it created a new generator object that also yielded a single result (the expected second one, i.e. ["11"]) before throwing a StopIteration back at me. So in fact, writing this as a generator function effectively gives the same result as if I'd written it as an ordinary function and returned the list instead of yielding it.
I guess I could solve this problem by creating my own class to use instead of a generator (as in John Millikin's answer to this question). But the point is that I was hoping to write something more concise than my original generator function (possibly even a generator expression). Can somebody tell me what I'm doing wrong, and how to get it right?

What you're trying to do is a perfect job for groupby:
from itertools import groupby
def read_parag(filename):
with open(filename) as f:
for k,g in groupby((line.strip() for line in f), bool):
if k:
yield list(g)
which will give:
>>> list(read_parag('myfile.txt')
[['11', '2', '3', '4'], ['11'], ['111']]
Or in one line:
[list(g) for k,g in groupby((line.strip() for line in open('myfile.txt')), bool) if k]

The other answers do a good job of explaining what is going on here, you need to call takewhile multiple times which your current generator does not do. Here is a fairly concise way to get the behavior you want using the built-in iter() function with a sentinel argument:
from itertools import takewhile
def readParag(fileObj):
cond = lambda line: line != "\n"
return iter(lambda: [ln.rstrip() for ln in takewhile(cond, fileObj)], [])

This is exactly how .takewhile() should behave. While the condition is true, it'll return elements from the underlying iterable, and as soon as it's false, it permamently switches to the iteration-done stage.
Note that this is how iterators must behave; raising StopIteration means just that, stop iterating over me, I am done.
From the python glossary on "iterator":
An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.
You could combine takewhile with tee to see if there are any more results in the next batch:
import itertools
def readParag(filename):
with open(filename) as f:
while True:
paras = itertools.takewhile(lambda l: l.strip(), f)
test, paras = itertools.tee(paras)
test.next() # raises StopIteration when the file is done
yield (l.strip() for l in paras)
This yields generators, so each item yielded is itself a generator. You do need to consume all elements in these generators for this to continue to work; the same is true for the groupby method listed in another answer.

If the file contents fit into memory, there is a much easier way to get the groups separated by blank lines:
with open("filename") as f:
groups = [group.split() for group in f.read().split("\n\n")]
This approach can be made more robust by using re.split() instead of str.split() and by filtering out potential empty groups resulting from four or more consecutive line breaks.

This is the documented behavior of takewhile. It takes while the condition is true. It doesn't start up again if the condition later becomes true again.
The simple fix is to make your function just call takewhile in a loop, stopping when takewhile has nothing more to return (i.e., at the end of the file):
def readParag(fileObj):
while True:
nextList = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
if not nextList:
break
yield nextList

You can call takewhile multiple times:
>>> def readParagGenerator(fileObj):
... group = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
... while len(group) > 0:
... yield group
... group = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
...
>>> list(readParagGenerator(StringIO(F)))
[['11', '2', '3', '4'], ['11'], ['111']]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Check if iterator is sorted - python

Related

Python: be sure only to get one element from lambda back [duplicate]

how to create own map() function in python

Python fluent filter, map, etc

How to remove and return element in python list

itertools.takewhile within a generator function - why is it evaluated once only?

Categories

Resources