Python fluent filter, map, etc

Python fluent filter, map, etc - python

I love python. However, one thing that bugs me a bit is that I don't know how to format functional activities in a fluid manner like a can in javascript.
example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?
var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
console.log(x); // prints it for fun
return x;
})
.reduce(function(num_set, val) {
num_set[val] = true;
}, {});
I'd like to know if there are fluid options? Maybe a library.
In general, I've been using list comprehensions for most things but it's a real problem if I want to print
e.g., How can I print every even number between 1 - 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn't). It's also a bit annoying that a list is constructed and returned. I'd rather just for loop.

Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:
from infixpy import *
a = (Seq(range(1,51))
.map(lambda x: x * 4)
.filter(lambda x: x <= 170)
.filter(lambda x: len(str(x)) == 2)
.filter( lambda x: x % 20 ==0)
.enumerate() Ï
.map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
.mkstring(' .. '))
print(a)
pip3 install infixpy
Older
I am looking now at an answer that strikes closer to the heart of the question:
fluentpy https://pypi.org/project/fluentpy/ :
Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:
import fluentpy as _
(
_(range(1,50+1))
.map(_.each * 4)
.filter(_.each <= 170)
.filter(lambda each: len(str(each))==2)
.filter(lambda each: each % 20 == 0)
.enumerate()
.map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
.join(',')
.print()
)
And it works fine:
Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80
I am just now trying this out. It will be a very good day today if this were working as it is shown above.
Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:
python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
Here is it in action on command line:
$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye"
| python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
hello world line1
line 2
line 3
goodbye
There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).

Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.
For example in this case, take care of the logging with a helper function:
def echo(x):
print(x)
return x
Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:
In [118]: d={echo(x):True for x in s if x%2==0}
2
4
In [119]: d
Out[119]: {2: True, 4: True}
or to add these values to an existing dictionary, use update.
new_set.update({echo(x):True for x in s if x%2==0})
another way to write this is with an intermediate generator:
{y:True for y in (echo(x) for x in s if x%2==0)}
Or combine the echo and filter in one generator
def even(s):
for x in s:
if x%2==0:
print(x)
yield(x)
followed by a dict comp using it:
{y:True for y in even(s)}

Comprehensions are the fluent python way of handling filter/map operations.
Your code would be something like:
def evenize(input_list):
return [x for x in input_list if x % 2 == 0]
Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.
Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:
List comprehension: [x for x in range(3)] == [0, 1, 2]
Set comprehension: {x for x in range(3)} == {0, 1, 2}
Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}
Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>
With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.
For instance, if you try to do the following, even with python3 semantics for range:
for number in [x**2 for x in range(10000000000000000)]:
print(number)
you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:
for number in (x**2 for x in range(1e20)):
print(number)
and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls
GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.
(Note the GENEXP_ITERATOR object is not visible at the code level)
next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.
This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.
def pipeline(filenames):
basepath = path.path('/usr/share/stories')
fullpaths = (basepath / fn for fn in filenames)
realfiles = (fn for fn in fullpaths if os.path.exists(fn))
openfiles = (open(fn) for fn in realfiles)
def read_and_close(file):
output = file.read(100)
file.close()
return output
prefixes = (read_and_close(file) for file in openfiles)
noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
return {prefix[:32]: prefix for prefix in prefixes}
At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.

The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.

Arguments against doing this notwithstanding, here is a translation into Python of your JS code.
from __future__ import print_function
from functools import reduce
def print_and_return(x):
print(x)
return x
def isodd(x):
return x % 2 == 0
def add_to_dict(d, x):
d[x] = True
return d
even_set = list(reduce(add_to_dict,
map(print_and_return,
filter(isodd, [1, 2, 3, 4, 5])), {}))
It should work on both Python 2 and Python 3.

There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension and it's prod ready and maintained on PyPi.
Your code would be rewritten in this form:
from pyxtension.strams import stream
def console_log(x):
print(x)
return x
even_set = stream([1,2,3,4,5])\
.filter(lambda x:x%2 === 0)\
.map(console_log)\
.reduce(lambda num_set, val: num_set.__setitem__(val,True))
Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.

We can use Pyterator for this (disclaimer: I am the author).
We define the function that prints and returns (which I believe you can omit completely however).
def print_and_return(x):
print(x)
return x
then
from pyterator import iterate
even_dict = (
iterate([1,2,3,4,5])
.filter(lambda x: x%2==0)
.map(print_and_return)
.map(lambda x: (x, True))
.to_dict()
)
# {2: True, 4: True}
where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.

Related

Python for element in list matching condition

I have found myself frequently iterating over a subset of a list according to some condition that is only needed for that loop, and would like to know if there is a more efficient way to write this.
Take for example the list:
foo = [1, 2, 3, 4, 5]
If I wanted to build a for loop that iterates through every element greater than 2, I would typically do something like this:
for x in [y for y in foo if y > 2]:
# Do something
However this seems redundant, and isn't extremely readable in my opinion. I don't think it is particularly inefficient, especially when using a generator instead as #iota pointed out below, however I would much rather be able to write something like:
for x in foo if x > 2:
# Do something
Ideally avoiding the need for a second for and a whole other temporary variable. Is there a syntax for this? I use Python3 but I assume any such syntax would likely have a Python2 variant as well.
Note: This is obviously a very simple example that could be better handled by something like range() or sorting & slicing, so assume foo is any arbitrary list that must be filtered by any arbitrary condition

Not quite the syntax you are after but you can also use filter using a lambda:
for x in filter(lambda y: y > 2, foo):
print(x)
Or using a function for readbility sake:
def greaterthantwo(y):
return y > 2
for x in filter(greaterthantwo, foo):
print(x)
filter also has the advantage of creating a generator expression so it doesn't evaluate all the values if you exit the loop early (as opposed to using a new list)

There's filter as discussed in #salparadise but you can also use generators:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value:
yield el
for x in filterbyvalue(foo,2):
#Do something
It may look bigger but it is useful when you have to do something complex instead of filtering, it is also performes better than first creating a list comprehension and then looping over it.

I would do like this
For efficient code
data = (x for x in foo if x>2)
print(next(data))
For more readable code
[print(x) for x in foo if x>2]

how to create own map() function in python

I am trying to create the built-in map() function in python.
Here is may attempt:
def mapper(func, *sequences):
if len(sequences) > 1:
while True:
list.append(func(sequences[0][0],sequences[0][0],))
return list
return list
But i really stuck, because if the user gives e.g 100 arguments how do i deal with those

You use the asterisk * when you call the function:
def mapper(func, *sequences):
result = []
if len(sequences) > 0:
minl = min(len(subseq) for subseq in sequences)
for i in range(minl):
result.append(func(*[subseq[i] for subseq in sequences]))
return result
This produces:
>>> import operator
>>> mapper(operator.add, [1,2,4], [3,6,9])
[4, 8, 13]
By using the asterisk, we unpack the iterable as separate parameters in the function call.
Note that this is still not fully equivalent, since:
the sequences should be iterables, not per se lists, so we can not always index; and
the result of a map in python-3.x is an iterable as well, so not a list.
A more python-3.x-like map function would be:
def mapper(func, *sequences):
if not sequences:
raise TypeError('Mapper should have at least two parameters')
iters = [iter(seq) for seq in sequences]
while True:
yield func(*[next(it) for it in iters])
Note however that most Python interpreters will implement map closer to the interpreter than Python code, so it is definitely more efficient to use the builtin map, than writing your own.
N.B.: it is better not to use variable names like list, set, dict, etc. since these will override (here locally) the reference to the list type. As a result a call like list(some_iterable) will no longer work.

Separating the part of combining of the sequence or sequences logic is much more easier to read and understand.
def mapper(func, *args):
for i in zip(*args):
yield func(*i)
Here we are using Python inbuilt zip
if you want to replace it entirely with your own implementation replace with zip with the below zipper function
def zipper(*args):
for i in range(len(args[0])):
index_elements = []
for arg in args:
index_elements.append(arg[i])
yield positional_elements

Python generator conflicting with list comprehension

I've been messing around in Python with generator functions. I want to write a function that took a generator whose values were tuples, and returns a list of generators, where each generator's values correspond to one index in the original tuple.
Currently, I have a function which accomplishes this for a hardcoded number of elements in the tuple. Here is my code:
import itertools
def tee_pieces(generator):
copies = itertools.tee(generator)
dropped_copies = [(x[0] for x in copies[0]), (x[1] for x in copies[1])]
# dropped_copies = [(x[i] for x in copies[i]) for i in range(2)]
return dropped_copies
def gen_words():
for i in "Hello, my name is Fred!".split():
yield i
def split_words(words):
for word in words:
yield (word[:len(word)//2], word[len(word)//2:])
def print_words(words):
for word in words:
print(word)
init_words = gen_words()
right_left_words = split_words(init_words)
left_words, right_words = tee_pieces(right_left_words)
print("Left halves:")
print_words(left_words)
print("Right halves:")
print_words(right_words)
This correctly splits the generator, leading to left_words containing the left halves and right_words containing the right halves.
The problem comes when I try to parameterize the number of generators to be created, using the commented out line above. As far as I know it should be equivalent, but when I use that line instead, both left_words and right_words end up containg the right half of the word, giving an output like this:
Left halves:
lo,
y
me
s
ed!
Right halves:
lo,
y
me
s
ed!
Why is this happening? How can I accommplish the desired result, namely parameterize the number of pieces to split the generator into?

This has to do with python's lexical scoping rules. The classical "surprising" example for demonstrating it:
funcs = [ lambda: i for i in range(3) ]
print(funcs[0]())
=> 2 #??
print(funcs[1]())
=> 2 #??
print(funcs[2]())
=> 2
Your examples is another result of the same rules.
To fix, you can "break" the scoping with an additional function:
def make_gen(i):
return (x[i] for x in copies[i])
dropped_copies = [make_gen(i) for i in range(2)]
This binds the the value of i to the specific value passed to a specific call to make_gen, which achieves the desired behavior. Without it, it is bound the "the current value of the variable named i", which ends up as the same value for all generators you create (as there's only one variable named i).

Too add to shx2's answer, you could also substitute the additional function by a lambda:
dropped_copies = [(lambda j: (x[j] for x in copies[j]))(i) for i in range(2)]
This too creates a new scope when the lambda gets called, as is abundantly clear by the different variable name. It would however also work with using the same name, since the parameter inside the lambda shadows the one inside the generator:
dropped_copies = [(lambda i: (x[i] for x in copies[i]))(i) for i in range(2)]
This sort of scoping seems very confusing but becomes more intuitive if you rewrite the generator as a for loop:
dropped_copies = []
for i in range(2):
dropped_copies.append((x[i] for x in copies[i]))
Note that this is broken in the same way the original list comprehension version is.

This is because dropped_copies is a pair of iterators, and when the iterators are evaluated, i has already been incremented to 1.
Try use list comprehension, you can see the difference:
dropped_copies = [[x[i] for x in copies[i]] for i in range(2)]

Three lines to find the greatest product in a string of numbers in Python

Full disclosure: this is for an assignment. Simply getting working code is enough, but doing this in three lines gets me extra credit.
I'm trying to take a 1000-digit string and find the largest product of 5 consecutive digits. You may recognize this as Project Euler's Problem #8.
I've tried a lot of options, but I seem to be stuck. I'm working on figuring out if I can make a lambda statement that will work, but I have no experience with lambda so it's evading me.
Here's what I have so far:
for i in range(1, 996):
max = int(number[i+0]) * int(number[i+1]) * int(number[i+2]) * int(number[i+3]) * int(number[i+4]) if max < int(number[i+0]) * int(number[i+1]) * int(number[i+2]) * int(number[i+3]) * int(number[i+4]) else max = max
return max
That doesn't work and triggers SyntaxError: can't assign to conditional expression.
I don't want outright code, or at least not a complete function, but just a little help understanding how I can move forward.

This isn't legal python:
x = y if z else x = w
This is:
x = y if z else w
So is this:
if z: x = y
By the way, there is a one line solution, that is much shorter and clearer than your three.

= appears twice in your (very long) line. Effectively you have this:
max = something if something else max = max
which Python parses as:
max = (something if something else max) = max
And, indeed, you can't assign to a conditional expression, which is that whole thing in the middle.
You probably didn't intend to have the final = max at the end.

In [15]: def myinput(l,n):
...: for x in l:
...: yield l[x:x+n]
...:
In [16]: max([reduce(lambda a,b:a*b, x) for x in myinput(range(1000),5) if len(x)==5])
Out[16]: 985084775273880L

Like recursive mentioned, there is a simple one-liner solution. It involves using the max function - always bad to name variables after builtins!
In Python 2 it looks something like this:
max(reduce(lambda x, y: x*y, map(int, num[i:i+5])) for i in xrange(996))
In Python 3 reduce was removed, so you have to get it through functools:
from functools import reduce
max(reduce(lambda x, y: x*y, map(int, num[i:i+5])) for i in range(996))

Look into:
the built-in max function to find the greatest number in a sequence,
the built-in map function to apply a function to all elements in a list,
the built-in reduce function to obtain a single object as a result of applying a function that returns a single object repeatedly to two elements in a list,
lambda definitions to be able to define function objects that you can pass to map() and reduce(),
and list comprehensions (and generators, which are very similar) to compose the above functions in a one-liner.

Nicest, efficient way to get result tuple of sequence items fulfilling and not fulfilling condition

(This is professional best practise/ pattern interest, not home work request)
INPUT: any unordered sequence or generator items, function myfilter(item) returns True if filter condition is fulfilled
OUTPUT: (filter_true, filter_false) tuple of sequences of
original type which contain the
elements partitioned according to
filter in original sequence order.
How would you express this without doing double filtering, or should I use double filtering? Maybe filter and loop/generator/list comprehencion with next could be answer?
Should I take out the requirement of keeping the type or just change requirement giving tuple of tuple/generator result, I can not return easily generator for generator input, or can I? (The requirements are self-made)
Here test of best candidate at the moment, offering two streams instead of tuple
import itertools as it
from sympy.ntheory import isprime as myfilter
mylist = xrange(1000001,1010000,2)
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
print 'Hundred primes and non-primes odd numbers'
print '\n'.join( " Prime %i, not prime %i" %
(next(filter_true),next(filter_false))
for i in range(100))

Here is a way to do it which only calls myfilter once for each item and will also work if mylist is a generator
import itertools as it
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)

Let's suppose that your probleme is not memory but cpu, myfilter is heavy and you don't want to iterate and filter the original dataset twice. Here are some single pass ideas :
The simple and versatile version (memoryvorous) :
filter_true=[]
filter_false=[]
for item in items:
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The memory friendly version : (doesn't work with generators (unless used with list(items)))
while items:
item=items.pop()
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The generator friendly version :
while True:
try:
item=next(items)
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
except StopIteration:
break

The easy way (but less efficient) is to tee the iterable and filter both of them:
import itertools
left, right = itertools.tee( mylist )
filter_true = (x for x in left if myfilter(x))
filter_false = (x for x in right if myfilter(x))
This is less efficient than the optimal solution, because myfilter will be called repeatedly for each element. That is, if you have tested an element in left, you shouldn't have to re-test it in right because you already know the answer. If you require this optimisation, it shouldn't be hard to implement: have a look at the implementation of tee for clues. You'll need a deque for each returned iterable which you stock with the elements of the original sequence that should go in it but haven't been asked for yet.

I think your best bet will be constructing two separate generators:
filter_true = (x for x in mylist if myfilter(x))
filter_false = (x for x in mylist if not myfilter(x))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python fluent filter, map, etc - python

The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.

Related

Python for element in list matching condition

how to create own map() function in python

Python generator conflicting with list comprehension

Three lines to find the greatest product in a string of numbers in Python

Nicest, efficient way to get result tuple of sequence items fulfilling and not fulfilling condition

Categories

Resources