I am trying to create the built-in map() function in python.
Here is may attempt:
def mapper(func, *sequences):
if len(sequences) > 1:
while True:
list.append(func(sequences[0][0],sequences[0][0],))
return list
return list
But i really stuck, because if the user gives e.g 100 arguments how do i deal with those
You use the asterisk * when you call the function:
def mapper(func, *sequences):
result = []
if len(sequences) > 0:
minl = min(len(subseq) for subseq in sequences)
for i in range(minl):
result.append(func(*[subseq[i] for subseq in sequences]))
return result
This produces:
>>> import operator
>>> mapper(operator.add, [1,2,4], [3,6,9])
[4, 8, 13]
By using the asterisk, we unpack the iterable as separate parameters in the function call.
Note that this is still not fully equivalent, since:
the sequences should be iterables, not per se lists, so we can not always index; and
the result of a map in python-3.x is an iterable as well, so not a list.
A more python-3.x-like map function would be:
def mapper(func, *sequences):
if not sequences:
raise TypeError('Mapper should have at least two parameters')
iters = [iter(seq) for seq in sequences]
while True:
yield func(*[next(it) for it in iters])
Note however that most Python interpreters will implement map closer to the interpreter than Python code, so it is definitely more efficient to use the builtin map, than writing your own.
N.B.: it is better not to use variable names like list, set, dict, etc. since these will override (here locally) the reference to the list type. As a result a call like list(some_iterable) will no longer work.
Separating the part of combining of the sequence or sequences logic is much more easier to read and understand.
def mapper(func, *args):
for i in zip(*args):
yield func(*i)
Here we are using Python inbuilt zip
if you want to replace it entirely with your own implementation replace with zip with the below zipper function
def zipper(*args):
for i in range(len(args[0])):
index_elements = []
for arg in args:
index_elements.append(arg[i])
yield positional_elements
Related
I love python. However, one thing that bugs me a bit is that I don't know how to format functional activities in a fluid manner like a can in javascript.
example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?
var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
console.log(x); // prints it for fun
return x;
})
.reduce(function(num_set, val) {
num_set[val] = true;
}, {});
I'd like to know if there are fluid options? Maybe a library.
In general, I've been using list comprehensions for most things but it's a real problem if I want to print
e.g., How can I print every even number between 1 - 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn't). It's also a bit annoying that a list is constructed and returned. I'd rather just for loop.
Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:
from infixpy import *
a = (Seq(range(1,51))
.map(lambda x: x * 4)
.filter(lambda x: x <= 170)
.filter(lambda x: len(str(x)) == 2)
.filter( lambda x: x % 20 ==0)
.enumerate() Ï
.map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
.mkstring(' .. '))
print(a)
pip3 install infixpy
Older
I am looking now at an answer that strikes closer to the heart of the question:
fluentpy https://pypi.org/project/fluentpy/ :
Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:
import fluentpy as _
(
_(range(1,50+1))
.map(_.each * 4)
.filter(_.each <= 170)
.filter(lambda each: len(str(each))==2)
.filter(lambda each: each % 20 == 0)
.enumerate()
.map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
.join(',')
.print()
)
And it works fine:
Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80
I am just now trying this out. It will be a very good day today if this were working as it is shown above.
Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:
python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
Here is it in action on command line:
$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye"
| python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
hello world line1
line 2
line 3
goodbye
There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).
Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.
For example in this case, take care of the logging with a helper function:
def echo(x):
print(x)
return x
Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:
In [118]: d={echo(x):True for x in s if x%2==0}
2
4
In [119]: d
Out[119]: {2: True, 4: True}
or to add these values to an existing dictionary, use update.
new_set.update({echo(x):True for x in s if x%2==0})
another way to write this is with an intermediate generator:
{y:True for y in (echo(x) for x in s if x%2==0)}
Or combine the echo and filter in one generator
def even(s):
for x in s:
if x%2==0:
print(x)
yield(x)
followed by a dict comp using it:
{y:True for y in even(s)}
Comprehensions are the fluent python way of handling filter/map operations.
Your code would be something like:
def evenize(input_list):
return [x for x in input_list if x % 2 == 0]
Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.
Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:
List comprehension: [x for x in range(3)] == [0, 1, 2]
Set comprehension: {x for x in range(3)} == {0, 1, 2}
Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}
Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>
With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.
For instance, if you try to do the following, even with python3 semantics for range:
for number in [x**2 for x in range(10000000000000000)]:
print(number)
you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:
for number in (x**2 for x in range(1e20)):
print(number)
and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls
GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.
(Note the GENEXP_ITERATOR object is not visible at the code level)
next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.
This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.
def pipeline(filenames):
basepath = path.path('/usr/share/stories')
fullpaths = (basepath / fn for fn in filenames)
realfiles = (fn for fn in fullpaths if os.path.exists(fn))
openfiles = (open(fn) for fn in realfiles)
def read_and_close(file):
output = file.read(100)
file.close()
return output
prefixes = (read_and_close(file) for file in openfiles)
noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
return {prefix[:32]: prefix for prefix in prefixes}
At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.
The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.
Arguments against doing this notwithstanding, here is a translation into Python of your JS code.
from __future__ import print_function
from functools import reduce
def print_and_return(x):
print(x)
return x
def isodd(x):
return x % 2 == 0
def add_to_dict(d, x):
d[x] = True
return d
even_set = list(reduce(add_to_dict,
map(print_and_return,
filter(isodd, [1, 2, 3, 4, 5])), {}))
It should work on both Python 2 and Python 3.
There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension and it's prod ready and maintained on PyPi.
Your code would be rewritten in this form:
from pyxtension.strams import stream
def console_log(x):
print(x)
return x
even_set = stream([1,2,3,4,5])\
.filter(lambda x:x%2 === 0)\
.map(console_log)\
.reduce(lambda num_set, val: num_set.__setitem__(val,True))
Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.
We can use Pyterator for this (disclaimer: I am the author).
We define the function that prints and returns (which I believe you can omit completely however).
def print_and_return(x):
print(x)
return x
then
from pyterator import iterate
even_dict = (
iterate([1,2,3,4,5])
.filter(lambda x: x%2==0)
.map(print_and_return)
.map(lambda x: (x, True))
.to_dict()
)
# {2: True, 4: True}
where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.
I've been messing around in Python with generator functions. I want to write a function that took a generator whose values were tuples, and returns a list of generators, where each generator's values correspond to one index in the original tuple.
Currently, I have a function which accomplishes this for a hardcoded number of elements in the tuple. Here is my code:
import itertools
def tee_pieces(generator):
copies = itertools.tee(generator)
dropped_copies = [(x[0] for x in copies[0]), (x[1] for x in copies[1])]
# dropped_copies = [(x[i] for x in copies[i]) for i in range(2)]
return dropped_copies
def gen_words():
for i in "Hello, my name is Fred!".split():
yield i
def split_words(words):
for word in words:
yield (word[:len(word)//2], word[len(word)//2:])
def print_words(words):
for word in words:
print(word)
init_words = gen_words()
right_left_words = split_words(init_words)
left_words, right_words = tee_pieces(right_left_words)
print("Left halves:")
print_words(left_words)
print("Right halves:")
print_words(right_words)
This correctly splits the generator, leading to left_words containing the left halves and right_words containing the right halves.
The problem comes when I try to parameterize the number of generators to be created, using the commented out line above. As far as I know it should be equivalent, but when I use that line instead, both left_words and right_words end up containg the right half of the word, giving an output like this:
Left halves:
lo,
y
me
s
ed!
Right halves:
lo,
y
me
s
ed!
Why is this happening? How can I accommplish the desired result, namely parameterize the number of pieces to split the generator into?
This has to do with python's lexical scoping rules. The classical "surprising" example for demonstrating it:
funcs = [ lambda: i for i in range(3) ]
print(funcs[0]())
=> 2 #??
print(funcs[1]())
=> 2 #??
print(funcs[2]())
=> 2
Your examples is another result of the same rules.
To fix, you can "break" the scoping with an additional function:
def make_gen(i):
return (x[i] for x in copies[i])
dropped_copies = [make_gen(i) for i in range(2)]
This binds the the value of i to the specific value passed to a specific call to make_gen, which achieves the desired behavior. Without it, it is bound the "the current value of the variable named i", which ends up as the same value for all generators you create (as there's only one variable named i).
Too add to shx2's answer, you could also substitute the additional function by a lambda:
dropped_copies = [(lambda j: (x[j] for x in copies[j]))(i) for i in range(2)]
This too creates a new scope when the lambda gets called, as is abundantly clear by the different variable name. It would however also work with using the same name, since the parameter inside the lambda shadows the one inside the generator:
dropped_copies = [(lambda i: (x[i] for x in copies[i]))(i) for i in range(2)]
This sort of scoping seems very confusing but becomes more intuitive if you rewrite the generator as a for loop:
dropped_copies = []
for i in range(2):
dropped_copies.append((x[i] for x in copies[i]))
Note that this is broken in the same way the original list comprehension version is.
This is because dropped_copies is a pair of iterators, and when the iterators are evaluated, i has already been incremented to 1.
Try use list comprehension, you can see the difference:
dropped_copies = [[x[i] for x in copies[i]] for i in range(2)]
I have an iterator it that I'm assuming already sorted but I would like to raise an exception if it isn't.
Data from iterator is not in memory so I do not want to use sorted() builtin because AFAIK it puts the whole iterator in a list.
The solution I'm using now is to wrap the iterator in a generator function like this:
def checkSorted(it):
prev_v = it.next()
yield prev_v
for v in it:
if v >= prev_v:
yield v
prev_v = v
else:
raise ValueError("Iterator is not sorted")
So that I can use it like this:
myconsumer(checkSorted(it))
Does someone know if there are better solutions?
I know that my solution works but it seems quite strange (at least to me) writing a module on my own to accomplish such a trivial task. I'm looking for a simple one liner or builtin solution (If it exists)
Basically your solution is almost as elegant as it gets (you could of course put it in an utility module if you find it generally useful). You could if you wanted it use an infinity object to cut the code down a bit, but then you have to include a class definition as well which grows the code again (unless you inline the class definition):
def checkSorted(it):
prev = type("", (), {"__lt__": lambda a, b: False})()
for x in it:
if prev < x:
raise ValueError("Not sorted")
prev = x
yield x
The first line is using the type to first create a class and then instantiate it. Objects of this class compares less than to anything (infinity object).
The problem with doing a one-liner is that you have to deal with three constructs: you have to update state (assignment), throw an exception and doing a loop. You could easily perform these by using statements, but making them into a oneliner will mean that you will have to try to put the statements on the same line - which in turn will result in problem with the loop and if-constructs.
If you want to put the whole thing into an expression you will have to use dirty tricks to do these, the assignment and looping the iterutils can provide and the throwing can be done by using the throw method in a generator (which can be provided in an expression too):
imap( lambda i, j: (i >= j and j or (_ for _ in ()).throw(ValueError("Not sorted"))), *(lambda pre, it: (chain([type("", (), {"__ge__": lambda a, b: True})()], pre), it))(*tee(it)))
The last it is the iterator you want to check and the expression evaluates to a checked iterator. I agree it's not good looking and not obvious what it does, but you asked for it (and I don't think you wanted it).
As an alternative i suggest to use itertools.izip_longest (and zip_longest in python 3 )to create a generator contains consecutive pairs :
You can use tee to create 2 independent iterators from a first iterable.
from itertools import izip_longest,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
for i,j in izip_longest(pre,it):
if j:
if i >= j:
yield i
else:
raise ValueError("Iterator is not sorted")
else :
yield i
Demo :
it=iter([5,4,3,2,1])
print list(checkSorted(it))
[5, 4, 3, 2, 1]
it=iter([5,4,3,2,3])
print list(checkSorted(it))
Traceback (most recent call last):
File "/home/bluebird/Desktop/ex2.py", line 19, in <module>
print list(checkSorted(it))
File "/home/bluebird/Desktop/ex2.py", line 10, in checkSorted
raise ValueError("Iterator is not sorted")
ValueError: Iterator is not sorted
Note : Actually I think there is no need to yield the values of your iterable wen you have them already.So as a more elegant way I suggest to use a generator expression within all function and return a bool value :
from itertools import izip,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
return all(i>=j for i,j in izip(pre,it))
There are the map,reduce,filter functions to make list comprehensions.
What is the difference between passing a xrange argument or a range argument to each of these functions?
For example:
map(someFunc, range(1,11))
OR
map(someFunc,xrange(1,11))
In Python 2, range returns an actual list, while xrange returns a generating function which can be iterated over. Since map only cares that it iterates over a sequence, both are applicable, though xrange uses less memory. Python 3's xrange replaces range, so your only option is to use range to return a generating function. (You can use [i for i in range(10)] to generate the actual range if so desired.)
The modulus operator returns 0, a falsey value, for the even numbers: 2 mod 2 is 0. As such, the even numbers are filtered out.
Actually, the map, reduce, and filter functions are an alternative to list comprehensions. The term "list comprehension" refers to the specific syntactic construct; anything that doesn't look like a list comprehension is necessarily not a list comprehension.
They actually predate list comprehensions, and are borrowed from other languages. But most of those languages have ways of constructing anonymous functions which are more powerful than Python's lambda, so functions such as these are more natural. List comprehensions are considered a more natural fit to Python.
The difference between range and xrange is that range actually constructs a list containing the numbers that form the range, whereas an xrange is an object that knows its endpoints and can iterate over itself without ever actually constructing the full list of values in memory. xrange(1,1000) takes up no more space than xrange(1,5), whereas range(1,1000) generates a 999-element list.
If range() and xrange() were implemented in the Python language, they would look something like this:
def xrange(start, stop=None, step=1):
if stop is None: stop, start = start, 0
i = start
while i < stop:
yield i
i += step
def range(start, stop=None, step=1):
if stop is None: stop, start = start, 0
acc, i = [], start
while i < stop:
acc.append(i)
i += step
return acc
As you can see, range() creates a list and returns it, while xrange() lazily generates the values in a range on demand. This has the advantage that the overhead for creating a list is avoided in xrange(), since it doesn't store the values or create a list object. For most instances, there is no difference in the end result.
One obvious difference is that xrange() doesn't support slicing:
>>> range(10)[2:5]
[2, 3, 4]
>>> xrange(10)[2:5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence index must be integer, not 'slice'
>>>
It does, however support indexing:
>>> xrange(11)[10]
10
As an exercise, i implemented the map function using recursion in python as follows:
#map function that applies the function f on every element of list l and returns the new list
def map(l,f):
if l == []:
return []
else:
return [f(l[0])] + map(l[1:],f)
I am aware of the fact that python does not support tail recursion optimization, but how would i go about writing the same function in a tail recursive manner ?.
Please Help
Thank You
Tail recursion means you must be directly returning the result of a recursive call, with no further manipulation.
The obvious recursion in map is to compute the function on one element of the list, then use a recursive call to process the rest of the list. However, you need to combine the result of processing one element with the result of processing the rest of the list, which requires an operation after the recursive call.
A very common pattern for avoiding that is to move the combination inside the recursive call; you pass in the processed element as an argument, and make it part of map's responsibility to do the combining as well.
def map(l, f):
if l == []:
return []
else:
return map(l[1:], f, f(l[0]))
Now it's tail recursive! But it's also obviously wrong. In the tail recursive call, we're passing 3 arguments, but map only takes two arguments. And then there's the question of what do we do with the 3rd value. In the base case (when the list is empty), it's obvious: return a list containing the information passed in. In the recursive case, we're computing a new value, and we have this extra parameter passed in from the top, and we have the recursive call. The new value and the extra parameter need to be rolled up together to be passed into the recursive call, so that the recursive call can be tail recursive. All of which suggests the following:
def map(l, f):
return map_acc(l, f, [])
def map_acc(l, f, a):
if l == []:
return a
else:
b = a + [f(l[0])]
return map_acc(l[1:], f, b)
Which can be expressed more concisely and Pythonically as other answers have shown, without resorting to a separate helper function. But this shows a general way of turning non-tail-recursive functions into tail recursive functions.
In the above, a is called an accumulator. The general idea is to move the operations you normally do after a recursive call into the next recursive call, by wrapping up the work outer calls have done "so far" and passing that on in an accumulator.
If map can be thought of as meaning "call f on every element of l, and return a list of the results", map_acc can be thought of as meaning "call f on every element of l, returning a list of the results combined with a, a list of results already produced".
This will be an example of the implementation of the built-in function map in tail recursion:
def map(func, ls, res=None):
if res is None:
res = []
if not ls:
return res
res.append(func(ls[0]))
return map(func, ls[1:], res)
But it will not solve the problem of python not having support of TRE which mean that the call stack of each function call will be hold at all the time.
This seems to be tail recursive:
def map(l,f,x=[]):
if l == []:
return x
else:
return map(l[1:],f,x+[f(l[0])])
Or in more compact form:
def map(l,f,x=[]):
return l and map(l[1:],f,x+[f(l[0])]) or x
not really an answer, sorry, but another way to implement map is to write it in terms of a fold. if you try, you'll find that it only comes out "right" with foldr; using foldl gives you a reversed list. unfortunately, foldr isn't tail recursive, while foldl is. this suggests that there's something more "natural" about rev_map (map that returns a reversed list). unfortunately i am not well enough educated to take this any further (i suspect you might be able to generalise this to say that there is no solution that doesn't use an accumulator, but i personally don't see how to make the argument).