Equivalent for inject() in Python? - python

In Ruby, I'm used to using Enumerable#inject for going through a list or other structure and coming back with some conclusion about it. For example,
[1,3,5,7].inject(true) {|allOdd, n| allOdd && n % 2 == 1}
to determine if every element in the array is odd. What would be the appropriate way to accomplish the same thing in Python?

To determine if every element is odd, I'd use all()
def is_odd(x):
return x%2==1
result = all(is_odd(x) for x in [1,3,5,7])
In general, however, Ruby's inject is most like Python's reduce():
result = reduce(lambda x,y: x and y%2==1, [1,3,5,7], True)
all() is preferred in this case because it will be able to escape the loop once it finds a False-like value, whereas the reduce solution would have to process the entire list to return an answer.

Sounds like reduce in Python or fold(r|l)'?' from Haskell.
reduce(lambda x, y: x and y % == 1, [1, 3, 5])

I think you probably want to use all, which is less general than inject. reduce is the Python equivalent of inject, though.
all(n % 2 == 1 for n in [1, 3, 5, 7])

Related

Can we method chain on lists?

I come from Ruby and you can method chain very easily. Let's look at an example. If I want to select all even nums from a list and add 5 to it. I would do something like this in Ruby.
nums = [...]
nums.select {|x| x % 2 == 0 }.map { |x| x + 5 }
In Python that becomes
nums = [...]
list(map(lambda x: x + 5, filter(lambda x: x % 2 == 0, nums)))
The Python syntax looks horrible. I tried to Google and didn't really find any good answers. All I saw was how you can achieve something like this with custom objects but nothing to process lists this way. Am I missing something?
When in a debugging console, it used to be extremely helpful to get som ActiveRecord objects in an array and I could just chain methods to process the entities to debug things. With Python, it almost seems like too much work.
In Ruby, every enumerable object includes the Enumerable interface, which is why we get all of those helpful methods like you mention. But in Python, there's no common superclass for iterables. An iterable is literally defined as "a thing which supports __iter__", and while there is an abstract class called Iterable which pretends to be a superclass of all iterables, it doesn't actually provide any methods and it doesn't sit in the inheritance chain of all iterables (it overrides the behavior of isinstance and issubclass using the magic of dunder methods, the same way you can override + by writing __add__).
The Alakazam library implements exactly this feature. (Disclosure: I am the creator and maintainer of this library, but it does exactly what you're asking for, so I'll mention it here)
Alakazam provides the Alakazam class, which wraps any Python iterable and provides, as methods, all of the built-in Python sequence methods, all of the itertools module, and some other useful stream-oriented methods that aren't included in Python by default. Consider your example from above
nums.select {|x| x % 2 == 0 }.map { |x| x + 5 }
In Python, that looks like
list(map(lambda x: x + 5, filter(lambda x: x % 2 == 0, nums)))
With Alakazam, that looks like
zz.of(nums).filter(lambda x: x % 2 == 0).map(lambda x: x + 5).list()
or, using Alakazam's lambda syntax
zz.of(nums).filter(_1 % 2 == 0).map(_1 + 5).list()
Whenever reasonable, Alakazam's methods like filter and map are lazy to match Python's behavior, so we still need to write list() at the end to consume the iterable and produce a single list result.
As noted in comments, this Ruby code:
nums = [...]
nums.select {|x| x % 2 == 0 }.map { |x| x + 5 }
Note: why not use #even??
nums = [...]
nums.select {|x| x.even? }.map { |x| x + 5 }
Or even:
nums = [...]
nums.select(&:even?).map { |x| x + 5 }
But nitpicks aside, this can be expressed in Python using a list comprehension, which is very clean.
nums = [...]
[x + 5 for x in nums if x % 2 == 0]
Now a list comprehension eagerly generates a full list. Imagine an original list like [1, 2, 3, 4, 5, 6, 7, 8]. The list comprehension would give us [2, 4, 6, 8]. The data set is trivial.
But imagine that nums is list(range(100_000_000)). Not a trivial data set. Applying this list comprehension to the whole thing will take a lot of time, even if we only need the first four values.
But a generator expression lets us lazily generate the values we need.
from itertools import islice
nums = range(100_000_000)
evens_plus_five = (x + 5 for x in nums if x % 2 == 0)
list(islice(evens_plus_five, 0, 5, 1))
As suggested in comments, this lazy evaluation advantage on large data sets can be gained in Ruby quite readily using #lazy and ranges.
nums = (1..100_000_000)
nums.lazy.select(&:even?).map { |x| x + 5 }.take(5).to_a
And if you're using Ruby 3, let's make that block even cleaner.
nums = (1..100_000_000)
nums.lazy.select(&:even?).map { _1 + 5 }.take(5).to_a

Python for element in list matching condition

I have found myself frequently iterating over a subset of a list according to some condition that is only needed for that loop, and would like to know if there is a more efficient way to write this.
Take for example the list:
foo = [1, 2, 3, 4, 5]
If I wanted to build a for loop that iterates through every element greater than 2, I would typically do something like this:
for x in [y for y in foo if y > 2]:
# Do something
However this seems redundant, and isn't extremely readable in my opinion. I don't think it is particularly inefficient, especially when using a generator instead as #iota pointed out below, however I would much rather be able to write something like:
for x in foo if x > 2:
# Do something
Ideally avoiding the need for a second for and a whole other temporary variable. Is there a syntax for this? I use Python3 but I assume any such syntax would likely have a Python2 variant as well.
Note: This is obviously a very simple example that could be better handled by something like range() or sorting & slicing, so assume foo is any arbitrary list that must be filtered by any arbitrary condition
Not quite the syntax you are after but you can also use filter using a lambda:
for x in filter(lambda y: y > 2, foo):
print(x)
Or using a function for readbility sake:
def greaterthantwo(y):
return y > 2
for x in filter(greaterthantwo, foo):
print(x)
filter also has the advantage of creating a generator expression so it doesn't evaluate all the values if you exit the loop early (as opposed to using a new list)
There's filter as discussed in #salparadise but you can also use generators:
def filterbyvalue(seq, value):
for el in seq:
if el.attribute==value:
yield el
for x in filterbyvalue(foo,2):
#Do something
It may look bigger but it is useful when you have to do something complex instead of filtering, it is also performes better than first creating a list comprehension and then looping over it.
I would do like this
For efficient code
data = (x for x in foo if x>2)
print(next(data))
For more readable code
[print(x) for x in foo if x>2]

Reverse a list without using built-in functions

I'm using Python 3.5.
As part of a problem, I'm trying to design a function that takes a list as input and reverts it. So if x = [a, b, c] the function would make x = [c, b, a].
The problem is, I'm not allowed to use any built-in functions, and it has got me stuck. My initial thought was the following loop inside a function:
for revert in range(1, len(x) + 1):
y.append(x[-revert])
And it works. But the problem is I'm using len(x), which I believe is a built-in function, correct?
So I searched around and have made the following very simple code:
y = x[::-1]
Which does exactly what I wanted, but it just seems almost too simple/easy and I'm not sure whether "::" counts as a function.
So I was wondering if anyone had any hints/ideas how to manually design said function? It just seems really hard when you can't use any built-in functions and it has me stuck for quite some time now.
range and len are both built-in functions. Since list methods are accepted, you could do this with insert. It is reeaallyy slow* but it does the job for small lists without using any built-ins:
def rev(l):
r = []
for i in l:
r.insert(0, i)
return r
By continuously inserting at the zero-th position you end up with a reversed version of the input list:
>>> print(rev([1, 2, 3, 4]))
[4, 3, 2, 1]
Doing:
def rev(l):
return l[::-1]
could also be considered a solution. ::-1 (:: has a different result) isn't a function (it's a slice) and [] is, again, a list method. Also, contrasting insert, it is faster and way more readable; just make sure you're able to understand and explain it. A nice explanation of how it works can be found in this S.O answer.
*Reeaaalllyyyy slow, see juanpa.arrivillaga's answer for cool plot and append with pop and take a look at in-place reverse on lists as done in Yoav Glazner's answer.
:: is not a function, it's a python literal. as well as []
How to check if ::, [] are functions or not. Simple,
import dis
a = [1,2]
dis.dis(compile('a[::-1]', '', 'eval'))
1 0 LOAD_NAME 0 (a)
3 LOAD_CONST 0 (None)
6 LOAD_CONST 0 (None)
9 LOAD_CONST 2 (-1)
12 BUILD_SLICE 3
15 BINARY_SUBSCR
16 RETURN_VALUE
If ::,[] were functions, you should find a label CALL_FUNCTION among python instructions executed by a[::-1] statement. So, they aren't.
Look how python instructions looks like when you call a function, lets say list() function
>>> dis.dis(compile('list()', '', 'eval'))
1 0 LOAD_NAME 0 (list)
3 CALL_FUNCTION 0
6 RETURN_VALUE
So, basically
def rev(f):
return f[::-1]
works fine. But, I think you should do something like Jim suggested in his answer if your question is a homework or sent by you teacher. But, you can add this quickest way as a side note.
If you teacher complains about [::-1] notation, show him the example I gave you.
Another way ( just for completeness :) )
def another_reverse(lst):
new_lst = lst.copy() # make a copy if you don't want to ruin lst...
new_lst.reverse() # notice! this will reverse it in place
return new_lst
Here's a solution that doesn't use built-in functions but relies on list methods. It reverse in-place, as implied by your specification:
>>> x = [1,2,3,4]
>>> def reverse(seq):
... temp = []
... while seq:
... temp.append(seq.pop())
... seq[:] = temp
...
>>> reverse(x)
>>> x
[4, 3, 2, 1]
>>>
ETA
Jim, your answer using insert at position 0 was driving me nuts! That solution is quadratic time! You can use append and pop with a temporary list to achieve linear time using simple list methods. See (reverse is in blue, rev is green):
If it feels a little bit like "cheating" using seq[:] = temp, we could always loop over temp and append every item into seq and the time complexity would still be linear but probably slower since it isn't using the C-based internals.
Your example that works:
y = x[::-1]
uses Python slices notation which is not a function in the sense that I assume you're requesting. Essentially :: acts as a separator. A more verbose version of your code would be:
y = x[len(x):None:-1]
or
y = x[start:end:step]
I probably wouldn't be complaining that python makes your life really, really easily.
Edit to be super pedantic. Someone could argue that calling [] at all is using an inbuilt python function because it's really syntactical sugar for the method __getitem__().
x.__getitem__(0) == x[0]
And using :: does make use of the slice() object.
x.__getitem__(slice(len(x), None, -1) == x[::-1]
But... if you were to argue this, anything you write in python would be using inbuilt python functions.
Another way for completeness, range() takes an optional step parameter that will allow you to step backwards through the list:
def reverse_list(l):
return [l[i] for i in range(len(l)-1, -1, -1)]
The most pythonic and efficient way to achieve this is by list slicing. And, since you mentioned you do not need any inbuilt function, it completely suffice your requirement. For example:
>>> def reverse_list(list_obj):
... return list_obj[::-1]
...
>>> reverse_list([1, 3, 5 , 3, 7])
[7, 3, 5, 3, 1]
Just iterate the list from right to left to get the items..
a = [1,2,3,4]
def reverse_the_list(a):
reversed_list = []
for i in range(0, len(a)):
reversed_list.append(a[len(a) - i - 1])
return reversed_list
new_list = reverse_the_list(a)
print new_list

Python fluent filter, map, etc

I love python. However, one thing that bugs me a bit is that I don't know how to format functional activities in a fluid manner like a can in javascript.
example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?
var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
console.log(x); // prints it for fun
return x;
})
.reduce(function(num_set, val) {
num_set[val] = true;
}, {});
I'd like to know if there are fluid options? Maybe a library.
In general, I've been using list comprehensions for most things but it's a real problem if I want to print
e.g., How can I print every even number between 1 - 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn't). It's also a bit annoying that a list is constructed and returned. I'd rather just for loop.
Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:
from infixpy import *
a = (Seq(range(1,51))
.map(lambda x: x * 4)
.filter(lambda x: x <= 170)
.filter(lambda x: len(str(x)) == 2)
.filter( lambda x: x % 20 ==0)
.enumerate() Ï
.map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
.mkstring(' .. '))
print(a)
pip3 install infixpy
Older
I am looking now at an answer that strikes closer to the heart of the question:
fluentpy https://pypi.org/project/fluentpy/ :
Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:
import fluentpy as _
(
_(range(1,50+1))
.map(_.each * 4)
.filter(_.each <= 170)
.filter(lambda each: len(str(each))==2)
.filter(lambda each: each % 20 == 0)
.enumerate()
.map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
.join(',')
.print()
)
And it works fine:
Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80
I am just now trying this out. It will be a very good day today if this were working as it is shown above.
Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:
python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
Here is it in action on command line:
$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye"
| python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
hello world line1
line 2
line 3
goodbye
There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).
Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.
For example in this case, take care of the logging with a helper function:
def echo(x):
print(x)
return x
Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:
In [118]: d={echo(x):True for x in s if x%2==0}
2
4
In [119]: d
Out[119]: {2: True, 4: True}
or to add these values to an existing dictionary, use update.
new_set.update({echo(x):True for x in s if x%2==0})
another way to write this is with an intermediate generator:
{y:True for y in (echo(x) for x in s if x%2==0)}
Or combine the echo and filter in one generator
def even(s):
for x in s:
if x%2==0:
print(x)
yield(x)
followed by a dict comp using it:
{y:True for y in even(s)}
Comprehensions are the fluent python way of handling filter/map operations.
Your code would be something like:
def evenize(input_list):
return [x for x in input_list if x % 2 == 0]
Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.
Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:
List comprehension: [x for x in range(3)] == [0, 1, 2]
Set comprehension: {x for x in range(3)} == {0, 1, 2}
Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}
Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>
With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.
For instance, if you try to do the following, even with python3 semantics for range:
for number in [x**2 for x in range(10000000000000000)]:
print(number)
you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:
for number in (x**2 for x in range(1e20)):
print(number)
and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls
GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.
(Note the GENEXP_ITERATOR object is not visible at the code level)
next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.
This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.
def pipeline(filenames):
basepath = path.path('/usr/share/stories')
fullpaths = (basepath / fn for fn in filenames)
realfiles = (fn for fn in fullpaths if os.path.exists(fn))
openfiles = (open(fn) for fn in realfiles)
def read_and_close(file):
output = file.read(100)
file.close()
return output
prefixes = (read_and_close(file) for file in openfiles)
noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
return {prefix[:32]: prefix for prefix in prefixes}
At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.
The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.
Arguments against doing this notwithstanding, here is a translation into Python of your JS code.
from __future__ import print_function
from functools import reduce
def print_and_return(x):
print(x)
return x
def isodd(x):
return x % 2 == 0
def add_to_dict(d, x):
d[x] = True
return d
even_set = list(reduce(add_to_dict,
map(print_and_return,
filter(isodd, [1, 2, 3, 4, 5])), {}))
It should work on both Python 2 and Python 3.
There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension and it's prod ready and maintained on PyPi.
Your code would be rewritten in this form:
from pyxtension.strams import stream
def console_log(x):
print(x)
return x
even_set = stream([1,2,3,4,5])\
.filter(lambda x:x%2 === 0)\
.map(console_log)\
.reduce(lambda num_set, val: num_set.__setitem__(val,True))
Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.
We can use Pyterator for this (disclaimer: I am the author).
We define the function that prints and returns (which I believe you can omit completely however).
def print_and_return(x):
print(x)
return x
then
from pyterator import iterate
even_dict = (
iterate([1,2,3,4,5])
.filter(lambda x: x%2==0)
.map(print_and_return)
.map(lambda x: (x, True))
.to_dict()
)
# {2: True, 4: True}
where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.

sort() in Python using cmp

I am trying to sort a list, move all 0 to the end of list.
example: [0,1,0,2,3,0,4]->[1,2,3,4,0,0,0]
and I see someone code it in 1 line
list.sort(cmp=lambda a,b:-1 if b==0 else 0)
But I don't understand what inside the parentheses mean.
Could anyone tell me? Thank you.
Preface:
Sort a list according to the normal comparison:
some_list.sort()
Supply a custom comparator:
some_list.sort(cmp=my_comparator)
A lambda function:
x = lambda a, b: a - b
# is roughly the same as
def x(a, b):
return a - b
An if-else-expression:
value = truthy_case if condition else otherwise
# is roughly the same as
if condition:
value = truthy_case
else:
value = otherwise
The line list.sort(cmp=lambda a,b:-1 if b==0 else 0) itself:
Now, the condition in the comparator is whether b==0, if so indicate that b has a bigger value than a (the sign of the result is negative), otherwise indicate that the values compare the same (the sign is zero).
Whilst Python's list.sort() is stable, this code is not sane, because the comparator needs to test a, too, not only b. A proper implementation would use the key argument:
some_list.sort(key=lambda a: 0 if a == 0 else -1)
Fixed list.sort(cmp=...) implementation:
If you want to use list.sort(cmp=...) (you don't) or if you are just curious, this is a sane implementation:
some_list.sort(cmp=lambda a, b: 0 if a == b else
+1 if a == 0 else
-1 if b == 0 else 0)
But notice:
In Py3.0, the cmp parameter was removed entirely (as part of a larger effort to simplify and unify the language, eliminating the conflict between rich comparisons and the __cmp__ methods).
An alternative:
Sorting a list is in O(𝘯 log 𝘯). I do not know if for this simple problem the code runs faster, but I wouldn't think so. An O(𝘯) solution is filtering:
new_list = [x for x in some_list if x != 0]
new_list.extend([0] * (len(some_list) - len(new_list)))
The difference will probably only matter for quite long lists, though.
>>> sorted(l, key=lambda x:str(x) if x == 0 else x)
[1, 3, 4, 8, 0, 0, 0]
Guess what's happening here? I am exploiting the fact that, as a preference, python will pick up integers first, then strings. SO I converted 0 into '0'.
Here's the proof.
>>> ll = [3,2,3, '1', '3', '0']
>>> sorted(ll)
[2, 3, 3, '0', '1', '3']
You should answer yourself and this is plan:
The ternary expression description is available here:
https://docs.python.org/3/reference/expressions.html?highlight=ternary%20operator#conditional-expressions
You can find a lot of expression description in that document:
https://docs.python.org/3/reference/expressions.html
Q: What does lambda mean?
Please spend just 5 days and read a Tutorial about Python language, which is a fork of the original Gvinno Van Rossum book.
https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions

Categories

Resources