Remove all elements that satisfy a predicate from a set - python

Given a mutable set of objects,
A = set(1,2,3,4,5,6)
I can construct a new set containing only those objects that don't satisfy a predicate ...
B = set(x for x in A if not (x % 2 == 0))
... but how do I modify A in place to contain only those objects? If possible, do this in linear time, without constructing O(n)-sized scratch objects, and without removing anything from A, even temporarily, that doesn't satisfy the predicate.
(Integers are used here only to simplify the example. In the actual code they are Future objects and I'm trying to pull out those that have already completed, which is expected to be a small fraction of the total.)
Note that it is not, in general, safe in Python to mutate an object that you are iterating over. I'm not sure of the precise rules for sets (the documentation doesn't make any guarantee either way).
I only need an answer for 3.4+, but will take more general answers.

(Not actually O(1) due to implementation details, but I'm loathe to delete it as it's quite clean.)
Use symmetric_difference_update.
>>> A = {1,2,3,4,5,6}
>>> A.symmetric_difference_update(x for x in A if not (x % 2))
>>> A
{1, 3, 5}

With an horrible time complexity (quadratic), but in O(1) space:
>>> A = {1,2,3,4,5,6}
>>> while modified:
... modified = False
... for x in A:
... if not x%2:
... A.remove(x)
... modified = True
... break
...
>>> A
{1, 3, 5}

On the very specific use case you showed, there is a way to do this in O(1) space, but it doesn't generalize very well to sets containing anything other than int objects:
A = {1, 2, 3, 4, 5, 6}
for i in range(min(A), max(A) + 1):
if i % 2 != 0:
A.discard(i)
It also wastes time since it will check numbers that aren't even in the set. For anything other than int objects, I can't yet think of a way to do this without creating an intermediate set or container of some sort.
For a more general solution, it would be better to simply initially construct your set using the predicate (if you don't need to use the set for anything else first). Something like this:
def items():
# maybe this is a file or a stream or something,
# where ever your initial values are coming from.
for thing in source:
yield thing
def predicate(item):
return bool(item)
A = set(item for item in items() if predicate(item))

to maintain the use use of memory constant this is the only thing that come to my mind
def filter_Set(predicate,origen:set) -> set:
resul = set()
while origen:
elem = origen.pop()
if predicate( elem ):
resul.add( elem )
return resul
def filter_Set_inplace(predicate,origen:set):
resul = set()
while origen:
elem = origen.pop()
if predicate( elem ):
resul.add( elem )
while resul:
origen.add(resul.pop())
with this functions I move elems from one set to the other keeping only those that satisfied the predicate

Related

Test if all values of a dictionary are equal - when value is unknown

I have 2 dictionaries:
the values in each dictionary should all be equal.
BUT I don't know what that number will be...
dict1 = {'xx':A, 'yy':A, 'zz':A}
dict2 = {'xx':B, 'yy':B, 'zz':B}
N.B. A does not equal B
N.B. Both A and B are actually strings of decimal numbers (e.g. '-2.304998') as they have been extracted from a text file
I want to create another dictionary - that effectively summarises this data - but only if all the values in each dictionary are the same.
i.e.
summary = {}
if dict1['xx'] == dict1['yy'] == dict1['zz']:
summary['s'] = dict1['xx']
if dict2['xx'] == dict2['yy'] == dict2['zz']:
summary['hf'] = dict2['xx']
Is there a neat way of doing this in one line?
I know it is possible to create a dictionary using comprehensions
summary = {k:v for (k,v) in zip(iterable1, iterable2)}
but am struggling with both the underlying for loop and the if statement...
Some advice would be appreciated.
I have seen this question, but the answers all seem to rely on already knowing the value being tested (i.e. are all the entries in the dictionary equal to a known number) - unless I am missing something.
sets are a solid way to go here, but just for code golf purposes here's a version that can handle non-hashable dict values:
expected_value = next(iter(dict1.values())) # check for an empty dictionary first if that's possible
all_equal = all(value == expected_value for value in dict1.values())
all terminates early on a mismatch, but the set constructor is well enough optimized that I wouldn't say that matters without profiling on real test data. Handling non-hashable values is the main advantage to this version.
One way to do this would be to leverage set. You know a set of an iterable has a length of 1 if there is only one value in it:
if len(set(dct.values())) == 1:
summary[k] = next(iter(dct.values()))
This of course, only works if the values of your dictionary are hashable.
While we can use set for this, doing so has a number of inefficiencies when the input is large. It can take memory proportional to the size of the input, and it always scans the whole input, even when two distinct values are found early. Also, the input has to be hashable.
For 3-key dicts, this doesn't matter much, but for bigger ones, instead of using set, we can use itertools.groupby and see if it produces multiple groups:
import itertools
groups = itertools.groupby(dict1.values())
# Consume one group if there is one, then see if there's another.
next(groups, None)
if next(groups, None) is None:
# All values are equal.
do_something()
else:
# Unequal values detected.
do_something_else()
Except for readability, I don't care for all the answers involving set or .values. All of these are always O(N) in time and memory. In practice it can be faster, although it depends on the distribution of values.
Also because set employs hashing operations, you may also have a hefty large constant multiplier to your time cost. And your values have to hashable, when a test for equality is all that's needed.
It is theoretically better to take the first value from the dictionary and search for the first example in the remaining values that is not equal to.
set might be quicker than the solution below because its workings are may reduce to C implementations.
def all_values_equal(d):
if len(d)<=1: return True # Treat len0 len1 as all equal
i = d.itervalues()
firstval = i.next()
try:
# Incrementally generate all values not equal to firstval
# .next raises StopIteration if empty.
(j for j in i if j!=firstval).next()
return False
except StopIteration:
return True
print all_values_equal({1:0, 2:1, 3:0, 4:0, 5:0}) # False
print all_values_equal({1:0, 2:0, 3:0, 4:0, 5:0}) # True
print all_values_equal({1:"A", 2:"B", 3:"A", 4:"A", 5:"A"}) # False
print all_values_equal({1:"A", 2:"A", 3:"A", 4:"A", 5:"A"}) # True
In the above:
(j for j in i if j!=firstval)
is equivalent to:
def gen_neq(i, val):
"""
Give me the values of iterator i that are not equal to val
"""
for j in i:
if j!=val:
yield j
I found this solution, which I find quite a bit I combined another solution found here: enter link description here
user_min = {'test':1,'test2':2}
all(value == list(user_min.values())[0] for value in user_min.values())
>>> user_min = {'test':1,'test2':2}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
False
>>> user_min = {'test':2,'test2':2}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
True
>>> user_min = {'test':'A','test2':'B'}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
False
>>> user_min = {'test':'A','test2':'A'}
>>> all(value == list(user_min.values())[0] for value in user_min.values())
True
Good for a small dictionary, but I'm not sure about a large dictionary, since we get all the values to choose the first one

Python fluent filter, map, etc

I love python. However, one thing that bugs me a bit is that I don't know how to format functional activities in a fluid manner like a can in javascript.
example (randomly created on the spot): Can you help me convert this to python in a fluent looking manner?
var even_set = [1,2,3,4,5]
.filter(function(x){return x%2 === 0;})
.map(function(x){
console.log(x); // prints it for fun
return x;
})
.reduce(function(num_set, val) {
num_set[val] = true;
}, {});
I'd like to know if there are fluid options? Maybe a library.
In general, I've been using list comprehensions for most things but it's a real problem if I want to print
e.g., How can I print every even number between 1 - 5 in python 2.x using list comprehension (Python 3 print() as a function but Python 2 it doesn't). It's also a bit annoying that a list is constructed and returned. I'd rather just for loop.
Update Here's yet another library/option : one that I adapted from a gist and is available on pipy as infixpy:
from infixpy import *
a = (Seq(range(1,51))
.map(lambda x: x * 4)
.filter(lambda x: x <= 170)
.filter(lambda x: len(str(x)) == 2)
.filter( lambda x: x % 20 ==0)
.enumerate() Ï
.map(lambda x: 'Result[%d]=%s' %(x[0],x[1]))
.mkstring(' .. '))
print(a)
pip3 install infixpy
Older
I am looking now at an answer that strikes closer to the heart of the question:
fluentpy https://pypi.org/project/fluentpy/ :
Here is the kind of method chaining for collections that a streams programmer (in scala, java, others) will appreciate:
import fluentpy as _
(
_(range(1,50+1))
.map(_.each * 4)
.filter(_.each <= 170)
.filter(lambda each: len(str(each))==2)
.filter(lambda each: each % 20 == 0)
.enumerate()
.map(lambda each: 'Result[%d]=%s' %(each[0],each[1]))
.join(',')
.print()
)
And it works fine:
Result[0]=20,Result[1]=40,Result[2]=60,Result[3]=80
I am just now trying this out. It will be a very good day today if this were working as it is shown above.
Update: Look at this: maybe python can start to be more reasonable as one-line shell scripts:
python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
Here is it in action on command line:
$echo -e "Hello World line1\nLine 2\Line 3\nGoodbye"
| python3 -m fluentpy "lib.sys.stdin.readlines().map(str.lower).map(print)"
hello world line1
line 2
line 3
goodbye
There is an extra newline that should be cleaned up - but the gist of it is useful (to me anyways).
Generators, iterators, and itertools give added powers to chaining and filtering actions. But rather than remember (or look up) rarely used things, I gravitate toward helper functions and comprehensions.
For example in this case, take care of the logging with a helper function:
def echo(x):
print(x)
return x
Selecting even values is easy with the if clause of a comprehension. And since the final output is a dictionary, use that kind of comprehension:
In [118]: d={echo(x):True for x in s if x%2==0}
2
4
In [119]: d
Out[119]: {2: True, 4: True}
or to add these values to an existing dictionary, use update.
new_set.update({echo(x):True for x in s if x%2==0})
another way to write this is with an intermediate generator:
{y:True for y in (echo(x) for x in s if x%2==0)}
Or combine the echo and filter in one generator
def even(s):
for x in s:
if x%2==0:
print(x)
yield(x)
followed by a dict comp using it:
{y:True for y in even(s)}
Comprehensions are the fluent python way of handling filter/map operations.
Your code would be something like:
def evenize(input_list):
return [x for x in input_list if x % 2 == 0]
Comprehensions don't work well with side effects like console logging, so do that in a separate loop. Chaining function calls isn't really that common an idiom in python. Don't expect that to be your bread and butter here. Python libraries tend to follow the "alter state or return a value, but not both" pattern. Some exceptions exist.
Edit: On the plus side, python provides several flavors of comprehensions, which are awesome:
List comprehension: [x for x in range(3)] == [0, 1, 2]
Set comprehension: {x for x in range(3)} == {0, 1, 2}
Dict comprehension: ` {x: x**2 for x in range(3)} == {0: 0, 1: 1, 2: 4}
Generator comprehension (or generator expression): (x for x in range(3)) == <generator object <genexpr> at 0x10fc7dfa0>
With the generator comprehension, nothing has been evaluated yet, so it is a great way to prevent blowing up memory usage when pipelining operations on large collections.
For instance, if you try to do the following, even with python3 semantics for range:
for number in [x**2 for x in range(10000000000000000)]:
print(number)
you will get a memory error trying to build the initial list. On the other hand, change the list comprehension into a generator comprehension:
for number in (x**2 for x in range(1e20)):
print(number)
and there is no memory issue (it just takes forever to run). What happens is the range object gets built (which only stores the start, stop and step values (0, 1e20, and 1)) the object gets built, and then the for-loop begins iterating over the genexp object. Effectively, the for-loop calls
GENEXP_ITERATOR = `iter(genexp)`
number = next(GENEXP_ITERATOR)
# run the loop one time
number = next(GENEXP_ITERATOR)
# run the loop one time
# etc.
(Note the GENEXP_ITERATOR object is not visible at the code level)
next(GENEXP_ITERATOR) tries to pull the first value out of genexp, which then starts iterating on the range object, pulls out one value, squares it, and yields out the value as the first number. The next time the for-loop calls next(GENEXP_ITERATOR), the generator expression pulls out the second value from the range object, squares it and yields it out for the second pass on the for-loop. The first set of numbers are no longer held in memory.
This means that no matter how many items in the generator comprehension, the memory usage remains constant. You can pass the generator expression to other generator expressions, and create long pipelines that never consume large amounts of memory.
def pipeline(filenames):
basepath = path.path('/usr/share/stories')
fullpaths = (basepath / fn for fn in filenames)
realfiles = (fn for fn in fullpaths if os.path.exists(fn))
openfiles = (open(fn) for fn in realfiles)
def read_and_close(file):
output = file.read(100)
file.close()
return output
prefixes = (read_and_close(file) for file in openfiles)
noncliches = (prefix for prefix in prefixes if not prefix.startswith('It was a dark and stormy night')
return {prefix[:32]: prefix for prefix in prefixes}
At any time, if you need a data structure for something, you can pass the generator comprehension to another comprehension type (as in the last line of this example), at which point, it will force the generators to evaluate all the data they have left, but unless you do that, the memory consumption will be limited to what happens in a single pass over the generators.
The biggest dealbreaker to the code you wrote is that Python doesn't support multiline anonymous functions. The return value of filter or map is a list, so you can continue to chain them if you so desire. However, you'll either have to define the functions ahead of time, or use a lambda.
Arguments against doing this notwithstanding, here is a translation into Python of your JS code.
from __future__ import print_function
from functools import reduce
def print_and_return(x):
print(x)
return x
def isodd(x):
return x % 2 == 0
def add_to_dict(d, x):
d[x] = True
return d
even_set = list(reduce(add_to_dict,
map(print_and_return,
filter(isodd, [1, 2, 3, 4, 5])), {}))
It should work on both Python 2 and Python 3.
There's a library that already does exactly what you are looking for, i.e. the fluid syntaxt, lazy evaluation and the order of operations is the same with how it's written, as well as many more other good stuff like multiprocess or multithreading Map/Reduce.
It's named pyxtension and it's prod ready and maintained on PyPi.
Your code would be rewritten in this form:
from pyxtension.strams import stream
def console_log(x):
print(x)
return x
even_set = stream([1,2,3,4,5])\
.filter(lambda x:x%2 === 0)\
.map(console_log)\
.reduce(lambda num_set, val: num_set.__setitem__(val,True))
Replace map with mpmap for multiprocessed map, or fastmap for multithreaded map.
We can use Pyterator for this (disclaimer: I am the author).
We define the function that prints and returns (which I believe you can omit completely however).
def print_and_return(x):
print(x)
return x
then
from pyterator import iterate
even_dict = (
iterate([1,2,3,4,5])
.filter(lambda x: x%2==0)
.map(print_and_return)
.map(lambda x: (x, True))
.to_dict()
)
# {2: True, 4: True}
where I have converted your reduce into a sequence of tuples that can be converted into a dictionary.

Test if all N variables are different

I want to make a condition where all selected variables are not equal.
My solution so far is to compare every pair which doesn't scale well:
if A!=B and A!=C and B!=C:
I want to do the same check for multiple variables, say five or more, and it gets quite confusing with that many. What can I do to make it simpler?
Create a set and check whether the number of elements in the set is the same as the number of variables in the list that you passed into it:
>>> variables = [a, b, c, d, e]
>>> if len(set(variables)) == len(variables):
... print("All variables are different")
A set doesn't have duplicate elements so if you create a set and it has the same number of elements as the number of elements in the original list then you know all elements are different from each other.
If you can hash your variables (and, uh, your variables have a meaningful __hash__), use a set.
def check_all_unique(li):
unique = set()
for i in li:
if i in unique: return False #hey I've seen you before...
unique.add(i)
return True #nope, saw no one twice.
O(n) worst case. (And yes, I'm aware that you can also len(li) == len(set(li)), but this variant returns early if a match is found)
If you can't hash your values (for whatever reason) but can meaningfully compare them:
def check_all_unique(li):
li.sort()
for i in range(1,len(li)):
if li[i-1] == li[i]: return False
return True
O(nlogn), because sorting. Basically, sort everything, and compare pairwise. If two things are equal, they should have sorted next to each other. (If, for some reason, your __cmp__ doesn't sort things that are the same next to each other, 1. wut and 2. please continue to the next method.)
And if ne is the only operator you have....
import operator
import itertools
li = #a list containing all the variables I must check
if all(operator.ne(*i) for i in itertools.combinations(li,2)):
#do something
I'm basically using itertools.combinations to pair off all the variables, and then using operator.ne to check for not-equalness. This has a worst-case time complexity of O(n^2), although it should still short-circuit (because generators, and all is lazy). If you are absolutely sure that ne and eq are opposites, you can use operator.eq and any instead.
Addendum: Vincent wrote a much more readable version of the itertools variant that looks like
import itertools
lst = #a list containing all the variables I must check
if all(a!=b for a,b in itertools.combinations(lst,2)):
#do something
Addendum 2: Uh, for sufficiently large datasets, the sorting variant should possibly use heapq. Still would be O(nlogn) worst case, but O(n) best case. It'd be something like
import heapq
def check_all_unique(li):
heapq.heapify(li) #O(n), compared to sorting's O(nlogn)
prev = heapq.heappop(li)
for _ in range(len(li)): #O(n)
current = heapq.heappop(li) #O(logn)
if current == prev: return False
prev = current
return True
Put the values into a container type. Then just loop trough the container, comparing each value. It would take about O(n^2).
pseudo code:
a[0] = A; a[1] = B ... a[n];
for i = 0 to n do
for j = i + 1 to n do
if a[i] == a[j]
condition failed
You can enumerate a list and check that all values are the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if all(a.index(v) == i for i, v in enumerate(a)):
print "all elements are unique"
This allows for short-circuiting once the first duplicate is detected due to the behaviour of Python's all() function.
Or equivalently, enumerate a list and check if there are any values which are not the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if not any(a.index(v) != i for i, v in enumerate(a)):
print "all elements are unique"

How to remove and return element in python list

In python you can do list.pop(i) which removes and returns the element in index i, but is there a built in function like list.remove(e) where it removes and returns the first element equal to e?
Thanks
I mean, there is list.remove, yes.
>>> x = [1,2,3]
>>> x.remove(1)
>>> x
[2, 3]
I don't know why you need it to return the removed element, though. You've already passed it to list.remove, so you know what it is... I guess if you've overloaded __eq__ on the objects in the list so that it doesn't actually correspond to some reasonable notion of equality, you could have problems. But don't do that, because that would be terrible.
If you have done that terrible thing, it's not difficult to roll your own function that does this:
def remove_and_return(lst, item):
return lst.pop(lst.index(item))
Is there a builtin? No. Probably because if you already know the element you want to remove, then why bother returning it?1
The best you can do is get the index, and then pop it. Ultimately, this isn't such a big deal -- Chaining 2 O(n) algorithms is still O(n), so you still scale roughly the same ...
def extract(lst, item):
idx = lst.index(item)
return lst.pop(idx)
1Sure, there are pathological cases where the item returned might not be the item you already know... but they aren't important enough to warrant a new method which takes only 3 lines to write yourself :-)
Strictly speaking, you would need something like:
def remove(lst, e):
i = lst.index(e)
# error if e not in lst
a = lst[i]
lst.pop(i)
return a
Which would make sense only if e == a is true, but e is a is false, and you really need a instead of e.
In most case, though, I would say that this suggest something suspicious in your code.
A short version would be :
a = lst.pop(lst.index(e))

idiomatic python, manage default arguments in functions

I usually encounter that most of the people manage default arguments values in functions or methods like this:
def foo(L=None):
if L is None:
L = []
However i see other people doing something like:
def foo(L=None):
L = L or []
I don't know if i a missing something but, why most of the people use the first approach instead the second? Are they equally the same thing?, seems that the second is clearer and shorter.
They are not equal.
First approach checks exactly, that given arg L is None.
Second checks, that L is true in python way. In python, if you check in condition the list, rules are the following:
List is empty, then it is False
True otherwise
So what's the difference between mentioned approaches? Compare this code.
First:
def foo(L=None):
if L is None:
L = []
L.append('x')
return L
>>> my_list = []
>>> foo(my_list)
>>> my_list
['x']
Second:
def foo(L=None):
L = L or []
L.append('x')
return L
>>> my_list = []
>>> foo(my_list)
>>> my_list
[]
So first didn't create a new list, it used the given list. But second creates the new one.
The two are not equivalent if the argument is a false-y value. This doesn't matters often, as many false-y values aren't suitable arguments to most functions where you'd do this. Still, there are conceivable situations where it can matter. For example, if a function is supposed to fill a dictionary (creating a new one if none is given), and someone passes an empty ordered dictionary instead, then the latter approach would incorrectly return an ordinary dictionary.
That's not my primary reason for always using the is None version though. I prefer it as it is more explicit and the fact that or returns one of its operands isn't intuitive to me. I like to forget about it as long as I can ;-) The extra line is not a problem, this is relatively rare.
Maybe they don't know of the second one? I tend to use the first.
Actually there is a difference. The second one will let L = [] if you pass anything that evaluates to Boolean false. 0 empty string or others. The first will only do that if no L is passed or it was passed as None.

Categories

Resources