Is there a powerful replace function in Python, something equivalent to replace(x, l, y) in R?
e.g.
x = [0,0,0,0,0,0,0,0,0, 0]
l = [True,False,True,True,False,False,False,False,True, False]
y = [5, 6, 7, 8]
The number of values in y matches the number of True in l. In R,
replace(x, l, y) will have x replaced by y corresponding to the True positions in l.
Is there a similar function in Python I can call? If not, any suggestions how to make it working?
Thanks much in advance!
Since the number of Trues and the number of elements in y are the same, we can a generator over y, and then use a list comprehension to build the result. We extract from y if the corresponding element in l is True, else we extract from x:
iter_y = iter(y)
[next(iter_y) if l_item else x_item for l_item, x_item in zip(l, x)]
This outputs:
[5, 0, 6, 7, 0, 0, 0, 0, 8, 0]
This question already has answers here:
Python AND operator on two boolean lists - how?
(10 answers)
Closed 5 years ago.
I encountered a bug in some code. The (incorrect) line was similar to:
[x for x in range(3, 6) and range(0, 10)]
print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(the correct way of writing this statement is not part of the question)
Wondering what someList and someOtherList does, I experimented. It seems to only ever set the result to the last parameter passed:
x = range(0,3) # [0, 1, 2]
y = range(3, 10) # [3, 4, 5, 6, 7, 8, 9]
z = range(4, 8) # [4, 5, 6, 7]
print x and y # [3, 4, 5, 6, 7, 8, 9]
print y and x # [0, 1, 2]
print z and y and x # [0, 1, 2]
I would assume that this is an unintentional consequence of being able to write something that is useful, but I'm not really seeing how the semantics of the "and" operator are being applied here.
From experience, python won't apply operators to things that don't support those operators (i.e. it spits out a TypeError), so I'd expect an error for and-ing something that should never be and-ed. The fact I don't get an error is telling me that I'm missing... something.
What am I missing? Why is list and list an allowed operation? And is there anything "useful" that can be done with this behaviour?
The and operator checks the truthiness of the first operand. If the truthiness is False, then the first argument is returned, otherwise the second.
So if both range(..)s contain elements, the last operand is returned. So your expression:
[x for x in range(3, 6) and range(0, 10)]
is equivalent to:
[x for x in range(0, 10)]
You can however use an if to filter:
[x for x in range(3, 6) if x in range(0, 10)]
In python-2.7 range(..) however constructs a list, making it not terribly efficient. Since we know that x is an int, we can do the bounds check like:
[x for x in range(3, 6) if 0 <= x < 10]
Of course in this case the check is useless since every element in range(3,6) is in the range of range(0,10).
My function needs to take in a list of integers and a certain integer and return the numbers in the list that are smaller than the specific integer. Any advice?
def smallerThanN(intList,intN):
y=0
newlist=[]
list1=intList
for x in intList:
if int(x) < int(intN):
print(intN)
y+=1
newlist.append(x)
return newlist
Use a list comprehension with an "if" filter to extract those values in the list less than the specified value:
def smaller_than(sequence, value):
return [item for item in sequence if item < value]
I recommend giving the variables more generic names because this code will work for any sequence regardless of the type of sequence's items (provided of course that comparisons are valid for the type in question).
>>> smaller_than([1,2,3,4,5,6,7,8], 5)
[1, 2, 3, 4]
>>> smaller_than('abcdefg', 'd')
['a', 'b', 'c']
>>> smaller_than(set([1.34, 33.12, 1.0, 11.72, 10]), 10)
[1.0, 1.34]
N.B. There is already a similar answer, however, I'd prefer to declare a function instead of binding a lambda expression.
integers_list = [4, 6, 1, 99, 45, 76, 12]
smallerThan = lambda x,y: [i for i in x if i<y]
print smallerThan(integers_list, 12)
Output:
[4, 6, 1]
def smallerThanN(intList, intN):
return [x for x in intList if x < intN]
>>> smallerThanN([1, 4, 10, 2, 7], 5)
[1, 4, 2]
I need to verify if a list is a subset of another - a boolean return is all I seek.
Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.
Adding further facts based on discussions:
Will either of the lists be the same for many tests? It does as one of them is a static lookup table.
Does it need to be a list? It does not - the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.
What would be the optimal solution given the scenario?
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>>
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
Use set.issubset
Example:
a = {1,2}
b = {1,2,3}
a.issubset(b) # True
a = {1,2,4}
b = {1,2,3}
a.issubset(b) # False
The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it's the answer to your question, however.
A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.
Are you asking about subset or subsequence (which means you'll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?
Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
all(x in two for x in one)
Explanation: Generator creating booleans by looping through list one checking if that item is in list two. all() returns True if every item is truthy, else False.
There is also an advantage that all return False on the first instance of a missing element rather than having to process every item.
Assuming the items are hashable
>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False
If you don't care about duplicate items eg. [1, 2, 2] and [1, 2] then just use:
>>> set([1, 2, 2]).issubset([1, 2])
True
Is testing equality on the smaller list after an intersection the fastest way to do this?
.issubset will be the fastest way to do it. Checking the length before testing issubset will not improve speed because you still have O(N + M) items to iterate through and check.
One more solution would be to use a intersection.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one).intersection(set(two)) == set(one)
The intersection of the sets would contain of set one
(OR)
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(one) & (set(two)) == set(one)
Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.
For example:
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)
has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.
Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.
#!/usr/bin/env python
from collections import Counter
def containedInFirst(a, b):
a_count = Counter(a)
b_count = Counter(b)
for key in b_count:
if a_count.has_key(key) == False:
return False
if b_count[key] > a_count[key]:
return False
return True
a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)
Then running this you get:
$ python contained.py
b in a: False
b in a: True
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
set(x in two for x in one) == set([True])
If list1 is in list 2:
(x in two for x in one) generates a list of True.
when we do a set(x in two for x in one) has only one element (True).
Pardon me if I am late to the party. ;)
To check if one set A is subset of set B, Python has A.issubset(B) and A <= B. It works on set only and works great BUT the complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects
I came up with an algorithm to check if list A is a subset of list B with following remarks.
To reduce complexity of finding subset, I find it appropriate to
sort both lists first before comparing elements to qualify for
subset.
It helped me to break the loop when value of element of second list B[j] is greater than value of element of first list A[i].
last_index_j is used to start loop over list B where it last left off. It helps avoid starting comparisons from the start of
list B (which is, as you might guess unnecessary, to start list B from index 0 in subsequent iterations.)
Complexity will be O(n ln n) each for sorting both lists and O(n) for checking for subset.
O(n ln n) + O(n ln n) + O(n) = O(n ln n).
Code has lots of print statements to see what's going on at each iteration of the loop. These are meant for understanding
only.
Check if one list is subset of another list
is_subset = True;
A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];
print(A, B);
# skip checking if list A has elements more than list B
if len(A) > len(B):
is_subset = False;
else:
# complexity of sorting using quicksort or merge sort: O(n ln n)
# use best sorting algorithm available to minimize complexity
A.sort();
B.sort();
print(A, B);
# complexity: O(n^2)
# for a in A:
# if a not in B:
# is_subset = False;
# break;
# complexity: O(n)
is_found = False;
last_index_j = 0;
for i in range(len(A)):
for j in range(last_index_j, len(B)):
is_found = False;
print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");
if B[j] <= A[i]:
if A[i] == B[j]:
is_found = True;
last_index_j = j;
else:
is_found = False;
break;
if is_found:
print("Found: " + str(A[i]));
last_index_j = last_index_j + 1;
break;
else:
print("Not found: " + str(A[i]));
if is_found == False:
is_subset = False;
break;
print("subset") if is_subset else print("not subset");
Output
[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset
Below code checks whether a given set is a "proper subset" of another set
def is_proper_subset(set, superset):
return all(x in superset for x in set) and len(set)<len(superset)
In python 3.5 you can do a [*set()][index] to get the element. It is much slower solution than other methods.
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]
result = set(x in two for x in one)
[*result][0] == True
or just with len and set
len(set(a+b)) == len(set(a))
Here is how I know if one list is a subset of another one, the sequence matters to me in my case.
def is_subset(list_long,list_short):
short_length = len(list_short)
subset_list = []
for i in range(len(list_long)-short_length+1):
subset_list.append(list_long[i:i+short_length])
if list_short in subset_list:
return True
else: return False
Most of the solutions consider that the lists do not have duplicates. In case your lists do have duplicates you can try this:
def isSubList(subList,mlist):
uniqueElements=set(subList)
for e in uniqueElements:
if subList.count(e) > mlist.count(e):
return False
# It is sublist
return True
It ensures the sublist never has different elements than list or a greater amount of a common element.
lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]
print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False
Since no one has considered comparing two strings, here's my proposal.
You may of course want to check if the pipe ("|") is not part of either lists and maybe chose automatically another char, but you got the idea.
Using an empty string as separator is not a solution since the numbers can have several digits ([12,3] != [1,23])
def issublist(l1,l2):
return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])
If you are asking if one list is "contained" in another list then:
>>>if listA in listB: return True
If you are asking if each element in listA has an equal number of matching elements in listB try:
all(True if listA.count(item) <= listB.count(item) else False for item in listA)
I'm looking for a Python function similar to nubBy in Haskell, which removes duplicate but with a different equality test.
The function would take the equality test and the list as parameters, and would return the list of elements with no duplicates.
Example:
In [1]: remove(lambda x, y: x+y == 12, [2, 3, 6, 9, 10])
Out[1]: [2,3,6]
For example, here (2 and 10) and (9 and 3) are duplicates. I don't care if the output is [10, 9, 6] or [2, 3, 6].
Is there an equivalent built-in function in Python? If not, what is the best way to efficiently implement it?
There is no built-in method (as the use case is rather esoteric), but you can easily write one:
def removeDups(duptest, iterable):
res = []
for e in iterable:
if not any(duptest(e, r) for r in res):
res.append(e)
return res
Now, in the console:
>>> removeDups(lambda x,y: x+y == 10, [2,3,5,7,8])
[2, 3, 5]
>>> removeDups(lambda x,y: x+y == 10, [2,3,6,7,8])
[2, 3, 6]
>>> removeDups(lambda x, y: x+y == 12, [2, 3, 6, 9, 10])
[2, 3, 6]
This remove function will allow you to specify any pairwise equality function. It will keep the last of each set of duplicates.
values = [2,3,5,7,8]
def addstoten(item, other):
return item + other == 10
def remove(eq, values):
values = tuple(values)
for index, item in enumerate(values):
if not any(eq(item, other) for other in values[index + 1:]):
yield item
print list(remove(addstoten, values))