Python find elements in array A but not in array B

Python find elements in array A but not in array B - python

I'm trying to find the difference between the 2 arrays
arrayA = np.array(['A1', 'A2', 'A3'])
arrayB = np.array(['A1', 'A2', 'A3', 'A4', 'A5', 'A6'])
I'm trying to get
difference = ['A4', 'A5', 'A6']
How can I do this, thank you

Use numpy's setdiff:
np.setdiff1d(arrayA, arrayB)
Also - is there any special reason for which this needs to be a numpy array? You could simply use sets and then the minus operator: set(arrayA) - set(arrayB)

[i for i in arrayB if i not in arrayA]

You can use the python set features for this:
import numpy as np
a = np.array(['A1', 'A2', 'A3'])
b = np.array(['A1', 'A2', 'A3', 'A4', 'A5', 'A6'])
print(set(b)-set(a))
Output:
{'A6', 'A5', 'A4'}
Or just comprehension:
import numpy as np
a = np.array(['A1', 'A2', 'A3'])
b = np.array(['A1', 'A2', 'A3', 'A4', 'A5', 'A6'])
print([i for i in b if i not in a])
Output:
['A4', 'A5', 'A6']

As pointed out by this great answer, you can use the np.setdiff1d() method:
import numpy as np
arrayA = np.array(['A1', 'A2', 'A3'])
arrayB = np.array(['A1', 'A2', 'A3', 'A4', 'A5', 'A6'])
print(np.setdiff1d(arrayB, arrayA))
Output
['A4' 'A5' 'A6']
But the order of the elements will not be kept, as the result will always be sorted in ascending order. Observe:
import numpy as np
arrayA = np.array(['A1', 'A2', 'A3'])
arrayB = np.array(['A1', 'A2', 'A3', 'A4', 'A6', 'A5']) # Swapped 5 and 6
print(np.setdiff1d(arrayB, arrayA))
Output:
['A4' 'A5' 'A6']
If you want to keep the order, you can use the np.in1d() method:
import numpy as np
arrayA = np.array(['A1', 'A2', 'A3'])
arrayB = np.array(['A1', 'A2', 'A3', 'A4', 'A6', 'A5']) # Swapped 5 and 6
print(arrayB[~np.in1d(arrayB, arrayA)])
Output:
['A4' 'A6' 'A5']

You can use sets:
difference = list(set(arrayB) - set(arrayA))
Output:
['A4', 'A6', 'A5']

Related

Split data in list based on condition

I have following list :
data = ['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3']
I want to split the list such that
split1 = ['A1', 'C3', 'B2', 'A2', 'C2', 'A3', 'C1', 'B1', 'B3']
split2 = ['D3', 'D2', 'D1']
Constraint is that no item with same prefix(A, B, etc.) can wind up in separate list. The data can be split in any ratio like 50-50, 80-20.

Here you go:
import numpy as np
data = np.array(['A1', 'C3', 'B2', 'A2', 'D3', 'C2', 'A3', 'D2', 'C1', 'B1', 'D1', 'B3'])
# define some condition
condition = ['B', 'D']
boolean_selection = [np.any([ c in d for c in condition]) for d in data]
split1 = data[boolean_selection]
split2 = data[np.logical_not(boolean_selection)]

Python List of Lists why do I get this Error: IndexError: list index out of range? [duplicate]

This question already has answers here:
Python: Editing list while iterating over it
(7 answers)
Closed 2 years ago.
I have made a function that is supposed to delete any similar occurrences in the lists of list but I was surprised by the following ERROR: IndexError: list index out of range why it is so?
for example:
input: [['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
expected output:[['a0', 'a3'], ['a1', 'a2'], ['a3', 'a1']]
def getList(a):
b=a
lena = len(a)
print(len(a))
for i in range(lena):
for j in range (i+1,lena):
print(i,j)
print(a[i],a[j])
if(a[i][0],a[i][1])==(a[j][1],a[j][0]) or (a[i][0],a[i][1])==(a[j][0],a[j][1]):
print(a)
a = [['a0', 'a3'], ['a1', 'a2'], ['a0','a3'], ['a2','a1'], ['a3', 'a1']]
getList(a)
OUTPUT:
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
5
0 1
['a0', 'a3'] ['a1', 'a2']
0 2
['a0', 'a3'] ['a0', 'a3']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
0 3
['a0', 'a3'] ['a2', 'a1']
0 4
['a0', 'a3'] ['a3', 'a1']
1 2
['a1', 'a2'] ['a0', 'a3']
1 3
['a1', 'a2'] ['a2', 'a1']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
1 4
['a1', 'a2'] ['a3', 'a1']
2 3
['a0', 'a3'] ['a2', 'a1']
2 4
['a0', 'a3'] ['a3', 'a1']
3 4
['a2', 'a1'] ['a3', 'a1']
When I Modify the Code by adding b.pop(j) or anything as for example:
def getList(a):
b=a
lena = len(a)
print(len(a))
for i in range(lena):
for j in range (i+1,lena):
print(i,j)
print(a[i],a[j])
if(a[i][0],a[i][1])==(a[j][1],a[j][0]) or (a[i][0],a[i][1])==(a[j][0],a[j][1]):
print(a)
b.pop(j)
a = [['a0', 'a3'], ['a1', 'a2'], ['a0','a3'], ['a2','a1'], ['a3', 'a1']]
getList(a)
RESULT:
5
0 1
['a0', 'a3'] ['a1', 'a2']
0 2
['a0', 'a3'] ['a0', 'a3']
[['a0', 'a3'], ['a1', 'a2'], ['a0', 'a3'], ['a2', 'a1'], ['a3', 'a1']]
0 3
['a0', 'a3'] ['a3', 'a1']
0 4
Traceback (most recent call last):
File "C:/Users/I/Desktop/papers/test.py", line 21, in <module>
getList(a)
File "C:/Users/I/Desktop/papers/test.py", line 13, in getList
print(a[i],a[j])
IndexError: list index out of range
I am wondering what could be the problem?

Manipulating a list while iterating over it is a recipe for disaster and almost always can be avoided by accumulating results onto a separate structure, or at least making a copy of the original list for iteration purposes (b = a creates an alias, not a copy, which can be done with a.copy() or a[:]). When you pop, the list length changes and the iterator refers to nonexistent list elements.
Also, it's best not to conflate printing with programmatic output. The result of most algorithms should not be stdout. Instead, results should be written to a data structure and returned for the caller to use or dump as they see fit.
Another problem is efficiency: the nested loops mean O(n2) running time. Using extra space can give you a linear algorithm.
If you convert each sublist into tuples, then they become hashable and you can stick the data into a set to eliminate duplicates, then convert everything back to lists:
>>> [list(x) for x in set(tuple(sorted(x)) for x in a)]
[['a1', 'a2'], ['a1', 'a3'], ['a0', 'a3']]
The problem is that order is lost. If order should be kept, you might use a set as a lookup table:
>>> lookup = set()
>>> result = []
>>> for pair in a:
... key = tuple(sorted(pair))
... if key not in lookup:
... lookup.add(key)
... result.append(pair)
...
>>> result
[['a0', 'a3'], ['a1', 'a2'], ['a3', 'a1']]
If you're using CPython 3.6+, you can take advantage of dictionary ordering to improve the set approach shown above:
>>> [list(x) for x in dict([tuple(sorted(x)), None] for x in a)]
[['a0', 'a3'], ['a1', 'a2'], ['a1', 'a3']]
Pre-3.6 versions can use collections.OrderedDict to achieve the same result:
>>> from collections import OrderedDict
>>> [list(x) for x in OrderedDict([tuple(sorted(x)), None] for x in a)]
[['a0', 'a3'], ['a1', 'a2'], ['a1', 'a3']]

Your code workes fine if you call b = a.copy() and return b.
b = a means that the variable name b points to the same Object(list) as a.

python fixed array of dynamic strings list

I would like to fill iteratively an array of fixed size where each item is a list of strings. For example, let's consider the following strings list:
arr = ['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
I want to obtain the following array of 3 items (no ordering is required):
res = [['A1', 'A2', 'A3', 'A4'],
['B2', 'B1'],
['C3', 'C1', 'C2']]
I have the following piece of code:
arr = ['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
res = [[]] * 3
for i in range(len(arr)):
# Calculate index corresponding to A, B or C
j = ord(arr[i][0])-65
# Extend corresponding string list
res[j].extend([arr[i]])
for i in range(len(res)):
print(res[i])
But I get this result:
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
['A1', 'C3', 'B2', 'A2', 'C1', 'A3', 'B1', 'C2', 'A4']
Where am I wrong please?
Thank you for your help!

You can use itertools.groupby and group the elements in the list (having been sorted) according to the first element. You can use operator.itemgetter to efficiently fetch the first substring in each string:
from itertools import groupby
from operator import itemgetter
[list(v) for k,v in groupby(sorted(arr), key=itemgetter(0))]
# [['A1', 'A2', 'A3', 'A4'], ['B1', 'B2'], ['C1', 'C2', 'C3']]

The problem is due to the following:
res = [[]] * 3 will create three lists, but all three are the same object. So whenever you append or extend one of them it will be added to "all" (they are all the same object after all).
You can easily check this by replacing it with:
res = [[],[],[]]
which will then give you the expected answer.
Consider these snippets:
res = [[]]*2
res[0].append(1)
print(res)
Out:
[[1], [1]]
While
res = [[],[]]
res[0].append(1)
print(res)
Out:
[[1], []]
Alternatively you can create the nested list like this: res = [[] for i in range(3)]

You can use list comprehension :
[[k for k in arr if k[0]==m] for m in sorted(set([i[0] for i in arr]))]
OUTPUT :
[['A1', 'A2', 'A3', 'A4'], ['B2', 'B1'], ['C3', 'C1', 'C2']]

haw can I find the smallest list among some lists generated by my program?

I wrote a program that generates some lists, something like
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5', 'b5', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'D', 'c4']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'b5', 'b5', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b3', 'b2', 'b2', 'b3', 'b4', 'b5']
['a0', 'a1', 'a2', 'a3', 'a3', 'a4', 'C', 'b4', 'D', 'c4', 'c4', 'D', 'b4', 'b5', 'b5', 'b4', 'b3', 'b2']
and I want to find the shortest list, the list that has the minimum number of elements
thanks,

You can use the min function:
min(data, key = len)
If you want to handle cases where there are multiple elements having the shortest length, you can sort the list in ascending order by length:
sorted(data, key = len)

You can sort it by list length then get the first element but this won't take into account lists that all have the same length.
smallest_list = sorted(list_of_list, key=len)[0]
Another would be get the length of the smallest list then use that as a filter
len_smallest_list = min(len(x) for x in list_of_list)
smallest_list = [list for list in list_of_list if len(list) == len_smallest_list]

Creating resources combination

We have some departures which can be assigned to different arrivals, just like this:
Dep1.arrivals = [A1, A2]
Dep2.arrivals = [A2, A3, A4]
Dep3.arrivals = [A3, A5]
The output of this function should be a list containing every possible combination of arrivals:
Output: [[A1, A2, A3], [A1, A2, A5], [A1, A3, A5], [A1, A4, A5], ...]
Notice that [A1, A3, A3] isn't contained in the list because you can not use an arrival twice. Also notice that [A1, A2, A3] is the same element as [A3, A1, A2] or [A3, A2, A1].
EDIT:
Many solutions given works in this case but not as a general solution, for instance if the 3 sets or arrivals are equal:
Dep1.arrivals = [A1, A2, A3]
Dep2.arrivals = [A1, A2, A3]
Dep3.arrivals = [A1, A2, A3]
Then it returns:
('A1', 'A2', 'A3')
('A1', 'A3', 'A2')
('A2', 'A1', 'A3')
('A2', 'A3', 'A1')
('A3', 'A1', 'A2')
('A3', 'A2', 'A1')
Which is wrong since ('A1', 'A2', 'A3') and ('A3', 'A2', 'A1') are the same solution.
Thank you anyway!

You can do this using a list comprehension with itertools.product:
>>> import itertools
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> print [x for x in itertools.product(*lol) if len(set(x)) == len(lol)]
Result
[('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A3'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A3'),
('A2', 'A4', 'A5')]
Note that this is notionally equivalent to the code that #Kevin has given.
Edit: As OP mentions in his edits, this solution doesn't work with when order of combination is different.
To resolve that, the last statement can be altered to the following, where we first obtain a list of sorted tuple of arrivals, and then convert convert the list to a set, as below:
>>> lol = [["A1", "A2", "A3"], ["A1", "A2", "A3"], ["A1", "A2", "A3"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3')}
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A4'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A4'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A5')}

You could use product to generate all possible combinations of the departures, and then filter out combinations containing duplicates after the fact:
import itertools
arrivals = [
["A1", "A2"],
["A2", "A3", "A4"],
["A3", "A5"]
]
for items in itertools.product(*arrivals):
if len(set(items)) < len(arrivals): continue
print items
Result:
('A1', 'A2', 'A3')
('A1', 'A2', 'A5')
('A1', 'A3', 'A5')
('A1', 'A4', 'A3')
('A1', 'A4', 'A5')
('A2', 'A3', 'A5')
('A2', 'A4', 'A3')
('A2', 'A4', 'A5')

The question is tagged with itertools but i suspect you did not look at itertools.combinations
arrivals = ['A1', 'A2', 'A3', 'A4']
[a for a in itertools.combinations(arrivals, 3)]
#[('A1', 'A2', 'A3'),
#('A1', 'A2', 'A4'),
# ('A1', 'A3', 'A4'),
#('A2', 'A3', 'A4')]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python find elements in array A but not in array B - python

I'm trying to find the difference between the 2 arrays arrayA = np.array(['A1', 'A2', 'A3']) arrayB = np.array(['A1', 'A2', 'A3', 'A4', 'A5', 'A6']) I'm trying to get difference = ['A4', 'A5', 'A6'] How can I do this, thank you

Use numpy's setdiff: np.setdiff1d(arrayA, arrayB) Also - is there any special reason for which this needs to be a numpy array? You could simply use sets and then the minus operator: set(arrayA) - set(arrayB)

[i for i in arrayB if i not in arrayA]

You can use sets: difference = list(set(arrayB) - set(arrayA)) Output: ['A4', 'A6', 'A5']

Related

Split data in list based on condition

Python List of Lists why do I get this Error: IndexError: list index out of range? [duplicate]

python fixed array of dynamic strings list

haw can I find the smallest list among some lists generated by my program?

Creating resources combination

Categories

Resources