I try to find common list of values for three different lists:
a = [1,2,3,4]
b = [2,3,4,5]
c = [3,4,5,6]
of course naturally I try to use the and operator however that way I just get the value of last list in expression:
>> a and b and c
out: [3,4,5,6]
Is any short way to find the common values list:
[3,4]
Br
Use sets:
>>> a = [1, 2, 3, 4]
>>> b = [2, 3, 4, 5]
>>> c = [3, 4, 5, 6]
>>> set(a) & set(b) & set(c)
{3, 4}
Or as Jon suggested:
>>> set(a).intersection(b, c)
{3, 4}
Using sets has the benefit that you don’t need to repeatedly iterate the original lists. Each list is iterated once to create the sets, and then the sets are intersected.
The naive way to solve this using a filtered list comprehension as Geotob did will iterate lists b and c for each element of a, so for longer list, this will be a lot less efficient.
out = [x for x in a if x in b and x in c]
is a quick and simple solution. This constructs a list out with entries from a, if those entries are in b and c.
For larger lists, you want to look at the answer provided by #poke
For those still stumbling uppon this question, with numpy one can use:
np.intersect1d(array1, array2)
This works with lists as well as numpy arrays.
It could be extended to more arrays with the help of functools.reduce, or it can simply be repeated for several arrays.
from functools import reduce
reduce(np.intersect1d, (array1, array2, array3))
or
new_array = np.intersect1d(array1, array2)
np.intersect1d(new_array, array3)
Related
Looking for a pythonic way to sum values from multiple lists:
I have got the following list of lists:
a = [0,5,2]
b = [2,1,1]
c = [1,1,1]
d = [5,3,4]
my_list = [a,b,c,d]
I am looking for the output:
[8,10,8]
I`ve used:
print ([sum(x) for x in zip(*my_list )])
but zip only works when I have 2 elements in my_list.
Any idea?
zip works for an arbitrary number of iterables:
>>> list(map(sum, zip(*my_list)))
[8, 10, 8]
which is, of course, roughly equivalent to your comprehension which also works:
>>> [sum(x) for x in zip(*my_list)]
[8, 10, 8]
Numpy has a nice way of doing this, it is also able to handle very large arrays. First we create the my_list as a numpy array as such:
import numpy as np
a = [0,5,2]
b = [2,1,1]
c = [1,1,1]
d = [5,3,4]
my_list = np.array([a,b,c,d])
To get the sum over the columns, you can do the following
np.sum(my_list, axis=0)
Alternatively, the sum over the rows can be retrieved by
np.sum(my_list, axis=1)
I'd make it a numpy array and then sum along axis 0:
my_list = numpy.array([a,b,c,d])
my_list.sum(axis=0)
Output:
[ 8 10 8]
I know this is similar to Efficient way to compare elements in 2 lists, but I have an extension on the question basically.
Say I have two lists:
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
I want to find out the indices of a where the element is equal to each element of b.
For instance, I would want the sample output for b[1] to tell me that a = b[1] at [0,3].
A data frame output would be useful as well, something like:
b index_a
0 4
1 0
1 3
2 1
2 6
3 5
4 3
What I used before was:
b = pd.DataFrame(b)
a = pd.DataFrame(a)
pd.merge(b.reset_index(),a.reset_index(),
left_on=b.columns.tolist(),
right_on = a.columns.tolist(),
suffixes = ('_b','_a'))['index_b','index_a']]
However, I am unsure if this is necessary since these are for lists. ( I used this method previously when I was working with dataframes ).
I am doing this operation thousands of times with much larger lists so I am wondering if there is a more efficient method.
In addition, b is just list(range(X)) where in this case X = 5
If anyone has some input I'd greatly appreciate it!
Thanks
A very simple and efficient solution is to build a mapping from the values in the range 0..N-1 to indices of a. The mapping can be a simple list, so you end up with:
indices = [[] for _ in b]
for i, x in enumerate(a):
indices[x].append(i)
Example run:
>>> a = [1,2,4,1,0,3,2]
>>> b = [0,1,2,3,4]
>>> indices = [[] for _ in b]
>>> for i,x in enumerate(a):
... indices[x].append(i)
...
>>> indices[1]
[0, 3]
Note that b[i] == i so keeping the b list is pretty useless.
import collections
dd=collections.defaultdict(list)
for i,x in enumerate(a):
dd[x].append(i)
>>> sorted(dd.items())
[(0, [4]), (1, [0, 3]), (2, [1, 6]), (3, [5]), (4, [2])]
If b is sorted consecutive integers as you shown here, then bucket sort is most effective.
Otherwise, you may construct a hash table, with value b as the key, and construction a list of a's as values.
I'm not sure if this is efficient enough for your needs, but this would work:
from collections import defaultdict
indexes = defaultdict(set)
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
for i, x in enumerate(a):
indexes[x].add(i)
for x in b:
print b, indexes.get(x)
I'm newbie on Python. I have this list:
a = [[0,1,2,3],[4,5,6,7,8,9], ...]
b = [[0,6,9],[1,5], ...]
a & b can have more components, depends on data. I want to know is there any intersection on these lists? If there's any intersection, I wanna have a result like this:
c = [[6,9], ...]
The set type, built into Python, supports intersection natively. However, note that set can only hold one of each element (like a mathematical set). If you want to hold more than one of each element, try collections.Counter.
You can make sets using {} notation (like dictionaries, but without values):
>>> a = {1, 2, 3, 4, 5}
>>> b = {2, 4, 6, 8, 10}
and you can intersect them using the & operator:
>>> print a & b
set([2, 4])
Given that intersection is an operation between two sets, and you have given two lists of lists, it's very unclear what you're looking for. Do you want the intersection of a[1] and b[0]? Do you want the intersection of every possible combination?
I'm guessing you want the intersection of every combination of two sets between your two lists, which would be:
from itertools import product
[set(x).intersection(set(y)) for x, y in product(a, b)]
First of all, in your example code this is not a tuple, it's a list (the original question asked about lists, but references tuples in the example code).
To get an intersection of two tuples or lists, use a code like this:
set((1,2,3,4,5)).intersection(set((1,2,3,7,8)))
In one line:
common_set = set([e for r in a for e in r])&set([e for r in b for e in r])
Or easier:
common_set = set(sum(a,[])) & set(sum(b,[]))
Common will be a set. You can easily convert set to the list is you need it:
common_list = list(common_set)
Another way to do it... assuming you want the intersection of the flatten list.
>>> from itertools import chain
>>> a = [[0,1,2,3],[4,5,6,7,8,9]]
>>> b = [[0,6,9],[1,5]]
>>> list(set(chain(*a)).intersection(set(chain(*b))))
[0, 9, 5, 6, 1]
I have 2 arrays, for the sake of simplicity let's say the original one is a random set of numbers:
import numpy as np
a=np.random.rand(N)
Then I sample and shuffle a subset from this array:
b=np.array() <------size<N
The shuffling I do do not store the index values, so b is an unordered subset of a
Is there an easy way to get the original indexes of b, so they are in the same order as a, say, if element 2 of b has the index 4 in a, create an array of its assignation.
I could use a for cycle checking element by element, but perhaps there is a more pythonic way
Thanks
I think the most computationally efficient thing to do is to keep track of the indices that associate b with a as b is created.
For example, instead of sampling a, sample the indices of a:
indices = random.sample(range(len(a)), k) # k < N
b = a[indices]
On the off chance a happens to be sorted you could do:
>>> from numpy import array
>>> a = array([1, 3, 4, 10, 11])
>>> b = array([11, 1, 4])
>>> a.searchsorted(b)
array([4, 0, 2])
If a is not sorted you're probably best off going with something like #unutbu's answer.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python, compute list difference
I have two lists
For example:
A = [1,3,5,7]
B = [1,2,3,4,5,6,7,8]
Now, A is always a subset of B
I want to generate a third list C:
which has elements which are present in B but absent in A
like
C = [2,4..]
Thanks
List comprehensions are one way to do this:
[x for x in B if x not in A]
If you use Python, I recommend gaining familiarity with list comprehensions. They're a very powerful tool.
(Several people have suggested using set. While this is a very good idea if you only care about whether or not an element is in the set, note that it will not preserve the order of the elements; a list comprehension will.)
>>> set(B) - set(A)
set([8, 2, 4, 6])
or
>>> sorted(set(B) - set(A))
[2, 4, 6, 8]
An easy way to do this is
C = [x for x in B if x not in A]
This will become slow for big lists, so it would be better to use a set for A:
A = set(A)
C = [x for x in B if x not in A]
If you have multiple operations like this, using sets all the time might be the best option. If A and B are sets, you can simply do
C = B - A
C = sorted(list(set(B) - set(A)))
That should do it.