Finding intersection of two lists of strings in python - python

I have gone through Find intersection of two lists?, Intersection of Two Lists Of Strings, Getting intersection of two lists in python. However, I could not solve this problem of finding intersection between two string lists using Python.
I have two variables.
A = [['11#N3'], ['23#N0'], ['62#N0'], ['99#N0'], ['47#N7']]
B = [['23#N0'], ['12#N1']]
How to find that '23#N0' is a part of both A and B?
I tried using intersect(a,b) as mentioned in http://www.saltycrane.com/blog/2008/01/how-to-find-intersection-and-union-of/
But, when I try to convert A into set, it throws an error:
File "<stdin>", line 1, in <module> TypeError: unhashable type: 'list'
To convert this into a set, I used the method in TypeError: unhashable type: 'list' when using built-in set function where the list can be converted using
result = sorted(set(map(tuple, A)), reverse=True)
into a tuple and then the tuple can be converted into a set. However, this returns a null set as the intersection.
Can you help me find the intersection?

You can use flatten function of compiler.ast module to flatten your sub-list and then apply set intersection like this
from compiler.ast import flatten
A=[['11#N3'], ['23#N0'], ['62#N0'], ['99#N0'], ['47#N7']]
B=[['23#N0'], ['12#N1']]
a = flatten(A)
b = flatten(B)
common_elements = list(set(a).intersection(set(b)))
common_elements
['23#N0']

The problem is that your lists contain sublists so they cannot be converted to sets. Try this:
A=[['11#N3'], ['23#N0'], ['62#N0'], ['99#N0'], ['47#N7']]
B=[['23#N0'], ['12#N1']]
C = [item for sublist in A for item in sublist]
D = [item for sublist in B for item in sublist]
print set(C).intersection(set(D))

Your datastructure is a bit strange, as it is a list of one-element lists of strings; you'd want to reduce it to a list of strings, then you can apply the previous solutions:
Thus a list like:
B = [['23#N0'], ['12#N1']]
can be converted to iterator that iterates over '23#N0', '12#N1'
with itertools.chain(*), thus we have simple oneliner:
>>> set(chain(*A)).intersection(chain(*B))
{'23#N0'}

In case you have to fit it on a fortune cookie:
set(i[0] for i in A).intersection(set(i[0] for i in B))

You have two lists of lists with one item each. In order to convert that to a set you have to make it a list of strings:
set_a = set([i[0] for i in A])
set_b = set([i[0] for i in B])
Now you can get the intersection:
set_a.intersection(set_b)

A=[['11#N3'], ['23#N0'], ['62#N0'], ['99#N0'], ['47#N7']]
A=[a[0] for a in A]
B=[['23#N0'], ['12#N1']]
B=[b[0] for b in B]
print set.intersection(set(A),set(B))
Output:set(['23#N0'])
If each of your list has sublists of only 1 element you can try this.

My preference is to use itertools.chain from the standard library:
from itertools import chain
A = [['11#N3'], ['23#N0'], ['62#N0'], ['99#N0'], ['47#N7']]
B = [['23#N0'], ['12#N1']]
set(chain(*A)) & set(chain(*B))
# {'23#N0'}

Related

Python - Check if any item exists in all list of lists

I have a list of lists in Python (python3). Example:
list_of_lists = [[vendor1, vendor2],
[vendor2, vendor5, vendor10],
[vendor1, vendor2, vendor7]]
What I'm trying to do is find out if there is an item that is in ALL lists in the list of lists. Most of the examples I've come across the user has known what value to search for in their list of lists, hence me asking a separate question on here, as I don't have a starting value to search for. The result from the above list would return vendor2 since it shows up in all lists.
Any help/general "look in this direction" advice is appreciated. Thank you
Assuming all elements of the list_of_lists are strings. Using set and intersection concepts. Create a set for each sublist and do intersection on all of them
In [3]: list_of_lists = [["vendor1", "vendor2"],
...: ["vendor2", "vendor5", "vendor10"],
...: ["vendor1", "vendor2", "vendor7"]]
In [4]: set.intersection(*[set(x) for x in list_of_lists])
Out[4]: {'vendor2'}
Also could try reduce:
from functools import reduce
list_of_lists = [["vendor1", "vendor2"],
["vendor2", "vendor5", "vendor10"],
["vendor1", "vendor2", "vendor7"]]
result = list(reduce(lambda a, b: set(a) & set(b), list_of_lists))
# ['vendor2']

TypeError while flattening list of lists of string type

I have a dataset which goes as follows:
ds = [['North','Raw','Tree'],['Saw','Raw','Apple'],['Saw','Apple'],['Gum','Saw'],......]
The dataset has 211945 values. I have imported this dataset in Jupyter and tried running the below code for making this list of lists as a single list.
Code:
list_1 = []
for sublist in ds:
for val in sublist:
list_1.append(val)
The error I got was:
1 list_1 = []
2 for sublist in ds:
3 for val in sublist: <---
4 list_1.append(val)
TypeError: 'float' object is not iterable
The expected output is:
list_1 = ['North','Raw','Tree','Saw','Raw','Apple','Saw','Apple','Gum','Saw',......]
You can use itertools.chain https://docs.python.org/3/library/itertools.html. This is perferct for your case, I think.
import itertools
ds = [['N','R','T'],['S','R','A'],['S','A']]
print(list(itertools.chain(*ds)))
Using itertools.chain() is very common idiom for this, but if you prefer list comprehensions, it can be done with the one liner
flattened = [j for i in lsts for j in i]
which is maybe easier to read.
new_list =[]
for x in ds:
try:
new_list.extend(x)
except:
new_list.extend([x])
I feel your lists might have some elements that aren't list. For example 'ds' might be like either [['a','b'],['c','d'],'e',['f','g']....] i.e a mixture of lists & single_elements. The above code should work as this will consider each singular element as a list in the 'except' case as & when an error occurs in 'try' case

How to compare two lists and return the number of times they match at each index in python?

I have two lists containing 1's and 0's, e.g.
list1 = [1,1,0,1,0,1]
list2 = [0,1,0,1,1,0]
I want to find the number of times they match at each index. So in this case the output would be 3 because they have the same value at indices 1,2 and 3 only.
Currently I'm doing this:
matches_list = []
for i in list1:
index = list[1].index(i)
if list1[index] == list2[index]:
mathes_list.append(i)
else:
pass
return len(matches_list)
However this is very slow and I want to do this many times over to compare a large number of these lists
I was hoping someone could advise me on a quicker way to do this. Is there a way to use the set() function, or something similar, for example to compare two lists but maintain the order of each one?
zip the lists, compare the elements, compute the sum.
>>> list1 = [1,1,0,1,0,1]
>>> list2 = [0,1,0,1,1,0]
>>> sum(a == b for a,b in zip(list1, list2))
3
(Consider using itertools.izip in Python 2 for memory efficiency.)
Here's a lightning fast numpy answer:
import numpy as np
list1 = np.array([1,1,0,1,0,1])
list2 = np.array([0,1,0,1,1,0])
len(np.where(list1==list2)[0])
The numpy np.where function will return the indexes of all the points in the pair of lists that conform to a function (in this case list1==list2 at indices [1,2,3]) along with a datatype description. In the above case, I strip out the array of indices and count how many there are with len().
You can use map with operator.eq and sum:
>>> import operator
>>> sum(map(operator.eq, list1, list2))
This works because True is interpreted as 1 when summed and False like 0.
You could also use numpy for this:
>>> import numpy as np
>>> np.count_nonzero(np.asarray(list1) == np.asarray(list2))

Python: How to convert a list to a list of tuples?

for example, i have a list below,
['Visa', 'Rogers', 'Visa']
if i want to convert it to a list of tuples, like
[('Visa',), ('Rogers',), ('Visa',)]
How can I convert it?
>>> [(x,) for x in ['Visa', 'Rogers', 'Visa']]
[('Visa',), ('Rogers',), ('Visa',)]
simple list comprehension will do the trick. make sure to have the , to specify single item tuples (you will just have the original strings instead)
Doing some kind of operation for each element can be done with map() or list comprehensions:
a = ['Visa', 'Rogers', 'Visa']
b = [(v,) for v in a]
c = map(lambda v: (v,), a)
print(b) # [('Visa',), ('Rogers',), ('Visa',)]
print(c) # [('Visa',), ('Rogers',), ('Visa',)]
Please keep in mind that 1-element-tuples are represented as (value,) to distinguish them from just a grouping/regular parantheses

How do I index a dictionary of multiple lists using a list of indexes?

I'm on Python 2.7.3.
If I have a dictionary of lists, like this:
>>> x1 = [1,2,3,4,5,6,7,8,5]
>>> x2 = range(11,20)
>>> mydict = {'first':x1,'second':x2}
... and the lists are equal size...
>>> len(mydict['second']) == len(mydict['first'])
True
How do I use a list of indexes like this:
>>> ind = [0,1,2,3,4,5,6,7]
To get the values from both lists in my dictionary? I have tried to use the "ind" list to index, but continuously get an error whether ind is a list or tuple like this:
>>> mydict['second'][ind]
TypeError: list indices must be integers, not set
I realize that the list isn't an integer, but each value in the set is an integer. Is there any way to get to the x1[ind] and x2[ind ] without iterating a counter" in a loop?
Don't know if it matters, but I have the index list already that I got from finding the unique values like this:
>>> import numpy as np
>>> ux1 = np.unique(x1, return_index = True)
You can use operator.itemgetter:
from operator import itemgetter
indexgetter = itemgetter(*ind)
indexed1 = indexgetter(mydict['first'])
indexed2 = indexgetter(mydict['second'])
note that in my example, indexed1 and indexed2 will be tuple instances, not list
instances. The alternative is to use a list comprehension:
second = mydict['second']
indexed2 = [second[i] for i in ind]
You want to use operator.itemgetter:
getter = itemgetter(*ind)
getter(mydict['second']) # returns a tuple of the elements you're searching for.

Categories

Resources