Python: Compare Lists - python

I have two lists a and b:
a = ['146769015', '163081689', '172235774', ...]
b = [['StackOverflow (146769015)'], ['StackOverflow (146769015)'], ['StackOverflow (163081689)'], ...]
What I'm trying to do is to check if the elements of list a are in list b, and if they are, how many times they appear.
In this case the output should be:
'146769015':2
'163081689':1
I've already tried the set() function but that does not seem to work
print(set(a)&set(b))
And i get this
print(set(a)&set(b))
TypeError: unhashable type: 'list'
Is it possible to do what i want?
Thank you all.

When you perform set(a) & set(b), you're trying to see which elements both lists share. There are a couple errors in your logic.
First, your first list is comprised of strings. Your second list is comprised of lists.
Second, the elements of your second list are never the same than your first list, because the first has only numbers, and the second has numbers and letters.
Third, even if you only extract the numbers, the intersection of both sets will bring which numbers are on both sets, but not how many times.
A good approach might be to extract the numbers in your second list and then count occurrences if they are present in list a:
from collections import Counter
import re
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
numbs = [re.search('\d+', elem[0]).group(0) for elem in b]
cnt = Counter()
for n in numbs:
if n in a:
cnt[n]+= 1
Output:
Counter({'146769015': 2, '163081689': 1})
I'll leave as homework to you to research what are dictionaries and Counters.

It's tricky when you have a string as a subset of strings, otherwise I think you could use a Counter from collections and iterate that using a as a key.
Otherwise you can flatten the list and nested loop through it.
from collections import defaultdict
flat_list = [item for sublist in b for item in sublist]
c = defaultdict(lambda: 0)
for string in a:
for string2 in flat_list:
if string in string2:
c[string] += 1

You can use a dictionary:
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
c = {}
for s in a:
for d in b:
for i in d:
if s in i:
if s not in c:
c[s] = 1
else:
c[s] += 1
print(c)
Output:
{'146769015': 2, '163081689': 1}

Related

Split list into sub-lists based on integer in string

I have a list of strings as such:
['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg', ...]
In reality, there are more entries than 2 per number but this is the general format. I would like to split this list into list of lists as such:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg'], ...]
These sub-lists being based on the integer after the underscore. My current method to do so is to first sort the list based on the numbers as shown in the first list sample above and then iterate through each index and copy the values into new lists if it matches the value of the previous integer.
I am wondering if there is a simpler more pythonic way of performing this task.
Try:
import re
lst = ["text_1.jpg", "othertext_1.jpg", "text_2.jpg", "othertext_2.jpg"]
r = re.compile(r"_(\d+)\.jpg")
out = {}
for val in lst:
num = r.search(val).group(1)
out.setdefault(num, []).append(val)
print(list(out.values()))
Prints:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]
Similiar solution to #Andrej:
import itertools
import re
def find_number(s):
# it is said that python will compile regex automatically
# feel free to compile first
return re.search(r'_(\d+)\.jpg', s).group(1)
l = ['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg']
res = [list(v) for k, v in itertools.groupby(l, find_number)]
print(res)
#[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]

How to count number of times a value appears in nested dictionary in python?

I have a dictionary of dictionary for example:
d={'object1':{'time1':['value1','value2'],'time2':['value1','value4']},
'object2':{'time1':['value1','value6'],'time2':['value7','value8']}}
How can I iterate over the dictionary such that I can find value1 appears 3 times in total?
You can iterate over the values & count like this:
n = 0
for list_data in d.values():
if 'value1' in list_data:
n = n + 1
print(n)
Try with list.count(x):
d={'object1':{'time1':['value1','value1','value2'],'time2':['value1','value4']},'object2':{'time1':['value1','value6'],'time2':['value7','value8']}}
cnt =[item for l in [v2 for v1 in d.values() for v2 in v1.values()] for item in l].count('value1')
print(cnt) # print 4
You may use the combination of collections.Counter and itertools.chain to achieve this as:
>>> from itertools import chain
>>> from collections import Counter
>>> d={'time1':['value1','value2'],'time2':['value1','value4'],'time3':['value1','value5']}
>>> counter_dict = Counter(chain(*d.values()))
# ^ dict holding the count of each value
In order to fetch the count of 'value1' in your counter_dict, you need to just access the value of this key as:
>>> counter_dict['value1']
3
Well the tricky way is:
print(str(d).count('value1'))
but you can always just do a nested loop.
This may not be the most elegant solution but it works for your nested dictionary problem:
lst = d.values()
sub_val = [temp.values() for temp in lst]
d_val = [item for sublist in sub_val for item in sublist]
d_val = [item for sublist in d_val for item in sublist]
count = d_val.count('value1')
lst is a list of nested dictionaries. sub_val creates a nested list of values for each nested dictionary. This results in a list of double nested list hence d_val flattening appears twice. Finally, count returns number of occurrences of value1 in the flattened list d_val.

Matching elements between lists in Python - keeping location

I have two lists, both fairly long. List A contains a list of integers, some of which are repeated in list B. I can find which elements appear in both by using:
idx = set(list_A).intersection(list_B)
This returns a set of all the elements appearing in both list A and list B.
However, I would like to find a way to find the matches between the two lists and also retain information about the elements' positions in both lists. Such a function might look like:
def match_lists(list_A,list_B):
.
.
.
return match_A,match_B
where match_A would contain the positions of elements in list_A that had a match somewhere in list_B and vice-versa for match_B.
I can see how to construct such lists using a for-loop, however this feels like it would be prohibitively slow for long lists.
Regarding duplicates: list_B has no duplicates in it, if there is a duplicate in list_A then return all the matched positions as a list, so match_A would be a list of lists.
That should do the job :)
def match_list(list_A, list_B):
intersect = set(list_A).intersection(list_B)
interPosA = [[i for i, x in enumerate(list_A) if x == dup] for dup in intersect]
interPosB = [i for i, x in enumerate(list_B) if x in intersect]
return interPosA, interPosB
(Thanks to machine yearning for duplicate edit)
Use dicts or defaultdicts to store the unique values as keys that map to the indices they appear at, then combine the dicts:
from collections import defaultdict
def make_offset_dict(it):
ret = defaultdict(list) # Or set, the values are unique indices either way
for i, x in enumerate(it):
ret[x].append(i)
dictA = make_offset_dict(A)
dictB = make_offset_dict(B)
for k in dictA.viewkeys() & dictB.viewkeys(): # Plain .keys() on Py3
print(k, dictA[k], dictB[k])
This iterates A and B exactly once each so it works even if they're one-time use iterators, e.g. from a file-like object, and it works efficiently, storing no more data than needed and sticking to cheap hashing based operations instead of repeated iteration.
This isn't the solution to your specific problem, but it preserves all the information needed to solve your problem and then some (e.g. it's cheap to figure out where the matches are located for any given value in either A or B); you can trivially adapt it to your use case or more complicated ones.
How about this:
def match_lists(list_A, list_B):
idx = set(list_A).intersection(list_B)
A_indexes = []
for i, element in enumerate(list_A):
if element in idx:
A_indexes.append(i)
B_indexes = []
for i, element in enumerate(list_B):
if element in idx:
B_indexes.append(i)
return A_indexes, B_indexes
This only runs through each list once (requiring only one dict) and also works with duplicates in list_B
def match_lists(list_A,list_B):
da=dict((e,i) for i,e in enumerate(list_A))
for bi,e in enumerate(list_B):
try:
ai=da[e]
yield (e,ai,bi) # element e is in position ai in list_A and bi in list_B
except KeyError:
pass
Try this:
def match_lists(list_A, list_B):
match_A = {}
match_B = {}
for elem in list_A:
if elem in list_B:
match_A[elem] = list_A.index(elem)
match_B[elem] = list_B.index(elem)
return match_A, match_B

Python, Take dictionary, and produce list with (words>1, most common words, longest words)

So i made a function
def word_count(string):
my_string = string.lower().split()
my_dict = {}
for item in my_string:
if item in my_dict:
my_dict[item] += 1
else:
my_dict[item] = 1
print(my_dict)
so, what this does is that it takes a string, splits it, and produces a dictionary with the key being the word, and the value being how many times it appears.
Okay, so what im trying to do now, is to make a function that takes the output of that function, and produces a list in the following format-
((list of words longer than 1 letter),(list of most frequent words), (list of words with the longest length))
also, for example lets say two words have appeared 3 times, and both words are 6 letters long, it should include both words in both the (most frequent) and (longest length) lists.
So, this has been my attempt thus far at tackling this problem
def analyze(x):
longer_than_one= []
most_frequent= []
longest= []
for key in x.item:
if len(key) >1:
key.append(longer_than_one)
print(longer_than_one)
so what i was trying to do here, is make a series of for and if loops, that append to the lists depending on whether or not the items meet the criteria, however i have run into the following problems:-
1- how do i iterate over a dictionary without getting an error?
2- I cant figure out a way to count the most frequent words (i was thinking to append the keys with the highest values)
3- I cant figure out a way to only append the words that are the longest in the dictionary (i was thinking of using len(key) but it said error)
If it's any help, im working in Anaconda's Spyder using Python 3.5.1 ,any tips would be appreciated!
You really are trying to re-invent the wheel.
Imagine you have list_of_words which is, well, a list of strings.
To get the most frequent word, use Counter:
from collections import Counter
my_counter = Counter(list_of_words)
To sort the list by the length:
sorted_by_length = sorted(list_of_words, key=len)
To get the list of words longer than one letter you can simply use your sorted list, or create a new list with only these:
longer_than_one_letter = [word for word in list_of_words if len(word) > 1]
To get your output on your required format, simply use all of the above.
Most of your problems are solved or get easier when you use a Counter.
Writing word_count with a Counter:
>>> from collections import Counter
>>> def word_count(string):
... return Counter(string.split())
Demo:
>>> c = word_count('aa aa aa xxx xxx xxx b b ccccccc')
>>> c
Counter({'aa': 3, 'xxx': 3, 'b': 2, 'ccccccc': 1})
>>> c['aa']
3
The most_common method of a Counter helps with getting the most frequent words:
>>> c.most_common()
[('aa', 3), ('xxx', 3), ('b', 2), ('ccccccc', 1)]
>>> c.most_common(1)
[('aa', 3)]
>>> max_count = c.most_common(1)[0][1]
>>> [word for word, count in c.items() if count == max_count]
['aa', 'xxx']
You can get the words themselves with c.keys()
>>> c.keys()
['aa', 'xxx', 'b', 'ccccccc']
and a list of words with the longest length this way:
>>> max_len = len(max(c, key=len))
>>> [word for word in c if len(word) == max_len]
['ccccccc']
1)
To iterate over dictionary you can either use:
for key in my_dict:
or if you want to get key and value at the same time use:
for key, value in my_dict.iteritems():
2)
To find most frequent words you have to assume that first word is most frequent, then you look at next word used count and if it's the same you append it to your list, if it's less just skip it, if it's more - clear you list and assume that this one is most frequent
3) Pretty much the same as 2. Assume that your first is longest the compare if next one, if it's lenght equals to your current max just append to a list, if it's less skip it, if it's more clear your list and assume that this is your max.
I didn't add any code since it's better if you write it your own in order to learn something
There are other nice answers for your question, But I would like to help you in your attempt, I have done few modification in your code to make it working-
def analyze(x):
longer_than_one= []
most_frequent= []
longest= []
for key in x:
if len(key) >1:
longer_than_one.append(key)
print(longer_than_one)
It seems you haven't attempted for 2nd and 3rd use case.
At first, check collections.Counter:
import collections
word_counts = collections.Counter(your_text.split())
Given that, you can use its .most_common method for the most common words. It produces a list of (word, its_count) tuples.
To discover the longest words in the dictionary, you can do:
import heapq
largest_words= heapq.nlargest(N, word_counts, key=len)
N being the count of largest words you want. This works because by default the iteration over a dict produces only the keys, so it sorts them according to the word length (key=len) and returns only the N largest ones.
But you seem to have fallen deep into Python without going over the tutorial. Is it homework?

Trying to add to dictionary values by counting occurrences in a list of lists (Python)

I'm trying to get a count of items in a list of lists and add those counts to a dictionary in Python. I have successfully made the list (it's a list of all possible combos of occurrences for individual ad viewing records) and a dictionary with keys equal to all the values that could possibly appear, and now I need to count how many times each occur and change the values in the dictionary to the count of their corresponding keys in the list of lists. Here's what I have:
import itertools
stuff=(1,2,3,4)
n=1
combs=list()
while n<=len(stuff):
combs.append(list(itertools.combinations(stuff,n)))
n = n+1
viewers=((1,3,4),(1,2,4),(1,4),(1,2),(1,4))
recs=list()
h=1
while h<=len(viewers):
j=1
while j<=len(viewers[h-1]):
recs.append(list(itertools.combinations(viewers[h-1],j)))
j=j+1
h=h+1
showcount={}
for list in combs:
for item in list:
showcount[item]=0
for k, v in showcount:
for item in recs:
for item in item:
if item == k:
v = v+1
I've tried a bunch of different ways to do this, and I usually either get 'too many values to unpack' errors or it simply doesn't populate. There are several similar questions posted but I'm pretty new to Python and none of them really addressed what I needed close enough for me to figure it out. Many thanks.
Use a Counter instead of an ordinary dict to count things:
from collections import Counter
showcount = Counter()
for item in recs:
showcount.update(item)
or even:
from collections import Counter
from itertools import chain
showcount = Counter(chain.from_iterable(recs))
As you can see that makes your code vastly simpler.
If all you want to do is flatten your list of lists you can use itertools.chain()
>>> import itertools
>>> listOfLists = ((1,3,4),(1,2,4),(1,4),(1,2),(1,4))
>>> flatList = itertools.chain.from_iterable(listOfLists)
The Counter object from the collections module will probably do the rest of what you want.
>>> from collections import Counter
>>> Counter(flatList)
Counter({1: 5, 4: 4, 2: 2, 3: 1})
I have some old code that resembles the issue, it might prove useful to people facing a similar problem.
import sys
file = open(sys.argv[-1], "r").read()
wordictionary={}
for word in file.split():
if word not in wordictionary:
wordictionary[word] = 1
else:
wordictionary[word] += 1
sortable = [(wordictionary[key], key) for key in wordictionary]
sortable.sort()
sortable.reverse()
for member in sortable: print (member)
First, 'flatten' the list using a generator expression: (item for sublist in combs for item in sublist).
Then, iterate over the flattened list. For each item, you either add an entry to the dict (if it doesn't already exist), or add one to the value.
d = {}
for key in (item for sublist in combs for item in sublist):
try:
d[key] += 1
except KeyError: # I'm not certain that KeyError is the right one, you might get TypeError. You should check this
d[key] = 1
This technique assumes all the elements of the sublists are hashable and can be used as keys.

Categories

Resources