Count duplicated value in Python considering sequence [duplicate] - python

This question already has an answer here:
What's the most Pythonic way to identify consecutive duplicates in a list?
(1 answer)
Closed 10 months ago.
I have string value as:
s = 'asdabbdasfababbabb'
I've split the str by using below code, than get result as below :
n = 3
split_strings = [s[index : index + n] for index in range(0, len(s), n)]
['asd', 'abb', 'das', 'fab', 'abb', 'abb']
What I need to achieve:
I want to count duplicated value consiering the sequence such as :
({'asd': 1, 'abb': 1, 'das': 1, 'fab': 1, 'abb' : 2})
However, if I use Counter() it counts the duplicated value but, does not seems to consider the sequence of list:
Counter({'asd': 1, 'abb': 3, 'das': 1, 'fab': 1})
How can I achieve what I need?

You cannot store duplicate keys in a dict. If you are willing to have a list of tuples, you can use itertools.groupby:
from itertools import groupby
lst = ['asd', 'abb', 'das', 'fab', 'abb', 'abb']
counts = [(k, len([*g])) for k, g in groupby(lst)]
print(counts) # [('asd', 1), ('abb', 1), ('das', 1), ('fab', 1), ('abb', 2)]

The itertools.groupby function is a favorite, but perhaps future readers might appreciate an algorithm for actually finding these groupings:
def groups(*items):
i = 0
groups = []
while i < len(items):
item = items[i]
j = i + 1
count = 1
while j < len(items):
if items[j] == item:
count += 1
j += 1
else:
break
i = j
groups.append((item, count))
return groups

Related

Convert a nested list from [[...],[...]] to [(...),(...)] [duplicate]

This question already has answers here:
How to sum up a list of tuples having the same first element?
(5 answers)
How to perform a groupby operation in Python on a list of tuples where we need to sum the second element? [duplicate]
(1 answer)
Closed 10 months ago.
Convert a nested list from [[...],[...]] to [(...),(...)]. I wish to format my list below :
x=[['dog', 2], ['bird', 1],['dog',1]]
to
x=[('dog', 3), ('bird', 1)]
Here is my code for reference.
#Convert last element of nested list to int
newlist = [[int(element) if element.isdigit() else element for element in sub for sub in x]
#add the 2 columns that match
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
x = [*map(list,grouped.items())]
Could this be due to my use of a dict()
I have been successful with adding the second indices given that the first ones match, however the result is being formatted as such
x=[['dog', 3], ['bird', 1]]
however, I would like it as so any advice on how to get this ideal output?
x=[('dog', 3), ('bird', 1)]
I guess you are looking for collections.Counter:
from collections import Counter
x=[['dog', 2], ['bird', 1],['dog',1]]
c = Counter()
for k, v in x:
c[k] += v
print(c)
# as pointed out by wim in the comments, use the below
# to get a list of tuples:
print([*c.items()])
Here is one way to do so:
x = [['dog', 2], ['bird', 1], ['dog', 1]]
data = {k: 0 for k, _ in x}
for key, num in x:
data[key] += num
print(list(data.items())) # [('dog', 3), ('bird', 1)]
You can also use setdefault():
data = {}
for key, num in x:
data.setdefault(key, 0)
data[key] += num
print(list(data.items()))
looks like this works
newlist = [int(element) if element[0].isdigit() else element for element in [sub for sub in x]]
# add the 2 columns that match
grouped = dict()
grouped.update((name, grouped.get(name, 0) + value) for name, value in newlist)
x = [*map(tuple, grouped.items())]
Don't make it a list in the first place. The only real thing to note here is replacing list with tuple however, I also removed the unpacking ([*...]) and went directly to casting the parent as a list.
change:
x = [*map(list,grouped.items())]
to:
x = list(map(tuple, grouped.items()))
x=[['dog', 3], ['bird', 1]]
# You want it to be...
x=[('dog', 3), ('bird', 1)]
So you should first know how to convert ['dog', 3] to ('dog', 3):
>>> x = ['dog', 3]
>>> tuple(x)
('dog', 3)
To make it a tuple you just have to use the tuple's class constructor.
Then you have to apply this to the whole x list:
x = [tuple(i) for i in x]

get occurrence of all substring of matching characters from string

e.g. find substring containing 'a', 'b', 'c' in a string 'abca', answer should be 'abc', 'abca', 'bca'
Below code is what I did, but is there better, pythonic way than doing 2 for loops?
Another e.g. for 'abcabc' count should be 10
def test(x):
counter = 0
for i in range(0, len(x)):
for j in range(i, len(x)+1):
if len((x[i:j]))>2:
print(x[i:j])
counter +=1
print(counter)
test('abca')
You can condense it down with list comprehension:
s = 'abcabc'
substrings = [s[b:e] for b in range(len(s)-2) for e in range(b+3, len(s)+1)]
substrings, len(substrings)
# (['abc', 'abca', 'abcab', 'abcabc', 'bca', 'bcab', 'bcabc', 'cab', 'cabc', 'abc'], 10)
You can use combinations from itertools:
from itertools import combinations
string = "abca"
result = [string[x:y] for x, y in combinations(range(len(string) + 1), r = 2)]
result = [item for item in result if 'a' in item and 'b' in item and 'c' in item]

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

count the frequency of elements in list of lists in Python

Say, I have the following sorted list of lists :
List 1 : (12,24,36)
List 2 : (3,5,12,24)
List 3 : (36,41,69)
I want to find out the frequency of each element in the entire list of lists. I came up with an ugly module for the same in python but I was wondering if there is some library function..
Edit : Please find the code below
def find_frequency(transactions,list):
freq = 0
for items_transaction in transactions:
flag = 0
for candidate in list:
if candidate not in items_transaction:
flag = 1
break
if flag == 0:
freq += 1
return freq
Counter does what I believe you are looking for:
>>> from itertools import chain
>>> from collections import Counter
>>> list1, list2, list3 = [12,24,36], [3,5,12,24], [36,41,69]
>>> Counter(chain(list1, list2, list3))
Counter({3: 1, 5: 1, 12: 2, 24: 2, 36: 2, 41: 1, 69: 1})
For those who searched by the title referring to list of lists, it seems like a simple solution is:
from itertools import chain
from collections import Counter
list_of_lists = [[12,24,36], [3,5,12,24], [36,41,69]]
Counter(chain.from_iterable(list_of_lists))
You need to flatten a list - that is, transform it to a single long sequence of all values. It can be done using itertools.chain
import collections, itertools
l = [[12,24,36], [3,5,12,24], [36,41,69]]
freq = collections.defaultdict(int) # 0 by default
for x in itertools.chain.from_iterable(l):
freq[x] += 1
print(freq)
You could flatten the list using what some other posters have said, and then do something like this.
def numList(list):
dic = {}
for num in list:
if num in dic.keys():
dic[num] += 1
else:
key = num
value = 1
dic[key] = value
return dic
This creates a dictionary of all the numbers in a list and their frequencies. The dictionary key is the number, and its value is the frequency.

Python - counting duplicate strings

I'm trying to write a function that will count the number of word duplicates in a string and then return that word if the number of duplicates exceeds a certain number (n). Here's what I have so far:
from collections import defaultdict
def repeat_word_count(text, n):
words = text.split()
tally = defaultdict(int)
answer = []
for i in words:
if i in tally:
tally[i] += 1
else:
tally[i] = 1
I don't know where to go from here when it comes to comparing the dictionary values to n.
How it should work:
repeat_word_count("one one was a racehorse two two was one too", 3) should return ['one']
Try
for i in words:
tally[i] = tally.get(i, 0) + 1
instead of
for i in words:
if i in tally:
tally[words] += 1 #you are using words the list as key, you should use i the item
else:
tally[words] = 1
If you simply want to count the words, use collections.Counter would fine.
>>> import collections
>>> a = collections.Counter("one one was a racehorse two two was one too".split())
>>> a
Counter({'one': 3, 'two': 2, 'was': 2, 'a': 1, 'racehorse': 1, 'too': 1})
>>> a['one']
3
Here is a way to do it:
from collections import defaultdict
tally = defaultdict(int)
text = "one two two three three three"
for i in text.split():
tally[i] += 1
print tally # defaultdict(<type 'int'>, {'three': 3, 'two': 2, 'one': 1})
Putting this in a function:
def repeat_word_count(text, n):
output = []
tally = defaultdict(int)
for i in text.split():
tally[i] += 1
for k in tally:
if tally[k] > n:
output.append(k)
return output
text = "one two two three three three four four four four"
repeat_word_count(text, 2)
Out[141]: ['four', 'three']
If what you want is a dictionary counting the words in a string, you can try this:
string = 'hello world hello again now hi there hi world'.split()
d = {}
for word in string:
d[word] = d.get(word, 0) +1
print d
Output:
{'again': 1, 'there': 1, 'hi': 2, 'world': 2, 'now': 1, 'hello': 2}
why don't you use Counter class for that case:
from collections import Counter
cnt = Counter(text.split())
Where elements are stored as dictionary keys and their counts are stored as dictionary values. Then it's easy to keep the words that exceeds your n number with iterkeys() in a for loop like
list=[]
for k in cnt.iterkeys():
if cnt[k]>n:
list.append(k)
In list you'll got your list of words.
**Edited: sorry, thats if you need many words, BrianO have the right one for your case.
As luoluo says, use collections.Counter.
To get the item(s) with the highest tally, use the Counter.most_common method with argument 1, which returns a list of pairs (word, tally) whose 2nd coordinates are all the same max tally. If the "sentence" is nonempty then that list is too. So, the following function returns some word that occurs at least n times if there is one, and returns None otherwise:
from collections import Counter
def repeat_word_count(text, n):
if not text: return None # guard against '' and None!
counter = Counter(text.split())
max_pair = counter.most_common(1)[0]
return max_pair[0] if max_pair[1] > n else None

Categories

Resources