Find count of identical adjacent characters in a string

Find count of identical adjacent characters in a string - python

I have a string: 'AAAAATTT'
I want to write a program that would count each time 2 values are identical.
So in 'AAAAATTT' it would give a count of:
AA: 4
TT: 2

You can use collections.defaultdict for this. This is an O(n) complexity solution which loops through adjacent letters and builds a dictionary based on a condition.
Your output will be a dictionary with keys as repeated letters and values as counts.
The use of itertools.islice is to avoid building a new list for the second argument of zip.
from collections import defaultdict
from itertools import islice
x = 'AAAAATTT'
d = defaultdict(int)
for i, j in zip(x, islice(x, 1, None)):
if i == j:
d[i+j] += 1
Result:
print(d)
defaultdict(<class 'int'>, {'AA': 4, 'TT': 2}

You could use a Counter:
from collections import Counter
s = 'AAAAATTT'
print([(k*2, v - 1) for k, v in Counter(list(s)).items() if v > 1])
#output: [('AA', 4), ('TT', 2)]

You may use collections.Counter with dictionary comprehension and zip as:
>>> from collections import Counter
>>> s = 'AAAAATTT'
>>> {k: v for k, v in Counter(zip(s, s[1:])).items() if k[0]==k[1]}
{('A', 'A'): 4, ('T', 'T'): 2}
Here's another alternative to achieve this using itertools.groupby, but this one is not as clean as the above solution (also will be slow in terms of performance).
>>> from itertools import groupby
>>> {x[0]:len(x) for i,j in groupby(zip(s, s[1:]), lambda y: y[0]==y[1]) for x in (tuple(j),) if i}
{('A', 'A'): 4, ('T', 'T'): 2}

One way may be as following using Counter:
from collections import Counter
string = 'AAAAATTT'
result = dict(Counter(s1+s2 for s1, s2 in zip(string, string[1:]) if s1==s2))
print(result)
Result:
{'AA': 4, 'TT': 2}

You can try it with just range method without importing anything :
data='AAAAATTT'
count_dict={}
for i in range(0,len(data),1):
data_x=data[i:i+2]
if len(data_x)>1:
if data_x[0] == data_x[1]:
if data_x not in count_dict:
count_dict[data_x] = 1
else:
count_dict[data_x] += 1
print(count_dict)
output:
{'TT': 2, 'AA': 4}

Related

Given a list of [string, number] tuples, create a dictionary where keys are the first characters of strings and the values are sums of the numbers

I have a list of tuples, where first object is a string and second one is a number. I need to create a dictionary with using first letter of the string as a key and number (or I need to add some numbers if keys will be the same) as a value.
for example:
input
lst = [('Alex', 5), ('Addy', 7), ('Abdul', 2), ('Bob', 6), ('Carl', 8), ('Cal', 4)]
output
dct = {'A': 14, 'B': 6, 'C': 12}

The most simple, straightforward and naive way is:
dct = {}
for k, v lst:
if k in v:
dct[k] += v
else:
dct[k] = v
There are ways to progressively be more clever, the first is probably to use .get with the default:
dct = {}
for k, v in lst:
dct[k] = dct.get(k, 0) + v
Finally, you can use a collections.defaultdict, which takes a "factory" function which will be called if the key is not there, use int as the factor:
from collections import defaultdict
dct = defaultdict(int)
for k, v in lst:
dct[k] += v
NOTE: it is usually safer to create a regular dict out of this, to avoid the default behavior:
dct = dict(dct)
Or even
dct.default_factory = None
Finally, one of the more flexible ways is to create your own dict subclass and use __missing__, this is useful if need access to the key when you are making the default value, so not particularly more helpful here, but for completion's sake:
class AggDict(dict):
def __missing__(self, key):
return 0
dct = AggDict()
for k, v in dct:
dct[k] += v

Use a defaultdict:
dct = defaultdict(int) # default to 0
for name, val in lst:
dct[name[0]] += val
dct = dict(dct) # get rid of default value

You could use Counter from collections to convert the tuples to countable key/values, then use reduce from functools to add them together:
from collections import Counter
from functools import reduce
lst = [('Alex', 5), ('Addy', 7), ('Abdul', 2), ('Bob', 6), ('Carl', 8), ('Cal', 4)]
dst = reduce(Counter.__add__,(Counter({k[:1]:v}) for k,v in lst))
# Counter({'A': 14, 'C': 12, 'B': 6})

Get sequences of same values within list and count elements within sequences

I'd like to find the amount of values within sequences of the same value from a list:
list = ['A','A','A','B','B','C','A','A']
The result should look like:
result_dic = {A: [3,2], B: [2], C: [1]}
I do not just want the counts of different values in a list as you can see in the result for A.

collections.defaultdict and itertools.groupby
from itertools import groupby
from collections import defaultdict
listy = ['A','A','A','B','B','C','A','A']
d = defaultdict(list)
for k, v in groupby(listy):
d[k].append(len([*v]))
d
defaultdict(list, {'A': [3, 2], 'B': [2], 'C': [1]})
groupby will loop through an iterable and lump contiguous things together.
[(k, [*v]) for k, v in groupby(listy)]
[('A', ['A', 'A', 'A']), ('B', ['B', 'B']), ('C', ['C']), ('A', ['A', 'A'])]
So I loop through those results and append the length of each grouped thing to the values of a defaultdict

I'd suggest using a defaultdict and looping through the list.
from collections import defaultdict
sample = ['A','A','A','B','B','C','A','A']
result_dic = defaultdict(list)
last_letter = None
num = 0
for l in sample:
if last_letter == l or last_letter is None:
num += 1
else:
result_dic[last_letter].append(num)
Edit
This is my approach, although I'd have a look at #piRSquared's answer because they were keen enough to include groupby as well. Nice work!

I'd suggest looping through the list.
result_dic = {}
old_word = ''
for word in list:
if not word in result_dic:
d[word] = [1]
elif word == old_word:
result_dic[word][-1] += 1
else:
result_dic[word].append(1)
old_word = word

Filter out elements that occur less times than a minimum threshold

After trying to count the occurrences of an element in a list using the below code
from collections import Counter
A = ['a','a','a','b','c','b','c','b','a']
A = Counter(A)
min_threshold = 3
After calling Counter on A above, a counter object like this is formed:
>>> A
Counter({'a': 4, 'b': 3, 'c': 2})
From here, how do I filter only 'a' and 'b' using minimum threshold value of 3?

Build your Counter, then use a dict comprehension as a second, filtering step.
{x: count for x, count in A.items() if count >= min_threshold}
# {'a': 4, 'b': 3}

As covered by Satish BV, you can iterate over your Counter with a dictionary comprehension. You could use items (or iteritems for more efficiency and if you're on Python 2) to get a sequence of (key, value) tuple pairs.
And then turn that into a Counter.
my_dict = {k: v for k, v in A.iteritems() if v >= min_threshold}
filteredA = Counter(my_dict)
Alternatively, you could iterate over the original Counter and remove the unnecessary values.
for k, v in A.items():
if v < min_threshold:
A.pop(k)

This looks nicer:
{ x: count for x, count in A.items() if count >= min_threshold }

You could remove the keys from the dictionary that are below 3:
for key, cnts in list(A.items()): # list is important here
if cnts < min_threshold:
del A[key]
Which gives you:
>>> A
Counter({'a': 4, 'b': 3})

Find count of characters within the string in Python

I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below
str1 = "aabbaba"
I want to create a dictionary like this
word_count = {'a':4,'b':3}
I am trying to use dictionary comprehension to do this.
I did
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
This ends up giving an error saying
File "<stdin>", line 1
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
^
SyntaxError: invalid syntax
Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?

As others have said, this is best done with a Counter.
You can also do:
>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}
But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. i.e., This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.
If you want no import version that is more efficient than using .count, you can use .setdefault to make a counter:
>>> count={}
>>> for c in str1:
... count[c]=count.setdefault(c, 0)+1
...
>>> count
{'a': 4, 'b': 3}
That only traverses the string once no matter how long or how many unique characters.
You can also use defaultdict if you prefer:
>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
... count[c]+=1
...
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}
But if you are going to import collections -- Use a Counter!

Ideal way to do this is via using collections.Counter:
>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer, as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n.

This is a nice case for collections.Counter:
>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
It's dict subclass so you can work with the object similarly to standard dictionary:
>>> c = Counter(str1)
>>> c['a']
4
You can do this without use of Counter class as well. The simple and efficient python code for this would be:
>>> d = {}
>>> for x in str1:
... d[x] = d.get(x, 0) + 1
...
>>> d
{'a': 4, 'b': 3}

Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.
To answer your question, yes it's possible but the approach is like this:
dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}
The condition is applied on the value only not on the key:value mapping.
The above can be made clearer using dict.get:
dic = {x: dic.get(x, 0) + 1 for x in str1}
0 is returned if x is not in dic.
Demo:
In [78]: s = "abcde"
In [79]: dic = {}
In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}
In [81]: dic
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}
In [82]: s = "abfg"
In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}
In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

weighted counting in python

I want to count the instances of X in a list, similar to
How can I count the occurrences of a list item in Python?
but taking into account a weight for each instance.
For example,
L = [(a,4), (a,1), (b,1), (b,1)]
the function weighted_count() should return something like
[(a,5), (b,2)]
Edited to add: my a, b will be integers.

you can still use counter:
from collections import Counter
c = Counter()
for k,v in L:
c.update({k:v})
print c

The following will give you a dictionary of all the letters in the array and their corresponding counts
counts = {}
for value in L:
if value[0] in counts:
counts[value[0]] += value[1]
else:
counts[value[0]] = value[1]
Alternatively, if you're looking for a very specific value. You can filter the list for that value, then map the list to the weights and find the sum of them.
def countOf(x,L):
filteredL = list(filter(lambda value: value[0] == x,L))
return sum(list(map(lambda value: value[1], filteredL)))

>>> import itertools
>>> L = [ ('a',4), ('a',1), ('b',1), ('b',1) ]
>>> [(k, sum(amt for _,amt in v)) for k,v in itertools.groupby(sorted(L), key=lambda tup: tup[0])]
[('a', 5), ('b', 2)]

defaultdict will do:
from collections import defaultdict
L = [('a',4), ('a',1), ('b',1), ('b',1)]
res = defaultdict(int)
for k, v in L:
res[k] += v
print(list(res.items()))
prints:
[('b', 2), ('a', 5)]

Group items with the occurrence of first element of each tuple using groupby from itertools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L = [('a',4), ('a',1), ('b',1), ('b',1)]
>>> L_new = []
>>> for k,v in groupby(L,key=itemgetter(0)):
L_new.append((k,sum(map(itemgetter(1), v))))
>>> L_new
[('a', 5), ('b', 2)]
>>> L_new = [(k,sum(map(itemgetter(1), v))) for k,v in groupby(L, key=itemgetter(0))] #for those fun of list comprehension and one liner expression
>>> L_new
[('a', 5), ('b', 2)]
Tested in both Python2 & Python3

Use the dictionaries get method.
>>> d = {}
>>> for item in L:
... d[item[0]] = d.get(item[0], 0) + item[1]
...
>>> d
{'a': 5, 'b': 2}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find count of identical adjacent characters in a string - python

I have a string: 'AAAAATTT' I want to write a program that would count each time 2 values are identical. So in 'AAAAATTT' it would give a count of: AA: 4 TT: 2

You could use a Counter: from collections import Counter s = 'AAAAATTT' print([(k*2, v - 1) for k, v in Counter(list(s)).items() if v > 1]) #output: [('AA', 4), ('TT', 2)]

One way may be as following using Counter: from collections import Counter string = 'AAAAATTT' result = dict(Counter(s1+s2 for s1, s2 in zip(string, string[1:]) if s1==s2)) print(result) Result: {'AA': 4, 'TT': 2}

Related

Given a list of [string, number] tuples, create a dictionary where keys are the first characters of strings and the values are sums of the numbers

Get sequences of same values within list and count elements within sequences

Filter out elements that occur less times than a minimum threshold

Find count of characters within the string in Python

weighted counting in python

Categories

Resources