Using python for frequency analysis

Using python for frequency analysis - python

I am trying to use python to help me crack Vigenère ciphers. I am fairly new to programming but I've managed to make an algorithm to analyse single letter frequencies. This is what I have so far:
Ciphertext = str(input("What is the cipher text?"))
Letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def LetterFrequency():
LetterFrequency = {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0, 'G': 0, 'H': 0, 'I': 0, 'J': 0, 'K': 0, 'L': 0, 'M': 0, 'N': 0, 'O': 0, 'P': 0, 'Q': 0, 'R': 0, 'S': 0, 'T': 0, 'U': 0, 'V': 0, 'W': 0, 'X': 0, 'Y': 0, 'Z': 0}
for letter in Ciphertext.upper():
if letter in Letters:
LetterFrequency[letter]+=1
return LetterFrequency
print (LetterFrequency())
But is there a way for me to print the answers in descending order starting from the most frequent letter? The answers are shown in random order right now no matter what I do.
Also does anyone know how to extract specific letters form a large block of text to perform frequency analysis? So for instance if I wanted to put every third letter from the text “THISISARATHERBORINGEXAMPLE” together to analyse, I would need to get:
T H I
S I S
A R A
T H E
R B O
R I N
G E X
A M P
L E
Normally I would have to do this by hand in either notepad or excel which takes ages. Is there a way to get around this in python?
Thanks in advance,
Tony

For the descending order you could use Counter:
>>> x = "this is a rather boring example"
>>> from collections import Counter
>>> Counter(x)
Counter({' ': 5, 'a': 3, 'e': 3, 'i': 3, 'r': 3, 'h': 2, 's': 2, 't': 2, 'b': 1, 'g': 1, 'm': 1, 'l': 1, 'o': 1, 'n': 1, 'p': 1, 'x': 1})
As for the second question you could iterate per 3.
To exclude spaces you can try what #not_a_robot suggests in the comment or
delete it manually like:
>>> y = Counter(x)
>>> del y[' ']
>>> y
Counter({'a': 3, 'e': 3, 'i': 3, 'r': 3, 'h': 2, 's': 2, 't': 2, 'b': 1, 'g': 1, 'm': 1, 'l': 1, 'o': 1, 'n': 1, 'p': 1, 'x': 1})

Another approach, although the collections.Counter example from #coder is your best bet.
from collections import defaultdict
from operator import itemgetter
Letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Ciphertext = "this is a rather boring example"
def LetterFrequency():
LetterFrequency = {letter: 0 for letter in Letters}
for letter in Ciphertext.upper():
if letter in Letters:
LetterFrequency[letter]+=1
return LetterFrequency
def sort_dict(dct):
return sorted(dct.items(), key = itemgetter(1), reverse = True)
print(sort_dict(LetterFrequency()))
Which prints this, a list of tuples sorted descendingly by frequency:
[('A', 3), ('I', 3), ('E', 3), ('R', 3), ('T', 2), ('S', 2), ('H', 2), ('L', 1), ('G', 1), ('M', 1), ('P', 1), ('B', 1), ('N', 1), ('O', 1), ('X', 1), ('Y', 0), ('J', 0), ('D', 0), ('U', 0), ('F', 0), ('C', 0), ('Q', 0), ('W', 0), ('Z', 0), ('K', 0), ('V', 0)]

Related

how to treat dictionary as an "whole thing" when using itertools?

from itertools import product
a = [1, 2, 3]
dict1 = {"x": 1, "y": 2, "z": 3}
dict2 = {"r": 4, "s": 5, "t": 6}
for i in product(a, dict1, dict2):
print(i)
and I got:
(1, 'x', 'r')
(1, 'x', 's')
(1, 'x', 't')
(1, 'y', 'r')
(1, 'y', 's')
(1, 'y', 't')
(1, 'z', 'r')
(1, 'z', 's')
(1, 'z', 't')
(2, 'x', 'r')
(2, 'x', 's')
(2, 'x', 't')
(2, 'y', 'r')
(2, 'y', 's')
(2, 'y', 't')
(2, 'z', 'r')
(2, 'z', 's')
(2, 'z', 't')
(3, 'x', 'r')
(3, 'x', 's')
(3, 'x', 't')
(3, 'y', 'r')
(3, 'y', 's')
(3, 'y', 't')
(3, 'z', 'r')
(3, 'z', 's')
(3, 'z', 't')
But I want to get these instead:
(1, {"x": 1, "y": 2, "z": 3}, {"r": 4, "s": 5, "t": 6})
(2, {"x": 1, "y": 2, "z": 3}, {"r": 4, "s": 5, "t": 6})
(3, {"x": 1, "y": 2, "z": 3}, {"r": 4, "s": 5, "t": 6})
How can I do this? Maybe I should use another function instead of product in this case? Please kindly help. Thanks!

Your requested output looks like you printed some tuples:
a = [1, 2, 3]
dict1 = {"x": 1, "y": 2, "z": 3}
dict2 = {"r": 4, "s": 5, "t": 6}
for n in a:
print( (n, dict1, dict2) )

The issues is, that itertools.product will return all product for each element of the iterables provided, so you could get a similar result to your expected result by wraping the two dicts into another iterable:
for i in list(product(a, [(dict1, dict2)])):
print(i)
(1, ({'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}))
(2, ({'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}))
(3, ({'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}))
The ouput is similar to what you asked for, but the dictionaries are wrappen inside a tupel.

This resembles what you ask for:
a = [1, 2, 3]
dict1 = {"x": 1, "y": 2, "z": 3}
dict2 = {"r": 4, "s": 5, "t": 6}
[[e, dict1, dict2] for e in a]
[[1, {'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}],
[2, {'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}],
[3, {'x': 1, 'y': 2, 'z': 3}, {'r': 4, 's': 5, 't': 6}]]
The key thing to note is that you really only want to iterate over a, since dict1 and dict2 are just content you are copying wholesale.

How do I change tuple values in a dictionary?

I have this dictionary:
d = {'a': (1, 2, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c'), 'd': (1, 3, 'd'), 'e': (0, 1, 'e'), 'f': (0, 1, 'f'), 'g': (1, 3, 'g'), 'h': (0, 1, 'h'), 'j': (1, 2, 'j'), 'i': (0, 1, 'i'), 'k': (-1, 0, 'k')}
How can I subtract the value by 1 for a specific key/value pair in this dictionary if the key matches the parameter?
For example, I want to subtract the values of key a by 1, so that it now displays:
{'a': (0, 1, 'a')
How can I edit the values of that specific key and decrease only the integers by 1 while creating the same dictionary again?
Code so far:
def matching(key_to_subtract):
for key, value in d.items():
if key_to_subtract == key:
matching("a")
Desired Output:
{'a': (0, 1, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c'), 'd': (1, 3, 'd'), 'e': (0, 1, 'e'), 'f': (0, 1, 'f'), 'g': (1, 3, 'g'), 'h': (0, 1, 'h'), 'j': (1, 2, 'j'), 'i': (0, 1, 'i'), 'k': (-1, 0, 'k')}

Since a tuple is immutable you have to build a new one and bind it to the key. You do not, however, have to iterate a dict to search for a given key. That is the whole point of a dict:
def decrement(d, k):
# if k in d: # if you are not certain the key exists
d[k] = tuple(v-1 if isinstance(v, int) else v for v in d[k])

You cannot change the content of a tuple. So you need to construct a new tuple and assign it to the dict:
d = {'a': (1, 2, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c')}
d['a'] = tuple( i - 1 if isinstance(i, int) else i for i in d['a'] )
# d become {'a': (0, 1, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c')}

As tuples are immutable you have to rebuild the tuple again and reassign the new tuple to the same key you got it from. This can be written in 1-2 lines using comprehensions but would make it difficult to understand, hope it helps!
Using generator expressions
def matching(key_to_subtract):
# generator expression force evaluated as tuple
d[key_to_subtract] = tuple(item-1 if isinstance(item, int) else item for item in d.get(key_to_subtract))
Expanded form
d = {'a': (1, 2, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c'), 'd': (1, 3, 'd'), 'e': (0, 1, 'e'), 'f': (0, 1, 'f'), 'g': (1, 3, 'g'), 'h': (0, 1, 'h'), 'j': (1, 2, 'j'), 'i': (0, 1, 'i'), 'k': (-1, 0, 'k')}
def matching(key_to_subtract):
buffer = [] # to build new tuple later from this list
for item in d.get(key_to_subtract): # get the tuple and iterate over
try:
buffer.append(item - 1) # subtract 1
except TypeError:
buffer.append(item) # append directly if its not integer
d[key_to_subtract] = tuple(buffer) # reassign the list as tuple
matching("a")

From your question, I propose another different approach from your code. The reason is because d.items() create a view which is good for checking but not for updating the code.
Therefore, I suggest using d.keys() to search for instance of key_to_subtract. Then change the tuple to list (or you can you another method of slicing the tuple at the position you want then add the new value in which can be faster in the case of a large tuple) then replacing that key with the new value (after changing it back from list to tuple):
def matching(key_to_subtract):
#check for key_to_subtract
#if found
if key_to_subtract in d.keys():
#change to list
lst_val = list(d[key_to_subtract])
#subtract the two values
lst_val[0]-=1
lst_val[1]-=1
#change back to tuple
tuple_val = tuple(lst_val)
#replace the old value of the same key with the new value
d[key_to_subtract] = tuple_val
based on the functions in: https://docs.python.org/3.7/library/stdtypes.html#dict-views

you can do in this way as tuple are immutable objects.
d = {'a': (1, 2, 'a'), 'b': (1, 2, 'b'), 'c': (2, 4, 'c'), 'd': (1, 3, 'd'), 'e': (0, 1, 'e'), 'f': (0, 1, 'f'), 'g': (1, 3, 'g'), 'h': (0, 1, 'h'), 'j': (1, 2, 'j'), 'i': (0, 1, 'i'), 'k': (-1, 0, 'k')}
def change_value_by_key_pos(data, key, pos, step):
for k,v in data.items():
if k == key:
list_items = list(data[key])
list_items[pos] = list_items[pos]+step
data[key] = tuple(list_items)
return data
data = change_value_by_key_pos(d,'a',1,1)
print(data)

Create function to count occurrence of characters in a string

I want get the occurrence of characters in a string, I got this code:
string = "Foo Fighters"
def conteo(string):
copia = ''
for i in string:
if i not in copia:
copia = copia + i
conteo = [0]*len(copia)
for i in string:
if i in copia:
conteo[copia.index(i)] = conteo[copia.index(i)] + 1
out = ['0']*2*len(copia)
for i in range(len(copia)):
out[2*i] = copia[i]
out[2*i + 1] = conteo[i]
return (out)
And I want return something like: ['f', 2, 'o', 2, '', 1, 'i', 1, 'g', 1, 'h', 1, 't', 1, 'e', 1, 'r', 1, 's', 1]
How can I do it? Without use a python library
Thank you

Use Python Counter (part of standard library):
>>> str = 'foo fighters'
>>> from collections import Counter
>>> counter = Counter(str)
Counter({'f': 2, 'o': 2, ' ': 1, 'e': 1, 'g': 1, 'i': 1, 'h': 1, 's': 1, 'r': 1, 't': 1})
>>> counter['f']
2
>>>

Depending on why you want this information, one method could be to use a Counter:
from collections import Counter
print(Counter("Foo Fighters"))
Of course, to create exactly the same output as requested, use itertools as well:
from collections import Counter
from itertools import chain
c = Counter("Foo Fighters")
output = list(chain.from_iterable(c.items()))
>> ['F', 2, 'o', 2, ' ', 1, 'i', 1, 'g', 1, 'h', 1, 't', 1, 'e', 1, 'r', 1, 's', 1]

It's not clear whether you want a critique of your current attempt or a pythonic solution. Below is one way where output is a dictionary.
from collections import Counter
mystr = "Foo Fighters"
c = Counter(mystr)
Result
Counter({' ': 1,
'F': 2,
'e': 1,
'g': 1,
'h': 1,
'i': 1,
'o': 2,
'r': 1,
's': 1,
't': 1})
Output as list
I purposely do not combine the tuples in this list, as it's a good idea to maintain structure until absolutely necessary. It's a trivial task to combine these into one list of strings.
list(c.items())
# [('F', 2),
# ('o', 2),
# (' ', 1),
# ('i', 1),
# ('g', 1),
# ('h', 1),
# ('t', 1),
# ('e', 1),
# ('r', 1),
# ('s', 1)]

Sort value in dictionary by value in Python

I have something like this in Python to count the frequency of characters in a text, but i can't sort the values on the dictionary "v".
abcedario='abcdefghijklmnopqrstvuxwyz'
v = {}
count = 0
for c in abcedario:
count = 0
for char in text:
if c == char:
count = count +1
v[c] = count
sorted(v.items(), key=lambda x:x[1])
print v
I try to search here on stackoverflow but never solve my problem, the aspect of the output is this:
{'a': 2, 'b': 4, 'e': 4, 'd': 36, 'g': 31, 'f': 37, 'i': 14, 'h': 4, 'k': 51, 'j': 31, 'l': 34, 'n': 18, 'q': 13, 'p': 2, 'r': 9, 'u': 1, 't': 1, 'w': 36, 'v': 15, 'y': 14, 'x': 8, 'z': 10}
I want sort by value, so it's different from other posts.

If you just want to print them in order, just print the output of sorted:
abcedario='abcdefghijklmnopqrstvuxwyz'
v = {}
count = 0
for c in abcedario:
count = 0
for char in text:
if c == char:
count = count +1
v[c] = count
print sorted(v.items(), key=lambda x:x[1])
For text = "helloworld" you get:
[('e', 1), ('d', 1), ('h', 1), ('r', 1), ('w', 1), ('o', 2), ('l', 3)]

A python dictionary is an unordered collection of items. Therefore, it can't be sorted.
Try looking into OrderedDict from collections.

you can use Counter
from collections import Counter
text = "I have something like this in Python to count the frequency of characters in a text, but i can't sort the values on the dictionary"
print(Counter(text))
output:
Counter({' ': 24, 't': 15, 'e': 11, 'n': 9, 'h': 8, 'i': 8, 'o': 8, 'a': 7, 'c': 6, 's': 5, 'r': 5, 'u': 4, 'y': 3, 'f': 2, 'l': 2, 'v': 2, "'": 1, 'q': 1, 'd': 1, 'I': 1, 'm': 1, 'g': 1, 'b': 1, 'x': 1, ',': 1, 'P': 1, 'k': 1})

counting letters in a string python

I have to write a function, countLetters(word), that takes in a word as argument and returns a list that counts the number of times each letter appears. The letters must be sorted in alphabetical order.
This is my attempt:
def countLetters(word):
x = 0
y = []
for i in word:
for j in range(len(y)):
if i not in y[j]:
x = (i, word.count(i))
y.append(x)
return y
I first tried it without the if i not in y[j]
countLetters("google")
result was
[('g', 2), ('o', 2), ('o', 2), ('g', 2), ('l', 1), ('e', 1)]
when I wanted
[('e', 1), ('g', 2), ('l', 1), ('o', 2)]
When I added the if i not in y[j] filter, it just returns an empty list [].
Could someone please point out my error here?

I recommend the collections module's Counter if you're in Python 2.7+
>>> import collections
>>> s = 'a word and another word'
>>> c = collections.Counter(s)
>>> c
Counter({' ': 4, 'a': 3, 'd': 3, 'o': 3, 'r': 3, 'n': 2, 'w': 2, 'e': 1, 'h': 1, 't': 1})
You can do the same in any version Python with an extra line or two:
>>> c = {}
>>> for i in s:
... c[i] = c.get(i, 0) + 1
This would also be useful to check your work.
To sort in alphabetical order (the above is sorted by frequency)
>>> for letter, count in sorted(c.items()):
... print '{letter}: {count}'.format(letter=letter, count=count)
...
: 4
a: 3
d: 3
e: 1
h: 1
n: 2
o: 3
r: 3
t: 1
w: 2
or to keep in a format that you can reuse as a dict:
>>> import pprint
>>> pprint.pprint(dict(c))
{' ': 4,
'a': 3,
'd': 3,
'e': 1,
'h': 1,
'n': 2,
'o': 3,
'r': 3,
't': 1,
'w': 2}
Finally, to get that as a list:
>>> pprint.pprint(sorted(c.items()))
[(' ', 4),
('a', 3),
('d', 3),
('e', 1),
('h', 1),
('n', 2),
('o', 3),
('r', 3),
('t', 1),
('w', 2)]

I think the problem lies in your outer for loop, as you are iterating over each letter in the word.
If the word contains more than one of a certain letter, for example "bees", when it iterates over this, it will now count the number of 'e's twice as the for loop does not discriminate against unique values. Look at string iterators, this might clarify this more. I'm not sure this will solve your problem, but this is the first thing that I noticed.
You could maybe try something like this:
tally= {}
for s in check_string:
if tally.has_key(s):
tally[s] += 1
else:
tally[s] = 1
and then you can just retrieve the tally for each letter from that dictionary.

Your list y is always empty. You are never getting inside a loop for j in range(len(y))
P.S. your code is not very pythonic

Works fine with latest Py3 and Py2
def countItems(iter):
from collections import Counter
return sorted(Counter(iter).items())

Using a dictionary and pprint from answer of #Aaron Hall
import pprint
def countLetters(word):
y = {}
for i in word:
if i in y:
y[i] += 1
else:
y[i] = 1
return y
res1 = countLetters("google")
pprint.pprint(res1)
res2 = countLetters("Google")
pprint.pprint(res2)
Output:
{'e': 1, 'g': 2, 'l': 1, 'o': 2}
{'G': 1, 'e': 1, 'g': 1, 'l': 1, 'o': 2}

I am not sure what is your expected output, according to the problem statement, it seems you should sort the word first to get the count of letters in a sorted order. code below may be helpful:
def countLetters(word):
letter = []
cnt = []
for c in sorted(word):
if c not in letter:
letter.append(c)
cnt.append(1)
else:
cnt[-1] += 1
return zip(letter, cnt)
print countLetters('hello')
this will give you [('e', 1), ('h', 1), ('l', 2), ('o', 1)]

You can create dict of characters first, and than list of tulips
text = 'hello'
my_dict = {x : text.count(x) for x in text}
my_list = [(key, my_dict[key]) for key in my_dict]
print(my_dict)
print(my_list)
{'h': 1, 'e': 1, 'l': 2, 'o': 1}
[('h', 1), ('e', 1), ('l', 2), ('o', 1)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using python for frequency analysis - python

Related

how to treat dictionary as an "whole thing" when using itertools?

How do I change tuple values in a dictionary?

Create function to count occurrence of characters in a string

Sort value in dictionary by value in Python

counting letters in a string python

Categories

Resources