Determining Letter Frequency Of Cipher Text

Determining Letter Frequency Of Cipher Text - python

I am trying to make a tool that finds the frequencies of letters in some type of cipher text.
Lets suppose it is all lowercase a-z no numbers. The encoded message is in a txt file
I am trying to build a script to help in cracking of substitution or possibly transposition ciphers.
Code so far:
cipher = open('cipher.txt','U').read()
cipherfilter = cipher.lower()
cipherletters = list(cipherfilter)
alpha = list('abcdefghijklmnopqrstuvwxyz')
occurrences = {}
for letter in alpha:
occurrences[letter] = cipherfilter.count(letter)
for letter in occurrences:
print letter, occurrences[letter]
All it does so far is show how many times a letter appears.
How would I print the frequency of all letters found in this file.

import collections
d = collections.defaultdict(int)
for c in 'test':
d[c] += 1
print d # defaultdict(<type 'int'>, {'s': 1, 'e': 1, 't': 2})
From a file:
myfile = open('test.txt')
for line in myfile:
line = line.rstrip('\n')
for c in line:
d[c] += 1
For the genius that is the defaultdict container, we must give thanks and praise. Otherwise we'd all be doing something silly like this:
s = "andnowforsomethingcompletelydifferent"
d = {}
for letter in s:
if letter not in d:
d[letter] = 1
else:
d[letter] += 1

The modern way:
from collections import Counter
string = "ihavesometextbutidontmindsharing"
Counter(string)
#>>> Counter({'i': 4, 't': 4, 'e': 3, 'n': 3, 's': 2, 'h': 2, 'm': 2, 'o': 2, 'a': 2, 'd': 2, 'x': 1, 'r': 1, 'u': 1, 'b': 1, 'v': 1, 'g': 1})

If you want to know the relative frequency of a letter c, you would have to divide number of occurrences of c by the length of the input.
For instance, taking Adam's example:
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37
and storing the absolute frequence of each letter in
dict[letter]
we obtain the relative frequencies by:
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
print c, dict[c]/float(n)
putting it all together, we get something like this:
# get input
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37
# get absolute frequencies of letters
import collections
dict = collections.defaultdict(int)
for c in s:
dict[c] += 1
# print relative frequencies
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
print c, dict[c]/float(n)

Related

Removing duplicates using a dictionary

I am writing a function that is supposed to count duplicates and mention how many duplicates are of each individual record. For now my output is giving me the total number of duplications, which I don't want.
i.e. if there are 4 duplicates of one record, it's giving me 4 instead of 1; if there are 6 duplicates of 2 individual records it should give me 2.
Could someone please help find the bug?
Thank you
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
for a in text:
if dict[a] > 1:
word = word + 1
return word
else:
return "0"

Fixed it:
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
return sum(1 for a in dict.values() if a >= 2)
else:
return "0"

You can do this with set and sum. First set is used to remove all duplicates. This is so we can have as few iterations as possible, as-well-as get an immediate count, as opposed to a "one-at-a-time" count. The set is then used to create a dictionary that stores the amount of times a character repeats. Those values are then used as a generator in sum to sum all the times that the "repeat value" is greater than 1.
def dup_cnt(t:str) -> int:
if not t: return 0
t = t.lower()
d = dict()
for c in set(t):
d[c] = t.count(c)
return sum(v>1 for v in d.values())
print(dup_cnt('aabccdeefggh')) #4

I don't really understand the question you asked.
But I assume you want to get the count or details of each letter's duplication in the text. You can do this, hoping this can help.
def duplicate_count(text):
count_dict = {}
for letter in text.lower():
count_dict[letter] = count_dict.setdefault(letter, 0) + 1
return count_dict
ret = duplicate_count('asuhvknasiasifjiasjfija')
# Get all letter details
print(ret)
#{'a': 5, 's': 4, 'u': 1, 'h': 1, 'v': 1, 'k': 1, 'n': 1, 'i': 4, 'f': 2, 'j': 3}
# Get all letter count
print(len(ret))
# 10
# Get only the letters appear more than once in the text
dedup = {k: v for k, v in ret.items() if v > 1}
# Get only duplicated letter details
print(dedup)
# {'a': 5, 's': 4, 'i': 4, 'f': 2, 'j': 3}
# Get only duplicated letter count
print(len(dedup))
# 5

I want to create a function that takes a text string and returns a dictionary containing certain characters with how many tmies they occur

This is what I have so far as a function
example = "Sample String"
def func(text, let):
count= {}
for let in text.lower():
let = count.keys()
if let in text:
count[let] += 1
else:
count[let] = 1
return count
I want to return something like this
print(func(example, "sao"))
{'s': 2, 'a' : 1}
I am not very sure what I could improve on

I would use Counter from the collections built-in module:
from collections import Counter
def func(text, let):
c = Counter(text.lower())
return {l: c[l] for l in let if l in c.keys()}
Breaking it down:
Counter will return the count of letters in your string:
In [5]: Counter(example.lower())
Out[5]:
Counter({'s': 2,
'a': 1,
'm': 1,
'p': 1,
'l': 1,
'e': 1,
' ': 1,
't': 1,
'r': 1,
'i': 1,
'n': 1,
'g': 1})
So then all you need to do is return a dictionary of the appropriate letters, which can be done in a dictionary comprehension:
# iterate over every letter in `let`, and get the Counter value for that letter,
# if that letter is in the Counter keys
{l: c[l] for l in let if l in c.keys()}
Fixing your code
If you prefer to use your approach, you could make your code work properly with this:
def func(text, let):
count = {}
for l in text.lower():
if l in let:
if l in count.keys():
count[l] += 1
else:
count[l] = 1
return count

from functools import reduce
def count(text, letters):
return reduce(
lambda d, letr: d.update({letr: d.get(letr, 0) + 1}) or d,
filter(lambda l: l in letters, text), {}
)
Read it backwards.
Creates an empty dictionary.
{}
Filters letters from text.
lambda l: l in letters
This lambda function returns true if l is in letters
filter(lambda l: l in letters, text)
reduce will iterate over the object returned by filter, which will
only produce letters in text, if they are in letters.
lambda d, letr: d.update({letr: d.get(letr, 0) + 1}) or d
Updates the dictionary with the count of the letters it encounters.
Each time reduce iterates over an item generated by the filter object,
it will call this lambda function. Since dict.update() -> None, returns None, which evaluates to false, we say or d to actually return the dict back to reduce, which will pass the dict back into the lambda the next time it gets called, thus building up the counts. We also use dict.get() in the lambda instead of d[i], this allows us to pass the default of 0 if the letter is not yet in the dictionary.
At the end reduce returns the dict, and we return that from count.
This is similar to how "map reduce" works.
You can read about functional style and lambda expressions in the python docs.

>>> def func(text: str, let: str):
... text, count = text.lower(), {}
... for i in let:
... if text.count(i) != 0:
... count[i] = text.count(i)
... return count
...
>>> print(func("Sample String", "sao"))
{'s': 2, 'a': 1}

Python dictionary being returned

I'm trying to create dictionary with the count of each letter appearing in a list of words. The method count_letters_v2(word_list), print the letters but does not return the dictionary. Why? And how do I fix this? Thank you.
def count_letters_v2(word_list):
count = {}
for word in word_list:
print word
for letter in word:
print letter
if letter not in count:
count[letter] = 1
else:
count[letter] = count[letter] + 1
return count
def main():
count_letters_v2(['a','short','list','of','words'])
if __name__ == '__main__':
main()

It does return the dictionary. That's what return count does. Do you want to print out the dictionary? Then change main() to
def main():
print count_letters_v2(['a','short','list','of','words'])
For the record, there's a Counter object (a subclass of dict so can do all the same things) in the standard library that will do this all for you.
from collections import Counter
def count_letters_v3(word_list):
return Counter(''.join(word_list))
print count_letters_v3(['a','short','list','of','words'])
Output:
Counter({'a': 1,
'd': 1,
'f': 1,
'h': 1,
'i': 1,
'l': 1,
'o': 3,
'r': 2,
's': 3,
't': 2,
'w': 1})

As said, you code works but you didn't do anytinhg with the return value of the function.
That said, I can think of some improvements though. First, the get method
of dict will make the case of a new letter cleaner, setting the default value to 0:
...
count = {}
for word in word_list:
for letter in word:
count[letter] = count.get(letter, 0) + 1
...
Otherwise, you can use a Counter object from collections:
from collections import Counter
def count_letters_v2(word_list):
count = Counter()
for word in word_list:
count.update(word)
return count
If you have a very long list of words, you shouldn't use str.join since it builds new string. The chain_from_iterable method of itertools module will chain the letters for free:
from collections import Counter
from itertools import chain
def count_letters_v2(word_list):
return Counter(chain.from_iterable(word_list))

Anagram test for two strings in python

This is the question:
Write a function named test_for_anagrams that receives two strings as
parameters, both of which consist of alphabetic characters and returns
True if the two strings are anagrams, False otherwise. Two strings are
anagrams if one string can be constructed by rearranging the
characters in the other string using all the characters in the
original string exactly once. For example, the strings "Orchestra" and
"Carthorse" are anagrams because each one can be constructed by
rearranging the characters in the other one using all the characters
in one of them exactly once. Note that capitalization does not matter
here i.e. a lower case character can be considered the same as an
upper case character.
My code:
def test_for_anagrams (str_1, str_2):
str_1 = str_1.lower()
str_2 = str_2.lower()
print(len(str_1), len(str_2))
count = 0
if (len(str_1) != len(str_2)):
return (False)
else:
for i in range(0, len(str_1)):
for j in range(0, len(str_2)):
if(str_1[i] == str_2[j]):
count += 1
if (count == len(str_1)):
return (True)
else:
return (False)
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)
The problem here is when I enter strings as Orchestra and Carthorse, it gives me result as False. Same for the strings The eyes and They see. Any help would be appreciated.

I'm new to python, so excuse me if I'm wrong
I believe this can be done in a different approach: sort the given strings and then compare them.
def anagram(a, b):
# string to list
str1 = list(a.lower())
str2 = list(b.lower())
#sort list
str1.sort()
str2.sort()
#join list back to string
str1 = ''.join(str1)
str2 = ''.join(str2)
return str1 == str2
print(anagram('Orchestra', 'Carthorse'))

The problem is that you just check whether any character matches exist in the strings and increment the counter then. You do not account for characters you already matched with another one. That’s why the following will also fail:
>>> test_for_anagrams('aa', 'aa')
False
Even if the string is equal (and as such also an anagram), you are matching the each a of the first string with each a of the other string, so you have a count of 4 resulting in a result of False.
What you should do in general is count every character occurrence and make sure that every character occurs as often in each string. You can count characters by using a collections.Counter object. You then just need to check whether the counts for each string are the same, which you can easily do by comparing the counter objects (which are just dictionaries):
from collections import Counter
def test_for_anagrams (str_1, str_2):
c1 = Counter(str_1.lower())
c2 = Counter(str_2.lower())
return c1 == c2
>>> test_for_anagrams('Orchestra', 'Carthorse')
True
>>> test_for_anagrams('aa', 'aa')
True
>>> test_for_anagrams('bar', 'baz')
False

For completeness: If just importing Counter and be done with the exercise is not in the spirit of the exercise, you can just use plain dictionaries to count the letters.
def test_for_anagrams(str_1, str_2):
counter1 = {}
for c in str_1.lower():
counter1[c] = counter1.get(c, 0) + 1
counter2 = {}
for c in str_2.lower():
counter2[c] = counter2.get(c, 0) + 1
# print statements so you can see what's going on,
# comment out/remove at will
print(counter1)
print(counter2)
return counter1 == counter2
Demo:
print(test_for_anagrams('The eyes', 'They see'))
print(test_for_anagrams('orchestra', 'carthorse'))
print(test_for_anagrams('orchestr', 'carthorse'))
Output:
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
True
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
True
{'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
False

Traverse through string test and validate weather character present in string test1 if present store the data in string value.
compare the length of value and length of test1 if equals return True Else False.
def anagram(test,test1):
value =''
for data in test:
if data in test1:
value += data
if len(value) == len(test1):
return True
else:
return False
anagram("abcd","adbc")

I have done Anagram Program in basic way and easy to understandable .
def compare(str1,str2):
if((str1==None) or (str2==None)):
print(" You don't enter string .")
elif(len(str1)!=len(str2)):
print(" Strings entered is not Anagrams .")
elif(len(str1)==len(str2)):
b=[]
c=[]
for i in str1:
#print(i)
b.append(i)
b.sort()
print(b)
for j in str2:
#print(j)
c.append(j)
c.sort()
print(c)
if (b==c and b!=[] ):
print(" String entered is Anargama .")
else:
print(" String entered are not Anargama.")
else:
print(" String entered is not Anargama .")
str1=input(" Enter the first String :")
str2=input(" Enter the second String :")
compare(str1,str2)

A more concise and pythonic way to do it is using sorted & lower/upper keys.
You can first sort the strings and then use lower/ upper to make the case consistent for proper comparison as follows:
# Function definition
def test_for_anagrams (str_1, str_2):
if sorted(str_1).lower() == sorted(str_2).lower():
return True
else:
return False
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)

Another solution:
def test_for_anagrams(my_string1, my_string2):
s1,s2 = my_string1.lower(), my_string2.lower()
count = 0
if len(s1) != len(s2) :
return False
for char in s1 :
if s2.count(char,0,len(s2)) == s1.count(char,0,len(s1)):
count = count + 1
return count == len(s1)

My solution is:
#anagrams
def is_anagram(a, b):
if sorted(a) == sorted(b):
return True
else:
return False
print(is_anagram("Alice", "Bob"))

def anagram(test,test1):
test_value = []
if len(test) == len(test1):
for i in test:
value = test.count(i) == test1.count(i)
test_value.append(value)
else:
test_value.append(False)
if False in test_value:
return False
else:
return True
check for length of test and test1 , if length matches traverse through string test and compare the character count in both test and test1 strings if matches store the value in string.

Converting raw input string to numbers (for each letter) then summing the numbers [duplicate]

This question already has answers here:
Convert alphabet letters to number in Python
(19 answers)
Closed 7 years ago.
I want a user to enter a word e.g. apple and then convert each character in this string to a corresponding letter (a = 1, b = 2, c = 3 etc.)
So far I've defined all the letters
a=1
b=2
c=3
d=4
e=5
f=6
g=7
etc...
and have split the string to print out each letter using
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
print word[i]
this prints the characters individually, but I can't figure out how to print these as their corresponding numbers, which I could then sum together.

In your case, It is better to use a dictionary which defines all the characters and there value. The string library provides an easier way to do this. Using string.ascii_lowercase within a dict comprehension you can populate your dictionary mapping as such.
>>> import string
>>> wordmap = {x:y for x,y in zip(string.ascii_lowercase,range(1,27))}
>>> wordmap
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'k': 11, 'j': 10, 'm': 13, 'l': 12, 'o': 15, 'n': 14, 'q': 17, 'p': 16, 's': 19, 'r': 18, 'u': 21, 't': 20, 'w': 23, 'v': 22, 'y': 25, 'x': 24, 'z': 26}
Now you can easily map this to your output. First we take the input
>>> word = str(raw_input("Enter a word: ").lower())
Enter a word: apple
>>> values = []
Now we just loop through the input word and append the values to an empty list. We append the values because we also need to find the sum of the values.
>>> for i in word:
... print "{}".format(wordmap[i])
... values.append(wordmap[i])
...
1
16
16
12
5
You can finally use the sum function to output the sum total of the values.
>>> sum(values)
50

>>> string = 'hello'
>>> for char in string:
... print ord(char) - 96,
...
8 5 12 12 15
You can get the sum as
>>> sum = 0
>>> for char in string:
... sum += ord(char) - 96
...
>>> sum
52
or lighter as
>>> sum ( [ ord(i) - 96 for i in string ] )
52

If the values associated to letters is not necessary folowing the ASCII order values, a solution could be to store the letters and their corresponding value to a dict:
values = {
'a':1,
'b':2,
'c':3,
'd':4,
'e':5,
'f':6,
'g':7,
# etc.
}
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
print values[word[i]]

First of all you can loop over your string itself as its an iterator then you can get the expected id for your letters with ord(i)%96 :
word = str(raw_input("Enter a word: ").lower())
for i in word:
print ord(i)%96
Note that ord('a')=97.
And for sum :
sum(ord(i)%96 for i in word)

Use ord (converts a character to it's code):
def getnumber(letter)
return ord(letter) - ord('a') + 1
and then, you can just do:
print sum(map(getnumber, word))

If you want the default ordinal of ANSI chars, you may use:
w = 'apple'
print [ ord(i) for i in w ]
If not, define your own mapping in a function, and call that in the list comprehension instead of the ord().

Store the characters in a dict: dict['a'] = 1, etc.

word = raw_input("Enter a word: ").lower()
s=[list(i) for i in word.split()]
s=[x for _list in s for x in _list]
print s
for i in s:
print ord(i)-96

To get your current method working you could do this...
word = str(raw_input("Enter a word: ").lower())
for i in range (len(word)):
# eval to change the string to an object
print eval(str(word[i]))
This will replace the character string with the character object you have defined already.
But a better way to solver your problem might be something like this...
nums = map(lambda x: ord(x)-96, list(raw_input("Enter word: ").lower()))
This returns a list of all the letters numbered where a is 1, b is 2 ... y is 25, z is 26. You can then do for i in nums: print i to print all the numbers out line by line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Determining Letter Frequency Of Cipher Text - python

The modern way: from collections import Counter string = "ihavesometextbutidontmindsharing" Counter(string) #>>> Counter({'i': 4, 't': 4, 'e': 3, 'n': 3, 's': 2, 'h': 2, 'm': 2, 'o': 2, 'a': 2, 'd': 2, 'x': 1, 'r': 1, 'u': 1, 'b': 1, 'v': 1, 'g': 1})

Related

Removing duplicates using a dictionary

I want to create a function that takes a text string and returns a dictionary containing certain characters with how many tmies they occur

Python dictionary being returned

Anagram test for two strings in python

Converting raw input string to numbers (for each letter) then summing the numbers [duplicate]

Categories

Resources