Dictionary won't be displayed as requested - python

I need to solve an exercise (for beginners!) and get as result the frequencies of all occurring characters in a given text. My problem is I am stuck with the function I have tried to write because instead of a dictionary I get a list as result. I am aware that the problem is probably to be found in the use of "[]" but I haven't found any better solution to get at least one result.
Here is what I am struggling with:
def character_frequency(text):
"""
Returns the frequences of all occuring characters in the given text
:param text: A text
:return: Dict in the form {"<character>": frequency, "<character>": frequency, ...}
"""
frequency = {} # empty dict
for line in text:
for character in line.lower():
if character in frequency:
frequency[character] += 1
else:
frequency[character] = 1
print(f"character{str(frequency)}")
return frequency
print()
print("excerise")
frequency = character_frequency(growing_plants)
for c, n in frequency.items():
print(f"Character: {c}: {n}")
How should I change my function in order to get the correct dictionary result?

def character_frequency(text):
"""
Returns the frequences of all occuring characters in the given text
:param text: A text
:return: Dict in the form {"<character>": frequency, "<character>": frequency, ...}
"""
frequency = {} # empty dict
for line in text:
for character in line.lower():
if character in frequency:
frequency[character] += 1
else:
frequency[character] = 1
return frequency
growing_plants = "Returns the frequences of all occuring characters in the given text"
print()
print("excerise")
frequency = character_frequency(growing_plants)
print(frequency)
# for c, n in frequency.items():
# print(f"Character: {c}: {n}")
Output:
{'r': 6, 'e': 9, 't': 6, 'u': 3, 'n': 5, 's': 3, ' ': 10, 'h': 3, 'f': 2, 'q': 1, 'c': 5, 'o': 2, 'a': 3, 'l': 2, 'i': 3, 'g': 2, 'v': 1, 'x': 1}

Firstly, I noticed your indentation is wrong.
def character_frequency(text):
"""
Returns the frequences of all occuring characters in the given text
:param text: A text
:return: Dict in the form {"<character>": frequency, "<character>": frequency, ...}
"""
# Finding most occuring character
# Set frequency as empty dictionary
frequency_dict = {}
for character in string:
if character in frequency_dict:
frequency_dict[character] += 1
else:
frequency_dict[character] = 1
most_occurring = max(frequency_dict, key=frequency_dict.get)
# Displaying result
print("\nMost occuring character is: ", most_occuring)
print("It is repeated %d times" %(frequency_dict[most_occurring]))

Related

Removing duplicates using a dictionary

I am writing a function that is supposed to count duplicates and mention how many duplicates are of each individual record. For now my output is giving me the total number of duplications, which I don't want.
i.e. if there are 4 duplicates of one record, it's giving me 4 instead of 1; if there are 6 duplicates of 2 individual records it should give me 2.
Could someone please help find the bug?
Thank you
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
for a in text:
if dict[a] > 1:
word = word + 1
return word
else:
return "0"
Fixed it:
def duplicate_count(text):
text = text.lower()
dict = {}
word = 0
if len(text) != "":
for a in text:
dict[a] = dict.get(a,0) + 1
return sum(1 for a in dict.values() if a >= 2)
else:
return "0"
You can do this with set and sum. First set is used to remove all duplicates. This is so we can have as few iterations as possible, as-well-as get an immediate count, as opposed to a "one-at-a-time" count. The set is then used to create a dictionary that stores the amount of times a character repeats. Those values are then used as a generator in sum to sum all the times that the "repeat value" is greater than 1.
def dup_cnt(t:str) -> int:
if not t: return 0
t = t.lower()
d = dict()
for c in set(t):
d[c] = t.count(c)
return sum(v>1 for v in d.values())
print(dup_cnt('aabccdeefggh')) #4
I don't really understand the question you asked.
But I assume you want to get the count or details of each letter's duplication in the text. You can do this, hoping this can help.
def duplicate_count(text):
count_dict = {}
for letter in text.lower():
count_dict[letter] = count_dict.setdefault(letter, 0) + 1
return count_dict
ret = duplicate_count('asuhvknasiasifjiasjfija')
# Get all letter details
print(ret)
#{'a': 5, 's': 4, 'u': 1, 'h': 1, 'v': 1, 'k': 1, 'n': 1, 'i': 4, 'f': 2, 'j': 3}
# Get all letter count
print(len(ret))
# 10
# Get only the letters appear more than once in the text
dedup = {k: v for k, v in ret.items() if v > 1}
# Get only duplicated letter details
print(dedup)
# {'a': 5, 's': 4, 'i': 4, 'f': 2, 'j': 3}
# Get only duplicated letter count
print(len(dedup))
# 5

A code to show the amount of each letter in a text document in python

I am the biggest rookie of all rookies in python, and i want to learn how to write a code that
A) Reads and analyses a text document, and
B) Prints how many of a certain character is in the text document
For example, if the text document said 'Hello my name is Mark' it will return as
A: 2
E: 2
H: 1 etc.
To be fair, I only know how to read text files in python because I googled it no less than 3 minutes ago, so I'm working from scratch here. The only thing I have written is
txt = open("file.txt","r")
print(txt.count("A")) #an experimental line, it didnt work
file.close()
I also tried the code
txt = input("Enter text here: ")
print("A: ", txt.count("A"))
...
print("z: ", txt.count("z"))
Which would have worked if the text file didnt have speech marks in it which made the programme return only information from the things in the speech marks, hence text files.
The easiest way is using collections.Counter:
import collections
with open('file.txt') as fh:
characters = collections.Counter(fh.read())
# Most common 10 characters (probably space and newlines are the first 2)
print(characters.most_common(10))
I'm not sure what you mean by speech marks though, we can filter out all non-alphabetical characters like this:
import collections
import string
allowed_characters = set(string.ascii_letters)
with open('file.txt') as fh:
data = fh.read()
data = (c for c in data if c in allowed_characters)
characters = collections.Counter(data)
# Most common 10 characters
print(characters.most_common(10))
txt is a file handle for your first case, it is a pointer and not container of your file.
you can try this
txt = open('yourfile') # read mode is the default
content = txt.read()
print (content.count('A'))
txt.close()
Here is an example test.txt and the corresponding Python code to calculate each alphabet's frequency :
test.txt :
Red boxed memory, sonet.
This is an anonymous text.
Liverpool football club.
Two empty lines.
This is an SO (Stack Overflow) answer.
This is the Python code :
file = open('test.txt');
the_string = file.read();
alphabets = 'abcdefghijklmnopqrstuvwxyz';
Alphabets = dict();
for i in alphabets:
frequency = the_string.count(i) + the_string.count(i.capitalize());
Alphabets[i] = frequency;
print(Alphabets);
The Alphabets is therefore a dictionary of :
{'a': 6, 'b': 3, 'c': 2, 'd': 2, 'e': 10, 'f': 2, 'g': 0, 'h': 2, 'i': 6, 'j': 0, 'k': 1, 'l': 7, 'm': 4, 'n': 7, 'o': 13, 'p': 2, 'q': 0, 'r': 5, 's': 10, 't': 9, 'u': 2, 'v': 2, 'w': 3, 'x': 2, 'y': 3, 'z': 0}
You can get the frequency of an alphabet by, for example, Alphabets['a'] which will return 6. Alphabets['n'] which will return 7.
The frequency is including the capital letter, using frequency = the_string.count(i) + the_string.count(i.capitalize());.
Notice that when reading the file, each line will have an \n at the end, marking line spacing. This \n is counted as a whole, it doesn't represent a \ char and an n char. So Alphabets['n'] will not include the 'n' from \n.
Is this okay?

python list.count always returns 0

I have a lengthy Python list and would like to count the number of occurrences of a single character. For example, how many total times does 'o' occur? I want N=4.
lexicon = ['yuo', 'want', 'to', 'sioo', 'D6', 'bUk', 'lUk'], etc.
list.count() is the obvious solution. However, it consistently returns 0. It doesn't matter which character I look for. I have double checked my file - the characters I am searching for are definitely there. I happen to be calculating count() in a for loop:
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count_C1 = sample.count(C1) / total
But it returns 0 outside of the for loop, over the list 'lexicon' as well. I don't want a list of overall counts so I don't think Counter will work.
Ideas?
If we take your list (the shortened version you supplied):
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
then we can get the count using sum() and a generator-expression:
count = sum(s.count(c) for s in lexicon)
so if c were, say, 'k' this would give 2 as there are two occurances of k.
This will work in a for-loop or not, so you should be able to incorporate this into your wider code by yourself.
With your latest edit, I can confirm that this produces a count of 4 for 'o' in your modified list.
If I understand your question correctly, you would like to count the number of occurrences of each character for each word in the list. This is known as a frequency distribution.
Here is a simple implementation using Counter
from collections import Counter
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
chars = [char for word in lexicon for char in word]
freq_dist = Counter(chars)
Counter({'t': 2, 'U': 2, 'k': 2, 'a': 1, 'u': 1, 'l': 1, 'i': 1, 'y': 1, 'D': 1, '6': 1, 'b': 1, 's': 1, 'w': 1, 'n': 1, 'o': 1})
Using freq_dist, you can return the number of occurrences for a character.
freq_dist.get('a')
1
# get() method returns None if character is not in dict
freq_dist.get('4')
None
It's giving zero because sample.count('K') will matches k as a string. It will not consider buk or luk.
If u want to calculate frequency of character go like this
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count=sum([x.count(C1) for x in sample])
sample_count_C1 = sampl_count / total

Anagram test for two strings in python

This is the question:
Write a function named test_for_anagrams that receives two strings as
parameters, both of which consist of alphabetic characters and returns
True if the two strings are anagrams, False otherwise. Two strings are
anagrams if one string can be constructed by rearranging the
characters in the other string using all the characters in the
original string exactly once. For example, the strings "Orchestra" and
"Carthorse" are anagrams because each one can be constructed by
rearranging the characters in the other one using all the characters
in one of them exactly once. Note that capitalization does not matter
here i.e. a lower case character can be considered the same as an
upper case character.
My code:
def test_for_anagrams (str_1, str_2):
str_1 = str_1.lower()
str_2 = str_2.lower()
print(len(str_1), len(str_2))
count = 0
if (len(str_1) != len(str_2)):
return (False)
else:
for i in range(0, len(str_1)):
for j in range(0, len(str_2)):
if(str_1[i] == str_2[j]):
count += 1
if (count == len(str_1)):
return (True)
else:
return (False)
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)
The problem here is when I enter strings as Orchestra and Carthorse, it gives me result as False. Same for the strings The eyes and They see. Any help would be appreciated.
I'm new to python, so excuse me if I'm wrong
I believe this can be done in a different approach: sort the given strings and then compare them.
def anagram(a, b):
# string to list
str1 = list(a.lower())
str2 = list(b.lower())
#sort list
str1.sort()
str2.sort()
#join list back to string
str1 = ''.join(str1)
str2 = ''.join(str2)
return str1 == str2
print(anagram('Orchestra', 'Carthorse'))
The problem is that you just check whether any character matches exist in the strings and increment the counter then. You do not account for characters you already matched with another one. That’s why the following will also fail:
>>> test_for_anagrams('aa', 'aa')
False
Even if the string is equal (and as such also an anagram), you are matching the each a of the first string with each a of the other string, so you have a count of 4 resulting in a result of False.
What you should do in general is count every character occurrence and make sure that every character occurs as often in each string. You can count characters by using a collections.Counter object. You then just need to check whether the counts for each string are the same, which you can easily do by comparing the counter objects (which are just dictionaries):
from collections import Counter
def test_for_anagrams (str_1, str_2):
c1 = Counter(str_1.lower())
c2 = Counter(str_2.lower())
return c1 == c2
>>> test_for_anagrams('Orchestra', 'Carthorse')
True
>>> test_for_anagrams('aa', 'aa')
True
>>> test_for_anagrams('bar', 'baz')
False
For completeness: If just importing Counter and be done with the exercise is not in the spirit of the exercise, you can just use plain dictionaries to count the letters.
def test_for_anagrams(str_1, str_2):
counter1 = {}
for c in str_1.lower():
counter1[c] = counter1.get(c, 0) + 1
counter2 = {}
for c in str_2.lower():
counter2[c] = counter2.get(c, 0) + 1
# print statements so you can see what's going on,
# comment out/remove at will
print(counter1)
print(counter2)
return counter1 == counter2
Demo:
print(test_for_anagrams('The eyes', 'They see'))
print(test_for_anagrams('orchestra', 'carthorse'))
print(test_for_anagrams('orchestr', 'carthorse'))
Output:
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
{' ': 1, 'e': 3, 'h': 1, 's': 1, 't': 1, 'y': 1}
True
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
True
{'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
{'a': 1, 'c': 1, 'e': 1, 'h': 1, 'o': 1, 's': 1, 'r': 2, 't': 1}
False
Traverse through string test and validate weather character present in string test1 if present store the data in string value.
compare the length of value and length of test1 if equals return True Else False.
def anagram(test,test1):
value =''
for data in test:
if data in test1:
value += data
if len(value) == len(test1):
return True
else:
return False
anagram("abcd","adbc")
I have done Anagram Program in basic way and easy to understandable .
def compare(str1,str2):
if((str1==None) or (str2==None)):
print(" You don't enter string .")
elif(len(str1)!=len(str2)):
print(" Strings entered is not Anagrams .")
elif(len(str1)==len(str2)):
b=[]
c=[]
for i in str1:
#print(i)
b.append(i)
b.sort()
print(b)
for j in str2:
#print(j)
c.append(j)
c.sort()
print(c)
if (b==c and b!=[] ):
print(" String entered is Anargama .")
else:
print(" String entered are not Anargama.")
else:
print(" String entered is not Anargama .")
str1=input(" Enter the first String :")
str2=input(" Enter the second String :")
compare(str1,str2)
A more concise and pythonic way to do it is using sorted & lower/upper keys.
You can first sort the strings and then use lower/ upper to make the case consistent for proper comparison as follows:
# Function definition
def test_for_anagrams (str_1, str_2):
if sorted(str_1).lower() == sorted(str_2).lower():
return True
else:
return False
#Main Program
str_1 = input("Enter a string 1: ")
str_2 = input("Enter a string 2: ")
result = test_for_anagrams (str_1, str_2)
print (result)
Another solution:
def test_for_anagrams(my_string1, my_string2):
s1,s2 = my_string1.lower(), my_string2.lower()
count = 0
if len(s1) != len(s2) :
return False
for char in s1 :
if s2.count(char,0,len(s2)) == s1.count(char,0,len(s1)):
count = count + 1
return count == len(s1)
My solution is:
#anagrams
def is_anagram(a, b):
if sorted(a) == sorted(b):
return True
else:
return False
print(is_anagram("Alice", "Bob"))
def anagram(test,test1):
test_value = []
if len(test) == len(test1):
for i in test:
value = test.count(i) == test1.count(i)
test_value.append(value)
else:
test_value.append(False)
if False in test_value:
return False
else:
return True
check for length of test and test1 , if length matches traverse through string test and compare the character count in both test and test1 strings if matches store the value in string.

Determining Letter Frequency Of Cipher Text

I am trying to make a tool that finds the frequencies of letters in some type of cipher text.
Lets suppose it is all lowercase a-z no numbers. The encoded message is in a txt file
I am trying to build a script to help in cracking of substitution or possibly transposition ciphers.
Code so far:
cipher = open('cipher.txt','U').read()
cipherfilter = cipher.lower()
cipherletters = list(cipherfilter)
alpha = list('abcdefghijklmnopqrstuvwxyz')
occurrences = {}
for letter in alpha:
occurrences[letter] = cipherfilter.count(letter)
for letter in occurrences:
print letter, occurrences[letter]
All it does so far is show how many times a letter appears.
How would I print the frequency of all letters found in this file.
import collections
d = collections.defaultdict(int)
for c in 'test':
d[c] += 1
print d # defaultdict(<type 'int'>, {'s': 1, 'e': 1, 't': 2})
From a file:
myfile = open('test.txt')
for line in myfile:
line = line.rstrip('\n')
for c in line:
d[c] += 1
For the genius that is the defaultdict container, we must give thanks and praise. Otherwise we'd all be doing something silly like this:
s = "andnowforsomethingcompletelydifferent"
d = {}
for letter in s:
if letter not in d:
d[letter] = 1
else:
d[letter] += 1
The modern way:
from collections import Counter
string = "ihavesometextbutidontmindsharing"
Counter(string)
#>>> Counter({'i': 4, 't': 4, 'e': 3, 'n': 3, 's': 2, 'h': 2, 'm': 2, 'o': 2, 'a': 2, 'd': 2, 'x': 1, 'r': 1, 'u': 1, 'b': 1, 'v': 1, 'g': 1})
If you want to know the relative frequency of a letter c, you would have to divide number of occurrences of c by the length of the input.
For instance, taking Adam's example:
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37
and storing the absolute frequence of each letter in
dict[letter]
we obtain the relative frequencies by:
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
print c, dict[c]/float(n)
putting it all together, we get something like this:
# get input
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37
# get absolute frequencies of letters
import collections
dict = collections.defaultdict(int)
for c in s:
dict[c] += 1
# print relative frequencies
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
print c, dict[c]/float(n)

Categories

Resources