I am doing python basic challenges this is one of them. What all I needed to do is to read through a file and print out the frequency of letters in decreasing order. I am able to do this but I wanted to enhance the program by also printing out the frequency percentage alongside with the letter - frequency - freq%. Something like this: o - 46 - 10.15%
This is what I did so far:
def exercise11():
import string
while True:
try:
fname = input('Enter the file name -> ')
fop = open(fname)
break
except:
print('This file does not exists. Please try again!')
continue
counts = {}
for line in fop:
line = line.translate(str.maketrans('', '', string.punctuation))
line = line.translate(str.maketrans('', '', string.whitespace))
line = line.translate(str.maketrans('', '', string.digits))
line = line.lower()
for ltr in line:
if ltr in counts:
counts[ltr] += 1
else:
counts[ltr] = 1
lst = []
countlst = []
freqlst = []
for ltrs, c in counts.items():
lst.append((c, ltrs))
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
freqlst.sort(reverse=True)
lst.sort(reverse=True)
for ltrs, c, in lst:
print(c, '-', ltrs)
exercise11()
As you can see I am able to calculate and sort the freq% on a different list but I am not able to include it in the tuple of the lst[] list alongside with the letter, freq. Is there any way to solve this problem?
Also if you have any other suggestions for my code. Please do mention.
Output Screen
Modification
Applying a simple modification as mentioned by #wwii I got the desired output. All I had to do is add one more parameter to the print statement while iterating the lst[] list. Previously I tried to make another list for the freq%, sort and then tried to insert it to the letters-count tuple in a list which didn't work out.
for ltrs, c, in lst:
print(c, '-', ltrs, '-', round(ltrs/totalcount*100, 2), '%')
Output Screen
Your count data is in a dictionary of {letter:count} pairs.
You can use the dictionary to calculate the total count like this:
total_count = sum(counts.values())
Then don't calculate the percentage till you are iterating over the counts...
for letter, count in counts.items():
print(f'{letter} - {count} - {100*count/total}') #Python v3.6+
#print('{} - {} - {}'.format(letter, count, 100*count/total) #Python version <3.6+
Or if you want to put it all in a list so you can sort it:
data = []
for letter, count in counts.items():
data.append((letter,count,100*count/total)
Using operator.itemgetter for the sort key function can help code readability.
import operator
letter = operator.itemgetter(0)
count = operator.itemgetter(1)
frequency = operator.itemgetter(2)
data.sort(key=letter)
data.sort(key=count)
data.sort(key=frequency)
Tuples are immutable which is probably the issue you are finding. The other issue is the simple form of the sort function; A more-advanced sort function would serve you well. See below:
The list-of-tuples format of lst, but because tuples are immutable whereas lists are mutable, opting to change lst to a list-of-lists is a valid approach. Then, since lst is a list-of-lists with each element consisting of 'letter,count,frequency%', the sort function with lambda can be used to sort by whichever index you'd like. The following is to be inserted after your for line in fop: loop.
lst = []
for ltrs, c in counts.items():
lst.append([ltrs,c])
totalcount = sum([x[1] for x in lst]) # sum all 'count' values in a list comprehension
for elem in lst:
elem.append((elem[1]/totalcount)*100) # now that each element in 'lst' is a mutable list, you can append the calculated frequency to the respective element in lst
lst.sort(reverse=True,key=lambda lst:lst[2]) # sort in-place in reverse order by index 2.
The items in freqlst,countlist, and lst are related to each other by their position. If any are sorted that relationship is lost.
zipping the lists together before sorting will maintain the relationship.
Will pick up from your list initialization lines.
lst = []
countlst = []
freqlst = []
for ltr, c in counts.items():
#change here, lst now only contains letters
lst.append(ltr)
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
#New stuff here: Note this only works in python 3+
zipped = zip(lst, countlst, freqlst)
zipped = sorted(zipped, key=lambda x: x[1])
for ltr, c, freq in zipped:
print("{} - {} - {}%".format(ltr, c, freq)) # love me the format method :)
Basically, zip combines lists together into a list of tuples. Then you can use a lambda function to sort those tuples (very common stack question)
I think I was able to achieve what you wanted by using lists instead of tuples. Tuples cannot be modified, but if you really want to know how click here
(I also added the possibility to quit the program)
Important: Never forget to comment your code
The code:
def exercise11():
import string
while True:
try:
fname = input('Enter the file name -> ')
print('Press 0 to quit the program') # give the User the option to quit the program easily
if fname == '0':
break
fop = open(fname)
break
except:
print('This file does not exists. Please try again!')
continue
counts = {}
for line in fop:
line = line.translate(str.maketrans('', '', string.punctuation))
line = line.translate(str.maketrans('', '', string.whitespace))
line = line.translate(str.maketrans('', '', string.digits))
line = line.lower()
for ltr in line:
if ltr in counts:
counts[ltr] += 1
else:
counts[ltr] = 1
lst = []
countlst = []
freqlst = []
for ltrs, c in counts.items():
# add a zero as a place holder &
# use square brakets so you can use a list that you can modify
lst.append([c, ltrs, 0])
countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
efreq = (ec/totalcount) * 100
freqlst.append(efreq)
freqlst.sort(reverse=True)
lst.sort(reverse=True)
# count the total of the letters
counter = 0
for ltrs in lst:
counter += ltrs[0]
# calculate the percentage for each letter
for letter in lst:
percentage = (letter[0] / counter) * 100
letter[2] += float(format(percentage, '.2f'))
for i in lst:
print('The letter {} is repeated {} times, which is {}% '.format(i[1], i[0], i[2]))
exercise11()
<?php
$fh = fopen("text.txt", 'r') or die("File does not exist");
$line = fgets($fh);
$words = count_chars($line, 1);
foreach ($words as $key=>$value)
{
echo "The character <b>' ".chr($key)." '</b> was found <b>$value</b> times. <br>";
}
?>
I am trying to write my own code for generating permutation of items represented by numbers. Say 4 items can be represented by 0,1,2,3
I've seen the code from itertools product. That code is pretty neat. My way of coding this is using binary or ternary,... My code below only works for bits of less than 10. Part of this code split the str using list(s). Number 120 in base 11 is 1010, splitting '1010' yields, 1,0,1,0. For it to work correctly, I need to to split to 10, 10. Is there a way around this and still work with the rest of the code?
Alternatively, what is a recursive version for this? Thanks
aSet = 11
subSet = 2
s = ''
l = []
number = aSet**subSet
#finding all permutation, repeats allowed
for num in range(number):
s = ''
while num//aSet != 0:
s = str(num%aSet) + s
num = num//aSet
else:
s = str(num%aSet) + s
s = s.zfill(subSet)
l.append(list(s))
Indeed, the problem with using a string, is that list(s) will chop it into individual characters. You should not create a string at all, but use a list for s from the start:
aSet = 11
subSet = 2
l = []
number = aSet**subSet
#finding all permutation, repeats allowed
for num in range(number):
s = []
for _ in range(subSet):
s.insert(0, num%aSet)
num = num//aSet
l.append(s)
I am trying to write a program that tallies the values in a file. For example, I am given a file with numbers like this
2222 (First line)
4444 (Second line)
1111 (Third line)
My program takes in the name of an input file (E.G. File.txt), and the column of numbers to tally. So for example, if my file.txt contains the number above and i need the sum of column 2, my function should be able to print out 7(2+4+1)
t1 = open(argv[1], "r")
number = argv[2]
k = 0
while True:
n = int(number)
t = t1.readline()
z = list(t)
if t == "":
break
k += float(z[n])
t1.close()
print k
This code works for the first column when I set it to 0, but it doesn't return a consistent result when I set it to 1 even though they should be the same answer.
Any thoughts?
A somewhat uglier implementation that demonstrates the cool-factor of zip:
def sum_col(filename, colnum):
with open(filename) as inf:
columns = zip(*[line.strip() for line in inf])
return sum([int(num) for num in list(columns)[colnum]])
zip(*iterable) flips from row-wise to columnwise, so:
iterable = ['aaa','bbb','ccc','ddd']
zip(*iterable) == ['abcd','abcd','abcd'] # kind of...
zip objects aren't subscriptable, so we need to cast as list before we subscript it (doing [colnum]). Alternatively we could do:
...
for _ in range(colnum-1):
next(columns) # skip the columns we don't need
return sum([int(num) for num in next(columns)])
Or just calculate all the sums and grab the sum that we need
...
col_sums = [sum(int(num) for num in column) for column in columns]
return col_sums[colnum]
So I need to save the results of a loop and I'm having some difficulty. I want to record my results to a new list, but I get "string index out of range" and other errors. The end goal is to record the products of digits 1-5, 2-6, 3-7 etc, eventually keeping the highest product.
def product_of_digits(number):
d= str(number)
for integer in d:
s = 0
k = []
while s < (len(d)):
j = (int(d[s])*int(d[s+1])*int(d[s+2])*int(d[s+3])*int(d[s+4]))
s += 1
k.append(j)
print(k)
product_of_digits(n)
Similar question some time ago. Hi Chauxvive
This is because you are checking until the last index of d as s and then doing d[s+4] and so on... Instead, you should change your while loop to:
while s < (len(d)-4):
I have a collection of 101 documents, I need to iterate over them taking 10 collections at a time and store a value of a particular field(of 10 documents) in a list.
I tried this:
values = db.find({},{"field":1})
urls = []
count = 0
for value in values:
if(count < 10):
urls.append(value["field"])
count = count + 1
print count
else:
print urls
urls = []
urls.append(value["field"])
count = 1
It doesn't fetch the last value because it doesn't reach if condition. Any elegant way to do this and rectify ths situation?
You reset count to 0 everytime the loop restarted. Move the declaration outside the loop:
count = 0
for value in values:
If urls is already filled, this will be your only problem.
As far as I can tell, you've some data that you want to organize into batches of size 10. If so, perhaps this will help:
N = 10
values = list(db.find({},{"field":1}))
url_batches = [
[v['field'] for v in values[i:i+N]]
for i in xrange(0, len(values), N)
]