Given a .txt with about 200,000 lines of single words, I need to count how many times each letter appears as the first letter of a word. I have a dictionary with keys 'a' - 'z', with counts assigned to each of their values. I need to print them out in the form
a:10,978 b:7,890 c:12,201 d:9,562 e:6,008
f:7,095 g:5,660 (...)
The dictionary currently prints like this
[('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505), ('k'...
How do I remove the brackets & parentheses and print only 5 counts per line? Also, after I sorted the dictionary by keys, it started printing as key, value instead of key:value
def main():
file_name = open('dictionary.txt', 'r').readlines()
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
letter = {}
for i in alphabet:
letter[i]=0
for n in letter:
for p in file_name:
if p.startswith(n):
letter[n] = letter[n]+1
letter = sorted(letter.items())
print(letter)
main()
You couuld use the following:
It loops through your list, groups it by 5 elements, then prints it in the desired format.
In [15]:
letter = [('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505)]
Replace print(letter) with following:
for grp in range(0, len(letter), 5):
print(' '.join(elm[0] + ':' + '{:,}'.format(elm[1]) for elm in letter[grp:grp+5]))
a:10,898 b:9,950 c:17,045 d:10,675 e:7,421
f:7,138 g:5,998 h:6,619 i:7,128 j:1,505
A collections.Counter dict will get the count of all the first letters on each line, then split into chunks and join:
from collections import Counter
with open('dictionary.txt') as f: # automatically closes your file
# iterate once over the file object as opposed to storing 200k lines
# and 26 iterations over the lines
c = Counter(line[0] for line in f)
srt = sorted(c.items())
# create five element chunks from the sorted items
chunks = (srt[i:i+5] for i in range(0, len(srt), 5))
for chk in chunks:
# format and join
print(" ".join("{}:{:,}".format(c[0],c[1]) for c in chk))
If you may have something other than letters a-z use isalpha in the loop:
c = Counter(line[0] for line in f if line[0].isalpha())
There was a Format Specifier for Thousands Separator added in python 2.7.
Related
So what I am currently trying to do is take a list of letters and find out how many vowels there are, both lower-case and upper-case count as a lower-case entry in the dictionary, and add 1 to that respective tuple in my dictionary. I'm unable to use string or list methods so I figured a dictionary with tuples as keys would work best given these restrictions.
def vowels_count(letters):
vowels = ['a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U']
frequent = {('a', 'A') : 0, ('e', 'E') : 0, ('i', 'I') : 0, ('o', 'O') : 0, ('u', 'U') :
0}
for i in letters:
if i in vowels:
To answer your specific case:
def vowels_count(letters):
vowels = ['a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U']
frequent = {('a', 'A') : 0, ('e', 'E') : 0, ('i', 'I') : 0, ('o', 'O') : 0, ('u', 'U') :
0}
for i in letters:
if i in vowels:
frequent[(i.lower(), i.upper())] += 1
return frequent
However to suggest another way of doing it if you dont need the resulting dictionary to be a tuple:
import re
import collections
def vowels_count(letters: str) -> dict[str, int]:
vowels = re.sub('[^aeiou]', '', letters.lower())
return dict(collections.Counter(vowels))
I have list of posiple letter for each letter in a word..and I want to find all possiple words.
for example the input is [[l,b],[e,d],[s,t]] this represent a word of 3 letter wher first letter could be l or b, second letter could be e or d and third letter is s or t. I wont the out put to be the product of these lists [les,let,bet,...and so on]. the list could be any length not only three.
I tried
res = list(map(prod, zip(test_list)))
but I get
[<itertools.product object at 0x0000024F65AEC600>, <itertools.product object at 0x0000024F65AEC640>, <itertools.product object at 0x0000024F65AEC680>, <itertools.product object at 0x0000024F65AEC6C0>]
I tried
word1=list(product(letter[0],letter[1],letter[2]))
it works perfectly but I want the code to accept any length pf list
You don't want to zip the test_list, just pass each element of it as an argument to product using the * operator:
>>> test_list = [['l','b'],['e','d'],['s','t']]
>>> import itertools
>>> list(itertools.product(*test_list))
[('l', 'e', 's'), ('l', 'e', 't'), ('l', 'd', 's'), ('l', 'd', 't'), ('b', 'e', 's'), ('b', 'e', 't'), ('b', 'd', 's'), ('b', 'd', 't')]
If you want the result to be in string form, use join:
>>> [''.join(p) for p in itertools.product(*test_list)]
['les', 'let', 'lds', 'ldt', 'bes', 'bet', 'bds', 'bdt']
I wrote a function in order to remove parts that duplicates in two strings. I first transform string into list and iterate through the two list to find if characters on the same position are the same. The problem is when iterating,
the code skips index 2. (ex:list="index",the iterator jump to 'd' after iterating 'i').
I've tried to use "replace" method to do string operation but I did not get the result I want. "Replace" method removed parts that I want.
def popp(s,t):
s_lis=list(s)
t_lis=list(t)
ind=0
for i,j in zip(s_lis,t_lis):
if i==j:
s_lis.pop(ind)
t_lis.pop(ind)
else:ind+=1
return s_lis,t_lis
# test the code
print(popp('hackerhappy','hackerrank'))
expected result: ['h','p','p','y'] ['r','n','k']
actual result: ['k', 'r', 'h', 'a', 'p', 'p', 'y'], ['k', 'r', 'r', 'a', 'n', 'k']
To begin with, you should use itertools.zip_longest which makes a zip out of the longest subsequence. You are using zip which makes a zip out of the shortest subsequence which is what you don't want.
So in our case, it will be
print(list(zip_longest(s_lis, t_lis)))
#[('h', 'h'), ('a', 'a'), ('c', 'c'), ('k', 'k'), ('e', 'e'),
#('r', 'r'), ('h', 'r'), ('a', 'a'), ('p', 'n'), ('p', 'k'), ('y', None)]
Then you should use another list to append the non-common characters rather then operating on the same list you are iterating on via s_lis.pop(idx)
So if the characters in the tuple do not match, append them if they are not None
from itertools import zip_longest
def popp(s,t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
#Use zip_longest to zip the two lists
for i, j in zip_longest(s_lis, t_lis):
#If the characters do not match, and they are not None, append them
#to the list
if i != j:
if i!=None:
s_res.append(i)
if j!=None:
t_res.append(j)
return s_res, t_res
The output will look like:
print(popp('hackerhappy','hackerrank'))
#(['h', 'p', 'p', 'y'], ['r', 'n', 'k'])
You could modify your code slightly
def popp(s, t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
# match each character. Stops once the
# shortest list ends
for i, j in zip(s_lis, t_lis):
if i != j:
s_res.append(i)
t_res.append(j)
# if s is longer, take rest of the string and
# add it to residual
if len(s) > len(t):
for x in s_lis[len(t):]:
s_res.append(x)
if len(t) > len(s):
for x in t_lis[len(s):]:
t_res.append(x)
print(s_res)
print(t_res)
popp('hackerhappy','hackerrank')
I'm trying to take something like:
input = "hello world"
And get the following result:
[('h', 'e', 'l', 'l', 'o'), ('w', 'o', 'r', 'l', 'd')]
I was able to split the input into individual words, and then the words into a list of characters, and then the list into a tuple...but they aren't separated by word like they are in the example.
sentence = input("Enter a sentence: ")
word_list = sentence.split()
print(word_list)
chars = []
for x in sentence:
chars.append(x)
print(chars)
tuple_list = tuple(word_list)
print(type(tuple_list))
The code above prints
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
and not
[('h', 'e', 'l', 'l', 'o'), ('w', 'o', 'r', 'l', 'd')]
What am I doing wrong?
Thank you very much!
You can split the string and map the sub-strings to the tuple constructor:
s = "hello world"
list(map(tuple, s.split()))
Let's go by your original attempt.
word_list = sentence.split() # ['hello', 'world']
We have a list of two words, however in your attempt, you iterate over the original input from the user as opposed to the word list that you made. So the code should become:
chars = []
for x in word_list:
chars.append(tuple(x))
print(chars)
# [('h', 'e', 'l', 'l', 'o'), ('w', 'o', 'r', 'l', 'd')]
I'm using Python 2.7.
Let's say I have a list like so:
string_list = ['hello', 'apple', 'green', 'paint', 'sting']
Where each string in the list is the same length.
I want to create a generator that would be doing something like the following code:
for i in xrange(len(string_list)):
my_gen = (ch for a_string[i] in string_list)
So the first run, my_gen would have 'h', 'a', 'g', 'p', s'. The next run it would have 'e', 'p', 'r', 'a', 't'.
Just use the built-in function zip -
like in
for letters in zip('hello', 'apple', 'green', 'paint', 'sting'):
print letters
zip is a built-in that does just that: combine one element of each iterable in a tuple, for each iteration.
Running the above example, you have:
>>> for letters in zip('hello', 'apple', 'green', 'paint', 'sting'):
... print letters
...
('h', 'a', 'g', 'p', 's')
('e', 'p', 'r', 'a', 't')
('l', 'p', 'e', 'i', 'i')
('l', 'l', 'e', 'n', 'n')
('o', 'e', 'n', 't', 'g')
izip does exactly what you want:
from itertools import izip
for letters in izip(*string_list):
print letters
The * operator unpacks your string_list so that izip sees it as five sequences of characters, instead of just a single list of strings.
Output:
('h', 'a', 'g', 'p', 's')
('e', 'p', 'r', 'a', 't')
('l', 'p', 'e', 'i', 'i')
('l', 'l', 'e', 'n', 'n')
('o', 'e', 'n', 't', 'g')
The built-in zip function works too, but it's not lazy (i.e. it immediately returns a list of all the tuples, instead of generating them one at a time).
The following recipe comes from the itertools documentation:
from itertools import islice, cycle
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).next for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Besides being very fast, one advantage of this approach is that it works well if the input iterables are of different lengths.
Use the zip function that takes several lists (iterables) and yields tuples of corresponding items:
zip(*string_list)
yields (successively)
[('h', 'a', 'g', 'p', 's'),
('e', 'p', 'r', 'a', 't'),
('l', 'p', 'e', 'i', 'i'),
('l', 'l', 'e', 'n', 'n'),
('o', 'e', 'n', 't', 'g')]
def foo(string_list):
for i in xrange(len(string_list)):
yield (a_string[i] for a_string in string_list)
string_list = ['hello', 'apple', 'green', 'paint', 'sting']
for nth_string_list in foo(string_list):
for ch in nth_string_list:
print ch
val = zip('hello','apple','green','paint','sting')
or zip(*string_list)
print val[0]
output = ('h', 'a', 'g', 'p', 's')
print val[1]
output = ('e', 'p', 'r', 'a', 't')