I am trying to write a code which inputs a line from user, splits it and feed it up to majestic dictionary named counts. All is well until we ask her majesty for some data. I want the data in the format such that the word is printed first and number of times it repeats printed next to it. Below is the code I managed to write.
counts = dict()
print('Enter a line of text:')
line = input('')
words = line.split()
print('Words:', words)
print('Counting:')
for word in words:
counts[word] = counts.get(word,0) + 1
for wording in counts:
print('trying',counts[wording], '' )
When it executes its output is unforgivable.
Words: ['You', 'will', 'always', 'only', 'get', 'an', 'indent', 'error', 'if', 'there', 'is', 'actually', 'an', 'indent', 'error.', 'Double', 'check', 'that', 'your', 'final', 'line', 'is', 'indented', 'the', 'same', 'was', 'as', 'the', 'other', 'lines', '--', 'either', 'with', 'spaces', 'or', 'with', 'tabs.', 'Most', 'likely,', 'some', 'of', 'the', 'lines', 'had', 'spaces', '(or', 'tabs)', 'and', 'the', 'other', 'line', 'had', 'tabs', '(or', 'spaces).']
Counting:
trying 1
trying 1
trying 1
trying 1
trying 1
trying 2
trying 2
trying 1
trying 1
trying 1
trying 2
trying 1
trying 1
trying 1
trying 1
trying 1
trying 1
trying 1
trying 2
It just prints trying and number of times it is repeated and without the word(I think it is called index in dictionary, correct me if I am wrong)
Thankyou
Please help me and when replying to this question please keep in mind I am a newbie, both to python and stack overflow.
Nowhere in your code do you attempt to print the word. How did you expect it to appear in the output? If you want the word, put it in the list of things to print:
print(wording, counts[wording])
For more education, look up the package collections, and use the Counter construct.
counts = Counter(words)
will do all of your word counts for you.
I'm confuzled as to why you print trying.
Try this instead.
counts = dict()
print('Enter a line of text:')
line = input('')
words = line.split()
print('Words:', words)
print('Counting:')
for word in words:
counts[word] = counts.get(word,0) + 1
for wording in counts:
print(wording,counts[wording], '' )
You should use counts.items() to iterate over the key and value of counts as follows:
counts = dict()
print('Enter a line of text:')
line = input('')
words = line.split()
print('Words:', words)
print('Counting:')
for word in words:
counts[word] = counts.get(word,0) + 1
for word, count in counts.items(): # notice this!
print(f'trying {word} {count}')
Also notice that you can use an f-string when printing.
The code you have iterates over the dictionary keys and prints only the count in the dictionary. You would want to do something like this:
for word, count in counts.items():
print('trying', word, count)
You might also want to use
from collections defaultdict
counts = defaultdict(lambda: 0)
So while adding to the dictionary, the code would be as simple as
counts[word] += 1
Related
I am trying to create this function which takes a string as input and returns a list containing the stem of each word in the string. The problem is, that using a nested for loop, the words in the string are appended multiple times in the list. Is there a way to avoid this?
def stemmer(text):
stemmed_string = []
res = text.split()
suffixes = ('ed', 'ly', 'ing')
for word in res:
for i in range(len(suffixes)):
if word.endswith(suffixes[i]):
stemmed_string.append(word[:-len(suffixes[i])])
elif len(word) > 8:
stemmed_string.append(word[:8])
else:
stemmed_string.append(word)
return stemmed_string
If I call the function on this text ('I have a dog is barking') this is the output:
['I',
'I',
'I',
'have',
'have',
'have',
'a',
'a',
'a',
'dog',
'dog',
'dog',
'that',
'that',
'that',
'is',
'is',
'is',
'barking',
'barking',
'bark']
You are appending something in each round of the loop over suffixes. To avoid the problem, don't do that.
It's not clear if you want to add the shortest possible string out of a set of candidates, or how to handle stacked suffixes. Here's a version which always strips as much as possible.
def stemmer(text):
stemmed_string = []
suffixes = ('ed', 'ly', 'ing')
for word in text.split():
for suffix in suffixes:
if word.endswith(suffix):
word = word[:-len(suffix)]
stemmed_string.append(word)
return stemmed_string
Notice the fixed syntax for looping over a list, too.
This will reduce "sparingly" to "spar", etc.
Like every naïve stemmer, this will also do stupid things with words like "sly" and "thing".
Demo: https://ideone.com/a7FqBp
I'm trying to make a MapReduce WordCount that get a large article and counts proper nouns.
Here's requirements:
Starts with a capital letter and has never been found in the text with a small letter
Has length between 2 and 7 letters
Sort in descending order
Looks like a typical WordCount mapreduce, but I couldn't do this. How to get rid of all the punctuation marks? What's the right way to construct mapper and reducer?
import sys
import re
for line in sys.stdin:
article_id, text = line.strip().split('\t', 1)
text = re.sub('\W', ' ', text).split(' ')
for word in text:
if len(word) >= 2 and len(word) < 7:
key = "".join(sorted(word.lower()))
print("{}\t{}\t{}".format(key, word.lower(), 1))
If you are only looking for words, since you already imported re, you can use re.compile (one solution) :
re.compile('\w+').findall(text)
This way you remove all the punctuation in the string, keeping only letters and numbers.
If you take the string below :
text = "Looks like a typical WordCount mapreduce, but I couldn't do this. How to get rid of all the punctuation marks"
you quickly obtain :
liste = ['Looks', 'like', 'a', 'typical', 'WordCount', 'mapreduce', 'but', 'I', 'couldn', 't', 'do', 'this', 'How', 'to', 'get', 'rid', 'of', 'all', 'the', 'punctuation', 'marks']
On which you can run your for loop in the same way.
import re
complex_sen_count = 0
sen = '5th grade. Very easy to read. Easily understood by an average 11-year-old student.'
search_list = [',', 'after', 'although', 'as', 'because',
'before', 'even though', 'if', 'since', 'though',
'unless', 'until', 'when', 'whenever', 'whereas',
'wherever','while']
s = sen.split('. ')
for n in s:
print(n)
if re.compile('|'.join(search_list),re.IGNORECASE).search(n):
complex_sen_count+=1
print("value: ",complex_sen_count)
the value should return 0 because there are no "search_list" words in the string. but still it is incrementing the variable complex_sen_count.
output is:
5th grade
Very easy to read
Easily understood by an average 11-year-old student.
value: 2
expected output: 0
please help.
There are exactly 2 matches:
'5th grade. Very easy to read. Easily understood by an average 11-year-old student.'
To search for a word, add whitespace before and after the word eg: \sas\s (\s means a whitespace).
I am trying to find the maximum length word in a sentence, like
a = "my name is john and i am working in STACKOVERFLOWLIMITED"
To fetch the largest word in this sentence, I am trying something like
c = a.split(',')
c = ['my', 'name', 'is', 'john', 'and', 'i', 'am', 'working', 'in', 'STACKOVERFLOWLIMITED']
When I am trying to print max (C)
output - 'working'
Why the output doesn't contain "STACKOVERFLOWLIMITED" as the longest word in that sentence?
that's why the working word considers as the maximum alphabetically word, not length.
try this :
result = max(a.split(), key=len)
print(result)
another way is...
sorted([(x,len(x)) for x in c],key= lambda x: x[1])[-1][0]
I'm trying to count the number of keywords in another py file
here what's i made:
import keyword
infile=open(xx.py,'r')
contentbyword=infile.read().split()
num_of_keywords=0
for word in contentbyword:
if keyword.iskeyword(word) or keyword.iskeyword(word.replace(':','')):
num_of_keywords+=1
I know it's buggy as even if the keywords is inside a quote or after a # sign, it also counts.
So what is the better way to count the orange-highlighted words (IDLE default) in python? Many Thanks<(_ _)>
The correct way to do this is using the tokenize module, which takes care of all the edge cases.
import token
import keyword
import tokenize
s = open('hi.py').readline
counter = 0
l = []
for i in tokenize.generate_tokens(s):
if i.type == token.NAME and keyword.iskeyword(i.string):
counter += 1
l.append(i.string)
print(counter)
print(l)
Should consider using Counter: here a snippet to grep all keywords in a file according a list of keywords:
from collections import Counter
def get_kws(file_in, keywords_list):
with open(file_in) as fin:
# load all content => not suitable for large file
content = fin.read()
# split by non-word
words = re.split(r"\W", content)
counter = Counter(words)
for word, c in counter.items():
if word and word in keywords_list:
yield word, c
EDIT:
for the list of python keywords =>
keywords_list = ['and', 'del', 'from', 'not', 'while', 'as', 'elif', 'global', 'or', 'with', 'assert', 'else', 'if', 'pass', 'yield', 'break', 'except', 'import', 'print', 'class', 'exec', 'in', 'raise', 'continue', 'finally', 'is', 'return', 'def', 'for', 'lambda', 'try']
The below code should do for python 2.7 codes.
import keyword
import re
handle = open("asdf.py","r")
data = str(handle.read())
data = re.sub(r'".*"', r"",data)
data = re.sub(r'#.*' , r"",data)
mystr = data.split()
mykeys = keyword.kwlist
count=0;
for i in mystr:
i = re.sub(r':',r'',i)
if i in mykeys:
print i
count=count+1
else:
count+=0
print count
replace the suitable filename, cheers!