Word count with pattern in Python - python

So this is the question:
Write a program to read in multiple lines of text and count the number
of words in which the rule i before e, except after c is broken, and
number of words which contain either ei or ie and which don't break
the rule.
For this question, we only care about the c if it is the character
immediately before the ie or the ei. So science counts as breaking the
rule, but mischievous doesn't. If a word breaks the rule twice (like
obeisancies), then it should still only be counted once.
Example given:
Line: The science heist succeeded
Line: challenge accepted
Line:
Number of times the rule helped: 0
Number of times the rule was broken: 2
and my code:
rule = []
broken = []
line = None
while line != '':
line = input('Line: ')
line.replace('cie', 'broken')
line.replace('cei', 'rule')
line.replace('ie', 'rule')
line.replace('ei', 'broken')
a = line.count('rule')
b = line.count('broken')
rule.append(a)
broken.append(b)
print(sum(a)); print(sum(b))
How do I fix my code, to work like the question wants it to?

I'm not going to write the code to your exact specification as it sounds like homework but this should help:
import pprint
words = ['science', 'believe', 'die', 'friend', 'ceiling',
'receipt', 'seize', 'weird', 'vein', 'foreign']
rule = {}
rule['ie'] = []
rule['ei'] = []
rule['cei'] = []
rule['cie'] = []
for word in words:
if 'ie' in word:
if 'cie' in word:
rule['cie'].append(word)
else:
rule['ie'].append(word)
if 'ei' in word:
if 'cei' in word:
rule['cei'].append(word)
else:
rule['ei'].append(word)
pprint.pprint(rule)
Save it to a file like i_before_e.py and run python i_before_e.py:
{'cei': ['ceiling', 'receipt'],
'cie': ['science'],
'ei': ['seize', 'weird', 'vein', 'foreign'],
'ie': ['believe', 'die', 'friend']}
You can easily count the occurrences with:
for key in rule.keys():
print "%s occured %d times." % (key, len(rule[key]))
Output:
ei occured 4 times.
ie occured 3 times.
cie occured 1 times.
cei occured 2 times.

Firstly, replace does not chance stuff in place. What you need is the return value:
line = 'hello there' # line = 'hello there'
line.replace('there','bob') # line = 'hello there'
line = line.replace('there','bob') # line = 'hello bob'
Also I would assume you want actual totals so:
print('Number of times the rule helped: {0}'.format(sum(rule)))
print('Number of times the rule was broken: {0}'.format(sum(broken)))
You are printing a and b. These are the numbers of times the rule worked and was broken in the last line processed. You want totals.
As a sidenote: Regular expressions are good for things like this. re.findall would make this a lot more sturdy and pretty:
line = 'foo moo goo loo foobar cheese is great '
foo_matches = len(re.findall('foo', line)) # = 2

Let's split the logic up into functions, that should help us reason about the code and get it right. To loop over the line, we can use the iter function:
def rule_applies(word):
return 'ei' in word or 'ie' in word
def complies_with_rule(word):
if 'cie' in word:
return False
if word.count('ei') > word.count('cei'):
return False
return True
helped_count = 0
broken_count = 0
lines = iter(lambda: input("Line: "), '')
for line in lines:
for word in line.split():
if rule_applies(word):
if complies_with_rule(word):
helped_count += 1
else:
broken_count += 1
print("Number of times the rule helped:", helped_count)
print("Number of times the rule was broken:", broken_count)
We can make the code more concise by shortening the complies_with_rule function and by using generator expressions and Counter:
from collections import Counter
def rule_applies(word):
return 'ei' in word or 'ie' in word
def complies_with_rule(word):
return 'cie' not in word and word.count('ei') == word.count('cei')
lines = iter(lambda: input("Line: "), '')
words = (word for line in lines for word in line.split())
words_considered = (word for word in words if rule_applies(word))
did_rule_help_count = Counter(complies_with_rule(word) for word in words_considered)
print("Number of times the rule helped:", did_rule_help_count[True])
print("Number of times the rule was broken:", did_rule_help_count[False])

If I understand correctly, your main problematic is to get unique result per word. Is that what you try to achieve:
rule_count = 0
break_count = 0
line = None
while line != '':
line = input('Line: ')
rule_found = False
break_found = False
for word in line.split():
if 'cie' in line:
line = line.replace('cie', '')
break_found = True
if 'cei' in line:
line = line.replace('cei', '')
rule_found = True
if 'ie' in line:
rule_found = True
if 'ei' in line:
break_found = True
if rule_found:
rule_count += 1
if break_found:
break_count += 1
print(rule_found); print(break_count)

rule = []
broken = []
tb = 0
tr = 0
line = ' '
while line:
lines = input('Line: ')
line = lines.split()
for word in line:
if 'ie' in word:
if 'cie' in word:
tb += 1
elif word.count('cie') > 1:
tb += 1
elif word.count('ie') > 1:
tr += 1
elif 'ie' in word:
tr += 1
if 'ei' in word:
if 'cei' in word:
tr += 1
elif word.count('cei') > 1:
tr += 1
elif word.count('ei') > 1:
tb += 1
elif 'ei' in word:
tb += 1
print('Number of times the rule helped: {0}'.format(tr))
print('Number of times the rule was broken: {0}'.format(tb))
Done.

Related

split by regex except

import re
digit_count=0
number_count = 0
numbers = []
count=0
with open ("letters_and_numbers.txt") as f:
for line in f.readlines():
sub_strs = line.rstrip().split("-")
file_words = re.split(r"[a-zA-Z\W]",line.rstrip())
for word in file_words:
if word.isdigit():
digit_count += len(word)
number_count += 1
numbers.append(word)
foundNeg=False
for i in range(1, len(sub_strs)):
if str(sub_strs).startswith(word):
foundNeg == True
count-=int(word)
else:
foundNeg==False
count+=int(word)
print "digits:",digit_count
print "amount of numbers:",number_count
print "numbers:",numbers
print "total:",count
I am trying to get the above program to work but it only can if i can split by regex whitespaces except negative signs.How do I fix it???
NEVER MIND I FIGURED IT OUT
Here is my answer.It is the code I used and it worked even when there were no spaces between numbers and letters.
import re
digit_count=0
number_count = 0
numbers = []
count=0
with open ("letters_and_numbers.txt") as f:
for line in f.readlines():
sub_strs = line.rstrip().split("-")
for i in range(0, len(sub_strs)):
file_words = re.split(r"[a-zA-Z\W]",sub_strs[i])
for word in file_words:
if word.isdigit():
digit_count += len(word)
number_count += 1
if i >=1 and sub_strs[i].startswith(word):
count-=int(word)
digit_count+=1
numbers.append("-")
numbers.append(word)
else:
count+=int(word)
numbers.append(word)
print "digits:",digit_count
print "amount of numbers:",number_count
print "numbers:",numbers
print "total:",count

How to check if a string contains a specific character or not in python

I am new to python, but fairly experienced in programming. While learning python I was trying to create a simple function that would read words in from a text file (each line in the text file is a new word) and then check if the each word has the letter 'e' or not. The program should then count the amount of words that don't have the letter 'e' and use that amount to calculate the percentage of words that don't have an 'e' in the text file.
I am running into a problem where I'm very certain that my code is right, but after testing the output it is wrong. Please help!
Here is the code:
def has_n_e(w):
hasE = False
for c in w:
if c == 'e':
hasE = True
return hasE
f = open("crossword.txt","r")
count = 0
for x in f:
word = f.readline()
res = has_n_e(word)
if res == False:
count = count + 1
iAns = (count/113809)*100 //113809 is the amount of words in the text file
print (count)
rAns = round(iAns,2)
sAns = str(rAns)
fAns = sAns + "%"
print(fAns)
Here is the code after doing some changes that may help:
def has_n_e(w):
hasE = False
for c in w:
if c == 'e':
hasE = True
return hasE
f = open("crossword.txt","r").readlines()
count = 0
for x in f:
word = x[:-1]
res = has_n_e(word)# you can use ('e' in word) instead of the function
if res == False:
count = count + 1
iAns = (count/len(f))*100 //len(f) #is the amount of words in the text file
print (count)
rAns = round(iAns,2)
sAns = str(rAns)
fAns = sAns + "%"
print(fAns)
Hope this will help

Check if string is exactly the same as line in file

I've been writing a Countdown program in Python, and in it. I've written this:
#Letters Game
global vowels, consonants
from random import choice, uniform
from time import sleep
from itertools import permutations
startLetter = ""
words = []
def check(word, startLetter):
fileName = startLetter + ".txt"
datafile = open(fileName)
for line in datafile:
print("Checking if", word, "is", line.lower())
if word == line.lower():
return True
return False
def generateLetters():
lettersLeft = 9
output = []
while lettersLeft >= 1:
lType = input("Vowel or consonant? (v/c)")
sleep(uniform(0.5, 1.5))
if lType not in ("v", "c"):
print("Please input v or c")
continue
elif lType == "v":
letter = choice(vowels)
print("Rachel has picked an", letter)
vowels.remove(letter)
output.append(letter)
elif lType == "c":
letter = choice(consonants)
print("Rachel has picked a", letter)
consonants.remove(letter)
output.append(letter)
print("Letters so far:", output)
lettersLeft -= 1
return output
def possibleWords(letters, words):
for i in range(1,9):
print(letters)
print(i)
for item in permutations(letters, i):
item = "".join(list(item))
startLetter = list(item)[0]
if check(item, startLetter):
print("\n\n***Got one***\n", item)
words.append(item)
return words
vowels = ["a"]*15 + ["e"]*21 + ["i"]*13 + ["o"]*13+ ["u"]*5
consonants = ["b"]*2 + ["c"]*3 + ["d"]*6 + ["f"]*2 + ["g"]*3 +["h"]*2 +["j"]*1 +["k"]*1 +["l"]*5 +["m"]*4 +["n"]*8 +["p"]*4 +["q"]*1 +["r"]*9 +["s"]*9 +["t"]*9 + ["v"]*1 +["w"]*1 +["x"]*1 +["y"]*1 +["z"]*1
print("***Let's play a letters game!***")
sleep(3)
letters = generateLetters()
sleep(uniform(1, 1.5))
print("\n\n***Let's play countdown***\n\n\n\n\n")
print(letters)
for count in reversed(range(1, 31)):
print(count)
sleep(1)
print("\n\nStop!")
print("All possible words:")
print(possibleWords(letters, words))
'''
#Code for sorting the dictionary into files
alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet = list(alphabet)
for letter in alphabet:
allFile = open("Dictionary.txt", "r+")
filename = letter + ".txt"
letterFile = open(filename, "w")
for line in allFile:
if len(list(line.lower())) <= 9:
if list(line.lower())[0] == letter:
print("Writing:", line.lower())
letterFile.write(line.lower())
allFile.close()
letterFile.close()
I have 26 text files called a.txt, b.txt, c.txt... to make the search quicker
(Sorry it's not very neat - I haven't finished it yet)
However, instead of returning what I expect (pan), it returns all words with pan in it (pan, pancake, pans, pandemic...)
Is there any way in Python you can only return the line if it's EXACTLY the same as the string? Do I have to .read() the file first?
Thanks
Your post is strangely written so excuse me if I missmatch
Is there any way in Python you can only return the line if it's EXACTLY the same as the string? Do I have to .read() the file first?
Yes, there is!!!
file = open("file.txt")
content = file.read() # which is a str
lines = content.split('\n') # which is a list (containing every lines)
test_string = " pan "
positive_match = [l for l in lines if test_string in l]
This is a bit hacky since we avoid getting pancake for pan (for instance) but using spaces (and then, what about cases like ".....,pan "?). You should have a look at tokenization function. As pythonists, we hve one of the best library for this: nltk
(because, basically, you are reinventing the wheel)

How to not print a line (rather than printing 0) if it doesn't satisfy if statement - Python

I want to print out the line count and hashtag count only if the line starts with "RT" AND has at least 1 hashtag in it.
At the moment my code will print the line if it starts with "RT" but will give a 0 for the hashtag count is 0. However I don't want it to print the line at all.
i.e. there should be no lines of code that print out as "49 : 0"
PLEASE HELP - feel like I'm nearly there but need to add some more code into the second for loop.
hashtag_count = 0
line_count = 0
for line in open("tweets.txt"):
line_split = line.split()
hashtag_count = 0
if line.startswith("RT "):
for word in line_split:
if "#" in word:
hashtag_count += 1
print(str(line_count) + " : " + str(hashtag_count))
line_count += 1
You have reset value of hashtag_count in loop, remove that statement
hashtag_count = 0
line_count = 0
for line in open("tweets.txt"):
line_strip = line.strip()
line_split = line_strip.split()
hashtag_count = 0 #Remove this line <----------------------------
if line_strip.startswith("RT "):
for word in line_split
if word.startswith("#"):
hashtag_count += 1
print(str(line_count) + " : " + str(hashtag_count))
line_count += 1
After removal it will return number of hashtag_count correctly.
if hashtag_count:
print(str(line_count) + " : " + str(hashtag_count))
P.S.
RT
Russia Today? :-O
Maybe I missed something but why you're iterating over words?
If you just need to print only line number with # count for this lines starting with 'RT', try the following:
with open("tweets.txt") as f:
for line_number, line in enumerate(f, 1):
if line.startswith('RT'):
hash_count = line.count('#')
print("{} : {}".format(line_number, hash_count))

Python - how to print amount of numbers, periods, and commas in file

def showCounts(fileName):
lineCount = 0
wordCount = 0
numCount = 0
comCount = 0
dotCount = 0
with open(fileName, 'r') as f:
for line in f:
words = line.split()
lineCount += 1
wordCount += len(words)
for word in words:
# ###text = word.translate(string.punctuation)
exclude = set(string.punctuation)
text = ""
text = ''.join(ch for ch in text if ch not in exclude)
try:
if int(text) >= 0 or int(text) < 0:
numCount += 1
# elif text == ",":
# comCount += 1
# elif text == ".":
# dotCount += 1
except ValueError:
pass
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
print("Number count: " + str(numCount))
print("Comma count: " + str(comCount))
print("Dot count: " + str(dotCount) + "\n")
Basically it will show the number of lines and the number of words, but I can't get it to show the number of numbers, commas, and dots. I have it read a file that the user enters and then show the amount of lines and words, but for some reason it says 0 for numbers commas and dots. I commented out the part where it gave me trouble. If i remove the comma then i just get an error. thanks guys
This code loops over every character in each line, and adds 1 to its variable:
numCount = 0
dotCount = 0
commaCount = 0
lineCount = 0
wordCount = 0
fileName = 'test.txt'
with open(fileName, 'r') as f:
for line in f:
wordCount+=len(line.split())
lineCount+=1
for char in line:
if char.isdigit() == True:
numCount+=1
elif char == '.':
dotCount+=1
elif char == ',':
commaCount+=1
print("Number count: " + str(numCount))
print("Comma count: " + str(commaCount))
print("Dot count: " + str(dotCount))
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
Testing it out:
test.txt:
Hello, my name is B.o.b. I like biking, swimming, and running.
I am 125 years old, and I was 124 years old 1 year ago.
Regards,
B.o.b
Running:
bash-3.2$ python count.py
Number count: 7
Comma count: 5
Dot count: 7
Line count: 6
Word count: 27
bash-3.2$
Everything makes sense here, except the lineCount the reason why this is 6 is because of newlines. In my editor (nano), it adds a newline to the end of any file by default. So just imagine the text file to be this:
>>> x = open('test.txt').read()
>>> x
'Hello, my name is B.o.b. I like biking, swimming, and running.\n\nI am 125 years old, and I was 124 years old 1 year ago.\n\nRegards,\nB.o.b \n'
>>> x.count('\n')
6
>>>
Hope this helps!
For the punctuations, why not just do:
def showCounts(fileName):
...
...
with open(fileName, 'r') as fl:
f = fl.read()
comCount = f.count(',')
dotCount = f.count('.')
You could use the Counter class to take care of it you:
from collections import Counter
with open(fileName, 'r') as f:
data = f.read().strip()
lines = len(data.split('\n'))
words = len(data.split())
counts = Counter(data)
numbers = sum(v for (k,v) in counts.items() if k.isdigit())
print("Line count: {}".format(lines))
print("Word count: {}".format(words))
print("Number count: {}".format(numbers))
print("Comma count: {}".format(counts[',']))
print("Dot count: {}".format(counts['.']))

Categories

Resources