Finding honorific text and then capital letters in Python

Finding honorific text and then capital letters in Python - python

file = open('AllCongress.txt', "r")
lines = file.readlines()
file.close()
a = ""
b = ""
c = ""
d = ""
e = ""
f = ""
g = ""
y = ""
z = ""
for x in lines:
if ((x == "M") or (a == "M")):
a == "M"
if((x == "r") or (b == "r")):
b == "r"
if((x == ".") or (c == ".")):
c == "."
if((x == " ") or (d == " ")):
d == " "
if((x == x.isupper()) or (e == "A"):
e == "A"
if((x == x.isupper()) and ((e == "A") and (f == "A"))):
f == "A"
if((x == x.isupper()) or (g == g.isupper())):
g == "A"
I'm trying to divide sections of a .txt file into two separate files based on whether it contains a male or female speaker. Speakers in this file type have their last name in all caps after the honorific. So the male honorific format for speakers in this file is "Mr. XYZ" with XYZ being any 3+ capital letters (3 capital letters in a row is enough to detect anybody in the file as a speaker). The female format is similar, just either "Ms. XYZ" or "Mrs. XYZ".
I want to get all of the text after a speaker name like that is listed and then sort them into separate male and female text file, until the next speaker speaks where I have to determine gender again to sort.
Unfortunately though, I'm new to Python and I'm unable to figure out a way that I can check for both "Mr. " or "Mrs. " and then at least 3 capital letters in a row afterwards. I really just need a way to detect this and I can probably figure out the rest. The code above is this really messy and unfinished way I tried to capture the "Mr. XYZ" part of text.

here is some code (if you have questions -> ask):
with open('yourfile.txt') as file:
lines = file.read()
lines = lines.split(' ')
for index, word in enumerate(lines):
if word == 'Mr.' and lines[index + 1].isupper():
prefix = 'Mr. '
name = lines[index + 1]
print(prefix + name)
elif word == 'Mrs.' and lines[index + 1].isupper():
prefix = 'Mrs. '
name = lines[index + 1]
print(prefix + name)
I suggest you use with statement for opening files. Basically you loop over each word and check if it eaquals what you need.

Use String functions like startswith() or endswith()
And don't close the file before reading lines
file = open('AllCongress.txt', "r")
lines = file.readlines()
lines = lines.split(' ')
for x in lines:
if(x.startswith("Mr."):
*code here*
elif(x.startswith("Ms."):
*code here*

Related

Python - Trying to print each line that has a string value that matches the index provided

Now I'm still pretty new to python and programming in general, but I know I've gone a bit of a roundabout way this entire program. But here is what I have and what I want if anyone can help.
Begins by reading a text file e.g.
ADF101,Lecture,Monday,08:00,10:00,Jenolan,J10112,Joe Blo
ADF101,Tutorial,Thursday,10:00,11:00,Jenolan,J10115,Cat Blue
ALM204,Lecture,Monday,09:00,11:00,Tarana,T05201,Kim Toll
Then I make empty lists and append them with each index...
subjects = []
lecturer = []
for line in x:
f = line.split(',')
if len(fields) == 8:
subjects.append([0])
lecturer.append(f[7])
Then I provide an input that runs a function based on the input.
while True:
filter = input("Choose filter. Or [Q]uit: ")
if filter.upper() == 'S':
S(subjects)
break
elif filter.upper() == 'L':
L(lecturer)
break
Now if they choose L it runs this...
def L(lecturer):
print ("""
Lecturers
---------
""")
print (''.join(['[' + str(ind + 1) + '] ' + x for ind, x in enumerate(lecturer)]))
pick_lecturer = input("\npick a lecturer: ")
Which outputs like:
[1] Joe Blo
[2] Cat Blue
[3] Kim Toll
Here's where I'm stuck. I want to make it so that if the last
input is '1' it will read the file for each line with Joe Blo
and print the entire line. Without any external modules or libraries
Any guidance is appreciated. Thanks.

You can use csv module to read the file into a list. In this example, I read each row from the file into a namedtuple:
import csv
from collections import namedtuple
Item = namedtuple(
"Item", "subject type day time_from time_to some_column1 some_column2 name"
)
def L(data):
all_lecturers = list(set(d.name for d in data))
for i, name in enumerate(all_lecturers, 1):
print("[{}] {}".format(i, name))
while True:
try:
pick_lecturer = int(input("\npick a lecturer: "))
lecturer_name = all_lecturers[pick_lecturer - 1]
break
except:
continue
for d in [d for d in data if d.name == lecturer_name]:
print(d)
# 1. read the file
data = []
with open("your_file.txt", "r") as f_in:
reader = csv.reader(f_in)
for row in reader:
data.append(Item(*row))
# 2. choose a filter
while True:
filter_ = input("Choose filter. Or [Q]uit: ")
if filter_.upper() == "Q":
break
# if filter_.upper() == "S":
# S(data)
# break
elif filter_.upper() == "L":
L(data)
break
Prints:
Choose filter. Or [Q]uit: L
[1] Cat Blue
[2] Kim Toll
[3] Joe Blo
pick a lecturer: 3
Item(subject='ADF101', type='Lecture', day='Monday', time_from='08:00', time_to='10:00', some_column1='Jenolan', some_column2='J10112', name='Joe Blo')

How to check if a string contains a specific character or not in python

I am new to python, but fairly experienced in programming. While learning python I was trying to create a simple function that would read words in from a text file (each line in the text file is a new word) and then check if the each word has the letter 'e' or not. The program should then count the amount of words that don't have the letter 'e' and use that amount to calculate the percentage of words that don't have an 'e' in the text file.
I am running into a problem where I'm very certain that my code is right, but after testing the output it is wrong. Please help!
Here is the code:
def has_n_e(w):
hasE = False
for c in w:
if c == 'e':
hasE = True
return hasE
f = open("crossword.txt","r")
count = 0
for x in f:
word = f.readline()
res = has_n_e(word)
if res == False:
count = count + 1
iAns = (count/113809)*100 //113809 is the amount of words in the text file
print (count)
rAns = round(iAns,2)
sAns = str(rAns)
fAns = sAns + "%"
print(fAns)

Here is the code after doing some changes that may help:
def has_n_e(w):
hasE = False
for c in w:
if c == 'e':
hasE = True
return hasE
f = open("crossword.txt","r").readlines()
count = 0
for x in f:
word = x[:-1]
res = has_n_e(word)# you can use ('e' in word) instead of the function
if res == False:
count = count + 1
iAns = (count/len(f))*100 //len(f) #is the amount of words in the text file
print (count)
rAns = round(iAns,2)
sAns = str(rAns)
fAns = sAns + "%"
print(fAns)
Hope this will help

Simple way to remove duplicate whitespaces and remove all \n efficiently

I have a file called test.txt It has a bunch of duplicate spaces. The test.txt file contains HTML. I want to remove all the unnessary whitespace to reduce the size of contents in the test.txt file. How can I remove the duplicate spaces and make the entire string on one line.
test.txt
<center>
<b class="test" >My name
is
fred</ b> <center>
What I want to print
<center><b class="test">My name is fred</b><center>
What gets printed
<center><b class="test" >Mynameisfred</b> <center>
program.py
def is_white_space(before, curr, after):
# remove duplicate spaces
if (curr == " " and (before == " " or after == " ")):
return True
# Remove all \n
elif (curr == "\n"):
return True
return False
f = open('test.txt', 'r')
contents = f.read()
f.close()
new = "";
i = 0
while (i < len(contents)):
if (i != 0 and
i != (len(contents) - 1) and
not is_white_space(contents[i - 1], contents[i], contents[i + 1])):
new += contents[i]
i += 1
print(new)

This will leave a space between digits or letters.
from string import ascii_letters, digits
def main():
with open('test.txt', 'r') as f:
parts = f.read().split()
keep_separated = set(ascii_letters) | set(digits)
for i in range(len(parts) - 1):
if parts[i][-1] in keep_separated and parts[i + 1][0] in keep_separated:
parts[i] = parts[i] + " "
print(''.join(parts))
if __name__ == '__main__':
main()

Check if string is exactly the same as line in file

I've been writing a Countdown program in Python, and in it. I've written this:
#Letters Game
global vowels, consonants
from random import choice, uniform
from time import sleep
from itertools import permutations
startLetter = ""
words = []
def check(word, startLetter):
fileName = startLetter + ".txt"
datafile = open(fileName)
for line in datafile:
print("Checking if", word, "is", line.lower())
if word == line.lower():
return True
return False
def generateLetters():
lettersLeft = 9
output = []
while lettersLeft >= 1:
lType = input("Vowel or consonant? (v/c)")
sleep(uniform(0.5, 1.5))
if lType not in ("v", "c"):
print("Please input v or c")
continue
elif lType == "v":
letter = choice(vowels)
print("Rachel has picked an", letter)
vowels.remove(letter)
output.append(letter)
elif lType == "c":
letter = choice(consonants)
print("Rachel has picked a", letter)
consonants.remove(letter)
output.append(letter)
print("Letters so far:", output)
lettersLeft -= 1
return output
def possibleWords(letters, words):
for i in range(1,9):
print(letters)
print(i)
for item in permutations(letters, i):
item = "".join(list(item))
startLetter = list(item)[0]
if check(item, startLetter):
print("\n\n***Got one***\n", item)
words.append(item)
return words
vowels = ["a"]*15 + ["e"]*21 + ["i"]*13 + ["o"]*13+ ["u"]*5
consonants = ["b"]*2 + ["c"]*3 + ["d"]*6 + ["f"]*2 + ["g"]*3 +["h"]*2 +["j"]*1 +["k"]*1 +["l"]*5 +["m"]*4 +["n"]*8 +["p"]*4 +["q"]*1 +["r"]*9 +["s"]*9 +["t"]*9 + ["v"]*1 +["w"]*1 +["x"]*1 +["y"]*1 +["z"]*1
print("***Let's play a letters game!***")
sleep(3)
letters = generateLetters()
sleep(uniform(1, 1.5))
print("\n\n***Let's play countdown***\n\n\n\n\n")
print(letters)
for count in reversed(range(1, 31)):
print(count)
sleep(1)
print("\n\nStop!")
print("All possible words:")
print(possibleWords(letters, words))
'''
#Code for sorting the dictionary into files
alphabet = "abcdefghijklmnopqrstuvwxyz"
alphabet = list(alphabet)
for letter in alphabet:
allFile = open("Dictionary.txt", "r+")
filename = letter + ".txt"
letterFile = open(filename, "w")
for line in allFile:
if len(list(line.lower())) <= 9:
if list(line.lower())[0] == letter:
print("Writing:", line.lower())
letterFile.write(line.lower())
allFile.close()
letterFile.close()
I have 26 text files called a.txt, b.txt, c.txt... to make the search quicker
(Sorry it's not very neat - I haven't finished it yet)
However, instead of returning what I expect (pan), it returns all words with pan in it (pan, pancake, pans, pandemic...)
Is there any way in Python you can only return the line if it's EXACTLY the same as the string? Do I have to .read() the file first?
Thanks

Your post is strangely written so excuse me if I missmatch
Is there any way in Python you can only return the line if it's EXACTLY the same as the string? Do I have to .read() the file first?
Yes, there is!!!
file = open("file.txt")
content = file.read() # which is a str
lines = content.split('\n') # which is a list (containing every lines)
test_string = " pan "
positive_match = [l for l in lines if test_string in l]
This is a bit hacky since we avoid getting pancake for pan (for instance) but using spaces (and then, what about cases like ".....,pan "?). You should have a look at tokenization function. As pythonists, we hve one of the best library for this: nltk
(because, basically, you are reinventing the wheel)

compare an exact word with the txt file

i am trying to get the exact word match from my file along with their line no.
like when i search for abc10 it gives me all the possible answers e.g abc102 abc103 etc
how can i limitize my code to only print what i commanded..
here is my code!
lineNo = 0
linesFound = []
inFile= open('rxmop.txt', 'r')
sKeyword = input("enter word ")
done = False
while not done :
pos = inFile.tell()
sLine = inFile.readline()
if sLine == "" :
done = True
break
if (sLine.find( sKeyword ) != -1):
print ("Found at line: "+str(lineNo))
tTuple = lineNo, pos
linesFound.append( tTuple )
lineNo = lineNo + 1
done = False
while not done :
command = int( input("Enter the line you want to view: ") )
if command == -1 :
done = True
break
for tT in linesFound :
if command == tT[0] :
inFile.seek( tT[1] )
lLine = inFile.readline()
print ("The line at position " + str(tT[1]) + "is: " + lLine)

"like when i search for abc10 it gives me all the possible answers e.g abc102 abc103 etc"
You split each record and compare whole "words" only.
to_find = "RXOTG-10"
list_of_possibles = ["RXOTG-10 QTA5777 HYB SY G12",
"RXOTG-100 QTA9278 HYB SY G12"]
for rec in list_of_possibles:
words_list=rec.strip().split()
if to_find in words_list:
print "found", rec
else:
print " NOT found", rec

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding honorific text and then capital letters in Python - python

Use String functions like startswith() or endswith() And don't close the file before reading lines file = open('AllCongress.txt', "r") lines = file.readlines() lines = lines.split(' ') for x in lines: if(x.startswith("Mr."): code here elif(x.startswith("Ms."): code here

Related

Python - Trying to print each line that has a string value that matches the index provided

How to check if a string contains a specific character or not in python

Simple way to remove duplicate whitespaces and remove all \n efficiently

Check if string is exactly the same as line in file

compare an exact word with the txt file

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding honorific text and then capital letters in Python - python

Use String functions like startswith() or endswith() And don't close the file before reading lines file = open('AllCongress.txt', "r") lines = file.readlines() lines = lines.split(' ') for x in lines: if(x.startswith("Mr."): *code here* elif(x.startswith("Ms."): *code here*

Related

Python - Trying to print each line that has a string value that matches the index provided

How to check if a string contains a specific character or not in python

Simple way to remove duplicate whitespaces and remove all \n efficiently

Check if string is exactly the same as line in file

compare an exact word with the txt file

Categories

Resources

Use String functions like startswith() or endswith() And don't close the file before reading lines file = open('AllCongress.txt', "r") lines = file.readlines() lines = lines.split(' ') for x in lines: if(x.startswith("Mr."): code here elif(x.startswith("Ms."): code here