Python gender-guesser not able to evaluate gender names

Python gender-guesser not able to evaluate gender names - python

I have a text file with a list of names:
Aaron
Abren
Adrian
Albert
When I run the following code:
import gender_guesser.detector as gender
d = gender.Detector()
file1 = open('names.txt','r')
count = 0
while True:
count += 1
line = file1.readline()
guess = d.get_gender(line)
print(line)
print(guess)
if not line:
break
print(count)
I get the following:
Aaron
unknown
Abren
unknown
Adrian
unknown
Albert
male
unknown
5
It looks like it is only able to evaluate the last name in the file (Albert), and I think it has to do with how it parses through the file. If I add a line break after Albert, it no longer detects Albert as male.
Any thoughts?

It looks like you have an issue with the line terminators. The library doesn't expect those.
Here's a working code snippet:
import gender_guesser.detector as gender
d = gender.Detector()
with open('names.txt') as fin:
for line in fin.readlines():
name = line.strip()
print(d.get_gender(name))
The main fix is adding line.strip().
Using with is just a best practice you should follow, but doesn't change the functionality.
The output is:
male
unknown
male
male

Related

Python Help: How to take input file values and pass their values in the script

I have a requirement where in I have to create one file which will have multiple lines and need to pass the values of the multiple lines as a variable in the python script. And also output file should be created which will display the results.
For example:
input.text : /opt/app_name/file.txt
Andy City State India
Ram City State India
Sandy City State India
Leo City State India
output.text
Andy : success
Ram : Fail
Sandy : success
Leo : Fail
When the script will be executed, it will first ask to enter the file name
Enter the file name: /opt/app_name/file.txt

I am unsure what you are asking for, here is an attempt:
name = input('Enter the file name: ')
lines = open(name).readlines() #Then you don't have to split over '\n'.
lines = [line.strip().split() for line in lines] #Split defaults to spaces, but you can us ',', if you need instead.
processed = []
for name,city,state,country in lines:
success = "don't know how you are determining success"
processed.append(name,success)
open('output_file.txt','w').write('\n'.join(' '.join(line) for line in processed))

Edit last element of each line of text file

I've got a text file with some elements as such;
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,0
3,To Kill a Mockingbird,Harper Lee,1/3/2010,1828
The last element of these lists determine which user has taken out the given book. If 0 then nobody has it.
I want a code to replace the ',0' of any given line into an input 4 digit number to show someone has taken out the book.
I've used .replace to change it from 1828 e.g. into 0 however I don't know how I can change the last element of specific line from 0 to something else.
I cannot use csv due to work/education restrictions so I have to leave the file in .txt format.
I also can only use Standard python library, therefore no pandas.

You can capture txt using read_csv into a dataframe:
import pandas as pd
df = pd.read_csv("text_file.txt", header=None)
df.columns = ["Serial", "Book", "Author", "Date", "Issued"]
print(df)
df.loc[3, "Issued"] = 0
print(df)
df.to_csv('text_file.txt', header=None, index=None, sep=',', mode='w+')
This replaces the third book issued count to 0.
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 1828
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 0
Edit after comment:
In case you only need to use python standard libraries, you can do something like this with file read:
import fileinput
i = 0
a = 5 # line to change with 1 being the first line in the file
b = '8371'
to_write = []
with open("text_file.txt", "r") as file:
for line in file:
i += 1
if (i == a):
print('line before')
print(line)
line = line[:line.rfind(',')] + ',' + b + '\n'
to_write.append(line)
print('line after edit')
print(line)
else:
to_write.append(line)
print(to_write)
with open("text_file.txt", "w") as f:
for line in to_write:
f.write(line)
File content
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84
3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895
4,XYZ,Harper,1/3/2018,258
5,PQR,Lee,1/3/2019,16
gives this as output
line before
4,XYZ,Harper,1/3/2018,258
line after edit
4,XYZ,Harper,1/3/2018,8371
["0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0\n", '1,Harry Potter,JK Rowling,1/1/2010,8137\n', '2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84\n', '3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895\n', '4,XYZ,Harper,1/3/2018,8371\n', '5,PQR,Lee,1/3/2019,16\n', '\n']

you can try this:
to_write = []
with open('read.txt') as f: #read input file
for line in f:
if int(line[-2]) ==0:
to_write.append(line.replace(line[-2],'1234')) #any 4 digit number
else:
to_write.append(line)
with open('output.txt', 'w') as f: #name of output file
for _list in to_write:
f.write(_list)

Text files help(Python 3)

Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.

As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])

In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).

You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)

The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.

in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")

A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)

Python File Reading & Writing

So I need to write a program that reads a text file, and copies its contents to another file. I then have to add a column at the end of the text file, and populate that column with an int that is calculated using the function calc_bill. I can get it to copy the contents of the original file to the new one, but I cannot seem to get my program to read in the ints necessary for calc_bill to run.
Any help would be greatly appreciated.
Here are the first 3 lines of the text file I am reading from:
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
It is copying the file exactly as it is supposed to to the new file. What is not working is writing the bill_amount (calc_bill)/ billVal(main) to the new file in a new column. Here is the expected output to the new file:
CustomerID Title FirstName MiddleName LastName Customer Type Company Name Start Reading End Reading BillVal
1 Mr. Orlando N. Gee Residential 297780 302555 some number
2 Mr. Keith NULL Harris Residential 274964 278126 some number
And here is my code:
def main():
file_in = open("water_supplies.txt", "r")
file_in.readline()
file_out = input("Please enter a file name for the output:")
output_file = open(file_out, 'w')
lines = file_in.readlines()
for line in lines:
lines = [line.split('\t')]
#output_file.write(str(lines)+ "\n")
billVal = 0
c_type = line[5]
start = int(line[7])
end = int(line[8])
billVal = calc_bill(c_type, start, end)
output_file.write(str(lines)+ "\t" + str(billVal) + "\n")
def calc_bill(customer_type, start_reading, end_reading):
price_per_gallon = 0
if customer_type == "Residential":
price_per_gallon = .012
elif customer_type == "Commercial":
price_per_gallon = .011
elif customer_type == "Industrial":
price_per_gallon = .01
if start_reading >= end_reading:
print("Error: please try again")
else:
reading = end_reading - start_reading
bill_amount = reading * price_per_gallon
return bill_amount
main()

There are the issues mentioned above, but here is a small change to your main() method that works correctly.
def main():
file_in = open("water_supplies.txt", "r")
# skip the headers in the input file, and save for output
headers = file_in.readline()
# changed to raw_input to not require quotes
file_out = raw_input("Please enter a file name for the output: ")
output_file = open(file_out, 'w')
# write the headers back into output file
output_file.write(headers)
lines = file_in.readlines()
for line in lines:
# renamed variable here to split
split = line.split('\t')
bill_val = 0
c_type = split[5]
start = int(split[6])
end = int(split[7])
bill_val = calc_bill(c_type, start, end)
# line is already a string, don't need to cast it
# added rstrip() to remove trailing newline
output_file.write(line.rstrip() + "\t" + str(bill_val) + "\n")
Note that the line variable in your loop includes the trailing newline, so you will need to strip that off as well if you're going to write it to the output file as-is. Your start and end indices were off by 1 as well, so I changed to split[6] and split[7].
It is a good idea to not require the user to include the quotes for the filename, so keep that in mind as well. An easy way is to just use raw_input instead of input.
Sample input file (from OP):
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
$ python test.py
Please enter a file name for the output:test.out
Output (test.out):
1 Mr. Orlando N. Gee Residential 297780 302555 57.3
2 Mr. Keith NULL Harris Residential 274964 278126 37.944

There are a couple things. The inconsistent spacing in your column names makes counting the actual columns a bit confusing, but I believe there are 9 column names there. However, each of your rows of data have only 8 elements, so it looks like you've got an extra column name (maybe "CompanyName"). So get rid of that, or fix the data.
Then your "start" and "end" variables are pointing to indexes 7 and 8, respectively. However, since there are only 8 elements in the row, I think the indexes should be 6 and 7.
Another problem could be that inside your for-loop through "lines", you set "lines" to the elements in that line. I would suggest renaming the second "lines" variable inside the for-loop to something else, like "elements".
Aside from that, I'd just caution you about naming consistency. Some of your column names are camel-case and others have spaces. Some of your variables are separated by underscores and others are camel-case.
Hopefully that helps. Let me know if you have any other questions.

You have two errors in handling your variables, both in the same line:
lines = [line.split()]
You put this into your lines variable, which is the entire file contents. You just lost the rest of your input data.
You made a new list-of-list from the return of split.
Try this line:
line = line.split()
I got reasonable output with that change, once I make a couple of assumptions about your placement of tabs.
Also, consider not overwriting a variable with a different data semantic; it confuses the usage. For instance:
for record in lines:
line = record.split()

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.

marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps

Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))

more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python gender-guesser not able to evaluate gender names - python

Related

Python Help: How to take input file values and pass their values in the script

Edit last element of each line of text file

Text files help(Python 3)

Python File Reading & Writing

Return the average mark for all student in that Section

Categories

Resources