Parsing through a file from database and adding info to dictionary

Parsing through a file from database and adding info to dictionary - python

I have text file as follows:
HEADER INFO
Last1, First1 Movie1 (1991) random stuff
Movie2 (1992) random stuff
Movie3 (1995) random stuff
Movie4 (3455) random stuff
Last2, First2 Movie1 (1998) random stuff
Movie2 (4568) random stuff
Movie3 (2466) random stuff
Movie4 (4325) random stuff
Movie5 (4875) random stuff
Movie6 (3525) random stuff
Movie7 (4567) random stuff
FOOTER INFO
It also contains some header/footer info that I can skip. The spaces between the name and movie are not constant. I want to add this data into a dictionary using while loops (no for loops for the whole process). Basically the name will act as the key and the list of following movies will be the values (both are strings). So far I can achieve either obtaining the lines which contain the names OR the lines which contain the movies. I tried an using an if statement to get it to work but to no avail.
Basically I was thinking of using an if statement to say if the line contains the name by some characteristic of the line, then splice out the name and splice out the movie and add to the dictionary. And if the name is not in the line, then associate that movie with the same name(multiple entries). But I think this is where Im lost. This part and maybe how Im iterating with the while loop.
I didn't use any readline(). Instead I used readlines() and I used that to toggle through the lines to pick out the information. I'm just wondering if anyone has any tips/hints they could offer.
If anyone wants the actual data I'm using then please pm me.
Ill rephrase it:
CRC: 0xDE308B96 File: actors.list Date: Fri Aug 12 00:00:00 2011
Copyright 1990-2007 The Internet Movie Database, Inc. All rights reserved.
COPYING POLICY: Internet Movie Database (IMDb)
==============================================
CUTTING COPYRIGHT NOTICE
THE ACTORS LIST
===============
Name Titles
---- ------
ActA, A m1 (2011)
m2 (2011)
ActB, B m1 (2011)
m2 (2011)
m3 (2001)
ActC, C m1 (2011)
ActD, D m3 (2003)
m6 (2006)
ActE, E m6 (2006)
ActF, F m4 (2004)
ActG, G m4 (2004)
ActH, H m5 (2005)
Bacon, Kevin m2 (2011)
m5 (2005)
-----------------------------------------------------------------------------
SUBMITTING UPDATES
==================
CUTTING UPDATES
For further info visit http://www.imdb.com/licensing/contact
And basically I want the output to be a dictionary:
{'E Acte': ['m6 (2006)'],
'A Acta': ['m1 (2011)', 'm2 (2011)'],
'G Actg': ['m4 (2004)'],
'B Actb': ['m1 (2011)', 'm2 (2011)', 'm3 (2001)'],
'D Actd': ['m3 (2003)', 'm6 (2006)'],
'F Actf': ['m4 (2004)'],
'Kevin Bacon': ['m2 (2011)', 'm5 (2005)'],
'H Acth': ['m5 (2005)'],
'C Actc': ['m1 (2011)']}
I'm suggested to use while loops since it'll make the process easier, but not restricted solely to it.

Here is another solution for the case when the list is formatted with tab chars instead of spaces:
output = {}
in_list = False
current_name = None
for line in open('actors.list'):
if in_list:
if line.startswith('-'):
break
if '\t' not in line:
continue
name, title = line.split('\t', 1)
name = name.strip()
title = title.strip()
if name:
if ',' in name:
name = name.split(',', 1)
name[0] = name[0].rstrip()
name[1] = name[1].lstrip()
name.reverse()
name = ' '.join(name)
current_name = name
if title:
output.setdefault(
current_name, []).append(title)
else:
if line.startswith('-'):
in_list = True

Here is a solution with a for loop which is much more natural in Python. It assumes the input file is formatted with spaces, like the code posted in the question above. I have posted an alternative answer now for the case when the list is formatted with tabs instead of spaces.
Of course you could rewrite it as a while loop, but it would not make much sense. You can also simplify it a bit by using a defaultdict(list) for the output in newer Python versions.
output = {}
pos = -1 # char position of title column
current_name = None
for line in open('actors.list'):
if pos < 0:
if line.startswith('-'):
pos = line.find(' ')
if pos > 0:
pos = line.find('-', pos)
else:
if line.startswith('-'):
break
name = line[:pos].strip()
title = line[pos:].strip()
if name:
if ',' in name:
name = name.split(',', 1)
name[0] = name[0].rstrip()
name[1] = name[1].lstrip()
name.reverse()
name = ' '.join(name)
current_name = name
if title:
output.setdefault(
current_name, []).append(title)
print output

Related

Reading lines from a txt file and create a dictionary where values are list of tuples

student.txt:
Akçam Su Tilsim PSYC 3.9
Aksel Eda POLS 2.78
Alpaydin Dilay ECON 1.2
Atil Turgut Uluç IR 2.1
Deveci Yasemin PSYC 2.9
Erserçe Yasemin POLS 3.0
Gülle Halil POLS 2.7
Gündogdu Ata Alp ECON 4.0
Gungor Muhammed Yasin POLS 3.1
Hammoud Rawan IR 1.7
Has Atakan POLS 1.97
Ince Kemal Kahriman IR 2.0
Kaptan Deniz IR 3.5
Kestir Bengisu IR 3.8
Koca Aysu ECON 2.5
Kolayli Sena Göksu IR 2.8
Kumman Gizem PSYC 2.9
Madenoglu Zeynep PSYC 3.1
Naghiyeva Gulustan IR 3.8
Ok Arda Mert IR 3.2
Var Berna ECON 2.9
Yeltekin Sude PSYC 1.2
Hello, I want to write a function, which reads the information about each student in the file into a dictionary where the keys are the departments, and the values are a list of students in the given department (list of tuples). The information about each student is stored in a tuple
containing (surname, GPA). Students in the file may have more than one name but only the surname and gpa will be stored. The function should return the dictionary. (Surnames are the first words at each line.)
This is what I tried:
def read_student(ifile):
D={}
f1=open(ifile,'r')
for line in f1:
tab=line.find('\t')
space=line.rfind(' ')
rtab=line.rfind('\t')
student_surname=line[0:tab]
gpa=line[space+1:]
department=line[rtab+1:space]
if department not in D:
D[department]=[(student_surname,gpa)]
else:
D[department].append((student_surname,gpa))
f1.close()
return D
print(read_student('student.txt'))
I think the main problem is that there is a sort of disorder because sometimes tab comes after words and sometimes a space comes after words, so I dont know how to use find function properly in this case.

Why mess with rfind and find when you can simply split?
def read_student(ifile):
D = {}
f1 = open(ifile,'r')
for line in f1:
cols = line.split() # Splits at one or more whitespace
surname = cols[0].strip()
department = cols[-2].strip() # Because you know the last-but-one is dept
gpa = float(cols[-1].strip()) # Because you know the last one is GPA
fname = ' '.join(cols[1:-2]).strip()
# cols[1:-2] gives you everything starting at col 1 up to but excluding the second-last.
# Then you join these with spaces.
if department not in D:
D[department] = [(surname, gpa)]
else:
D[department].append((surname, gpa))
f1.close()
return D
If you know that your columns are separated by \t always, you can do cols = line.split('\t') instead. Then you have the students' fname in the second column, the department in the third, and the GPA in the fourth.
A couple of suggestions:
You can use defaultdict to avoid checking if department not in D every time
You can use with to manage reading the file so you don't have to worry about f1.close(). This is the preferred way to read files in Python.

see below - you will have to take care of the surname but rest of the details in the question were handled
from collections import defaultdict
data = defaultdict(list)
with open('data.txt', encoding="utf-8") as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
first_space_idx = line.rfind(' ')
sec_space_idx = line.rfind(' ', 0,first_space_idx - 1)
grade = line[first_space_idx+1:]
dep = line[sec_space_idx:first_space_idx]
student = line[:sec_space_idx].strip()
data[dep].append((student, grade))
for dep, students in data.items():
print(f'{dep} --> {students}')
output
PSYC --> [('Akçam Su Tilsim', '3.9'), ('Deveci Yasemin', '2.9'), ('Kumman Gizem', '2.9'), ('Madenoglu Zeynep', '3.1'), ('Yeltekin Sude', '1.2')]
POLS --> [('Aksel Eda', '2.78'), ('Erserçe Yasemin', '3.0'), ('Gülle Halil', '2.7'), ('Gungor Muhammed Yasin', '3.1'), ('Has Atakan', '1.97')]
ECON --> [('Alpaydin Dilay', '1.2'), ('Gündogdu Ata Alp', '4.0'), ('Koca Aysu', '2.5'), ('Var Berna', '2.9')]
IR --> [('Atil Turgut Uluç', '2.1'), ('Hammoud Rawan', '1.7'), ('Ince Kemal Kahriman', '2.0'), ('Kaptan Deniz', '3.5'), ('Kestir Bengisu', '3.8'), ('Kolayli Sena Göksu', '2.8'), ('Naghiyeva Gulustan', '3.8'), ('Ok Arda Mert', '3.2')]

You can use split(' ', 1) to extract surname. It gives list with two elements. first one is surname. Then again split the second elements to get the using rsplit(' ', 1). It again gives list with two element first one is name and dept and second one is gpa. Again split second element to get department.
def read_student(ifile):
d = {}
with open(ifile) as fp:
for line in fp:
fname, data = line.strip().split(' ', 1)
data, gpa = data.rsplit(' ', 1)
dept = data.split()[-1]
d.setdefault(dept, []).append((fname, gpa))
return d
print(read_student('student.txt'))
Output:
{'ECON': [('Alpaydin', '1.2'),
('Gündogdu', '4.0'),
('Koca', '2.5'),
('Var', '2.9')],
'IR': [('Atil', '2.1'),
('Hammoud', '1.7'),
('Ince', '2.0'),
('Kaptan', '3.5'),
('Kestir', '3.8'),
('Kolayli', '2.8'),
('Naghiyeva', '3.8'),
('Ok', '3.2')],
'POLS': [('Aksel', '2.78'),
('Erserçe', '3.0'),
('Gülle', '2.7'),
('Gungor', '3.1'),
('Has', '1.97')],
'PSYC': [('Akçam', '3.9'),
('Deveci', '2.9'),
('Kumman', '2.9'),
('Madenoglu', '3.1'),
('Yeltekin', '1.2')]}

This solution makes use of itemgetter to simplify the getting of variables: surname, dept. and gpa
from operator import itemgetter
d = dict()
with open('f0.txt', 'r') as f:
for line in f:
name, dept, gpa = itemgetter(0, -2, -1)(line.split())
d.setdefault(dept, []).append((name, gpa))

Remove quotes holding 2 words and remove comma between them

Following up on Python to replace a symbol between between 2 words in a quote
Extended input and expected output:
trying to replace comma between 2 words Durango and PC in the second line by & and then remove the quotes " as well. Same for third line with Orbis and PC and 4th line has 2 word combos in quotes that I would like to process "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"
I would like to retain the rest of the lines using Python.
INPUT
2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened
3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For
...
...
...
Like these, there can be 100 lines in my sample. So the expected output is:
2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For
...
...
...
So far, I could think of reading line by line and then if the line contains quote replace it with no character but then replacement of symbol inside is something I am stuck with.
Here is what I have right now:
for line in lines:
expr2 = re.findall('"(.*?)"', line)
if len(expr2)!=0:
expr3 = re.split('"',line)
expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2]
print >>k, expr4
else:
print >>k, line
but it does not consider the case in 4th line? There can be more than 3 combos as well. For eg.
3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open
and wish to make this
3,SIN-Audio,AAA - Audio & xxxx & yyyy, Orbis & PC, 13 & 22,Open
How to achieve this, any suggestion? Learning Python.

So, by treating the input file as a .csv we can easily turn the lines into something easy to work with.
For example,
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
is read as:
['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango, PC', '55', 'Reopened']
Then, by replacing all instances of , with _& (space) we would have the line:
['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango & PC', '55', 'Reopened']
And it replaces multiple instances of ,s within a line, and when finally writing we no longer have the original double quotes.
Here is the code, given that in.txt is your input file and it will write to out.txt.
import csv
with open('in.txt') as infile:
reader = csv.reader(infile)
with open('out.txt', 'w') as outfile:
for line in reader:
line = list(map(lambda s: s.replace(',', ' &'), line))
outfile.write(','.join(line) + '\n')
The fourth line is outputted as:
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango & Orbis & PC,29,Waiting For

Please check this once: I could not find a single expression that could do this. So did it in a bit elaborate way. Will update if I can find a better way(Python 3)
import re
st = "3,SIN-Audio,\"AAA - Audio, xxxx, yyyy\",\"Orbis, PC\",\"13, 22\",Open"
found = re.findall(r'\"(.*)\"',st)[0].split("\",\"")
final = ""
for word in found:
final = final + (" &").join(word.split(","))+","
result = re.sub(r'\"(.*)\"',final[:-1],st)
print(result)

Replace Single Character in a line of a Text file with Python

I have a text file with all of them currently having the same end character (N), which is being used to identify progress the system makes. I want to change the end character to "Y" in case the program ends via an error or other interruptions so that upon restarting the program will search until a line has the end character "N" and begin working from there. Below is my code as well as a sample from the text file.
UPDATED CODE:
def GeoCode():
f = open("geocodeLongLat.txt", "a")
with open("CstoGC.txt",'r') as file:
print("Geocoding...")
new_lines = []
for line in file.readlines():
check = line.split('~')
print(check)
if 'N' in check[-1]:
geolocator = Nominatim()
dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5]
address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
f.write(dot_number + '\n')
try:
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
try:
address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
print("Cannot Geocode")
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
f.close()
Output:
2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N
As you can see I am already keeping track of which line I am at just by using x. Should I use something like file.readlines()?
Sample of text document:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
Thank you!
Edit: updated code thanks to #idlehands

There are a few ways to do this.
Option #1
My original thought was to use the tell() and seek() method to go back a few steps but it quickly shows that you cannot do this conveniently when you're not opening the file in bytes and definitely not in a for loop of readlines(). You can see the reference threads here:
Is it possible to modify lines in a file in-place?
How to solve "OSError: telling position disabled by next() call"
The investigation led to this piece of code:
with open('file.txt','rb+') as file:
line = file.readline() # initiate the loop
while line: # continue while line is not None
print(line)
check = line.split(b'~')[-1]
if check.startswith(b'N'): # carriage return is expected for each line, strip it
# ... do stuff ... #
file.seek(-len(check), 1) # place the buffer at the check point
file.write(check.replace(b'N', b'Y')) # replace "N" with "Y"
line = file.readline() # read next line
In the first referenced thread one of the answers mentioned this could lead you to potential problems, and directly modifying the bytes on the buffer while reading it is probably considered a bad idea™. A lot of pros probably will scold me for even suggesting it.
Option #2a
(if file size is not horrendously huge)
with open('file.txt','r') as file:
new_lines = []
for line in file.readlines():
check = line.split('~')
if 'N' in check[-1]:
# ... do stuff ... #
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
This approach loads all the lines into memory first, so you do the modification in memory but leave the buffer alone. Then you reload the file and write the lines that were changed. The caveat is that technically you are rewriting the entire file line by line - not just the string N even though it was the only thing changed.
Option #2b
Technically you could open the file as r+ mode from the onset and then after the iterations have completed do this (still within the with block but outside of the loop):
# ... new_lines.append('~'.join(check)) #
file.seek(0)
for line in new_lines:
file.writelines(line)
I'm not sure what distinguishes this from Option #1 since you're still reading and modifying the file in the same go. If someone more proficient in IO/buffer/memory management wants to chime in please do.
The disadvantage for Option 2a/b is that you always end up storing and rewriting the lines in the file even if you are only left with a few lines that needs to be updated from 'N' to 'Y'.
Results (for all solutions):
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y
And if you were to say, encountered a break at the line starting with 220940, the file would become:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
There are pros and cons to these approaches. Try and see which one fits your use case the best.

I would read the entire input file into a list and .pop() the lines off one at a time. In case of an error, append the popped item to the list and write overwrite the input file. This way it will always be up to date and you won't need any other logic.

Python File Reading & Writing

So I need to write a program that reads a text file, and copies its contents to another file. I then have to add a column at the end of the text file, and populate that column with an int that is calculated using the function calc_bill. I can get it to copy the contents of the original file to the new one, but I cannot seem to get my program to read in the ints necessary for calc_bill to run.
Any help would be greatly appreciated.
Here are the first 3 lines of the text file I am reading from:
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
It is copying the file exactly as it is supposed to to the new file. What is not working is writing the bill_amount (calc_bill)/ billVal(main) to the new file in a new column. Here is the expected output to the new file:
CustomerID Title FirstName MiddleName LastName Customer Type Company Name Start Reading End Reading BillVal
1 Mr. Orlando N. Gee Residential 297780 302555 some number
2 Mr. Keith NULL Harris Residential 274964 278126 some number
And here is my code:
def main():
file_in = open("water_supplies.txt", "r")
file_in.readline()
file_out = input("Please enter a file name for the output:")
output_file = open(file_out, 'w')
lines = file_in.readlines()
for line in lines:
lines = [line.split('\t')]
#output_file.write(str(lines)+ "\n")
billVal = 0
c_type = line[5]
start = int(line[7])
end = int(line[8])
billVal = calc_bill(c_type, start, end)
output_file.write(str(lines)+ "\t" + str(billVal) + "\n")
def calc_bill(customer_type, start_reading, end_reading):
price_per_gallon = 0
if customer_type == "Residential":
price_per_gallon = .012
elif customer_type == "Commercial":
price_per_gallon = .011
elif customer_type == "Industrial":
price_per_gallon = .01
if start_reading >= end_reading:
print("Error: please try again")
else:
reading = end_reading - start_reading
bill_amount = reading * price_per_gallon
return bill_amount
main()

There are the issues mentioned above, but here is a small change to your main() method that works correctly.
def main():
file_in = open("water_supplies.txt", "r")
# skip the headers in the input file, and save for output
headers = file_in.readline()
# changed to raw_input to not require quotes
file_out = raw_input("Please enter a file name for the output: ")
output_file = open(file_out, 'w')
# write the headers back into output file
output_file.write(headers)
lines = file_in.readlines()
for line in lines:
# renamed variable here to split
split = line.split('\t')
bill_val = 0
c_type = split[5]
start = int(split[6])
end = int(split[7])
bill_val = calc_bill(c_type, start, end)
# line is already a string, don't need to cast it
# added rstrip() to remove trailing newline
output_file.write(line.rstrip() + "\t" + str(bill_val) + "\n")
Note that the line variable in your loop includes the trailing newline, so you will need to strip that off as well if you're going to write it to the output file as-is. Your start and end indices were off by 1 as well, so I changed to split[6] and split[7].
It is a good idea to not require the user to include the quotes for the filename, so keep that in mind as well. An easy way is to just use raw_input instead of input.
Sample input file (from OP):
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
$ python test.py
Please enter a file name for the output:test.out
Output (test.out):
1 Mr. Orlando N. Gee Residential 297780 302555 57.3
2 Mr. Keith NULL Harris Residential 274964 278126 37.944

There are a couple things. The inconsistent spacing in your column names makes counting the actual columns a bit confusing, but I believe there are 9 column names there. However, each of your rows of data have only 8 elements, so it looks like you've got an extra column name (maybe "CompanyName"). So get rid of that, or fix the data.
Then your "start" and "end" variables are pointing to indexes 7 and 8, respectively. However, since there are only 8 elements in the row, I think the indexes should be 6 and 7.
Another problem could be that inside your for-loop through "lines", you set "lines" to the elements in that line. I would suggest renaming the second "lines" variable inside the for-loop to something else, like "elements".
Aside from that, I'd just caution you about naming consistency. Some of your column names are camel-case and others have spaces. Some of your variables are separated by underscores and others are camel-case.
Hopefully that helps. Let me know if you have any other questions.

You have two errors in handling your variables, both in the same line:
lines = [line.split()]
You put this into your lines variable, which is the entire file contents. You just lost the rest of your input data.
You made a new list-of-list from the return of split.
Try this line:
line = line.split()
I got reasonable output with that change, once I make a couple of assumptions about your placement of tabs.
Also, consider not overwriting a variable with a different data semantic; it confuses the usage. For instance:
for record in lines:
line = record.split()

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.

marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps

Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))

more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing through a file from database and adding info to dictionary - python

Related

Reading lines from a txt file and create a dictionary where values are list of tuples

Remove quotes holding 2 words and remove comma between them

Replace Single Character in a line of a Text file with Python

Python File Reading & Writing

Return the average mark for all student in that Section

Categories

Resources