So I am trying to write a piece of code to take text from a file, move into a dictionary and then process it. I keep getting this error:
File "C:\Users\Oghosa\Assignment2.py", line 12, in <module>
builtins.IndexError: string index out of range
Here's my program:
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if (line != " ") and (line[0] != "#"):
item = line.split(":")
print(items)
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]
print(dic)
Can someone aid in pointing out my mistake please?
Here's a sample input text:
#Comments
a01:01-24-2011:s1
a03:01-24-2011:s2
<blank line>
<blank line>
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
<blank line>
#comments
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
Well, you should at least print the offending line so you know what the culprit is:
for line in infile:
items = line.strip("\n")
try:
if (line.strip != "") and (items[0] != "#"):
items = line.split(":") #i dont like your reuse of line so changing to items
....
except IndexError: #if python 3 use except IndexError as e:
print(items) #prints offending line
endofprogram = False
attrs=["Animal id","Date","Station"]
dictionary=[]
try:
# filename = input("Please Enter the Filename:")
infile = open('rite.txt', 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if (line != "") and (line[0] != "#"):
item = line.split(":")
dictionary.append(dict(zip(attrs, item)))
print dictionary
Your problem is that when there are blank lines in the file, line[0] doesn't exist. To fix this problem try this version:
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if len(line):
if line[0] != "#":
item = line.split(":")
print(items)
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]
print(dic)
Also worth noting is that you are overwriting dic on each iteration of the loop. So after the loop is done; dic will only contain information from the last line of the file.
The problem is you weren't checking for empty lines correctly in the
if (line != " ") and (line[0] != "#"):
statement. This is because they wouldn't even have a space left in them after line = line.strip("\n") executed, so just about any indexing operation will fail.
The code below has that and several other coding errors fixed. Note it's important to post your actual code here to make it easier for people to help you.
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if not endofprogram:
for line in infile:
line = line.strip("\n")
if line and line[0] != "#":
items = line.split(":")
print(items)
dic["Animal id"] = items[0]
dic["Date"] = items[1]
dic["Station"] = items[2]
print(dic)
Do you have a blank line in your file? On line 12 you may want to check that the line has text before indexing it using line[0]. Do you really have a line with an empty string of should line 12 really read:
if line.strip() and (line[0] != "#"):
Edit.. Adding a full example.
dic = {}
filename = input("Please Enter the Filename:")
try:
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
else:
for line in infile:
line = line.strip()
if line and line[0] != "#":
item = line.split(":")
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]
Related
I have a very large file of about 900k values. It is a repetition of values like
/begin throw
COLOR red
DESCRIPTION
"cashmere sofa throw"
10
10
156876
DIMENSION
140
200
STORE_ADDRESS 59110
/end throw
The values keep changing, but I need it like below:
/begin throw
STORE_ADDRESS 59110
COLOR red
DESCRIPTION "cashmere sofa throw" 10 10 156876
DIMENSION 140 200
/end throw
Currently, my approach is removing the new line and including space in them:
the store address is constant throughout the file so I thought of removing it from the index and inserting it before the description
text_file = open(filename, 'r')
filedata = text_file.readlines();
for num,line in enumerate(filedata,0):
if '/begin' in line:
for index in range(num, len(filedata)):
if "store_address 59110 " in filedata[index]:
filedata.remove(filedata[index])
filedata.insert(filedata[index-7])
break
if "DESCRIPTION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","").replace("\n", " ")
filedata[index+4] = filedata[index+4].replace(" ","").replace("\n", " ")
filedata[index+5] = filedata[index+5].replace(" ","").replace("\n", " ")
filedata[index+6] = filedata[index+6].replace(" ","").replace("\n", " ")
filedata[index+7] = filedata[index+7].replace(" ","").replace("\n", " ")
filedata[index+8] = filedata[index+8].replace(" ","")
except IndexError:
print("Error Index DESCRIPTION:", index, num)
if "DIMENSION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","")
except IndexError:
print("Error Index DIMENSION:", index, num)
After which I write filedata into another file.
This approach is taking too long to run(almost an hour and a half) because as mentioned earlier it is a large file.
I was wondering if there was a faster approach to this issue
You can read the file structure by structure so that you don't have to store the whole content in memory and manipulate it there. By structure, I mean all the values between and including /begin throw and /end throw. This should be much faster.
def rearrange_structure_and_write_into_file(structure, output_file):
# TODO: rearrange the elements in structure and write the result into output_file
current_structure = ""
with open(filename, 'r') as original_file:
with open(output_filename, 'w') as output_file:
for line in original_file:
current_structure += line
if "/end throw" in line:
rearrange_structure_and_write_into_file(current_structure, output_file)
current_structure = ""
The insertion and removal of values from a long list is likely to make this code slower than it needs to be, and also makes it vulnerable to any errors and difficult to reason about. If there are any entries without store_address then the code would not work correctly and would search through the remaining entries until it finds a store address.
A better approach would be to break down the code into functions that parse each entry and output it:
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
def parse_lines(lines):
""" Parse throw data from lines in the old format """
current_section = None
r = {}
for line in lines:
words = line.strip().split(" ")
if words[0] in KEYWORDS:
if words[1:]:
r[words[0]] = words[1]
else:
current_section = r[words[0]] = []
else:
current_section.append(line.strip())
return r
def output_throw(throw):
""" Output a throw entry as lines of text in the new format """
yield "/begin throw"
for keyword in KEYWORDS:
if keyword in throw:
value = throw[keyword]
if type(value) is list:
value = " ".join(value)
yield f"{keyword} {value}"
yield "/end throw"
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
for line in in_file:
line = line.strip()
if line == "/begin throw":
entry = []
elif line == "/end throw":
throw = parse_lines(entry)
for line in output_throw(throw):
out_file.write(line + "\n")
else:
entry.append(line)
Or if you really need to maximize performance by removing all unnecessary operations you could read, parse and write in a single long condition, like this:
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
in_section = True
def write(line):
out_file.write(line + "\n")
for line in in_file:
line = line.strip()
first = line.split()[0]
if line == "/begin throw":
in_section = False
write(line)
entry = []
elif line == "/end throw":
in_section = False
for line_ in entry:
write(line_)
write(line)
elif first == "STORE_ADDRESS":
in_section = False
write(line)
elif line in KEYWORDS:
in_section = True
entry.append(line)
elif first in KEYWORDS:
in_section = False
entry.append(line)
elif in_section:
entry[-1] += " " + line
def add():
while True:
try:
a = int(input("How many words do you want to add:"))
if a >= 0:
break
else:
raise ValueError
except ValueError:
print("Not valid ")
return a
for i in range(add()):
key_i = input(f"Turkish meaning: {i + 1}: ")
value_i = input("translated version: ")
with open('words.txt', 'a+') as f:
f.write("'"+key_i+':')+ f.write(value_i+"'"+",")
My goal is to create my own dictionary,but I am adding a list into the txt file, so it is added into the txt file like this
words = {'araba:kol',
but when I search the txt file it gives me the whole list
def search():
while 1:
search = str(input("Search: "))
if search not in["exit", "Exit"]:
with open('words.txt', 'r+') as f:
line = f.readline()
while line:
data = line.find(search)
if not data == -1:
print(line.rstrip('\n'))
line = f.readline()
else:
line = f.readline()
else:
break
f.close()
What can I do to make it output like this
car:araba
Use JSON module to avoid having to write the dictionary line by line yourself.
import json
with open('words.json', 'a+') as f:
json.dump({key_i: value_i}, f)
with open('data.json', 'r') as f:
d2 = json.load(f)
d2 is now the data that you wrote to the file.
Note, that you should change the a+ to 'w' as you only have one dictionary per file.
I used this code to delete a word from a text file.
f = open('./test.txt','r')
a = ['word1','word2','word3']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
f = open('./test.txt','w')
for line in lst:
f.write(line)
f.close()
But for some reason if the words have the same characters, all those characters get deleted. So for e.g
in my code:
def cancel():
global refID
f1=open("refID.txt","r")
line=f1.readline()
flag = 0
while flag==0:
refID=input("Enter the reference ID or type 'q' to quit: ")
for i in line.split(','):
if refID == i:
flag=1
if flag ==1:
print("reference ID found")
cancelsub()
elif (len(refID))<1:
print("Reference ID not found, please re-enter your reference ID\n")
cancel()
elif refID=="q":
flag=1
else:
print("reference ID not found\n")
menu()
def cancelsub():
global refIDarr, index
refIDarr=[]
index=0
f = open('flightbooking.csv')
csv_f = csv.reader(f)
for row in csv_f:
refIDarr.append(row[1])
for i in range (len(refIDarr)):
if refID==refIDarr[i]:
index=i
print(index)
while True:
proceed=input("You are about to cancel your flight booking, are you sure you would like to proceed? y/n?: ")
while proceed>"y" or proceed<"n" or (proceed>"n" and proceed<"y") :
proceed=input("Invalid entry. \nPlease enter y or n: ")
if proceed=="y":
Continue()
break
elif proceed=="n":
main_menu
break
exit
break
def Continue():
lines = list()
with open('flightbooking.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field ==refID:
lines.remove(row)
break
with open('flightbooking.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
f = open('refID.txt','r')
a=refIDarr[index]
print(a)
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
print(lst)
f.close()
f = open('refID.txt','w')
for line in lst:
f.write(line)
f.close()
print("Booking successfully cancelled")
menu()
When the code is run, the refID variable has one word stored in it, and it should replace just that word with a blank space, but it takes that word for e.g 'AB123', finds all other words which might have an 'A' or a 'B' or the numbers, and replace all of them. How do I make it so it only deletes the word?
Text file before running code:
AD123,AB123
Expected Output in the text file:
AD123,
Output in text file:
D,
Edit: I have added the entire code, and maybe you can help now after seeing that the array is being appended to and then being used to delete from a text file.
here's my opinion.
refIDarr = ["AB123"]
a = refIDarr[0] => a = "AB123"
strings in python are iterable, so when you do for word in a, you're getting 5 loops where each word is actually a letter.
Something like the following is being executed.
if "A" in line:
line = line.replace("A","")
if "B" in line:
line = line.replace("B","")
if "1" in line:
line = line.replace("1","")
if "2" in line:
line = line.replace("2","")
if "3" in line:
line = line.replace("3","")
they correct way to do this is loop over refIDarr
for word in refIDarr:
line = line.replace(word,'')
NOTE: You don't need the if statement, since if the word is not in the line it will return the same line as it was.
"abc".replace("bananan", "") => "abc"
Here's a working example:
refIDarr = ["hello", "world", "lol"]
with open('mytext.txt', "r") as f:
data = f.readlines()
for word in refIDarr:
data = [line.replace(word, "") for line in data]
with open("mytext.txt", "w") as newf:
newf.writelines(data)
The problem is here:
a=refIDarr[index]
If refIDarr is a list of words, accessing specific index makes a be a word. Later, when you iterate over a (for word in a:), word becomes a letter and not a word as you expect, which causes eventually replacing characters of word instead the word itself in your file.
To avoid that, remove a=refIDarr[index] and change your loop to be:
for line in f:
for word in refIDarr:
if word in line:
line = line.replace(word,'')
Lets say I have a Text file with the below content
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.
I wrote the following code.
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("Start"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("End"):
keepCurrentSet = True
inFile.close()
outFile.close()
I'm not getting the desired output as expected
I'm just getting Start
What I want to get is all the lines between Start and End.
Excluding Start & End.
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:
import re
with open('data.txt') as myfile:
content = myfile.read()
text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
I'm not a Python expert, but this code should do the job.
inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("End"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("Start"):
keepCurrentSet = True
inFile.close()
outFile.close()
Using itertools.dropwhile, itertools.takewhile, itertools.islice:
import itertools
with open('data.txt') as f, open('result.txt', 'w') as fout:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
it = itertools.islice(it, 1, None)
it = itertools.takewhile(lambda line: line.strip() != 'End', it)
fout.writelines(it)
UPDATE: As inspectorG4dget commented, above code copies over the first block. To copy multiple blocks, use following:
import itertools
with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
while True:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
if next(it, None) is None: break
fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
Move the outFile.write call into the 2nd if:
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
if line.startswith("Start"):
buffer = ['']
elif line.startswith("End"):
outFile.write("".join(buffer))
buffer = []
elif buffer:
buffer.append(line)
inFile.close()
outFile.close()
import re
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
buffer1=buffer1+(line)
buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)
outFile.write("".join(buffer1))
inFile.close()
outFile.close()
I would handle it like this :
inFile = open("data.txt")
outFile = open("result.txt", "w")
data = inFile.readlines()
outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()
if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.
Please find below the code snippet that I used to extract sql statements from a shell script
def process_lines(in_filename, out_filename, start_kw, end_kw):
try:
inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
except FileNotFoundError as err:
print(f"File {in_filename} not found", err)
raise
except OSError as err:
print(f"OS error occurred trying to open {in_filename}", err)
raise
except Exception as err:
print(f"Unexpected error opening {in_filename} is", repr(err))
raise
else:
with inp, out:
copy = False
for line in inp:
# first IF block to handle if the start and end on same line
if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
copy = False
continue
elif line.lstrip().lower().startswith(start_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
continue
elif line.rstrip().endswith(end_kw):
if copy: # keep the ends with keyword
out.write(line)
copy = False
continue
elif copy:
# write
out.write(line)
if __name__ == '__main__':
infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
outfile = f"{infile}.sql"
statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
statement_end = ";"
process_lines(infile, outfile, tuple(statement_start_list), statement_end)
Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write. You can simply use another loop when you reach the start line, and break it when you reach the end line:
with open("data.txt") as in_file, open("result.text", 'w') as out_file:
for line in in_file:
if line.strip() == "Start":
for line in in_file:
if line.strip() == "End":
break
out_file.write(line)
Lets say I have a Text file with the below content
fdsjhgjhg
fdshkjhk
Start
Good Morning
Hello World
End
dashjkhjk
dsfjkhk
Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.
I wrote the following code.
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
keepCurrentSet = True
for line in inFile:
buffer.append(line)
if line.startswith("Start"):
#---- starts a new data set
if keepCurrentSet:
outFile.write("".join(buffer))
#now reset our state
keepCurrentSet = False
buffer = []
elif line.startswith("End"):
keepCurrentSet = True
inFile.close()
outFile.close()
I'm not getting the desired output as expected
I'm just getting Start
What I want to get is all the lines between Start and End.
Excluding Start & End.
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
If the text files aren't necessarily large, you can get the whole content of the file then use regular expressions:
import re
with open('data.txt') as myfile:
content = myfile.read()
text = re.search(r'Start\n.*?End', content, re.DOTALL).group()
with open("result.txt", "w") as myfile2:
myfile2.write(text)
I'm not a Python expert, but this code should do the job.
inFile = open("data.txt")
outFile = open("result.txt", "w")
keepCurrentSet = False
for line in inFile:
if line.startswith("End"):
keepCurrentSet = False
if keepCurrentSet:
outFile.write(line)
if line.startswith("Start"):
keepCurrentSet = True
inFile.close()
outFile.close()
Using itertools.dropwhile, itertools.takewhile, itertools.islice:
import itertools
with open('data.txt') as f, open('result.txt', 'w') as fout:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
it = itertools.islice(it, 1, None)
it = itertools.takewhile(lambda line: line.strip() != 'End', it)
fout.writelines(it)
UPDATE: As inspectorG4dget commented, above code copies over the first block. To copy multiple blocks, use following:
import itertools
with open('data.txt', 'r') as f, open('result.txt', 'w') as fout:
while True:
it = itertools.dropwhile(lambda line: line.strip() != 'Start', f)
if next(it, None) is None: break
fout.writelines(itertools.takewhile(lambda line: line.strip() != 'End', it))
Move the outFile.write call into the 2nd if:
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer = []
for line in inFile:
if line.startswith("Start"):
buffer = ['']
elif line.startswith("End"):
outFile.write("".join(buffer))
buffer = []
elif buffer:
buffer.append(line)
inFile.close()
outFile.close()
import re
inFile = open("data.txt")
outFile = open("result.txt", "w")
buffer1 = ""
keepCurrentSet = True
for line in inFile:
buffer1=buffer1+(line)
buffer1=re.findall(r"(?<=Start) (.*?) (?=End)", buffer1)
outFile.write("".join(buffer1))
inFile.close()
outFile.close()
I would handle it like this :
inFile = open("data.txt")
outFile = open("result.txt", "w")
data = inFile.readlines()
outFile.write("".join(data[data.index('Start\n')+1:data.index('End\n')]))
inFile.close()
outFile.close()
if one wants to keep the start and end lines/keywords while extracting the lines between 2 strings.
Please find below the code snippet that I used to extract sql statements from a shell script
def process_lines(in_filename, out_filename, start_kw, end_kw):
try:
inp = open(in_filename, 'r', encoding='utf-8', errors='ignore')
out = open(out_filename, 'w+', encoding='utf-8', errors='ignore')
except FileNotFoundError as err:
print(f"File {in_filename} not found", err)
raise
except OSError as err:
print(f"OS error occurred trying to open {in_filename}", err)
raise
except Exception as err:
print(f"Unexpected error opening {in_filename} is", repr(err))
raise
else:
with inp, out:
copy = False
for line in inp:
# first IF block to handle if the start and end on same line
if line.lstrip().lower().startswith(start_kw) and line.rstrip().endswith(end_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
copy = False
continue
elif line.lstrip().lower().startswith(start_kw):
copy = True
if copy: # keep the starts with keyword
out.write(line)
continue
elif line.rstrip().endswith(end_kw):
if copy: # keep the ends with keyword
out.write(line)
copy = False
continue
elif copy:
# write
out.write(line)
if __name__ == '__main__':
infile = "/Users/testuser/Downloads/testdir/BTEQ_TEST.sh"
outfile = f"{infile}.sql"
statement_start_list = ['database', 'create', 'insert', 'delete', 'update', 'merge', 'delete']
statement_end = ";"
process_lines(infile, outfile, tuple(statement_start_list), statement_end)
Files are iterators in Python, so this means you don't need to hold a "flag" variable to tell you what lines to write. You can simply use another loop when you reach the start line, and break it when you reach the end line:
with open("data.txt") as in_file, open("result.text", 'w') as out_file:
for line in in_file:
if line.strip() == "Start":
for line in in_file:
if line.strip() == "End":
break
out_file.write(line)