I have a very large file of about 900k values. It is a repetition of values like
/begin throw
COLOR red
DESCRIPTION
"cashmere sofa throw"
10
10
156876
DIMENSION
140
200
STORE_ADDRESS 59110
/end throw
The values keep changing, but I need it like below:
/begin throw
STORE_ADDRESS 59110
COLOR red
DESCRIPTION "cashmere sofa throw" 10 10 156876
DIMENSION 140 200
/end throw
Currently, my approach is removing the new line and including space in them:
the store address is constant throughout the file so I thought of removing it from the index and inserting it before the description
text_file = open(filename, 'r')
filedata = text_file.readlines();
for num,line in enumerate(filedata,0):
if '/begin' in line:
for index in range(num, len(filedata)):
if "store_address 59110 " in filedata[index]:
filedata.remove(filedata[index])
filedata.insert(filedata[index-7])
break
if "DESCRIPTION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","").replace("\n", " ")
filedata[index+4] = filedata[index+4].replace(" ","").replace("\n", " ")
filedata[index+5] = filedata[index+5].replace(" ","").replace("\n", " ")
filedata[index+6] = filedata[index+6].replace(" ","").replace("\n", " ")
filedata[index+7] = filedata[index+7].replace(" ","").replace("\n", " ")
filedata[index+8] = filedata[index+8].replace(" ","")
except IndexError:
print("Error Index DESCRIPTION:", index, num)
if "DIMENSION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index+1] = filedata[index+1].replace(" ","").replace("\n", " ")
filedata[index+2] = filedata[index+2].replace(" ","").replace("\n", " ")
filedata[index+3] = filedata[index+3].replace(" ","")
except IndexError:
print("Error Index DIMENSION:", index, num)
After which I write filedata into another file.
This approach is taking too long to run(almost an hour and a half) because as mentioned earlier it is a large file.
I was wondering if there was a faster approach to this issue
You can read the file structure by structure so that you don't have to store the whole content in memory and manipulate it there. By structure, I mean all the values between and including /begin throw and /end throw. This should be much faster.
def rearrange_structure_and_write_into_file(structure, output_file):
# TODO: rearrange the elements in structure and write the result into output_file
current_structure = ""
with open(filename, 'r') as original_file:
with open(output_filename, 'w') as output_file:
for line in original_file:
current_structure += line
if "/end throw" in line:
rearrange_structure_and_write_into_file(current_structure, output_file)
current_structure = ""
The insertion and removal of values from a long list is likely to make this code slower than it needs to be, and also makes it vulnerable to any errors and difficult to reason about. If there are any entries without store_address then the code would not work correctly and would search through the remaining entries until it finds a store address.
A better approach would be to break down the code into functions that parse each entry and output it:
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
def parse_lines(lines):
""" Parse throw data from lines in the old format """
current_section = None
r = {}
for line in lines:
words = line.strip().split(" ")
if words[0] in KEYWORDS:
if words[1:]:
r[words[0]] = words[1]
else:
current_section = r[words[0]] = []
else:
current_section.append(line.strip())
return r
def output_throw(throw):
""" Output a throw entry as lines of text in the new format """
yield "/begin throw"
for keyword in KEYWORDS:
if keyword in throw:
value = throw[keyword]
if type(value) is list:
value = " ".join(value)
yield f"{keyword} {value}"
yield "/end throw"
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
for line in in_file:
line = line.strip()
if line == "/begin throw":
entry = []
elif line == "/end throw":
throw = parse_lines(entry)
for line in output_throw(throw):
out_file.write(line + "\n")
else:
entry.append(line)
Or if you really need to maximize performance by removing all unnecessary operations you could read, parse and write in a single long condition, like this:
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
in_section = True
def write(line):
out_file.write(line + "\n")
for line in in_file:
line = line.strip()
first = line.split()[0]
if line == "/begin throw":
in_section = False
write(line)
entry = []
elif line == "/end throw":
in_section = False
for line_ in entry:
write(line_)
write(line)
elif first == "STORE_ADDRESS":
in_section = False
write(line)
elif line in KEYWORDS:
in_section = True
entry.append(line)
elif first in KEYWORDS:
in_section = False
entry.append(line)
elif in_section:
entry[-1] += " " + line
Related
def add():
while True:
try:
a = int(input("How many words do you want to add:"))
if a >= 0:
break
else:
raise ValueError
except ValueError:
print("Not valid ")
return a
for i in range(add()):
key_i = input(f"Turkish meaning: {i + 1}: ")
value_i = input("translated version: ")
with open('words.txt', 'a+') as f:
f.write("'"+key_i+':')+ f.write(value_i+"'"+",")
My goal is to create my own dictionary,but I am adding a list into the txt file, so it is added into the txt file like this
words = {'araba:kol',
but when I search the txt file it gives me the whole list
def search():
while 1:
search = str(input("Search: "))
if search not in["exit", "Exit"]:
with open('words.txt', 'r+') as f:
line = f.readline()
while line:
data = line.find(search)
if not data == -1:
print(line.rstrip('\n'))
line = f.readline()
else:
line = f.readline()
else:
break
f.close()
What can I do to make it output like this
car:araba
Use JSON module to avoid having to write the dictionary line by line yourself.
import json
with open('words.json', 'a+') as f:
json.dump({key_i: value_i}, f)
with open('data.json', 'r') as f:
d2 = json.load(f)
d2 is now the data that you wrote to the file.
Note, that you should change the a+ to 'w' as you only have one dictionary per file.
points = "temp"
a = "temp"
f = "temp"
def pointincrementer():
global points
points = 0
for line in f:
for word in a:
if word in line:
scorelen = int(len(user+","))
scoreval = line[0:scorelen]
isolatedscore = line.replace(scoreval,'')
if "," in line:
scorestr = isolatedscore.replace(",","")
score = int(scorestr)
points = score + 1
print(points)
def score2():
f = open('test.txt','r')
a = [user]
lst = []
for line in f:
for word in a:
if word in line:
pointincrementer()
print(points)
point = str(points)
winning = (user+","+point+","+"\n")
line = line.replace(line,winning)
lst.append(line)
f.close()
f = open('test.txt','w')
for line in lst:
f.write(line)
f.close()
print("Points updated")
user = input("Enter username: ") #change so user = winners userid
with open('test.txt') as myfile:
if user in myfile.read():
score2()
else:
f = open('test.txt','r')
f2 = f.read()
f3 = (f2+"\n"+user)
f.close()
f = open('test.txt','w')
f.write(f3)
f.close()
score2()
This is paired with test.txt, which looks like this:
one,1,
two,5,
three,4,
four,94,
When this code is run, it it will ask the user their name (as expected) and then will print 0 (when it should instead print the user's score) and then Points updated. Anybody know how to sort this out?
There are many problems with your code. You should not be using global variables like that. Each function should be passed what it needs, do its computing, and return values for the caller to handle. You should not be reading the file multiple times. And you can't write the file while you still have it open with the with statement.
Here, I read the file at the beginning into a Python dictionary. The code just updates the dictionary, then writes it back out at the end. This makes for a simpler and more maintainable structure.
def readdata(fn):
data = {}
for row in open(fn):
info = row.strip().split(',')
data[info[0]] = int(info[1])
return data
def writedata(fn,data):
f = open(fn,'w')
for k,v in data.items():
print( f"{k},{v}", file=f )
def pointincrementer(data,user):
return data[user] + 1
def score2(data, user):
points = pointincrementer(data, user)
print(points)
data[user] = points
print("Points updated")
user = input("Enter username: ")
data = readdata( 'test.txt' )
if user not in data:
data[user] = 0
score2(data, user)
writedata( 'test.txt', data )
The f in pointincrementer() refers to the "temp" string declared on the third line. The f in score2() refers to the file handle declared immediately below the function header. To get around this, you can pass the file handle into pointincrementer():
def pointincrementer(file_handle):
global points
points = 0
for line in file_handle:
for word in a:
if word in line:
scorelen = int(len(user+","))
scoreval = line[0:scorelen]
isolatedscore = line.replace(scoreval,'')
if "," in line:
scorestr = isolatedscore.replace(",","")
score = int(scorestr)
points = score + 1
print(points)
def score2():
file_handle = open('test.txt','r')
a = [user]
lst = []
for line in f:
print(line)
for word in a:
if word in line:
pointincrementer(file_handle)
print(points)
point = str(points)
winning = (user+","+point+","+"\n")
line = line.replace(line,winning)
lst.append(line)
f.close()
f = open('test.txt','w')
for line in lst:
f.write(line)
f.close()
print("Points updated")
This leads to a parsing error. However, as you haven't described what each function is supposed to do, this is the limit to which I can help. (The code is also extremely difficult to read -- the lack of readability in this code snippet is likely what caused this issue.)
I used this code to delete a word from a text file.
f = open('./test.txt','r')
a = ['word1','word2','word3']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
f = open('./test.txt','w')
for line in lst:
f.write(line)
f.close()
But for some reason if the words have the same characters, all those characters get deleted. So for e.g
in my code:
def cancel():
global refID
f1=open("refID.txt","r")
line=f1.readline()
flag = 0
while flag==0:
refID=input("Enter the reference ID or type 'q' to quit: ")
for i in line.split(','):
if refID == i:
flag=1
if flag ==1:
print("reference ID found")
cancelsub()
elif (len(refID))<1:
print("Reference ID not found, please re-enter your reference ID\n")
cancel()
elif refID=="q":
flag=1
else:
print("reference ID not found\n")
menu()
def cancelsub():
global refIDarr, index
refIDarr=[]
index=0
f = open('flightbooking.csv')
csv_f = csv.reader(f)
for row in csv_f:
refIDarr.append(row[1])
for i in range (len(refIDarr)):
if refID==refIDarr[i]:
index=i
print(index)
while True:
proceed=input("You are about to cancel your flight booking, are you sure you would like to proceed? y/n?: ")
while proceed>"y" or proceed<"n" or (proceed>"n" and proceed<"y") :
proceed=input("Invalid entry. \nPlease enter y or n: ")
if proceed=="y":
Continue()
break
elif proceed=="n":
main_menu
break
exit
break
def Continue():
lines = list()
with open('flightbooking.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field ==refID:
lines.remove(row)
break
with open('flightbooking.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
f = open('refID.txt','r')
a=refIDarr[index]
print(a)
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
print(lst)
f.close()
f = open('refID.txt','w')
for line in lst:
f.write(line)
f.close()
print("Booking successfully cancelled")
menu()
When the code is run, the refID variable has one word stored in it, and it should replace just that word with a blank space, but it takes that word for e.g 'AB123', finds all other words which might have an 'A' or a 'B' or the numbers, and replace all of them. How do I make it so it only deletes the word?
Text file before running code:
AD123,AB123
Expected Output in the text file:
AD123,
Output in text file:
D,
Edit: I have added the entire code, and maybe you can help now after seeing that the array is being appended to and then being used to delete from a text file.
here's my opinion.
refIDarr = ["AB123"]
a = refIDarr[0] => a = "AB123"
strings in python are iterable, so when you do for word in a, you're getting 5 loops where each word is actually a letter.
Something like the following is being executed.
if "A" in line:
line = line.replace("A","")
if "B" in line:
line = line.replace("B","")
if "1" in line:
line = line.replace("1","")
if "2" in line:
line = line.replace("2","")
if "3" in line:
line = line.replace("3","")
they correct way to do this is loop over refIDarr
for word in refIDarr:
line = line.replace(word,'')
NOTE: You don't need the if statement, since if the word is not in the line it will return the same line as it was.
"abc".replace("bananan", "") => "abc"
Here's a working example:
refIDarr = ["hello", "world", "lol"]
with open('mytext.txt', "r") as f:
data = f.readlines()
for word in refIDarr:
data = [line.replace(word, "") for line in data]
with open("mytext.txt", "w") as newf:
newf.writelines(data)
The problem is here:
a=refIDarr[index]
If refIDarr is a list of words, accessing specific index makes a be a word. Later, when you iterate over a (for word in a:), word becomes a letter and not a word as you expect, which causes eventually replacing characters of word instead the word itself in your file.
To avoid that, remove a=refIDarr[index] and change your loop to be:
for line in f:
for word in refIDarr:
if word in line:
line = line.replace(word,'')
I have a .txt with:
#Date 111111:UhUidsiIds
#Name Sebastian-Forset
#Date 222222:UdfasdUDsa
#Name Sebastian_Forset2
#Date 333333:UDsafduD
#Name Solaris Mage
#Date 444444:Ghdsasra
#Name Marge S
and a file whith:
#Name Sebastian Forset
#Date 191020
#Name Sebastian Forset2
#Date 201020
#Date Homer S
#Date 281902
The names are the same, with some differences of characters (spaces, -, _ etc.)
I would copy the numbers of the second file to the first file in order to have a final file txt with:
#Name Sebastian Forset
#Date 191020:UhUidsiIds
#Name Sebastian Forset2
#Date 201020:UdfasdUDsa
#Name Solaris Mage
#Date 281902:UDsafduD
#Name Marge S
#Date 444444:Ghdsasra
This is my code, but merge the file, copy only same name
def isInFile(l, f):
with open(f, 'r') as f2:
for line in f2:
if l == line:
return True
return False
def similitudes(file1, file2):
same = 0
data = ''
copy = False
with open(file1, 'r') as f1:
for line in f1:
if copy == True:
data += line
if line == '\n' or line[0:6] != '#Name ':
copy = False
if (line[0:6] == '#Name ') or line[0:6] == '#Date ':
print line
if isInFile(line, file2) == True:
copy = True
data += line
print "true"
else:
print "ok"
same += 1
return data
def main(argv=2):
print (sys.argv[1])
print (sys.argv[2])
if argv == 2:
out = open('final.txt', 'w')
data = (
similitudes(sys.argv[1], sys.argv[2]) + '\n'
)
out.write(data)
out.close()
else:
print ("This program need 2 files")
exit (0)
return 0
if __name__ == '__main__':
status = main()
sys.exit(status)
First, list out the characters that will differ. Let's say "-" , "_" and " ".
Now split the two strings using these delimiters. you can use "re" package in python.
>>> a='Mr-Sebastian_Forset '
>>> import re
>>> re.split('- |_ | ',a)
['Mr', 'Sebastian', 'Forset']
If the resultant lists for the two strings are equal, paste the number in second file in first one.
You can use the same delimiter concept to split the number and paste it in other file.
Adding another answer, which will points out the bug in your code
Coming to the following piece of code
if (line[0:6] == '#Name ') or line[0:6] == '#Date ':
print line
if isInFile(line, file2) == True:
copy = True
data += line
Here, you are checking If your line starts with either "#Name " or "#Date ", and calling isInFile() method with line and file2 as arguments.
This is the first issue, there is no use of sending just one line that starts with "#Name " in your case.
If the current line starts with "#Date ", send the previous line and file as arguments to this method.
And second Issue is with the isInFile() definition, which is doing effectively nothing.
if l == line:
return true
You are just checking if two lines in file1 and file2 are same and if yes, you writing this line in sysout.
So, your program will just print the common lines between file1 and file2.
Modified code should like the below one:
def isInFile(l, f):
line_found = false
required_line = null
with open(f, 'r') as f2:
for line in f2:
if line_found:
required_line = line
break
elif l == line:
line_found = true
return (line_found, required_line)
def similitudes(file1, file2):
same = 0
data = ''
copy = False
previous_line = null
with open(file1, 'r') as f1:
for line in f1:
if copy == True:
data += line
if line == '\n' or line[0:6] != '#Name ':
copy = False
if (line[0:6] == '#Name '):
print line
previous_line = line
elif line[0:6] == '#Date ':
print line
file2_line_info = isInFile(previous_line, file2)
if file2_line_info[0] == True:
copy = True
data += file2_line_info[1]
print "true"
return data
def main(argv=2):
print (sys.argv[1])
print (sys.argv[2])
if argv == 2:
out = open('final.txt', 'w')
data = (
similitudes(sys.argv[1], sys.argv[2]) + '\n'
)
out.write(data)
out.close()
else:
print ("This program need 2 files")
exit (0)
return 0
if __name__ == '__main__':
status = main()
sys.exit(status)
Note: This is not the pythonic way of doing things. As I have mentioned in the above answer https://stackoverflow.com/a/34696778/3534696 use "re" module and solve the problem efficiently.
Read the first file into a dictionary, using maketrans/translate to clean up the name.
Using zip(file, file) to read 2 lines of the file at a time makes it much easier to handle.
And using .split(' ', 1)[1] to get rid of the first column.
And .strip() to get rid of any surrounding whitespace (i.e. \n)
Then you can read the second file updating the dictionary.
In Python3 this looks like:
>>> punc = str.maketrans('_-', ' ') # import string & string.maketrans() in Py2
>>> with open(filename1) as file1, open(filename2) as file2:
... data = {name.split(' ', 1)[1].strip().translate(punc):
... date.split(' ', 1)[1].strip().split(':')
... for name, date in zip(file1, file1)}
... for n, d in zip(file2, file2):
... data[n.split(' ', 1)[1].strip()][0] = d.split(' ', 1)[1].strip()
>>> data
{'Marge S': ['444444', 'Ghdsasra'],
'Sebastian Forset': ['191020', 'UhUidsiIds'],
'Sebastian Forset2': ['201020', 'UdfasdUDsa'],
'Solaris Mage': ['281902', 'UDsafduD']}
After that it is just a matter of writing the dictionary out to a new file.
>>> with open(<output>, 'w+') as output:
... for name, date in data.items():
... output.write('#Name {}\n'.format(name))
... output.write('#Date {}:{}\n'.format(*date))
Note: I had to change 'Homer S' to 'Solaris Mage' in the second file to get the stated output.
So I am trying to write a piece of code to take text from a file, move into a dictionary and then process it. I keep getting this error:
File "C:\Users\Oghosa\Assignment2.py", line 12, in <module>
builtins.IndexError: string index out of range
Here's my program:
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if (line != " ") and (line[0] != "#"):
item = line.split(":")
print(items)
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]
print(dic)
Can someone aid in pointing out my mistake please?
Here's a sample input text:
#Comments
a01:01-24-2011:s1
a03:01-24-2011:s2
<blank line>
<blank line>
a02:01-24-2011:s2
a03:02-02-2011:s2
a03:03-02-2011:s1
a02:04-19-2011:s2
<blank line>
#comments
a01:05-14-2011:s2
a02:06-11-2011:s2
a03:07-12-2011:s1
a01:08-19-2011:s1
a03:09-19-2011:s1
a03:10-19-2011:s2
a03:11-19-2011:s1
a03:12-19-2011:s2
Well, you should at least print the offending line so you know what the culprit is:
for line in infile:
items = line.strip("\n")
try:
if (line.strip != "") and (items[0] != "#"):
items = line.split(":") #i dont like your reuse of line so changing to items
....
except IndexError: #if python 3 use except IndexError as e:
print(items) #prints offending line
endofprogram = False
attrs=["Animal id","Date","Station"]
dictionary=[]
try:
# filename = input("Please Enter the Filename:")
infile = open('rite.txt', 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if (line != "") and (line[0] != "#"):
item = line.split(":")
dictionary.append(dict(zip(attrs, item)))
print dictionary
Your problem is that when there are blank lines in the file, line[0] doesn't exist. To fix this problem try this version:
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if endofprogram == False:
for line in infile:
line = line.strip("\n")
if len(line):
if line[0] != "#":
item = line.split(":")
print(items)
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]
print(dic)
Also worth noting is that you are overwriting dic on each iteration of the loop. So after the loop is done; dic will only contain information from the last line of the file.
The problem is you weren't checking for empty lines correctly in the
if (line != " ") and (line[0] != "#"):
statement. This is because they wouldn't even have a space left in them after line = line.strip("\n") executed, so just about any indexing operation will fail.
The code below has that and several other coding errors fixed. Note it's important to post your actual code here to make it easier for people to help you.
endofprogram = False
dic = {}
try:
filename = input("Please Enter the Filename:")
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
endofprogram = True
if not endofprogram:
for line in infile:
line = line.strip("\n")
if line and line[0] != "#":
items = line.split(":")
print(items)
dic["Animal id"] = items[0]
dic["Date"] = items[1]
dic["Station"] = items[2]
print(dic)
Do you have a blank line in your file? On line 12 you may want to check that the line has text before indexing it using line[0]. Do you really have a line with an empty string of should line 12 really read:
if line.strip() and (line[0] != "#"):
Edit.. Adding a full example.
dic = {}
filename = input("Please Enter the Filename:")
try:
infile = open(filename, 'r')
except IOError:
print("Error Reading File! Program ends here!")
else:
for line in infile:
line = line.strip()
if line and line[0] != "#":
item = line.split(":")
dic["Animal id"] = item[0]
dic["Date"] = item[1]
dic["Station"] = item[2]