Python compare bombs if files not sorted

Python compare bombs if files not sorted - python

I have written some code to compare two files via a search string.
The file = master data file
The checkfile = list of states & regions
When I have more than 1 state in the file that is not in sorted order it bombs out.
How can i get this to work without having to sort my "file"
The Error message: Traceback (most recent call last):
File "./gangnamstyle.py", line 27, in
csvLineList_2 = csv2[lineCount].split(",")
IndexError: list index out of range
My code:
#!/usr/bin/python
import csv
file = raw_input("Please enter the file name to search: ") #File name
checkfile = raw_input("Please enter the file with the search data: ") #datafile
save_file = raw_input("Please enter the file name to save: ") #Save Name
search_string = raw_input("Please type string to search for: ") #search string
#row = raw_input("Please enter column text is in: ") #column number - starts at 0
#ID_INDEX = row
#ID_INDEX = int(ID_INDEX)
f = open(file)
f1 = open(save_file, 'a')
csv1 = open(file, "r").readlines()
csv2 = open(checkfile, "r").readlines()
#what looks for the string in the file
copyline=False
for line in f.readlines():
if search_string in line:
copyline=True
if copyline:
f1.write(line)
for lineCount in range( len( csv1) ):
csvLineList_1 = csv1[lineCount].split(",")
csvLineList_2 = csv2[lineCount].split(",")
if search_string == csvLineList_2[0]:
f1.write(csvLineList_2[2])
f1.close() #close saved file
f.close() #close source file
#csv1.close()
#csv2.close()

OK, so that error message is an IndexError: list index out of range in the line csvLineList_2 = csv2[lineCount].split(","). There's only one indexing happening there, so apparently lineCount is too big for csv2.
lineCount is one of the values of range(len(csv1)). That makes it automatically in range for csv1. Apparently csv1 and csv2 are not the same length, causing the IndexError.
Now that's quite possible, because they contain lines from different files. Apparently the files don't have equal number of lines.
To be honest I have no clue why you are reading the lines into csv1 at all. You loop over those lines and split them (into the variable csvLineList_1), but you never use that variable.
I think your loop should just be:
for line in csv2:
parts = line.strip().split(",") # line.strip() removes whitespace and the newline
# at the end of the line
if search_string == parts[0]:
f1.write(parts[2] + "\n") # Add a newline, you probably want it
I hope this helps.

The error you're getting is probably due to the file lengths not being equal.
It's not exactly clear from what you've written, what you're hoping to do. It looks to me like (maybe) you want to find a search term in "master file", and if you find it, write the line you find to the "save file". It also looks to me like you want to find that same search term in the very first field of the "check file", and if you find it, write the contents of the third field into the "save file". If that's wrong, it's because your code has bugs.
Either way, there's a bunch of issues in the code you've posted, and you're probably going to get at least some mileage out of using the csv module to do what you're trying to do.
Maybe post a fuller problem description.
Edit:
import csv
import sys
def build_state_lookup(fn):
with open(fn) as infile:
reader = csv.reader(infile)
# throw away first line
reader.next()
# now build a dictionary mapping state to region
lookup = {state: region for (state, _, region) in reader}
return lookup
def process_big_file(in_fn, checkfile, out_fn):
lookup = build_state_lookup()
with open(in_fn) as infile:
with open(out_fn, 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# output the header row
writer.writerow(reader.next() + ['Region'])
for row in reader:
state = row[0]
region = lookup.get(state, "No Region Found")
row.append(region)
writer.writerow(row)
def main():
process_big_file(*sys.argv[1:])
if __name__ == '__main__':
main()

Related

Find coincidence and add column

I want to achieve this specific task, I have 2 files, the first one with emails and credentials:
xavier.desprez#william.com:Xavier
xavier.locqueneux#william.com:vocojydu
xaviere.chevry#pepe.com:voluzigy
Xavier.Therin#william.com:Pussycat5
xiomara.rivera#william.com:xrhj1971
xiomara.rivera#william-honduras.william.com:xrhj1971
and the second one, with emails and location:
xavier.desprez#william.com:BOSNIA
xaviere.chevry#pepe.com:ROMANIA
I want that, whenever the email from the first file is found on the second file, the row is substituted by EMAIL:CREDENTIAL:LOCATION , and when it is not found, it ends up being: EMAIL:CREDENTIAL:BLANK
so the final file must be like this:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I have do several tries in python, but it is not even worth it to write it because I am not really close to the solution.
Regards !
EDIT:
This is what I tried:
import os
import sys
with open("test.txt", "r") as a_file:
for line_a in a_file:
stripped_email_a = line_a.strip().split(':')[0]
with open("location.txt", "r") as b_file:
for line_b in b_file:
stripped_email_b = line_b.strip().split(':')[0]
location = line_b.strip().split(':')[1]
if stripped_email_a == stripped_email_b:
a = line_a + ":" + location
print(a.replace("\n",""))
else:
b = line_a + ":BLANK"
print (b.replace("\n",""))
This is the result I get:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.desprez#william.com:Xavier:BLANK
xaviere.chevry#pepe.com:voluzigy:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
xavier.locqueneux#william.com:vocojydu:BLANK
xavier.locqueneux#william.com:vocojydu:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I am very close but I get duplicates ;)
Regards

The duplication issue comes from the fact that you are reading two files in a nested way, once a line from the test.txt is read, you open the location.txt file for reading and process it. Then, you read the second line from test.txt, and re-open the location.txt and process it again.
Instead, get all the necessary data from the location.txt, say, into a dictionary, and then use it while reading the test.txt:
email_loc_dict = {}
with open("location.txt", "r") as b_file:
for line_b in b_file:
splits = line_b.strip().split(':')
email_loc_dict[splits[0]] = splits[1]
with open("test.txt", "r") as a_file:
for line_a in a_file:
line_a = line_a.strip()
stripped_email_a = line_a.split(':')[0]
if stripped_email_a in email_loc_dict:
a = line_a + ":" + email_loc_dict[stripped_email_a]
print(a)
else:
b = line_a + ":BLANK"
print(b)
Output:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK

I am wondering why my search phonebook function is not working correctly

I currently have a working phonebook with 4 options. The only thing is, the search option is not printing all of the matches.
If I type a name and there is a match in the phonebook it will print that line into a text file (phone.txt02, which is blank).
Then after all of the matches have been printed to the text file, my program reads the new text file and returns them in my preferred format.
The readFile function isn't working properly right now for the new text file.
It works fine on phone.txt (the original text file) which contains the same information... Names and numbers separated by a comma.
Because this works, I cannot figure out why the readFile function will not work for phone02.txt when the values are also name,number \n
def readFile1(filename):
phonebook = []
file = open(filename, "r")
for aline in file:
person = aline.split(",")
if person[1][-1] == '\n' :
pn = person.pop(1)
person.append(pn[:-1])
phonebook.append(person)
elif person[1][-1] != '\n' :
phonebook.append(person)
file.close()
return phonebook
def printEntries1(phonebook):
readFile1("phone02.txt")
print("Name Phone Number")
print("------------------- --------------")
for i in range (len(phonebook)):
person = phonebook[i]
print(i,"{:<20s} {:>14s}".format(person[0],person[1]))
print("------------------- --------------")
def searchEntry():
search = input("Type a name to search for")
with open("phone.txt", "r") as file:
lines = file.readlines()
for line in lines:
if search in line:
outfile = open("phone02.txt", "a")
outfile.write(line)
phonebook = readFile1("phone02.txt")
print(readFile1("phone02.txt"))
printEntries1(phonebook)
outfile = open("phone02.txt", "r+")
outfile.truncate()
print(searchEntry())
I am not sure how to have the printEntries print all of the matches (name and number) from phone02.txt
Here is an example of the phone.txt file
Polly,549-5393
Bud Wieser,(213) 477-3928
Jack,277-4829
Mike Dunleavy,335-3453
Robert Darn,219-473-4373
Earl Lee,703-304-8393
Tim Bean,(612) 493-2629
Bud,(701) 487-8522
If I were to input "Bud" it would print the 2 lines that contain bud to phone02.txt but not print them correctly.
It seems that (in this example) when the 2 lines containing Bud are put into phone02.txt, only the first line is printing
Name. Number
------------------- ------------------
0 Bud Wieser. (218) 477-3928
I know this is a lot of information for most likely an easy fix, but I think it should help with the issue.
Thanks for any help.

You never closed the writer to phone02.txt. All new lines will be saved in the file only after you close the file writer or directly tell python to do save the changes to the file. Please try
if search in line:
outfile = open("phone02.txt", "a")
outfile.write(line)
outfile.close()
Anyway, you open the file for reading many times -- please either do it once before the loop and close afterwards, or open and close the writer for each iteration.

Why is for loop iterated only once?

I'm rather new to coding, yet as I have to write a few letters I wanted to write a script to change the name within this letter automatically.
I've got a textfile with placeholders for the name and a csv-file where the names are stored in the following format:
Surname;Firstname
Doe;John
Norris;Chuck
...
Now I've conjured up this script:
import csv
import re
letterPATH = "Brief.txt"
tablePATH = "Liste.csv"
with open(letterPATH, "r") as letter, open(tablePATH, "r") as table:
table = csv.reader(table, delimiter=";")
rows = list(table)
rows = rows[1::]
print(rows)
for (surname, firstname) in rows:
#Check if first- and surname have correct output
#print(firstname)
#print(surname)
for lines in letter:
new_content = ""
print(lines)
lines = re.sub(r"\<Nachname\>", surname, lines)
print(lines)
lines = re.sub(r"\<Vorname\>", firstname, lines)
print(lines)
new_content += lines
with open(surname + firstname +".txt", "w") as new_letter:
new_letter.writelines(new_content)
I've got the following problem now:
There's a file created a textfile for each entry as it should (JohnDoe.txt, ChuckNorris.txt and so on) however only the first file has the correct content, while the others are empty.
While debugging I've seen that the for-loop in line 18 is only iterated once and the with statement is iterated multiple times as it should.
I simply do not understand why the for-loop isn't iterating.
Cheers and thanks for your help! :)

letter is a file. A file keeps track of how much you've read and where the next read should be. So if you've read two lines, then the next read will be on the third line, and so on.
Since you read through the whole file the first time, the next iterations it'll not read any more lines from the file, since you've already read them.
The solution could be to reset the file pointer (the thing pointing to where in the file you've currently read to) to the beginning with the letter.seek(0) method. Or, you could simply store the file content in a list directly and iterate over the list.
import csv
import re
letterPATH = "Brief.txt"
tablePATH = "Liste.csv"
with open(letterPATH, "r") as letter_file, open(tablePATH, "r") as table:
table = csv.reader(table, delimiter=";")
letter = list(letter_file) # Add all content to a list instead.
rows = list(table)
rows = rows[1::]
print(rows)
for (surname, firstname) in rows:
#Check if first- and surname have correct output
#print(firstname)
#print(surname)
for lines in letter:
new_content = ""
print(lines)
lines = re.sub(r"\<Nachname\>", surname, lines)
print(lines)
lines = re.sub(r"\<Vorname\>", firstname, lines)
print(lines)
new_content += lines
with open(surname + firstname +".txt", "w") as new_letter:
new_letter.writelines(new_content)

Searching for a term in a text file using python

I'm really desperate for some help on this python code please. I need to search for a variable (string), return it and the data present on the same line as the variable data.
I've managed to create a variable and then search for the variable in a text file, however if the data contained in the variable is found in the text file the contents of the whole text file is printed out not the line in which the variable data exists.
This is my code so far, please help:
number = input("Please enter the number of the item that you want to find:")
f = open("file.txt", "r")
lines = f.read()
if lines.find("number"):
print (lines)
else:
f.close
Thank you in advance.

See my changes below:
number = input("Please enter the number of the item that you want to find:")
f = open("file.txt", "r")
lines = f.read()
for line in lines: # check each line instead
if number in line: # if the number you're looking for is present
print(line) # print it

It goes like
lines_containg_number = [line for line in lines if number in line]
What this'll do is give you all the lines in the text file in the form of a list and then you can simply print out the contents of the list...

If you use 'with' loop, you don't have to close file. It will be handled by with. Otherwise you have to use f.close(). Solution:
number = input("Please enter the number of the item that you want to find:")
with open('file.txt', 'r') as f:
for line in f:
if number in line:
print line

Printing to a file via Python

Hopefully this is an easy fix. I'm trying to edit one field of a file we use for import, however when I run the following code it leaves the file blank and 0kb. Could anyone advise what I'm doing wrong?
import re #import regex so we can use the commands
name = raw_input("Enter filename:") #prompt for file name, press enter to just open test.nhi
if len(name) < 1 : name = "test.nhi"
count = 0
fhand = open(name, 'w+')
for line in fhand:
words = line.split(',') #obtain individual words by using split
words[34] = re.sub(r'\D', "", words[34]) #remove non-numeric chars from string using regex
if len(words[34]) < 1 : continue # If the 34th field is blank go to the next line
elif len(words[34]) == 2 : "{0:0>3}".format([words[34]]) #Add leading zeroes depending on the length of the field
elif len(words[34]) == 3 : "{0:0>2}".format([words[34]])
elif len(words[34]) == 4 : "{0:0>1}".format([words[34]])
fhand.write(words) #write the line
fhand.close() # Close the file after the loop ends

I have taken below text in 'a.txt' as input and modified your code. Please check if it's work for you.
#Intial Content of a.txt
This,program,is,Java,program
This,program,is,12Python,programs
Modified code as follow:
import re
#Reading from file and updating values
fhand = open('a.txt', 'r')
tmp_list=[]
for line in fhand:
#Split line using ','
words = line.split(',')
#Remove non-numeric chars from 34th string using regex
words[3] = re.sub(r'\D', "", words[3])
#Update the 3rd string
# If the 3rd field is blank go to the next line
if len(words[3]) < 1 :
#Removed continue it from here we need to reconstruct the original line and write it to file
print "Field empty.Continue..."
elif len(words[3]) >= 1 and len(words[3]) < 5 :
#format won't add leading zeros. zfill(5) will add required number of leading zeros depending on the length of word[3].
words[3]=words[3].zfill(5)
#After updating 3rd value in words list, again creating a line out of it.
tmp_str = ",".join(words)
tmp_list.append(tmp_str)
fhand.close()
#Writing to same file
whand = open("a.txt",'w')
for val in tmp_list:
whand.write(val)
whand.close()
File content after running code
This,program,is,,program
This,program,is,00012,programs

The file mode 'w+' Truncates your file to 0 bytes, so you'll only be able to read lines that you've written.
Look at Confused by python file mode "w+" for more information.
An idea would be to read the whole file first, close it, and re-open it to write files in it.

Not sure which OS you're on but I think reading and writing to the same file has undefined behaviour.
I guess internally the file object holds the position (try fhand.tell() to see where it is). You could probably adjust it back and forth as you went using fhand.seek(last_read_position) but really that's asking for trouble.
Also, I'm not sure how the script would ever end as it would end up reading the stuff it had just written (in a sort of infinite loop).
Best bet is to read the entire file first:
with open(name, 'r') as f:
lines = f.read().splitlines()
with open(name, 'w') as f:
for l in lines:
# ....
f.write(something)

For 'Printing to a file via Python' you can use:
ifile = open("test.txt","r")
print("Some text...", file = ifile)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python compare bombs if files not sorted - python

Related

Find coincidence and add column

I am wondering why my search phonebook function is not working correctly

Why is for loop iterated only once?

Searching for a term in a text file using python

Printing to a file via Python

Categories

Resources