Reading Two Files and Writing To One File Using Python3 - python

I'm currently using Python 3 on Ubuntu 18.04. I'm not a programmer by any means and I'm not asking for a code review, however, I'm having an issue that I can't seem to resolve.
I have 1 text file named content.txt that I'm reading lines from.
I have 1 text file named standard.txt that I'm reading lines from.
I have 1text file named outfile.txt that I'm writing to.
content = open("content.txt", "r").readlines()
standard = open("standard.txt", "r").readlines()
outfile = "outfile.txt"
outfile_set = set()
with open(outfile, "w") as f:
for line in content:
if line not in standard:
outfile_set.add(line)
f.writelines(sorted(outfile_set))
I'm not sure where to put the following line though. My for loop nesting may all be off:
f.write("\nNo New Content")
Any code examples to make this work would be most appreciated. Thank you.

if i understand good you whant to add outfile_set if this is not empty to the outfile or add the string "\nNo New Content"
Replace the line
f.writelines(sorted(outfile_set))
to
if any(outfile_set):
f.writelines(sorted(outfile_set))
else:
f.write("\nNo New Content")

I'm assuming that you want to write "No new content" to the file if every line in content is in standard. So you might do something like:
with open(outfile, "w") as f:
for line in content:
if line not in standard:
outfile_set.add(line)
if len(outfile_set) > 0:
f.writelines(sorted(outfile_set))
else:
f.write("\nNo New Content")
Your original code was almost there!

You can reduce your runtime a lot by using set/frozenset:
with open("content.txt", "r") as f:
content = frozenset(f.readlines()) # only get distinct values from file
with open("standard.txt", "r") as f:
standard = frozenset(f.readlines()) # only get distinct values from file
# only keep whats in content but not in standard
outfile_set = sorted(content-standard) # set difference, no loops or tests needed
with open ("outfile.txt","w") as outfile:
if outfile_set:
outfile.writelines(sorted(outfile_set))
else:
outfile.write("\nNo New Content")
You can read more about it here:
set operator list (python 2 - but valid for 3 - can't find this overview in py3 doku
set difference
Demo:
# Create files
with open("content.txt", "w") as f:
for n in map(str,range(1,10)): # use range(1,10,2) for no changes
f.writelines(n+"\n")
with open("standard.txt", "w") as f:
for n in map(str,range(1,10,2)):
f.writelines(n+"\n")
# Process files:
with open("content.txt", "r") as f:
content = set(f.readlines())
with open("standard.txt", "r") as f:
standard = set(f.readlines())
# only keep whats in content but not in standard
outfile_set = sorted(content-standard)
with open ("outfile.txt","w") as outfile:
if outfile_set:
outfile.writelines(sorted(outfile_set))
else:
outfile.write("\nNo New Content")
with open ("outfile.txt") as f:
print(f.read())
Output:
2
4
6
8
or
No New Content

Related

How to fix Python 3 code to extract specific lines from a text file

I'm trying to extract specific lines from a 4.7 GB text file into another text file.
I'm pretty new to python 3.7.1 and this was the best code I could come up with.
Here is a sample of what the text file looks like:
C00629618|N|TER|P|201701230300133512|15C|IND|DOE, JOHN A|PLEASANTVILLE|WA|00000|PRINCIPAL|DOUBLE NICKEL ADVISORS|01032017|40|H6CA34245|SA01251735122|1141239|||2012520171368850783
C00501197|N|M2|P|201702039042410893|15|IND|DOE, JANE|THE LODGE|GA|00000|UNUM|SVP, CORPORATE COMMUNICATIONS|01312017|230||PR1890575345050|1147350||P/R DEDUCTION ($115.00 BI-WEEKLY)|4020820171370029335
C00177436|N|M2|P|201702039042410893|15|IND|DOE, JOHN|RED ROOM|ME|00000|UNUM|SVP, DEPUTY GENERAL COUNSEL, BUSINESS|01312017|384||PR2260663445050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029336
C00177436|N|M2|P|201702039042410895|15|IND|PALMER, LAURA|TWIN PEAKS|WA|00000|UNUM|EVP, GLOBAL SERVICES|01312017|384||PR2283905245050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029342
C00501197|N|M2|P|201702039042410894|15|IND|COOPER, DALE|TWIN PEAKS|WA|00000|UNUM|SVP, CORP MKTG & PUBLIC RELAT.|01312017|384||PR2283904845050|1147350||P/R DEDUCTION ($192.00 BI-WEEKLY)|4020820171370029339
And this is the code I've written:
import re
with open("data.txt", 'r') as rf:
for line in rf:
field_match = re.match('^(.*):(.*)$',line)
if field_match :
(key) = field_match.groups()
if key == "C00501197" :
print(rec.split('|'))
with open('extracted_data.txt','w') as wf:
wf.write(line)
I need to extract full lines that contain the id C00501197 and then have the program write those extracted lines into another txt file, but as of now it's only extracting one line and that line doesn't begin with the id I want extracted.
Don't use regex if you can avoid it. csv is a good choice, or use simple string manipulation.
ans = []
with open('data.txt') as rf:
for line in rf:
line = line.strip()
if line.startswith("C00501197"):
ans.append(line)
with open('extracted_data.txt', 'w') as wf:
for line in ans:
wf.write(line)
Your output code was a bit busted as well - always wrote out the last line in the file, not the selected records.
You should implement the built in csv module that comes standard with python. It can easily parse each line into a list. Try something like this:
import csv
with open('text.txt', 'r') as file:
my_reader = csv.reader(file, delimiter='|')
for row in my_reader:
if row[0] == 'C00501197':
print(row)
This should output the lines you want. You can then do whatever you want to process them, and save them again.
You don't need to pass through regex, just split the line based on separator and check the nth field you're interested in:
found_lines = []
with open("data.txt", 'r') as rf:
for line_file in rf:
line = line_file.split("|")
if line[0] == "C00501197" :
found_lines.append( line )
with open('extracted_data.txt','w') as wf:
for found_line in found_lines :
wf.write("|".join(map(str,found_line)))
This should work.

Putting items into array

I'm working on a Python project in Visual Studio. I want to process a longer text file, this is a simplified version:
David Tubb
Eduardo Cordero
Sumeeth Chandrashekar
So for reading this file I use this code:
with open("data.txt", "r") as f:
f_contents = f.read()
print(f_contents)
I want to put these items into a new array that looks like that:
['David Tubb','Eduardo Cordero','Sumeeth Chandrashekar']
Is that possible?
Yes, the following code will work for this:
output = [] # the output list
nameFile = open('data.txt', 'r')
for name in nameFile:
# get rid of new line character and add it to your list
output.append(name.rstrip('\n'))
print output
# don't forget to close the file!
nameFile.close()
result = []
with open("data.txt", "r") as f:
result = f.read().splitlines()
print(result)
Output:
['David Tubb', 'Eduardo Cordero', 'Sumeeth Chandrashekar']
The method stated by python for opening a file context is using "with open", this ensures the context will end during clean up.
python.org-pep-0343
dalist = list()
with open('data.txt', 'r') as infile:
for line in infile.readlines():
dalist.append(line)
Additonal resource for contex handeling: https://docs.python.org/3/library/contextlib.html

Save each line as separate .txt file using Notepad++

I am using Notepad++ to restructure some data. Each .txt file has 99 lines. I am trying to run a python script to create 99 single-line files.
Here is the .py script I am currently running, which I found in a previous thread on the topic. I'm not sure why, but it isn't quite doing the job:
yourfile = open('filename.TXT', 'r')
counter = 0
magic = yourfile.readlines()
for i in magic:
counter += 1
newfile = open(('filename_' + str(counter) + '.TXT'), "w")
newfile.write(i)
newfile.close()
When I run this particular script, it simply creates a copy of the host file, and it still has 99 lines.
You may want to change the structure of your script a bit:
with open('filename.txt', 'r') as f:
for i, line in enumerate(f):
with open('filename_{}.txt'.format(i), 'w') as wf:
wf.write(line)
In this format you have the benefit of relying on context managers to close your file handler and also you don't have to read things separately, there isa better logical flow.
You can use the following piece of code to achieve that. It's commented, but feel free to ask.
#reading info from infile with 99 lines
infile = 'filename.txt'
#using context handler to open infile and readlines
with open(infile, 'r') as f:
lines = f.readlines()
#initializing counter
counter = 0
#for each line, create a new file and write line to it.
for line in lines:
#define outfile name
outfile = 'filename_' + str(counter) + '.txt'
#create outfile and write line
with open(outfile, 'w') as g:
g.write(line)
#add +1 to counter
counter += 1
magic = yourfile.readlines(99)
Please try remove '99' like this.
magic = yourfile.readlines()
I tried it and I have 99 file that have a single line each one.

How to save the output of the print statements to a CSV file?

I have written the following to isolate a very specific part of a file:
for line in open('120301.KAP'):
rec = line.strip()
if rec.startswith('PLY'):
print line
The output appears as such
PLY/1,48.107478621032,-69.733975000000
PLY/2,48.163516399836,-70.032838888053
PLY/3,48.270000002883,-70.032838888053
PLY/4,48.270000002883,-69.712824977522
PLY/5,48.192379262383,-69.711801581207
PLY/6,48.191666671083,-69.532840015422
PLY/7,48.033358898628,-69.532840015422
PLY/8,48.033359033880,-69.733975000000
PLY/9,48.107478621032,-69.733975000000
Ideally what I am hoping for is the output to create a CSV file with just the coordinates. The PLY/1, PLY/2, etc. does not need to stay.
Is this doable? If not, at least can the print statements result in a new text file with the same name as the KAP file?
You can use the csv module:
import csv
with open('120301.csv', 'w', newline='') as file:
writer = csv.writer(file)
for line in open('120301.KAP'):
rec = line.strip()
if rec.startswith('PLY'):
writer.writerow(rec.split(','))
In a similar way, the csv.reader can easily read records from your input file.
https://docs.python.org/3/library/csv.html?highlight=csv#module-contents
If you are using Python 2.x, you should open the file in binary mode:
import csv
with open('120301.csv', 'wb') as file:
writer = csv.writer(file)
for line in open('120301.KAP'):
rec = line.strip()
if rec.startswith('PLY'):
writer.writerow(rec.split(','))
You could open the file at the beginning of your code and then just add a write statement after the print line.
Something like this:
target = open(filename, 'w')
for line in open('120301.KAP'):
rec = line.strip()
if rec.startswith('PLY'):
print line
target.write(line)
target.write("\n") #writes a new line
This is totally doable!
Here are a couple of links to some docs for writing/reading CSV:
https://docs.python.org/2/library/csv.html
You could also just make your own CSV with the regular file reading/writing functions.
file = open('data', rw)
output = open('output.csv', w)
file.write('your infos') #add a comma to each string you output?
The simplest way is to redirect stdout to a file:
for i in range(10):
print str(i) + "," + str(i*2)
will output:
0,0
1,2
2,4
3,6
4,8
5,10
6,12
7,14
8,16
9,18
if you run it as python myprog.py > myout.txt the result go to myout.txt

How to read one particular line from .txt file in python?

I know I can read the line by line with
dataFile = open('myfile.txt', 'r')
firstLine = dataFile.readline()
secondLine = dataFile.readline()
...
I also know how to read all the lines in one go
dataFile = open('myfile.txt', 'r')
allLines = dataFile.read()
But my question is how to read one particular line from .txt file?
I wish to read that line by its index.
e.g. I want the 4th line, I expect something like
dataFile = open('myfile.txt', 'r')
allLines = dataFile.readLineByIndex(3)
Skip 3 lines:
with open('myfile.txt', 'r') as dataFile:
for i in range(3):
next(dataFile)
the_4th_line = next(dataFile)
Or use linecache.getline:
the_4th_line = linecache.getline('myfile.txt', 4)
From another Ans
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
You can do exactly as you wanted with this:
DataFile = open('mytext.txt', 'r')
content = DataFile.readlines()
oneline = content[5]
DataFile.close()
you could take this down to three lines by removing oneline = content[5] and using content[5] without creating another variable (print(content[5]) for example) I did this just to make it clear that content[5] must be a used as a list to read the one line.

Categories

Resources