Use process substitution as input file to Python twice - python

Consider the following python script
#test.py
import sys
inputfile=sys.argv[1]
with open(inputfile,'r') as f:
for line in f.readlines():
print line
with open(inputfile,'r') as f:
for line in f.readlines():
print line
Now I want to run test.py on a substituted process, e.g.,
python test.py <( cat file | head -10)
It seems the second f.readlines returns empty. Why is that and is there a way to do it without having to specify two input files?

Why is that.
Process substitution works by creating a named pipe. So all the data consumed at the first open/read loop.
Is there a way to do it without having to specify two input files.
How about buffering the data before using it.
Here is a sample code
import sys
import StringIO
inputfile=sys.argv[1]
buffer = StringIO.StringIO()
# buffering
with open(inputfile, 'r') as f:
buffer.write(f.read())
# use it
buffer.seek(0)
for line in buffer:
print line
# use it again
buffer.seek(0)
for line in buffer:
print line

readlines() will read all available lines from the input at once. This is why the second call returns nothing because there is nothing left to read. You can assign the result of readlines() to a local variable and use it as many times as you want:
import sys
inputfile=sys.argv[1]
with open(inputfile,'r') as f:
lines = f.readlines()
for line in lines:
print line
#use it again
for line in lines:
print line

Related

reading file only once throughout the other functions

with open('sample.txt', 'r') as f:
def function1():
file = f.readlines()
...code that will read the file and modify
def function2():
file = f.readlines()
...code that will read the file and modify
with open('output.txt', 'w') as outputFile:
for file in file:
function1()
function2()
Here is my code. I am trying to read the file only once. I have functions that will read different parts from the file and write it as in output.txt file.
I tried but it is giving me an error "ValueError: I/O operation on closed file."
helpp
If you're reading all of the file in each function, you're better off doing something like the following:
with open('sample.txt','r') as f:
file = f.readlines()
function1(file) # so don't readline multiple times
function2(file) # in your function just operate on data
with open('output.txt', 'w') as f:
f.writelines(file)
Firstly, some notes:
The for file in file piece means "For each line in the file I will do the following".
Your 2 functions are not indented (I think) so that could cause an issue also.
f.readlines() takes the whole file and stores it as the variable named file.
The best approach to this would be to read the file 1 time with file = f.readlines(). Now that file has all the lines, loop over those lines while making any changes that you need to make. For each line, save that line to a new file (look up how append works).
Right now you aren't printing anything out which makes debugging very hard when you are new, so start with this:
def my_change_text_function(line):
#here you can write code that will have the 1 line available to change.
changed_line = ......
return changed_line
f = open("pok.txt")
newfile = open("newfile.txt", "a")
file = f.readlines()
for line in file:
print(line)
changed_line = my_change_text_function(line)
#Do your changes to the line here, character replacement, etc.
newfile.write(changed_line)
Now you will have a new file named newfile.txt that contains your changes. This is all of the code required, minus the code you need to modify the line.

How do you read lines from all files in a directory?

I have two files in a directory. I'd like to read the lines from each of the files. Unfortunately when I try to do so with the following code, there is no output.
from pathlib import Path
p = Path('tmp')
for file in p.iterdir():
print(file.name)
functions.py
test.txt
for file in p.iterdir():
f = open(file, 'r')
f.readlines()
You're reading all the lines from the file, but you're not outputting them. If you want to print the lines to standard output, you need to use print() as you did in your first example.
You can also write this somewhat more elegantly using contexts and more iterators:
from pathlib import Path
file = Path('test.txt')
with file.open() as open_file:
for line in open_file:
print(line, end="")
test.txt:
Spam
Spam
Spam
Wonderful
Spam!
Spamity
Spam
Result:
Spam
Spam
Spam
Wonderful
Spam!
Spamity
Spam
Using a context for opening the file (with file.open()) means you inherently set up closing the file, and the iterator for the lines (for line in open_file) means you're not loading the whole file at once (an important consideration with larger files).
Setting end="" in print() is optional depending on how your source files are structured, as you might otherwise end up printing extra blank lines in your output.
You could use fileinput:
import os
import fileinput
for line in fileinput.input(os.listdir('.')):
print(line)
You should print data like this from text.py
count = 1
f = open(file, 'r')
Lines = f.readlines()
for line in Lines:
count += 1
print("Line {}: {}".format(count, line.strip()))
Output will look like:
Line 1: ...
Line 2: ...
Line 3: ...
you can see reading line example here - Line reading

How to open and print the contents of a file line by line using subprocess?

I am trying to write a python script which SSHes into a specific address and dumps a text file. I am currently having some issues. Right now, I am doing this:
temp = "cat file.txt"
need = subprocess.Popen("ssh {host} {cmd}".format(host='155.0.1.1', cmd=temp),shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
print(need)
This is the naive approach where I am basically opening the file, saving its output to a variable and printing it. However, this really messes up the format when I print "need". Is there any way to simply use subprocess and read the file line by line? I have to be SSHed into the address in order to dump the file otherwise the file will not be detected, that is why I am not simply doing
f = open(temp, "r")
file_contents = f.read()
print (file_contents)
f.close()
Any help would be appreciated :)
You don't need to use the subprocess module to print the entire file line by line. You can use pure python.
f = open(temp, "r")
file_contents = f.read()
f.close()
# turn the file contents into a list
file_lines = file_contents.split("\n")
# print all the items in the list
for file_line in file_lines:
print(file_line)

Replacing a line in an already opened file python [duplicate]

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.
What is the best way to do this, within the following code?
f = open(file)
for line in f:
if line.contains('foo'):
newline = line.replace('foo', 'bar')
# how to write this newline back to the file
The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:
import fileinput
for line in fileinput.input("test.txt", inplace=True):
print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
# print "%d: %s" % (fileinput.filelineno(), line), # for Python 2
What happens here is:
The original file is moved to a backup file
The standard output is redirected to the original file within the loop
Thus any print statements write back into the original file
fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.
While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.
There are two options:
The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
def replace(file_path, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
with fdopen(fh,'w') as new_file:
with open(file_path) as old_file:
for line in old_file:
new_file.write(line.replace(pattern, subst))
#Copy the file permissions from the old file to the new file
copymode(file_path, abs_path)
#Remove original file
remove(file_path)
#Move new file
move(abs_path, file_path)
Here's another example that was tested, and will match search & replace patterns:
import fileinput
import sys
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
Example use:
replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
This should work: (inplace editing)
import fileinput
# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1):
print line.replace("foo", "bar"),
Based on the answer by Thomas Watnedal.
However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis
This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.
Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.
Reading the file as a single string instead of line by line allows for multiline match and replacement.
import re
def replace(file, pattern, subst):
# Read contents from file as a single string
file_handle = open(file, 'r')
file_string = file_handle.read()
file_handle.close()
# Use RE package to allow for replacement (also allowing for (multiline) REGEX)
file_string = (re.sub(pattern, subst, file_string))
# Write contents to file.
# Using mode 'w' truncates the file.
file_handle = open(file, 'w')
file_handle.write(file_string)
file_handle.close()
As lassevk suggests, write out the new file as you go, here is some example code:
fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:
import re
def replace( filePath, text, subs, flags=0 ):
with open( filePath, "r+" ) as file:
fileContents = file.read()
textPattern = re.compile( re.escape( text ), flags )
fileContents = textPattern.sub( subs, fileContents )
file.seek( 0 )
file.truncate()
file.write( fileContents )
A more pythonic way would be to use context managers like the code below:
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with open(target_file_path, 'w') as target_file:
with open(source_file_path, 'r') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
You can find the full snippet here.
fileinput is quite straightforward as mentioned on previous answers:
import fileinput
def replace_in_file(file_path, search_text, new_text):
with fileinput.input(file_path, inplace=True) as file:
for line in file:
new_line = line.replace(search_text, new_text)
print(new_line, end='')
Explanation:
fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
end='' in print statement is to eliminate intermediate blank new lines.
You can used it as follows:
file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.
Expanding on #Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:
import codecs
from tempfile import mkstemp
from shutil import move
from os import remove
def replace(source_file_path, pattern, substring):
fh, target_file_path = mkstemp()
with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
for line in source_file:
target_file.write(line.replace(pattern, substring))
remove(source_file_path)
move(target_file_path, source_file_path)
Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.
import re
fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
newline = p.sub('',line) # replace matching strings with empty string
print newline
fout.write(newline)
fin.close()
fout.close()
if you remove the indent at the like below, it will search and replace in multiple line.
See below for example.
def replace(file, pattern, subst):
#Create temp file
fh, abs_path = mkstemp()
print fh, abs_path
new_file = open(abs_path,'w')
old_file = open(file)
for line in old_file:
new_file.write(line.replace(pattern, subst))
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)

Parsing a line from an ASCII HDR file python

I am having difficulty parsing a line from an hdr file I have. When I print read (data) like in the code below the command window outputs the contents of the hdr file. However, when I try to parse out a line or a column , like the script below, it outputs nothing in the command window.
import numpy as np
import matplotlib.pyplot as plt
f = open('zz_ssmv11034tS__T0001TTNATS2012021505HP001.Hdr', 'r')
data = f.read()
print (data)
for line in f:
columns = line.split()
time = float(columns[2])
print (time)
f.close()
Remove this two lines and execute your code again:
data = f.read()
print (data)
Then change your loop:
for line in f.readlines():
columns = line.split()
time = float(columns[2])
print (time)
Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines()
Read the post Why can't I call read() twice on an open file?

Categories

Resources