Python csv read file, what if I don't close the file? - python

I use the following code to read a csv file
f = csv.reader(open(filename, 'rb'))
Then there's no way I can close filename, right? Is there any harm of doing so or is there a better way of reading it?

There is, use context managers:
with open(filename, 'rb') as handle:
f = csv.reader(handle)
In general an open unused file descriptor is a resource leak and should be avoided.
Interestingly in the case of files, at least the file descriptor is released, as soon as there is no reference to the file any more (see also this answer):
#!/usr/bin/env python
import gc
import os
import subprocess
# GC thresholds (http://docs.python.org/3/library/gc.html#gc.set_threshold)
print "Garbage collection thresholds: {}".format(gc.get_threshold())
if __name__ == '__main__':
pid = os.getpid()
print('------- No file descriptor ...')
subprocess.call(['lsof -p %s' % pid], shell=True)
x = open('/tmp/test', 'w')
print('------- Reference to a file ...')
subprocess.call(['lsof -p %s' % pid], shell=True)
x = 2
print('------- FD is freed automatically w/o GC')
subprocess.call(['lsof -p %s' % pid], shell=True)

Related

How to read and write the same file at a time in python

There are three python programs, writer program (writer.py) writes in to the file output.txt and two reader programs (reader_1.py, reader_2.py) read from the same output.txt file at a same time.
What is the best way to achieve the synchronization between these three programs?
How to avoid reading by the reader program, if the other program is writing in to the output file?
How to handle single writer and multiple readers problem efficiently in python?
I tried to implement the fnctl locking mechanism, but this module is not found in my python.
writer.py
#!/usr/bin/python
import subprocess
import time
cycle = 10
cmd="ls -lrt"
def poll():
with open("/home/output.txt", 'a') as fobj:
fobj.seek(0)
fobj.truncate()
try:
subprocess.Popen(cmd, shell=True, stdout=fobj)
except Exception:
print "Exception Occured"
# Poll the Data
def do_poll():
count = int(time.time())
while True:
looptime = int(time.time())
if (looptime - count) >= cycle:
count = int(time.time())
print('Begin polling cycle')
poll()
print('End polling cycle')
def main():
do_poll()
if __name__ == "__main__":
main()
reader_1.py
#!/usr/bin/python
with open("/home/output10.txt", 'r') as fobj:
f=fobj.read()
print f
reader_2.py
#!/usr/bin/python
with open("/home/output10.txt", 'r') as fobj:
f=fobj.read()
print f
Note: reader_1.py and reader_2.py runs continuously in while loop.
Due to this reason same file being accessed by three programs at same time.
Looking for ideas.
Solution #1: Added fnctl locking mechanism to writer.py program. But not sure this is efficiently locking the file.
#!/usr/bin/python
import subprocess
import time
import os
import fcntl, os
report_cycle = 2
cmd='ls -lrt'
def poll(devnull):
with open("/home/output10.txt", 'a') as fobj:
try:
fcntl.flock(fobj, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
print "flock() failed to hold an exclusive lock."
fobj.seek(0)
fobj.truncate()
try:
subprocess.call(cmd, shell=True, stdout=fobj, stderr=devnull)
except Exception:
print "Exception Occured"
# Unlock file
try:
fcntl.flock(fobj, fcntl.LOCK_UN)
except:
print "flock() failed to unlock file."

Shared file access between Python and Matlab

I have a Matlab application that writes in to a .csv file and a Python script that reads from it. These operations happen concurrently and at their own respective periods (not necessarily the same). All of this runs on Windows 7.
I wish to know :
Would the OS inherently provide some sort of locking mechanism so that only one of the two applications - Matlab or Python - have access to the shared file?
In the Python application, how do I check if the file is already "open"ed by Matlab application? What's the loop structure for this so that the Python application is blocked until it gets access to read the file?
I am not sure about window's API for locking files
Heres a possible solution:
While matlab has the file open, you create an empty file called "data.lock" or something to that effect.
When python tries to read the file, it will check for the lock file, and if it is there, then it will sleep for a given interval.
When matlab is done with the file, it can delete the "data.lock" file.
Its a programmatic solution, but it is simpler than digging through the windows api and finding the right calls in matlab and python.
If Python is only reading the file, I believe you have to lock it in MATLAB because a read-only open call from Python may not fail. I am not sure how to accomplish that, you may want to read this question atomically creating a file lock in MATLAB (file mutex)
However, if you are simply consuming the data with python, did you consider using a socket instead of a file?
In Windows on the Python side, CreateFile can be called (directly or indirectly via the CRT) with a specific sharing mode. For example, if the desired sharing mode is FILE_SHARE_READ, then the open will fail if the file is already open for writing. If the latter call instead succeeds, then a future attempt to open the file for writing will fail (e.g. in Matlab).
The Windows CRT function _wsopen_s allows setting the sharing mode. You can call it with ctypes in a Python 3 opener:
import sys
import os
import ctypes as ctypes
import ctypes.util
__all__ = ['shdeny', 'shdeny_write', 'shdeny_read']
_SH_DENYRW = 0x10 # deny read/write mode
_SH_DENYWR = 0x20 # deny write mode
_SH_DENYRD = 0x30 # deny read
_S_IWRITE = 0x0080 # for O_CREAT, a new file is not readonly
if sys.version_info[:2] < (3,5):
_wsopen_s = ctypes.CDLL(ctypes.util.find_library('c'))._wsopen_s
else:
# find_library('c') may be deprecated on Windows in 3.5, if the
# universal CRT removes named exports. The following probably
# isn't future proof; I don't know how the '-l1-1-0' suffix
# should be handled.
_wsopen_s = ctypes.CDLL('api-ms-win-crt-stdio-l1-1-0')._wsopen_s
_wsopen_s.argtypes = (ctypes.POINTER(ctypes.c_int), # pfh
ctypes.c_wchar_p, # filename
ctypes.c_int, # oflag
ctypes.c_int, # shflag
ctypes.c_int) # pmode
def shdeny(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYRW, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
def shdeny_write(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYWR, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
def shdeny_read(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYRD, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
For example:
if __name__ == '__main__':
import tempfile
filename = tempfile.mktemp()
fw = open(filename, 'w')
fw.write('spam')
fw.flush()
fr = open(filename)
assert fr.read() == 'spam'
try:
f = open(filename, opener=shdeny_write)
except PermissionError:
fw.close()
with open(filename, opener=shdeny_write) as f:
assert f.read() == 'spam'
try:
f = open(filename, opener=shdeny_read)
except PermissionError:
fr.close()
with open(filename, opener=shdeny_read) as f:
assert f.read() == 'spam'
with open(filename, opener=shdeny) as f:
assert f.read() == 'spam'
os.remove(filename)
In Python 2 you'll have to combine the above openers with os.fdopen, e.g.:
f = os.fdopen(shdeny_write(filename, os.O_RDONLY|os.O_TEXT), 'r')
Or define an sopen wrapper that lets you pass the share mode explicitly and calls os.fdopen to return a Python 2 file. This will require a bit more work to get the file mode from the passed in flags, or vice versa.

How to test a directory of files for gzip and uncompress gzipped files in Python using zcat?

I'm in my 2nd week of Python and I'm stuck on a directory of zipped/unzipped logfiles, which I need to parse and process.
Currently I'm doing this:
import os
import sys
import operator
import zipfile
import zlib
import gzip
import subprocess
if sys.version.startswith("3."):
import io
io_method = io.BytesIO
else:
import cStringIO
io_method = cStringIO.StringIO
for f in glob.glob('logs/*'):
file = open(f,'rb')
new_file_name = f + "_unzipped"
last_pos = file.tell()
# test for gzip
if (file.read(2) == b'\x1f\x8b'):
file.seek(last_pos)
#unzip to new file
out = open( new_file_name, "wb" )
process = subprocess.Popen(["zcat", f], stdout = subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
if process.poll() != None:
break;
output = io_method(process.communicate()[0])
exitCode = process.returncode
if (exitCode == 0):
print "done"
out.write( output )
out.close()
else:
raise ProcessException(command, exitCode, output)
which I've "stitched" together using these SO answers (here) and blogposts (here)
However, it does not seem to work, because my test file is 2.5GB and the script has been chewing on it for 10+mins plus I'm not really sure if what I'm doing is correct anyway.
Question:
If I don't want to use GZIP module and need to de-compress chunk-by-chunk (actual files are >10GB), how do I uncompress and save to file using zcat and subprocess in Python?
Thanks!
This should read the first line of every file in the logs subdirectory, unzipping as required:
#!/usr/bin/env python
import glob
import gzip
import subprocess
for f in glob.glob('logs/*'):
if f.endswith('.gz'):
# Open a compressed file. Here is the easy way:
# file = gzip.open(f, 'rb')
# Or, here is the hard way:
proc = subprocess.Popen(['zcat', f], stdout=subprocess.PIPE)
file = proc.stdout
else:
# Otherwise, it must be a regular file
file = open(f, 'rb')
# Process file, for example:
print f, file.readline()

Python - Passing data to a jar file as input stream

I have a jar file that I can send data to for processing, the data is in json format.
data_path is a path to a file that has the data. Below works great.. However the data I have is not going to be a file, but in a variable. Below command does not work with a variable, it tries to read the data passed as a literal directory path to file.. Would it be a different bash command ? or something I can do with the subprocess module? Thanks!
import subprocess as sub
cmd = "java -jar %s < %s" % (jar_path, data_path)
# send data in a var
# cmd = "java -jar %s < %s" % (jar_path, data)
proc = sub.Popen(cmd, stdin=sub.PIPE, stdout=sub.PIPE, shell=True)
(out, err) = proc.communicate()
You can write it to a temporary file and pass that:
import tempfile
with tempfile.NamedTemporaryFile() as f:
f.write(data)
f.flush()
cmd = "java -jar %s < %s" % (jar_path, f.name)
...
The temp file will delete itself when the context ends.
#FedorGogolev had deleted answers going for a Popen stdin approach that weren't quite working for your specific needs. But it was a good approach so I credit him, and thought I would add the working version of what he was going for...
import tempfile
with tempfile.TemporaryFile() as f:
f.write(data)
f.flush()
f.seek(0)
cmd = "java -jar %s" % jar_path
p = subprocess.Popen(cmd, shell=True, stdin=f, stdout=subprocess.PIPE)
...
If you are passing the file object as the stdin arg you have to make sure to seek it to 0 position first.

How to redirect stderr of a program that is run using os.system by a third-party python library

I use external library, like this:
from some_lib import runThatProgram
infile = '/tmp/test'
outfile = '/tmp/testout'
runThatProgram(infile, outfile)
while runThatProgram is:
def runThatProgram(infile, outfile):
os.system("%s %s > %s" % ('thatProgram', infile, outfile))
The problem is that 'thatProgram' returns lots of stuff on STDERR, I want to redirect it to a file, but I cannot edit runThatProgram code because it is in third party lib!
To illustrate what Rosh Oxymoron said, you can hack the code like this :
from some_lib import runThatProgram
infile = '/tmp/test'
outfile = '/tmp/testout 2>&1'
runThatProgram(infile, outfile)
with this, it will call
thatProgram /tmp/test > /tmp/testout 2>&1
that will redirected stderr (2) to stdout (1), and everything will be logged in your outfile.
To elaborate on using subprocess, you can open it, give it a pipe and then work from there so
import subprocess
program = "runthatprogram.py".split()
process = subprocess.Popen(program, stdout = subprocess.PIPE, stderr = open('stderr','w')) #stderr to fileobj
process.communicate()[0] #display stdout

Categories

Resources