How can I open file on FTP server in write mode? I know I can write/create file directly (when I have data), but I want to first open it for writing and only then write it as you would do locally using contextmanager.
The reasoning is, I want to create interface that would have unified methods to work with transfer protocol servers. Specifically SFTP and FTP.
So with SFTP its easy (using paramiko):
def open(sftp, fname, mode='r'):
return sftp.open(fname, mode=mode)
Now I can do this:
with open(sftp, 'some_file.txt', 'w') as f:
f.write(data)
And then I can read what was written
with open(sftp, 'some_file.txt', 'r') as f:
print(f.read().decode('utf-8'))
How can I do the same implementation for FTP (using ftplib)?
Reading part for FTP, I was able to implement and I can open file in read mode just like with SFTP. But how can I open it in write mode? ftplib method storbinary asks for data to be provided "immediately". I mean I should already pass data I want to write via open method (but then it would defeat unified method purpose)?
import io
def open(ftp, filename, mode='r'):
"""Open a file on FTP server."""
def handle_buffer(buffer_data):
bio.write(buffer_data)
# Reading implementation
if mode == 'r':
bio = io.BytesIO()
ftp.retrbinary(
'RETR %s' % filename, callback=handle_buffer)
bio.seek(0)
return bio
# Writing implementation.
if mode == 'w':
# how to open in write mode?
update
Let say we have immediate writing implementation in FTP:
bio = io.BytesIO
# Write some data
data = csv.writer(bio)
data.writerows(data_to_export)
bio.seek(0)
# Store. So it looks like storbinary does not open file in w mode, it does everything in one go?
ftp.storbinary("STOR " + file_name, sio)
So the question is how can I separate writing data from just opening file in write mode. Is it even possible with ftplib?
So after some struggle, I was able to make this work. Solution was to implement custom contextmanagers for open method when in read (had to reimplement read mode, because it was only working with plain file reading, but was failing if let say I would try to use csv reader) mode and when in write mode.
For read mode, I chose to use tempfile, because using other approaches, I was not able to properly read data using different readers (plain file reader, csv reader etc.). Though when using opened tempfile in read mode, everything works as expected.
For write mode, I was able to utilize memory buffer -> io.BytesIO. So for writing it was not necessary to use tempfile.
import tempfile
class OpenRead(object):
def _open_tempfile(self):
self.tfile = tempfile.NamedTemporaryFile()
# Write data on tempfile.
self.ftp.retrbinary(
'RETR %s' % self.filename, self.tfile.write)
# Get back to start of file, so it would be possible to
# read it.
self.tfile.seek(0)
return open(self.tfile.name, 'r')
def __init__(self, ftp, filename):
self.ftp = ftp
self.filename = filename
self.tfile = None
def __enter__(self):
return self._open_tempfile()
def __exit__(self, exception_type, exception_value, traceback):
# Remove temporary file.
self.tfile.close()
class OpenWrite(object):
def __init__(self, ftp, filename):
self.ftp = ftp
self.filename = filename
self.data = ''
def __enter__(self):
return self
def __exit__(self, exception_type, exception_value, traceback):
bio = io.BytesIO()
if isinstance(self.data, six.string_types):
self.data = self.data.encode()
bio.write(self.data)
bio.seek(0)
res = self.ftp.storbinary('STOR %s' % self.filename, bio)
bio.close()
return res
def write(self, data):
self.data += data
def open(ftp, filename, mode='r'):
"""Open a file on FTP server."""
if mode == 'r':
return OpenRead(ftp, filename)
if mode == 'w':
return OpenWrite(ftp, filename)
P.S. this might not work properly without context manager, but for now it is OK solution to me. If anyone has better implementation, they are more than welcome to share it.
Update
Decided to use ftputil package instead of standard ftplib. So all this hacking is not needed, because ftputil takes care of it and it actually uses many same named methods as paramiko, that do same thing, so it is much easier to unify protocols usage.
Related
The goal is to create python2.7 and >=python3.6 compatible code.
This code currently works on python2.7. It creates a GzipFile object and later writes lists to the gzip file. It lastly uploads the gzip file to an s3 bucket.
Example Data: [[1, 2, 3], [4, 5, 6], ["a", 3, "iamastring"]]
def get_gzip_writer(path):
with s3_reader.open(path) as s3_file:
with gzip.GzipFile(fileobj=s3_file, mode="w") as gzip_file:
yield csv.writer(gzip_file)
However, this code does not work on python3 due to csv giving str whereas gzip expects bytes. It's important to keep gzip in bytes due to how it's used/read later on. That means using io.TextIOWrapper does not work in this specific use case.
I have tried to create an adapter class.
class BytesToBytes(object):
def __init__(self, stream, dialect, encoding, **kwargs):
self.temp = six.StringIO()
self.writer = csv.writer(self.temp, dialect, **kwargs)
self.stream = stream
self.encoding = encoding
def writerow(self, row):
self.writer.writerow([s.decode('utf-8') if hasattr(s, 'decode') else s for s in row])
self.stream.write(six.ensure_binary(self.temp.getvalue(), encoding))
self.temp.seek(0)
self.temp.truncate(0)
With the updated code looking like:
def get_gzip_writer(path):
with s3_reader.open(path) as s3_file:
with gzip.GzipFile(fileobj=s3_file, mode="w") as gzip_file:
yield BytesToBytes(gzip_file)
This works, but it seems excessive to have a full class for the purpose of this singular use case.
This is the code that calls the above:
def write_data(data, url):
with get_gzip_writer(url) as writer:
for row in data:
writer.writerow(row)
return url
What options are available for working with GzipFile (while maintaining bytes for read/write) without creating an entire adapter class?
I've read and considered your concern w/keeping the GZip file in binary mode, and I think you can still use TextIOWrapper. My understanding is that its job is to provide an interface for writing bytes from text (my emphasis):
A buffered text stream providing higher-level access to a BufferedIOBase buffered binary stream.
I interpret that as "text in, bytes out"... which is what your GZip application needs, right? If so, then for Python3 we need to give the CSV writer something that accepts strings but ultimately writes bytes.
Enter TextIOWrapper with a UTF-8 encoding, accepting strings from csv.writer's writerow/s() methods and writing UTF-8-encoded bytes to gzip_file.
I've run this in Python2 and 3, and unzipped the file and it looks good:
import csv, gzip, io, six
def get_gzip_writer(path):
with open(path, 'wb') as s3_file:
with gzip.GzipFile(fileobj=s3_file, mode='wb') as gzip_file:
if six.PY3:
with io.TextIOWrapper(gzip_file, encoding='utf-8') as wrapper:
yield csv.writer(wrapper)
elif six.PY2:
yield csv.writer(gzip_file)
else:
raise ValueError('Neither Python2 or 3?!')
data = [[1,2,3],['a','b','c']]
url = 'output.gz'
for writer in get_gzip_writer(url):
for row in data:
writer.writerow(row)
I have a Matlab application that writes in to a .csv file and a Python script that reads from it. These operations happen concurrently and at their own respective periods (not necessarily the same). All of this runs on Windows 7.
I wish to know :
Would the OS inherently provide some sort of locking mechanism so that only one of the two applications - Matlab or Python - have access to the shared file?
In the Python application, how do I check if the file is already "open"ed by Matlab application? What's the loop structure for this so that the Python application is blocked until it gets access to read the file?
I am not sure about window's API for locking files
Heres a possible solution:
While matlab has the file open, you create an empty file called "data.lock" or something to that effect.
When python tries to read the file, it will check for the lock file, and if it is there, then it will sleep for a given interval.
When matlab is done with the file, it can delete the "data.lock" file.
Its a programmatic solution, but it is simpler than digging through the windows api and finding the right calls in matlab and python.
If Python is only reading the file, I believe you have to lock it in MATLAB because a read-only open call from Python may not fail. I am not sure how to accomplish that, you may want to read this question atomically creating a file lock in MATLAB (file mutex)
However, if you are simply consuming the data with python, did you consider using a socket instead of a file?
In Windows on the Python side, CreateFile can be called (directly or indirectly via the CRT) with a specific sharing mode. For example, if the desired sharing mode is FILE_SHARE_READ, then the open will fail if the file is already open for writing. If the latter call instead succeeds, then a future attempt to open the file for writing will fail (e.g. in Matlab).
The Windows CRT function _wsopen_s allows setting the sharing mode. You can call it with ctypes in a Python 3 opener:
import sys
import os
import ctypes as ctypes
import ctypes.util
__all__ = ['shdeny', 'shdeny_write', 'shdeny_read']
_SH_DENYRW = 0x10 # deny read/write mode
_SH_DENYWR = 0x20 # deny write mode
_SH_DENYRD = 0x30 # deny read
_S_IWRITE = 0x0080 # for O_CREAT, a new file is not readonly
if sys.version_info[:2] < (3,5):
_wsopen_s = ctypes.CDLL(ctypes.util.find_library('c'))._wsopen_s
else:
# find_library('c') may be deprecated on Windows in 3.5, if the
# universal CRT removes named exports. The following probably
# isn't future proof; I don't know how the '-l1-1-0' suffix
# should be handled.
_wsopen_s = ctypes.CDLL('api-ms-win-crt-stdio-l1-1-0')._wsopen_s
_wsopen_s.argtypes = (ctypes.POINTER(ctypes.c_int), # pfh
ctypes.c_wchar_p, # filename
ctypes.c_int, # oflag
ctypes.c_int, # shflag
ctypes.c_int) # pmode
def shdeny(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYRW, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
def shdeny_write(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYWR, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
def shdeny_read(file, flags):
fh = ctypes.c_int()
err = _wsopen_s(ctypes.byref(fh),
file, flags, _SH_DENYRD, _S_IWRITE)
if err:
raise IOError(err, os.strerror(err), file)
return fh.value
For example:
if __name__ == '__main__':
import tempfile
filename = tempfile.mktemp()
fw = open(filename, 'w')
fw.write('spam')
fw.flush()
fr = open(filename)
assert fr.read() == 'spam'
try:
f = open(filename, opener=shdeny_write)
except PermissionError:
fw.close()
with open(filename, opener=shdeny_write) as f:
assert f.read() == 'spam'
try:
f = open(filename, opener=shdeny_read)
except PermissionError:
fr.close()
with open(filename, opener=shdeny_read) as f:
assert f.read() == 'spam'
with open(filename, opener=shdeny) as f:
assert f.read() == 'spam'
os.remove(filename)
In Python 2 you'll have to combine the above openers with os.fdopen, e.g.:
f = os.fdopen(shdeny_write(filename, os.O_RDONLY|os.O_TEXT), 'r')
Or define an sopen wrapper that lets you pass the share mode explicitly and calls os.fdopen to return a Python 2 file. This will require a bit more work to get the file mode from the passed in flags, or vice versa.
I'm trying to retrieve a zip folder(s) from an ftp site and save them to my local machine, using python (ideally I'd like to specify where they are saved on my C:).
The code below connects to the FTP site and then *something happens in the PyScripter window that looks like random characters for about 1000 lines... but nothing actually gets downloaded to my hard drive.
Any tips?
import ftplib
import sys
def gettext(ftp, filename, outfile=None):
# fetch a text file
if outfile is None:
outfile = sys.stdout
# use a lambda to add newlines to the lines read from the server
ftp.retrlines("RETR " + filename, lambda s, w=outfile.write: w(s+"\n"))
def getbinary(ftp, filename, outfile=None):
# fetch a binary file
if outfile is None:
outfile = sys.stdout
ftp.retrbinary("RETR " + filename, outfile.write)
ftp = ftplib.FTP("FTP IP Address")
ftp.login("username", "password")
ftp.cwd("/MCPA")
#gettext(ftp, "subbdy.zip")
getbinary(ftp, "subbdy.zip")
Well, it seems that you simply forgot to open the file you want to write into.
Something like:
getbinary(ftp, "subbdy.zip", open(r'C:\Path\to\subbdy.zip', 'wb'))
The computer is toying with me, I know it!
I am creating a zip folder in Python. The individual files are generated in memory and then the whole thing is zipped and saved to a file. I am allowed to add 9 files to the zip. I am allowed to add 11 files to the zip. But 10, no, not 10 files. The zip file IS saved to my computer, but I'm not allowed to open it; Windows says that the compressed zipped folder is invalid.
I use the code below, which I got from another stackoverflow question. It appends 10 files and saves the zipped folder. When I click on the folder, I cannot extract it. BUT, remove one of the appends() and it's fine. Or, add another append and it works!
What am I missing here? How can I make this work every time?
imz = InMemoryZip()
imz.append("1a.txt", "a").append("2a.txt", "a").append("3a.txt", "a").append("4a.txt", "a").append("5a.txt", "a").append("6a.txt", "a").append("7a.txt", "a").append("8a.txt", "a").append("9a.txt", "a").append("10a.txt", "a")
imz.writetofile("C:/path/test.zip")
import zipfile
import StringIO
class InMemoryZip(object):
def __init__(self):
# Create the in-memory file-like object
self.in_memory_zip = StringIO.StringIO()
def append(self, filename_in_zip, file_contents):
'''Appends a file with name filename_in_zip and contents of
file_contents to the in-memory zip.'''
# Get a handle to the in-memory zip in append mode
zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)
# Write the file to the in-memory zip
zf.writestr(filename_in_zip, file_contents)
# Mark the files as having been created on Windows so that
# Unix permissions are not inferred as 0000
for zfile in zf.filelist:
zfile.create_system = 0
return self
def read(self):
'''Returns a string with the contents of the in-memory zip.'''
self.in_memory_zip.seek(0)
return self.in_memory_zip.read()
def writetofile(self, filename):
'''Writes the in-memory zip to a file.'''
f = file(filename, "w")
f.write(self.read())
f.close()
You should use the 'wb' mode when creating the file you are saving to the file system. This will ensure that the file is written in binary.
Otherwise, any time a newline (\n) character happens to be encountered in the zip file python will replace it to match the windows line ending (\r\n). The reason 10 files is a problem is that 10 happens to be the code for \n.
So your write function should look like this:
def writetofile(self, filename):
'''Writes the in-memory zip to a file.'''
f = file(filename, 'wb')
f.write(self.read())
f.close()
This should fix your problem and work for the files in your example. Although, in your case you might find it easier to write the zip file directly to the file system like this code which includes some of the comments from above:
import StringIO
import zipfile
class ZipCreator:
buffer = None
def __init__(self, fileName=None):
if fileName:
self.zipFile = zipfile.ZipFile(fileName, 'w', zipfile.ZIP_DEFLATED, False)
return
self.buffer = StringIO.StringIO()
self.zipFile = zipfile.ZipFile(self.buffer, 'w', zipfile.ZIP_DEFLATED, False)
def addToZipFromFileSystem(self, filePath, filenameInZip):
self.zipFile.write(filePath, filenameInZip)
def addToZipFromMemory(self, filenameInZip, fileContents):
self.zipFile.writestr(filenameInZip, fileContents)
for zipFile in self.zipFile.filelist:
zipFile.create_system = 0
def write(self, fileName):
if not self.buffer: # If the buffer was not initialized the file is written by the ZipFile
self.zipFile.close()
return
f = file(fileName, 'wb')
f.write(self.buffer.getvalue())
f.close()
# Use File Handle
zipCreator = ZipCreator('C:/path/test.zip')
# Use Memory Buffer
# zipCreator = ZipCreator()
for i in range(1, 10):
zipCreator.addToZipFromMemory('test/%sa.txt' % i, 'a')
zipCreator.write('C:/path/test.zip')
Ideally, you would probably use separate classes for an in-memory zip and a zip that is tied to the file system from the beginning. I have also seem some issues with the in-memory zip when folders are added which are difficult to recreate and which I am still trying to track down.
I need to download a zip archive of text files, dispatch each text file in the archive to other handlers for processing, and finally write the unzipped text file to disk.
I have the following code. It uses multiple open/close on the same file, which does not seem elegant. How do I make it more elegant and efficient?
zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
logfile = unzipped.open(f_info)
handler1(logfile)
logfile.close() ## Cannot seek(0). The file like obj does not support seek()
logfile = unzipped.open(f_info)
handler2(logfile)
logfile.close()
unzipped.extract(f_info)
Your answer is in your example code. Just use StringIO to buffer the logfile:
zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
logfile = unzipped.open(f_info)
# Here's where we buffer:
logbuffer = cStringIO.StringIO(logfile.read())
logfile.close()
for handler in [handler1, handler2]:
handler(logbuffer)
# StringIO objects support seek():
logbuffer.seek(0)
unzipped.extract(f_info)
You could say something like:
handler_dispatch(logfile)
and
def handler_dispatch(file):
for line in file:
handler1(line)
handler2(line)
or even make it more dynamic by constructing a Handler class with multiple handlerN functions, and applying each of them inside handler_dispatch. Like
class Handler:
def __init__(self:)
self.handlers = []
def add_handler(handler):
self.handlers.append(handler)
def handler_dispatch(self, file):
for line in file:
for handler in self.handlers:
handler.handle(line)
Open the zip file once, loop through all the names, extract the file for each name and process it, then write it to disk.
Like so:
for f_info in unzipped.info_list():
file = unzipped.open(f_info)
data = file.read()
# If you need a file like object, wrap it in a cStringIO
fobj = cStringIO.StringIO(data)
handler1(fobj)
handler2(fobj)
with open(filename,"w") as fp:
fp.write(data)
You get the idea