Python FTPlib is creating files with unix clrf instead of windows cldf - python

I'm on a Windows PC and I'm trying to download files from an FTP. The files download fine, but the only issue when I open them up in Notepad is that it's displayed with a Unix (LF). I've tried a couple of different fixes to be able to get it to be a Windows (CRLF), but nothing is working. The file is a UTF-16-LE encoded file.
Here are two sources I looked at two fix this, but nothing:
How to correctly download files using ftplib so line breaks are added for windows
https://effbot.org/librarybook/ftplib.htm
My code is currently as follows:
def downloadFiles(self, files, localFolder):
with FTP(host=self.host, user=self.username, passwd=self.password) as ftp:
ftp.cwd(self.root)
for file in files:
with open(os.path.join(localFolder, file.fileName), 'w', newline=None) as f:
ftp.retrlines(f'RETR {file.fileName}', lambda line, file=f: file.write(line+'\n'))
I've tried the line+'\r\n, but it just adds an extra line space instead.
Anyone have any ideas of how to fix this?

If anyone has the issue in the future with a utf-16 file you just need to set the ftp encoding to utf-16. I was looking for an encoding option at the file level, but apparently you need to set it at the connection level.
with FTP(host=self.host, user=self.username, passwd=self.password) as ftp:
ftp.encoding = 'utf-16'
ftp.cwd(self.root)
for file in files:
with open(os.path.join(localFolder, file.fileName), 'w', encoding='utf-16') as f:
ftp.retrlines(f'RETR {file.fileName}', lambda line: f.write(line + '\n'))

There is a simple command line utility unix2dos.
You can use use unix2dos utility on the files after ftp.
Also if you are text editing the file, use Notepad++ .
With Notepad++ you can manage the file's newline format and its encoding as well.

Related

Uploading JSON String as file to Azure File Storage - linebreaks disappear

I have a list of dicts in Python and want to upload it as a json-file to Azure File Storage. When I print the list locally the linebreaks exist. After uploading and manually checking the file on Azure File Storage I noticed that the linebreaks were non existent.
list_of_dicts = my_json_dicts
transformed_dict_str = '\n'.join([json.dumps(x) for x in list_of_dicts])
# print(transformed_dict_str) gives me the "dicts"/lines separated by linebreaks.
service.create_file_from_text(share_name, file_path, file_name.json, transformed_dict_str, encoding='utf-8')
Can anyone tell me why the uploaded file (when i open it in notepad after downloading manually via the browser interface of Azure) does not contain any linebreaks?
Edit:
When I write the string to a local path with the following code, the linebreaks still exist. So it must happen during the create_file_from_text function?
file = open("myjson.json", "w")
file.write(transformed_dict_str)
file.close()
Please use '\r\n' instead of '\n' in your code.
I can reproduce your issue when use '\n', but works fine using '\r\n' (in notepad, there is linebreaks).

Zip-file not recognized when downloaded through FTP

I am trying to create a script which will download a ZIP-file and extract it.
I am using Python 2.7 on Windows Server 2016.
I created a download script looking like this:
ftp = FTP()
ftp.connect("***")
ftp.login("***","***")
ftp.cwd(ftppath)
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
ftp.quit()
And a zip extraction script:
zip_ref = zipfile.ZipFile(tempfile, 'r')
zip_ref.extractall(localpath)
zip_ref.close()
These work independently. Meaning: If i run the extraction script on my test ZIP-file it will extract the file. Also if i run the FTP script from my server, it will download the file.
However! If i run the scripts together, meaning i download the file from my FTP server and then extract it, it will return an error: "file is not a Zip file".
Anyone who knows why this happens?
I have checked the following:
Correct folder
Downloading the zip-file, extracting it and recompressing it (then the script will extract it)
EDIT
I have been reading about IO bytes and the like, however without any luck on implementing it.
probably because of this bad practice one-liner:
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
open(tempfile, 'wb').write doesn't give any guarantee as to when the file is closed. You don't store the handle returned by open anywhere so you cannot decide when to close the file (and ensure full disk write).
So the last part of the file could just be not written to disk yet when trying to open it in read mode. And chaining download + unzip can trigger the bug (when 2 separate executions leave the time to flush & close the file)
Better use a context manager like this:
with open(tempfile, 'wb') as f:
ftp.retrbinary("RETR " + filename ,f.write)
so the file is flushed & closed when exiting the with block (of course, perform the file read operations outside this block).

Problems with cross platform tell / seek in Python

Having a weird bug with Python 2.7.3 file reading. If I do this sort of thing:
end_of_header = f.tell()
print f.readline()
f.seek(end_of_header)
print f.readline()
the results are different. The file was written in Linux / Mac (not sure) and I'm trying to run it on Windows 7. If I run it in Linux it works. I have tried opening the file with both 'b' and 'U' tags and its not working. I have tried various encodings by opening with the codecs module.
Is the readline() causing the problem?
Some context is that there is a header after which there are a long trajectory (can be in the GB range) I need to be able read the header and process it, then read the file one line at a time. I may need to go back to the start of the file (end of the header) at any time though.
As you say of Windows and Linux/Mac , I think you have
a problem of different newlines ( http://www.editpadpro.com/tricklinebreak.html )
used by the operating system in which the file was written and the one in which it is read.
And the problem arises because you opened the file in a not-binary mode.
Try to open the file in binary mode, that is to say with 'rb' or 'rb+' or 'ab' or 'ab+' according what you want to do.

Python ftplib Corrupting Files?

I'm downloading files in Python using ftplib and up until recently everything seemed to be working fine. I am downloading files as such:
ftpSession = ftplib.FTP(host,username,password)
ftpSession.cwd('rlmfiles')
ftpFileList = filter(lambda x: 'PEDI' in x, ftpSession.nlst())
ftpFileList.sort()
for f in ftpFileList:
tempFile = open(os.path.join(localDirectory,f),'wb')
ftpSession.retrbinary('RETR '+f,tempFile.write)
tempFile.close()
ftpSession.quit()
sys.exit(0)
Up until recently it was downloading the files I needed just fine, as expected. Now, however, My files I'm downloading are corrupted and just contain long strings of garbage ASCII. I know that it is not the files posted onto the FTP I'm pulling them from because I also have a Perl script that does this successfully from the same FTP.
If it is any additional info, here's what the debugger puts out in the command prompt when downloading a file:
Has anyone encountered any issues with corrupted file contents using retrbinary() in Python's ftplib?
I'm really stuck/frustrated and haven't come across anything related to possible corruption here. Any help is appreciated.
I just ran into this issue yesterday when I was attempting to download text files. Not sure if that is what you were doing, but since you say it has ASCII garbage in it, I assume you opened it in a text editor because it was supposed to be text.
If this is the case, the problem is that the file is a text file and you are trying to download it in binary mode.
What you want to do instead is retrieve the file in ASCII transfer mode.
tempFile = open(os.path.join(localDirectory,f),'w') # Changed 'wb' to 'w'
ftpSession.retrlines('RETR '+f,tempFile.write) # Changed retrbinary to retrlines
Unfortunately, this strips all the new-line characters out of the file. Yuck!
So then you need to add the stripped out new-line characters again:
tempFile = open(os.path.join(localDirectory,f),'w')
textLines = []
ftpSession.retrlines('RETR '+f,textLines.append)
tempFile.write('\n'.join(textLines))
This should work, but it doesn't look as nice as it could. So a little cleanup effort would get us:
temporaryFile = open(os.path.join(localDirectory, currentFile), 'w')
textLines = []
retrieveCommand = 'RETR '
ftpSession.retrlines(retrieveCommand + currentFile, textLines.append)
temporaryFile.write('\n'.join(textLines))

Downloading text files with Python and ftplib.FTP from z/os

I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.
Since the host files are EBCDIC, I can't simply use FTP.retrbinary().
FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.
So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.
Many thanks.
#!python.exe
from ftplib import FTP
class xfile (file):
def writelineswitheol(self, sequence):
for s in sequence:
self.write(s+"\r\n")
sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()
Update: Python 3.0, platform is MingW under Windows XP.
z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.
Closing update:
Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):
import ftplib
import os
from sys import exc_info
sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
sess.cwd("'ZLTALM.PREP.%s'" % dir)
try:
filelist = sess.nlst()
except ftplib.error_perm as x:
if (x.args[0][:3] != '550'):
raise
else:
try:
os.mkdir(dir)
except:
continue
for hostfile in filelist:
lines = []
sess.retrlines("RETR "+hostfile, lines.append)
pcfile = open("%s/%s"% (dir,hostfile), 'w')
for line in lines:
pcfile.write(line+"\n")
pcfile.close()
print ("Done: " + dir)
sess.quit()
My thanks to both John and Vinay
Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:
def writeline(line):
file.write(line + "\n")
file = open(filename, "w")
ftp.retrlines("retr " + filename, writeline)
You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):
file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()
Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.
Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.
Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.
Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.
[a few "sanitation" remarks]
You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.
To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)
Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.
Use retrlines of ftplib to download file from z/os, each line has no '\n'.
It's different from windows ftp command 'get xxx'.
We can rewrite the function 'retrlines' to 'retrlines_zos' in ftplib.py.
Just copy the whole code of retrlines, and chane the 'callback' line to:
...
callback(line + "\n")
...
I tested and it worked.
you want a lambda function and a callback. Like so:
def writeLineCallback(line, file):
file.write(line + "\n")
ftpcommand = "RETR {}{}{}".format("'",zOsFile,"'")
filename = "newfilename"
with open( filename, 'w' ) as file :
callback_lambda = lambda x: writeLineCallback(x,file)
ftp.retrlines(ftpcommand, callback_lambda)
This will download file 'zOsFile' and write it to 'newfilename'

Categories

Resources