I am trying to create a script which will download a ZIP-file and extract it.
I am using Python 2.7 on Windows Server 2016.
I created a download script looking like this:
ftp = FTP()
ftp.connect("***")
ftp.login("***","***")
ftp.cwd(ftppath)
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
ftp.quit()
And a zip extraction script:
zip_ref = zipfile.ZipFile(tempfile, 'r')
zip_ref.extractall(localpath)
zip_ref.close()
These work independently. Meaning: If i run the extraction script on my test ZIP-file it will extract the file. Also if i run the FTP script from my server, it will download the file.
However! If i run the scripts together, meaning i download the file from my FTP server and then extract it, it will return an error: "file is not a Zip file".
Anyone who knows why this happens?
I have checked the following:
Correct folder
Downloading the zip-file, extracting it and recompressing it (then the script will extract it)
EDIT
I have been reading about IO bytes and the like, however without any luck on implementing it.
probably because of this bad practice one-liner:
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
open(tempfile, 'wb').write doesn't give any guarantee as to when the file is closed. You don't store the handle returned by open anywhere so you cannot decide when to close the file (and ensure full disk write).
So the last part of the file could just be not written to disk yet when trying to open it in read mode. And chaining download + unzip can trigger the bug (when 2 separate executions leave the time to flush & close the file)
Better use a context manager like this:
with open(tempfile, 'wb') as f:
ftp.retrbinary("RETR " + filename ,f.write)
so the file is flushed & closed when exiting the with block (of course, perform the file read operations outside this block).
Related
I've encountered this strange problem with opening/closing files in python. I am trying to do the same thing in python that i was doing successfully in matlab, and i am getting a problem communicating with some software through text files. I've come up with a strange workaround to solve the problem, but i dont understand why it works.
I have software that communicates with a some lab equipment. To communicate with this software, i write a file ('wavefile.txt') to a specific folder, containing parameters to send to the device. I then write another file named 'request.txt' containing the location of this first file ('wavefile.txt') which contains the parameters to send to the device. The software is constantly checking this folder to find the file named 'request.txt' and once it finds it, it will read the parameters in the file which is specified by the text in 'request.txt' and then delete 'request.txt'. The software/equipment developer instructs to give a 50 ms second delay before closing the 'request.txt' file.
original matlab code that works:
home = cd;
cd \\CREOL-FAST-01\data
fileID = fopen('request.txt', 'wt');
proj = 'C:\\dazzler\\data\\wavefile.txt';
fprintf(fileID, proj);
pause(0.05);
fclose('all');
cd(home);
original python code that does not work:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Every time the device program reads the 'request.txt' when its working with matlab, it deletes it immediately after matlab closes it. When i run that code with python it works SOMETIMES, maybe 1 in every 5 tries will be successful and the parameters are sent. The 'request.txt' file is always deleted with the python code above,but the parameters i've input are clearly not sent to my lab device. My guess is that when i write the file in python, the device program is able to read it before python writes the text to it, so its just opening the blank file, not applying any parameters, and then deleting it.
My workaround in python:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
fileh = open('request.txt', 'w+')
proj = r'C:\dazzler\data\wavefile.txt'
fileh.write(proj)
time.sleep(0.05)
print(fileh.read())
time.sleep(0.05)
fileh.close()
This method in python seems to work 100% of the time. I open the file in w+ mode, and using fileh.read() is absolutely necessary. if i delete that line and still include the extra sleeptime, it will again work about 1 in 5 tries. This seems really strange to me. Any explanation, or better solutions?
My guess (which could be wrong) is that the file is being read before it is completely flushed. I would try using the flush() method after the write to make sure that the complete data is written to the file. You might also need the os.fsync() method to make sure the data is flushed properly. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
file.flush()
os.fsync()
time.sleep(0.05)
os.chdir(home)
Not knowing any details about the particular equipment and other software you are using it's hard to say. One guess is the difference in buffering on write calls.
From this blog post on Matlab's fwrite: "The default behavior for fprintf and fwrite is to flush the file buffer after each call to either of these functions"
Whereas for Python's open:
When no buffering argument is given, the default buffering policy
works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying
device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On
many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use
line buffering. Other text files use the policy described above for
binary files.
To test this guess change:
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
to:
with open('request.txt', 'w', buffer=1) as file:
proj = 'C:\\dazzler\\data\\wavefile.txt\n'
The problem is probably that you are doing the delay while the file is still open and thus not written to disk. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
The only difference here is that the sleep is done after the file is closed (the file is closed when the with block ends), and thus the delay doesn't happen until after the text is written to disk.
To put it in words, what you are doing is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Wait 50 ms
Close (and write) the file
What you want to do is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Close (and write) the file
Wait 50 ms
So what you end up with is a period of at least 50 ms where the text file has been created, but where there is nothing in it because the text is sitting in your computer memory not on disk.
To be honest, I would put as little in the with block as possible, to avoid issues like this. So I would write it like so:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
proj = r'C:\dazzler\data\wavefile.txt'
with open('request.txt', 'w') as file:
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Also keep in mind that you also can't do the opposite: assume that no text is written until you close. For small files like this, that will probably happen. But when the file is written to disk depends on a lot of factors. When a file is closed, it is written, but it may be written before that too.
I am trying to automate the download of multiple files from an ftp source. These will span multiple years, dates, and from multiple sites that collected the data. Right now, I'm trying to make the basic download work. I can download a single file, but multiple files fail. I know when doing it manually, we would get to the directory, then
$>prompt
$>mget *.*
I have the following code as a first run at this...
import ftplib, subprocess
session = ftplib.FTP(host,user,password)
session.cwd(path)
subprocess.call("prompt")
files = session.nlst()
for f in files:
print f
session.retrbinary(("RETR" + f), open(f, 'wb').write)
session.quit()
Without the subprocess.call, the code pulls the first file, then errors out saying "command not understood." My assumption is that this is the box promptingg, since it does that if being downloaded manually. That's why I'm assuming I need the subprocess.call("prompt") command in there, as I would if handling this manually. However, when I have the subprocess added, it gives me an error that "The system cannot find the file specified" so that doesn't work, either. This error comes out of the subprocess.py module.
I guess I should post this here. Thank you to Greg Hewgill in the comments for the answer. I just needed a space after "Retr" in the line
session.retrbinary(("RETR " + f), open(f, 'wb').write)
I'm downloading files in Python using ftplib and up until recently everything seemed to be working fine. I am downloading files as such:
ftpSession = ftplib.FTP(host,username,password)
ftpSession.cwd('rlmfiles')
ftpFileList = filter(lambda x: 'PEDI' in x, ftpSession.nlst())
ftpFileList.sort()
for f in ftpFileList:
tempFile = open(os.path.join(localDirectory,f),'wb')
ftpSession.retrbinary('RETR '+f,tempFile.write)
tempFile.close()
ftpSession.quit()
sys.exit(0)
Up until recently it was downloading the files I needed just fine, as expected. Now, however, My files I'm downloading are corrupted and just contain long strings of garbage ASCII. I know that it is not the files posted onto the FTP I'm pulling them from because I also have a Perl script that does this successfully from the same FTP.
If it is any additional info, here's what the debugger puts out in the command prompt when downloading a file:
Has anyone encountered any issues with corrupted file contents using retrbinary() in Python's ftplib?
I'm really stuck/frustrated and haven't come across anything related to possible corruption here. Any help is appreciated.
I just ran into this issue yesterday when I was attempting to download text files. Not sure if that is what you were doing, but since you say it has ASCII garbage in it, I assume you opened it in a text editor because it was supposed to be text.
If this is the case, the problem is that the file is a text file and you are trying to download it in binary mode.
What you want to do instead is retrieve the file in ASCII transfer mode.
tempFile = open(os.path.join(localDirectory,f),'w') # Changed 'wb' to 'w'
ftpSession.retrlines('RETR '+f,tempFile.write) # Changed retrbinary to retrlines
Unfortunately, this strips all the new-line characters out of the file. Yuck!
So then you need to add the stripped out new-line characters again:
tempFile = open(os.path.join(localDirectory,f),'w')
textLines = []
ftpSession.retrlines('RETR '+f,textLines.append)
tempFile.write('\n'.join(textLines))
This should work, but it doesn't look as nice as it could. So a little cleanup effort would get us:
temporaryFile = open(os.path.join(localDirectory, currentFile), 'w')
textLines = []
retrieveCommand = 'RETR '
ftpSession.retrlines(retrieveCommand + currentFile, textLines.append)
temporaryFile.write('\n'.join(textLines))
I'm building a software to run on a Bluehost server, and off the server it writes and reads files fine, but on the server it just builds empty files and won't read either, this is a sample of how my code works:
File = open(os.getcwd() + '/file.dat', 'wb')
File.write('data')
File.close()
It works of the server fine, but not on the server. It creates the file but won't write data to it. What could be the problem?
Are you checking the right file?
Try this and see if the print out is what you expect.
file_path = os.path.join(os.path.dirname(os.getcwd()), '/file.dat')
print file_path
try:
directoryListing = os.listdir(inputDirectory)
#other code goes here, it iterates through the list of files in the directory
except WindowsError as winErr:
print("Directory error: " + str((winErr)))
This works fine, and I have tested that it doesnt choke and die when the directory doesn't exist, but I was reading in a Python book that I should be using "with" when opening files. Is there a preferred way to do what I am doing?
You are perfectly fine. The os.listdir function does not open files, so ultimately you are alright. You would use the with statement when reading a text file or similar.
an example of a with statement:
with open('yourtextfile.txt') as file: #this is like file=open('yourtextfile.txt')
lines=file.readlines() #read all the lines in the file
#when the code executed in the with statement is done, the file is automatically closed, which is why most people use this (no need for .close()).
What you are doing is fine. With is indeed the preferred way for opening files, but listdir is perfectly acceptable for just reading the directory.