I am trying to automate the download of multiple files from an ftp source. These will span multiple years, dates, and from multiple sites that collected the data. Right now, I'm trying to make the basic download work. I can download a single file, but multiple files fail. I know when doing it manually, we would get to the directory, then
$>prompt
$>mget *.*
I have the following code as a first run at this...
import ftplib, subprocess
session = ftplib.FTP(host,user,password)
session.cwd(path)
subprocess.call("prompt")
files = session.nlst()
for f in files:
print f
session.retrbinary(("RETR" + f), open(f, 'wb').write)
session.quit()
Without the subprocess.call, the code pulls the first file, then errors out saying "command not understood." My assumption is that this is the box promptingg, since it does that if being downloaded manually. That's why I'm assuming I need the subprocess.call("prompt") command in there, as I would if handling this manually. However, when I have the subprocess added, it gives me an error that "The system cannot find the file specified" so that doesn't work, either. This error comes out of the subprocess.py module.
I guess I should post this here. Thank you to Greg Hewgill in the comments for the answer. I just needed a space after "Retr" in the line
session.retrbinary(("RETR " + f), open(f, 'wb').write)
Related
I am trying to create a script which will download a ZIP-file and extract it.
I am using Python 2.7 on Windows Server 2016.
I created a download script looking like this:
ftp = FTP()
ftp.connect("***")
ftp.login("***","***")
ftp.cwd(ftppath)
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
ftp.quit()
And a zip extraction script:
zip_ref = zipfile.ZipFile(tempfile, 'r')
zip_ref.extractall(localpath)
zip_ref.close()
These work independently. Meaning: If i run the extraction script on my test ZIP-file it will extract the file. Also if i run the FTP script from my server, it will download the file.
However! If i run the scripts together, meaning i download the file from my FTP server and then extract it, it will return an error: "file is not a Zip file".
Anyone who knows why this happens?
I have checked the following:
Correct folder
Downloading the zip-file, extracting it and recompressing it (then the script will extract it)
EDIT
I have been reading about IO bytes and the like, however without any luck on implementing it.
probably because of this bad practice one-liner:
ftp.retrbinary("RETR " + filename ,open(tempfile, 'wb').write)
open(tempfile, 'wb').write doesn't give any guarantee as to when the file is closed. You don't store the handle returned by open anywhere so you cannot decide when to close the file (and ensure full disk write).
So the last part of the file could just be not written to disk yet when trying to open it in read mode. And chaining download + unzip can trigger the bug (when 2 separate executions leave the time to flush & close the file)
Better use a context manager like this:
with open(tempfile, 'wb') as f:
ftp.retrbinary("RETR " + filename ,f.write)
so the file is flushed & closed when exiting the with block (of course, perform the file read operations outside this block).
I have to run my python script on windows too, and then it began the problems.
Here I'm scraping html locally saved files, and then saving their .csv versions with the data I want. I ran it on my ubuntu and goes for +100k files with no problems. But when I go on windows, it says:
IOError: [Errno 13] Permission denied
It is not a permissions problems, I've rechecked it, and run it under 'Administration' powers, and it makes no difference.
It breaks exactly on the line where I open the file:
with open(of, 'w') as output:
...
I've tried to create same first file of the 100k from the python console and from a new blank stupid script from same directory as my code, and it works...
So, it seems is doable.
Then I've tried with output = open(of, 'w') instead of above code but nothing.
The weird thing is that it creates a directory with same name as the file, and then breaks with the IOError.
I've started thinking that it could be a csv thing..., naaaeehh, apart from other tries that didn't helped me, the most interesting stuff is that with the following code:
with open(of+.txt, 'w') as output:
...
it happens the astonishing thing that it creates a directory ending on .csv AND a file ending in .csv.txt with the right data!
Aargh!
Changing the open mode file to 'w+', 'wb', it didn't make a difference either.
Any ideas?
You can get permission denied if the file is opened up in another application.
Follow this link to see if any other process is using it: http://www.techsupportalert.com/content/how-find-out-which-windows-process-using-file.htm
Otherwise, I would say to try to open the file for read instead of write to see if it allows you to access it at all.
-Brian
Damn it, it's already working!, it has been like saying i cannot find my glasses and to have them on.
THanks Brian, it wasn't that the error. The problem was that in my code i was dealing with ubuntu separator besides the full path to the csv output file was completely correct. But I replaced it with os.sep , and started working like a charm :)
Thanks again!
I'm trying to unzip a file from an FTP site. I've tried it using 7z in a subprocess as well as using 7z in the older os.system format. I get closest however when I'm using the zipfile module in python so I've decided to stick with that. No matter how I edit this I seem to get one of two errors so here are both of them so y'all can see where I'm banging my head against the wall:
z = zipfile.ZipFile(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
z.extractall()
NotImplementedError: compression type 6 (implode)
(I think this one is totally wrong, but figured I'd include.)
I seem to get the closest with the following:
z = zipfile.ZipFile(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
z.extractall(r"\\svr-dc\ftp site\%s\daily\data1.zip" % item)
IOError: [Errno 2] No such file or directory: '\\\\svr-dc...'
The catch with this is that it is actually giving me the first file name in the zip. I can see the file AJ07242013.PRN at the end of the error so I feel closer because it's at least getting to the point of reading the contents of the zip file.
Pretty much any iteration of this that I try gets me one of those two errors, or a syntax error but that's easily addressed and not my primary concern.
Sorry for being so long winded. I'd love to get this working, so let me know what you think I need to do.
EDIT:
So 7z has finally been added to the path and is running through without any errors with both the subprocess as well as os.system. However, I still can't seem to get anything to unpack. It looks to me, from all I've read in the python documentation that I should be using the subprocess.communicate() module to extract this file but it just won't unpack. When I use os.system it keeps telling me that it cannot find the archive.
import subprocess
cmd = ['7z', 'e']
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)
sp.communicate('r"\C:\Users\boster\Desktop\Data1.zip"')
I don't think that sp.communicate is right but if I add anything else to it I have too many arguments.
python's zipfile doesn't support compression type 6 (imploded) so its simply not going to work. In the first case, that's obvious from the error. In the second case, things are worse. The parameter for extractfile is an alternate unzip directory. Since you gave it the name of your zip file, a directory of the same name can't be found and zipfile gives up before getting to the not-supported problem.
Make sure you can do this with 7z on the command line, try implementing subprocess again and ask for help on that technique if you need it.
Here's a script that will look for 7z in the usual places:
import os
import sys
import subprocess
from glob import glob
print 'python version:', sys.version
subprocess.call('ver', shell=True)
print
if os.path.exists(r'C:\Program Files\7-Zip'):
print 'have standard 7z install'
if '7-zip' in os.environ['PATH'].lower():
print '...and its in the path'
else:
print '...but its not in the path'
print
print 'find in path...'
found = 0
for p in os.environ['PATH'].split(os.path.pathsep):
candidate = os.path.join(p, '7z.*')
for fn in glob(candidate):
print ' found', fn
found += 1
print
if found:
print '7z located, attempt run'
subprocess.call(['7z'])
else:
print '7z not found'
Accoring to the ZipFile documentation, you might be better off copying the zip first to your working directory. (http://docs.python.org/2/library/zipfile#zipfile.ZipFile.extract)
If you have problems copying, you might want to store the zip in a path with no spaces or protect your code against spaces by using os.path.
I made a small test in which I used os.path.abspath to make sure I had the proper path to my zip and it worked properly.
Also make sure that for extractall the path that you specify is the path where the zip content will be extracted. (If a folder that is specified is not created, it will be created automatically) Your files will be extracted in your current working directory (CWD) if no parameter is passed to extractall.
Cheers!
Managed to get this to work without using the PIPE functionality as subprocess.communicate wouldn't unpack the files. Here was the solution using subprocess.call. Hope this can help someone in the future.
def extract_data_one():
for item in sites:
os.chdir(r"\\svr-dc\ftp site\%s\Daily" % item)
subprocess.call(['7z', 'e', 'data1.zip', '*.*'])
I'm downloading files in Python using ftplib and up until recently everything seemed to be working fine. I am downloading files as such:
ftpSession = ftplib.FTP(host,username,password)
ftpSession.cwd('rlmfiles')
ftpFileList = filter(lambda x: 'PEDI' in x, ftpSession.nlst())
ftpFileList.sort()
for f in ftpFileList:
tempFile = open(os.path.join(localDirectory,f),'wb')
ftpSession.retrbinary('RETR '+f,tempFile.write)
tempFile.close()
ftpSession.quit()
sys.exit(0)
Up until recently it was downloading the files I needed just fine, as expected. Now, however, My files I'm downloading are corrupted and just contain long strings of garbage ASCII. I know that it is not the files posted onto the FTP I'm pulling them from because I also have a Perl script that does this successfully from the same FTP.
If it is any additional info, here's what the debugger puts out in the command prompt when downloading a file:
Has anyone encountered any issues with corrupted file contents using retrbinary() in Python's ftplib?
I'm really stuck/frustrated and haven't come across anything related to possible corruption here. Any help is appreciated.
I just ran into this issue yesterday when I was attempting to download text files. Not sure if that is what you were doing, but since you say it has ASCII garbage in it, I assume you opened it in a text editor because it was supposed to be text.
If this is the case, the problem is that the file is a text file and you are trying to download it in binary mode.
What you want to do instead is retrieve the file in ASCII transfer mode.
tempFile = open(os.path.join(localDirectory,f),'w') # Changed 'wb' to 'w'
ftpSession.retrlines('RETR '+f,tempFile.write) # Changed retrbinary to retrlines
Unfortunately, this strips all the new-line characters out of the file. Yuck!
So then you need to add the stripped out new-line characters again:
tempFile = open(os.path.join(localDirectory,f),'w')
textLines = []
ftpSession.retrlines('RETR '+f,textLines.append)
tempFile.write('\n'.join(textLines))
This should work, but it doesn't look as nice as it could. So a little cleanup effort would get us:
temporaryFile = open(os.path.join(localDirectory, currentFile), 'w')
textLines = []
retrieveCommand = 'RETR '
ftpSession.retrlines(retrieveCommand + currentFile, textLines.append)
temporaryFile.write('\n'.join(textLines))
try:
directoryListing = os.listdir(inputDirectory)
#other code goes here, it iterates through the list of files in the directory
except WindowsError as winErr:
print("Directory error: " + str((winErr)))
This works fine, and I have tested that it doesnt choke and die when the directory doesn't exist, but I was reading in a Python book that I should be using "with" when opening files. Is there a preferred way to do what I am doing?
You are perfectly fine. The os.listdir function does not open files, so ultimately you are alright. You would use the with statement when reading a text file or similar.
an example of a with statement:
with open('yourtextfile.txt') as file: #this is like file=open('yourtextfile.txt')
lines=file.readlines() #read all the lines in the file
#when the code executed in the with statement is done, the file is automatically closed, which is why most people use this (no need for .close()).
What you are doing is fine. With is indeed the preferred way for opening files, but listdir is perfectly acceptable for just reading the directory.