Incomplete extraction of Zip files using Python

Incomplete extraction of Zip files using Python - python

I have a script where I download a bunch of Zip files (150+) from a website and unzip them. I just noticed that the Zip files weren't completely extracting - i.e., there should be 68 files in each directory and there are only 62. The script ran fine with no errors.
Any thoughts? I tried running one Zip file through by itself and it extracted fine. Could the operation be timing out or something? Please forgive my code, I'm new.
I'm running Python 2.7.
import csv, urllib, urllib2, zipfile
from datetime import date
dlList =[]
dloadUrlBase = r"https://websoilsurvey.sc.egov.usda.gov/DSD/Download/Cache/SSA/"
dloadLocBase = r"Z:/Shared/Corporate/Library/GIS_DATA/Soils/"
stateDirList =[]
countyDirList =[]
fileNameList=[]
unzipList =[]
extractLocList=[]
logfile = 'log_{}.txt'.format(date.today())
with open(r'N:\Shared\Service Areas\Geographic Information Systems\Tools and Scripts\Soil_Downloads\FinalListforDownloads.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
stateDirList.append(row['StateDir'])
countyDirList.append(row['CountyDir'])
fileNameList.append(row['File_Name'])
for state, county, fileName in zip(stateDirList, countyDirList, fileNameList):
dloadDir = dloadLocBase + state + r"/" + county + "/" + fileName
requestURL = dloadUrlBase + fileName
extractLocList.append(dloadLocBase + state + r"/" + county + "/")
try:
urllib.urlretrieve(requestURL, dloadDir)
print requestURL + " found"
urllib.urlcleanup()
unzipList.append(dloadDir)
f = open(logfile, 'a+')
f.write(dloadDir + " has been downloaded")
f.close()
except:
pass
for zFile, uzDir in zip(unzipList, extractLocList):
zip_ref = zipfile.ZipFile(zFile, "r")
zip_ref.extractall(uzDir)
zip_ref.close()

Instead of just passing when an error is raised, log / print what the error is. That should indicate what is the issue or a set of issues.
except Exception as e:
print e # or print e.message

Turns out that this was a syncing issue with my network. We use a cloud based network that syncs with our offices, so somehow not all of the files were being synced and getting left in a queue.

Related

No such file or directory when creating file in Python

I am implementing a logging process which append to the log file.
I want to check, if the log file exist then append to more lines to the file. If not then create the new file then append but i keep getting error saying: No such file or directory
try:
f = open(os.path.join(
BASE_DIR, '/app/logs/log-' + current_date + '.csv'), "a+")
f.write(message + "\n")
except IOError:
f = open(os.path.join(
BASE_DIR, '/app/logs/log-' + current_date + '.csv'), "w+")
f.write(message + "\n")
finally:
f.close()
What mistake am i making here?
============ Update
This code is working :
try
f = open('log-' + current_date + '.csv'), "a+")
f.write(message + "\n")
except IOError:
f = open('log-' + current_date + '.csv'), "w+")
f.write(message + "\n")
finally:
f.close()
if i open the file like this, its working. But as soon as i add the path there. Its just keep saying no file or directory.
=============== Update
Never mind, it has been working.I forgot to rebuild my docker image to see the results. :DD.
So the problem is the incorrect path.

The output of os.path.join will be /app/logs/log-<current_date>.csv. This is not what you want. Remove the leading / from that second argument and it will work as you want. This happens because you passed it an absolute path as the second input. See os.path.join documentation for an explanation.

Why not do something like this:
import os
check = os.path.isfile("file.txt")
with open("file.txt", "a+") as f:
if not check:
f.write("oi")
else:
f.write("oi again")

Q: Python stops working at end of script in Batch

I'm running a series of python scripts from the Command window via a batch file.
Previously, it's worked without issue. However now, without a change in code, every time it gets to the end of a script I get a "Python.exe has stopped working" error. The scripts have actually completed processing, but I need to close the error window for the batch to proceed.
I've tried adding sys.exit to ends of the scripts but that makes no difference. The first script has no issue but every script after has this issue.
How do I stop this error from happening?
Batch File
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script1
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script2
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script3
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script4a
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script4b
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script4c
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script4d
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script5
C:\Path\to\Python\ArcGIS64bitversion C:\Path\to\Script6
the python scripts do, all, actually complete. Scripts 2-5 all use multiprocessing, however script 6 does not use multiprocessing and still experiences the error.
General Script Structure
import statements
global variables
get data statements
Def Code:
try:
code
sys.exit
except:
print error in text file
Def multiprocessing:
pool = multiprocessing.pool(32)
pool.map(Code, listofData)
if main statement
try:
code
multiprocessing()
sys.exit
except:
print error to text file
Script 2 (the first script to error)
import arcpy, fnmatch, os, shutil, sys, traceback
import multiprocessing
from time import strftime
#===========================================================================================
ras_dir = r'C:\Path\to\Input'
working_dir = r'C:\Path\to\Output'
output_dir = os.path.join(working_dir, 'Results')
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
#===========================================================================================
global input_files1
global raslist
global ras
raslist = []
input_files1 = []
#===========================================================================================
for r, d, f in os.walk(working_dir):
for inFile in fnmatch.filter(f, '*.shp'):
input_files1.append(os.path.join(r, inFile))
for r, d, f in os.walk(ras_dir):
for rasf in fnmatch.filter(f,'*.tif'):
raslist.append(os.path.join(r, rasf))
ras = raslist[0]
del rasf,raslist
def rasextract(file):
arcpy.CheckOutExtension("Spatial")
arcpy.env.overwriteOutput = True
proj = file.split('.')
proj = proj[0] + '.' + proj[1] + '.prj'
arcpy.env.outputCoordinateSystem = arcpy.SpatialReference(proj)
try:
filename = str(file)
filename = filename.split('\\')
filename = filename[-1]
filename = filename.split('.')
filename = filename[0]
tif_dir = output_dir + '\\' + filename
os.mkdir(tif_dir)
arcpy.env.workspace = tif_dir
arcpy.env.scratchWorkspace = tif_dir
dname = tif_dir + '\\' + filename + '_ras.tif'
fname = working_dir+ '\\' + filename + '_ras.tif'
bufname = tif_dir + '\\' + filename + '_rasbuf.shp'
arcpy.Buffer_analysis(file, bufname, "550 METERS", "FULL", "ROUND", "ALL")
newras = arcpy.sa.ExtractByMask(ras, bufname)
newras.save(rasname)
print "Saved " + filename + " ras"
sys.exit
except:
var = traceback.format_exc()
x = str(var)
timecode = strftime("%a, %d %b %Y %H:%M:%S + 0000")
logfile = open(r'C:\ErrorLogs\Log_Script2_rasEx.txt', "a+")
ent = "\n"
logfile.write(timecode + " " + x + ent)
logfile.close()
def MCprocess():
pool = multiprocessing.Pool(32)
pool.map(rasextract, input_files1)
if __name__ == '__main__':
try:
arcpy.CheckOutExtension("Spatial")
ras_dir = r'C:\Path\to\Input'
working_dir = r'C:\Path\to\Output'
output_dir = os.path.join(working_dir, 'Results')
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
#=============================================================
raslist = []
input_files1 = []
#=============================================================
for r, d, f in os.walk(working_dir):
for inFile in fnmatch.filter(f, '*.shp'):
input_files1.append(os.path.join(r, inFile))
for r, d, f in os.walk(ras_dir):
for demf in fnmatch.filter(f,'*.tif'):
demlist.append(os.path.join(r, rasf))
ras = raslist[0]
del rasf,raslist
MCprocess()
sys.exit
except:
var = traceback.format_exc()
x = str(var)
timecode = strftime("%a, %d %b %Y %H:%M:%S + 0000")
logfile = open(r'C:\ErrorLogs\Log_Script2_rasEx.txt', "a+")
ent = "\n"
logfile.write(timecode + " " + x + ent)
logfile.close()
NEW error message
this error was encountered after disabling error reporting.

Windows is catching the error.
Try disabling 'Window Error Reporting' in the Registry. After that a traceback/error should be shown. Here you find instructions how to disable 'WER' for Windows 10.

Posting this as it is the only web search I could find matching my error (which was that a script ran flawlessly in IDLE, but threw the "Python has stopped working" error when called from a batch (.bat) file) - Full disclosure, I was using shelve, not using arcpy.
I think the issue is that you are somehow leaving files 'open', and then when the script ends Python is forced to clean up the open files in an 'unplanned' fashion. Inside the IDE, this is caught and handled, but once in a batch file, the issue bubbles up to give the 'stopped working'
contrast:
f = open("example.txt", "r")
with
f = open("example.txt", "r")
f.close()
The first will error out from a bat file, the second will not.

Throttling FTP download with Python ftplib

How do I throttle the FTP download with Python ftplib? For example put a cap on the speed to be 20Mb/s?
I'm using the following code to download files with Python ftplib:
from ftplib import FTP
import os
download_list = 'testlist.txt' # inital list of directories to be downloaded
path_list = [] # initalize a list of all the pathes from download_list
local_folder = 'testStorage' #where files are going to be downloaded to
downloaded_list = 'completedownload.txt' # list of completed downloads
error_list = 'incomplete_downloads.txt' # list of paths that are incomplete
ftp=FTP("ftp.address.com")
ftp.login("user_name","password") #login to FTP account
print "Successfully logged in"
# make a list of files to download from a file
with open(download_list, 'r') as f:
content = f.readlines()
path_list = [x.strip() for x in content]
for path in path_list:
path = path.replace("*","") # strips the * found in the source file
print '\nChanging directory to ' + path + ':\n'
#ftp.cwd('/AAA/BBB/CCC/logic-1/') #the format to change into path note the * is omitted
#if ftp.cwd(path) == True:
try: # tries the path in the file
ftp.cwd(path)
#ftp.retrlines('LIST')
filenames = ftp.nlst()
for filename in filenames:
local_directory = local_folder+path # create the local path ie : testStorage/AAA/BBB/CCC/logic-1/
local_filename = os.path.join(local_directory,filename) #
if os.path.exists(local_filename) == False: # checks if file already exists
if not os.path.exists(local_directory): # mimic the remote path locally
os.makedirs(local_directory)
file = open(local_filename,'wb')
ftp.retrbinary('RETR '+ filename, file.write)
print filename
file.close()
elif os.path.exists(local_filename) == True: # skip the file if it exits
print 'File ' +filename + ' already exists, skipping this file'
except: #if path in text file does not exist write to error_list.txt
print 'Path ' + path + ' does not exist writing path to error_list.txt'
with open(error_list, 'a') as f2:
f2.write(path+'\n')
continue
print "all done closing connection"
ftp.close() #CLOSE THE FTP CONNECTION

To throttle the download, just implement a function that does file.write and time.sleep as needed. Pass that function to ftp.retrbinary as callback (instead of file.write directly).
This pseudo code (I do not do Python) should give you some idea:
total_length = 0
start_time = time.time()
def write_and_sleep(buf):
global file
global total_length
global start_time
file.write(buf)
total_length += sys.getsizeof(buf)
while (total_length / (time.time() - start_time)) > 100000000:
time.sleep(0.1)
ftp.retrbinary('RETR '+ filename, write_and_sleep)
Reducing maxblocksize (the 3rd argument of ftp.retrbinary) may help achieving more smooth "download curve".

Python FTP download 550 error

I've written an ftp crawler to download specific files. It works up until it finds the specific file it wants to download, and then it throws this error:
ftplib.error_perm: 550
The file exists in my download folder, but the size of the file is 0 kb.
Do I need to convert something in order to get it to download?.
I can access the ftp manual and download the file without any problems, so don't think it's the login part (unless there's different ways of logging in??)
Here's my code:
import ftplib
import re
import os
class Reader:
def __init__(self):
self.data = ""
def __call__(self,s):
self.data += s + "\n"
ftp = ftplib.FTP("my_ftp_server")
ftp.login()
r = Reader()
ftp.dir(r)
def get_file_list(folder):
r = Reader()
ftp.dir(folder, r)
print ("Reading folder",folder)
global tpe
global name
for l in r.data.split("\n"):
if len(l) > 0:
vars = re.split("[ ]*", l)
tpe = vars[2]
name = vars[3]
if tpe == "<DIR>":
get_file_list( folder + "/" + name )
else:
print (folder + name)
for name in folder:
if vars[3].endswith(('501.zip','551.zip')):
if os.path.exists('C:\\download\\' + vars[3]) == False:
fhandle = open(os.path.join('C:\\download\\', vars[3]), 'wb')
print ('Getting ' + vars[3])
ftp.retrbinary('RETR ' + vars[3], fhandle.write)
fhandle.close()
elif os.path.exists(('C:\\download\\' + vars[3])) == True:
print ('File ', vars[3], ' Already Exists, Skipping Download')
print("-"*30)
print ("Fetching folders...")
get_file_list("")

Your code is probably OK.
FTP error 550 is caused by a permission issue on the server side.
This error means 'Requested action not taken. File unavailable (e.g., file not found, no access).', as you can find out here on Wikipedia
If you expect to have access to it, you should contact the sysadmin to rectify the file permission.

Using urllib.urlretrieve to download files over HTTP not working

I'm still working on my mp3 downloader but now I'm having trouble with the files being downloaded. I have two versions of the part that's tripping me up. The first gives me a proper file but causes an error. The second gives me a file that is way too small but no error. I've tried opening the file in binary mode but that didn't help. I'm pretty new to doing any work with html so any help would be apprecitaed.
import urllib
import urllib2
def milk():
SongList = []
SongStrings = []
SongNames = []
earmilk = urllib.urlopen("http://www.earmilk.com/category/pop")
reader = earmilk.read()
#gets the position of the playlist
PlaylistPos = reader.find("var newPlaylistTracks = ")
#finds the number of songs in the playlist
NumberSongs = reader[reader.find("var newPlaylistIds = " ): PlaylistPos].count(",") + 1
initPos = PlaylistPos
#goes though the playlist and records the html address and name of the song
for song in range(0, NumberSongs):
songPos = reader[initPos:].find("http:") + initPos
namePos = reader[songPos:].find("name") + songPos
namePos += reader[namePos:].find(">")
nameEndPos = reader[namePos:].find("<") + namePos
SongStrings.append(reader[songPos: reader[songPos:].find('"') + songPos])
SongNames.append(reader[namePos + 1: nameEndPos])
initPos = nameEndPos
for correction in range(0, NumberSongs):
SongStrings[correction] = SongStrings[correction].replace('\\/', "/")
#downloading songs
fileName = ''.join([a.isalnum() and a or '_' for a in SongNames[0]])
fileName = fileName.replace("_", " ") + ".mp3"
# This version writes a file that can be played but gives an error saying: "TypeError: expected a character buffer object"
## songDL = open(fileName, "wb")
## songDL.write(urllib.urlretrieve(SongStrings[0], fileName))
# This version creates the file but it cannot be played (file size is much smaller than it should be)
## url = urllib.urlretrieve(SongStrings[0], fileName)
## url = str(url)
## songDL = open(fileName, "wb")
## songDL.write(url)
songDL.close()
earmilk.close()

Re-read the documentation for urllib.urlretrieve:
Return a tuple (filename, headers) where filename is the local file
name under which the object can be found, and headers is whatever the
info() method of the object returned by urlopen() returned (for a
remote object, possibly cached).
You appear to be expecting it to return the bytes of the file itself. The point of urlretrieve is that it handles writing to a file for you, and returns the filename it was written to (which will generally be the same thing as your second argument to the function if you provided one).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Incomplete extraction of Zip files using Python - python

Instead of just passing when an error is raised, log / print what the error is. That should indicate what is the issue or a set of issues. except Exception as e: print e # or print e.message

Turns out that this was a syncing issue with my network. We use a cloud based network that syncs with our offices, so somehow not all of the files were being synced and getting left in a queue.

Related

No such file or directory when creating file in Python

Q: Python stops working at end of script in Batch

Throttling FTP download with Python ftplib

Python FTP download 550 error

Using urllib.urlretrieve to download files over HTTP not working

Categories

Resources