Extract files that exist in folders from Zip Archive

Extract files that exist in folders from Zip Archive - python

How can i extract files from directory that exist in Zip Archive,i uploaded zip archive from form(written in HTML),now if the Zip Archive contains folders i can't extract the files in side this folder,this is a snippet from my code:
form = cgi.FieldStorage()
file_upload = form['file']
zfile=zipfile.ZipFile(file_upload.file,"r")
files_zip=zfile.namelist()
for name in files_zip:
print name
if name.endswith('/'):
print "yes"
l=list()
l=os.listdir(name)
print l
EDIT:
I tried to use StringIO() as:
s=StringIO(file_upload)
f=s.getvalue()
with zipfile.ZipFile(f,'r')as z:
for d in z.namelist():
print "%s: %s"%(d, z.read(d))
but the problem of the second snippet of code is:
No such file or directory: "FieldStorage('file', 'test.zip')
,i want to extract thse files to add them to GAE BlobStore??
Thanks in advance.

There's a working example of how to do this in appengine-mapreduce.
Look at input_readers.py for BlobstoreZipInputReader (which starts at line 898 at the moment).

I don't understand why you are using os.listdir to list files inside zip data, you should just go thru names and extract data, here is a example where I create a in-memory zip file and extract files, even in a folder e.g.
from zipfile import ZipFile
from StringIO import StringIO
# first lets create a zip file with folders to simulate data coming from user
f = StringIO()
with ZipFile(f, 'w') as z:
z.writestr('1.txt', "data of file 1")
z.writestr('folder1/2.txt', "data of file 2")
zipdata = f.getvalue()
# try to read zipped data containing folders
f = StringIO(zipdata)
with ZipFile(f, 'r') as z:
for name in z.namelist():
print "%s: %s"%(name, z.read(name))
output:
1.txt: data of file 1
folder1/2.txt: data of file 2
As appengine doesn't allow writing to file system you will need to read file data (explained aboce) and dump it to blobs, you can just have a simple structure of name and data, but in you local OS you can try z.extractall() and it will create whole folder structure and files.

Related

I have a problem regarding extracting in python

I have been using the following code to extract the files:
import os, zipfile
extension = ".zip"
for item in os.listdir(dir_name): # loop through items in dir
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
os.remove(file_name) # delete
The problem is that all the files inside the zip have the same name. For example:
Zip 1 has names,
"File 1, File 2"
Whereas Zip 2 also has names "Files 1" and "File 2"
After extracting, all my files are getting overwritten by the next file.
Is there any solution to this?
I tried extracting files, expected the files to be extracted, but all the files got overridden.

Use os.mkdir("dir_name") with the same name as the Zip File and then Extract them in this new directory zip_ref.extractall("dir_name")
For eg:- Zip File => my_zip.zip
filename = my_zip.zip
os.mkdir(filename.strip(".zip"))
#rest of the code
zip_ref.extractall(filename.strip(".zip"))

How to open multiple pickle files in a folder

I have multiple pickle files with the same format in one folder called pickle_files:
1_my_work.pkl
2_my_work.pkl
...
125_my_work.pkl
How would I go about loading those files into the workspace, without having to do it one file at a time?
Thank you!

Loop over the files and save the data in a structure, for example a dictionary:
# Imports
import pickle
import os
# Folder containing your files
directory = 'C://folder'
# Create empty dictionary to save data
data = {}
# Loop over files and read pickles
for file in os.listdir(directory):
if file.endswith('.pkl'):
with open(file, 'rb') as f:
data[file.split('.')[0]] = pickle.load(f)
# Now you can print 1_my_work
print(data['1_my_work'])

File scanner in python 3

I am learning python atm and in order to do something useful whilst learning, I have created a small plan:
Read specific disc drive partition. Outcome: List of directories
Iterate each file within directory and subdirectories. Outcome: List of files within directories
Read file information: extension Outcome: File extension
Read file information: size Outcome: Size
Read file information: date created Outcome: Date
Read file information: date modified Date
Read file information: owner Outcome:Ownership
At step 1 I have tried several approaches, scandir:
import os as os
x = [f.name for f in os.scandir('my_path') if f.is_file()]
with open('write_to_file_path', 'w') as f:
for row in x:
print(row)
f.write("%s\n" % str(row))
f.close()
and this:
import os as os
rootDir = ('/Users/Ivan/Desktop/revit dynamo/')
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
Though I have hard time writing a result into txt file.
May I ask what would be an ideal approach to make an audit of the specific directories with all relevant information extracted and stored as a table in txt file for now?
P.S.: my first question here, so please do not judge to strictly :)

Since you are learning Python3, I would suggest as an alternative to the low-level path manipulation using os.path, you could try pathlib(part of standard library as of Python 3.4):
from pathlib import Path
p = Path(mydir)
#list mydir content
for child in p.iterdir():
print(child)
#Recursive iteration
for child in p.glob("**/*"):
if child.is_dir():
#do dir stuff
else:
print(child.suffix) # extension
print(child.owner()) # file owner
child_info = child.stat()
#file size, mod time
print(child_info.size,child_info.st_mtime)
File creation time is platform-dependent, but this post presents some solutions.
The string of a Path can be accessed as str(p).
To write to a file using pathlib:
textfile = Path(myfilepath)
# create file if it doesn't exist
textfile.touch()
# open file, write string, then close file
textfile.write_text(mystringtext)
# open file with context manager
with textfile.open('r') as f:
f.read()

Read xml files directly from a zip file using Python

I have a following zip file structure:
some_file.zip/folder/folder/files.xml
So I have a lot of xml files within a subfolder of the zip file.
So far I have managed to unpack the zip file using the following code:
import os.path
import zipfile
with zipfile.ZipFile('some_file.zip') as zf:
for member in zf.infolist():
# Path traversal defense copied from
# http://hg.python.org/cpython/file/tip/Lib/http/server.py#l789
words = member.filename.split('/')
path = "output"
for word in words[:-1]:
drive, word = os.path.splitdrive(word)
head, word = os.path.split(word)
if word in (os.curdir, os.pardir, ''): continue
path = os.path.join(path, word)
zf.extract(member, path)
But I do not need to extract the files but to read them directly from the zip file. So either read each file within a for loop and process it or to save each file in some kind of data structure in Python. Is it possible?

as Robin Davis has written zf.open() will do the trick. Here is a small example:
import zipfile
zf = zipfile.ZipFile('some_file.zip', 'r')
for name in zf.namelist():
if name.endswith('/'): continue
if 'folder2/' in name:
f = zf.open(name)
# here you do your magic with [f] : parsing, etc.
# this will print out file contents
print(f.read())
As OP wished in comment only files from the "folder2" will be processed...

zf.open() will return a file like object without extracting it.

Extracting the extracted with python

I have a zip file containing thousands of mixed .xml and .csv files. I used the following to extract the zip file:
import zipfile
zip = zipfile.ZipFile(r'c:\my.zip')
zip.extractall(r'c:\output')
Now I need to extract the thousands of individual zip files contained in the 'c:\output' folder. I am planning on concatenating just the .csv files into one file. Thank you for the help!

Try this code :
import zipfile , os
zip = zipfile.ZipFile(r'c:/my.zip')
zip.extractall(r'c:/output')
filelist = []
for name in zip.namelist():
filelist.append(name)
zip.close()
for i in filelist:
newzip = zipfile.ZipFile(r'c:/output/'+str(i))
for file in newzip.namelist():
if '.csv' in file :
newzip.extract(file,r'c:/output/')
newzip.close()
os.remove(r'c:/output/'+str(i))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract files that exist in folders from Zip Archive - python

There's a working example of how to do this in appengine-mapreduce. Look at input_readers.py for BlobstoreZipInputReader (which starts at line 898 at the moment).

Related

I have a problem regarding extracting in python

How to open multiple pickle files in a folder

File scanner in python 3

Read xml files directly from a zip file using Python

Extracting the extracted with python

Categories

Resources