Extracting the extracted with python - python

I have a zip file containing thousands of mixed .xml and .csv files. I used the following to extract the zip file:
import zipfile
zip = zipfile.ZipFile(r'c:\my.zip')
zip.extractall(r'c:\output')
Now I need to extract the thousands of individual zip files contained in the 'c:\output' folder. I am planning on concatenating just the .csv files into one file. Thank you for the help!

Try this code :
import zipfile , os
zip = zipfile.ZipFile(r'c:/my.zip')
zip.extractall(r'c:/output')
filelist = []
for name in zip.namelist():
filelist.append(name)
zip.close()
for i in filelist:
newzip = zipfile.ZipFile(r'c:/output/'+str(i))
for file in newzip.namelist():
if '.csv' in file :
newzip.extract(file,r'c:/output/')
newzip.close()
os.remove(r'c:/output/'+str(i))

Related

How to read pairwise csv and json files having same names inside a folder using python?

Consider my folder structure having files in these fashion:-
abc.csv
abc.json
bcd.csv
bcd.json
efg.csv
efg.json
and so on i.e. a pair of csv files and json files having the same names, i have to perform the same operation by reading the same named files , do some operation and proceed to the next pair of files. How do i go about this?
Basically what i have in mind as a pseudo code is:-
for files in folder_name:
df_csv=pd.read_csv('abc.csv')
df_json=pd.read_json('abc.json')
# some script to execute
#now read the next pair and repeat for all files
Did you think of something like this?
import os
# collects filenames in the folder
filelist = os.listdir()
# iterates through filenames in the folder
for file in filelist:
# pairs the .csv files with the .json files
if file.endswith(".csv"):
with open(file) as csv_file:
pre, ext = os.path.splitext(file)
secondfile = pre + ".json"
with open(secondfile) as json_file:
# do something
You can use the glob module to extract the file names matching a pattern:
import glob
import os.path
for csvfile in glob.iglob('*.csv'):
jsonfile = csvfile[:-3] + 'json'
# optionaly control file existence
if not os.path.exists(jsonfile):
# show error message
...
continue
# do smth. with csvfile
# do smth. else with jsonfile
# and proceed to next pair
If the directory structure is consistent you could do the following:
import os
for f_name in {x.split('.')[0] for x in os.listdir('./path/to/dir')}:
df_csv = pd.read_csv("{f_name}.csv")
df_json = pd.read_json("{f_name}.json")
# execute the rest

How to read json files in subfolders?

I have a file path like this '/mnt/extract'. Now inside this extract folder, I have below 3 more subfolders -
subfolder1
subfolder2
subfolder3 (it has one .json file inside it)
The json in subfolder3 looks like this -
{
"x": "/mnt/extract/p",
"y": "/mnt/extract/r",
}
I want to extract the above json file from subfolder3 and concatenate the value - /mnt/extract/p for the key 'x' with one more string 'data' so that the final path will become '/mnt/extract/p/data' where I want to finally export some data. I tried the below approach but it's not working.
import os
for root, dirs, files in list(os.walk(path)):
for name in files:
print (os.path.join(root, name))
Using the in-built python Glob module, you can read files in folders and sub-folders.
Try this:
import glob
files = glob.glob('./mnt/extract/**/*.json', recursive=True)
The files list will contain paths to all json files in the extract directory.
Try this:
import glob
final_paths = []
extract_path= './mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
for file in files:
with open(file, 'r') as f:
json_file = json.load(f)
output_path = json_file['x']+'/'+'data'
final_paths.append(output_path)
The final_path variable will contain the output of all json files in the folder structure.
import glob
extract_path= '/mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
if len(files) != 0:
with open(files[0], 'r') as f:
dict = json.load(f)
final_output_path = dict['x']+'/'+'data'
In the above code, files object is returning a list containing JSON file as the only element. To make sure that we pass json object to the open method and not list, i took files[0] which will pick the json file element from list and then it was parsed easily.If anyone has some other suggestion to handle this list object which is retuning from glob function, feel free to answer as in how can we handle it in a more cleaner way.

Rename zipfile with info contained in archive in python

I have 5000 Zip archives that contains a json file with an information (versionName),
I want to rename those zip files, the best way I found to do that in python is
to read the jsons and get the information I need, then rename each zip like that : "Archive_name.zip => Archive_name_versionName.zip"
Here is my python code :
import zipfile
archive = zipfile.ZipFile('/home/AndroidBags/aasuited.net.word.zip', 'r')
print(archive)
jsonre = archive.read('meta_google_play/apk_aasuited.net.word.json')
print(jsonre)
Here the result of that script :
{"appdata":
[{"versionName": "1.24.1", "size": 19480447}]}
How can I acces the versionName value and rename the zip file in python ? Thank you
import os
import json
version_name = json.loads(jsonre)['appdata'][0]['versionName']
os.rename(
'/home/AndroidBags/aasuited.net.word.zip',
'./aasuited.net.word.' + version_name + '.zip'
)
This also moves the zip file to the current work directory.

How to extract only mp3 files from a ZIP archive

I have this code:
from zipfile import ZipFile
import os
import glob
inp = raw_input("Specify a ZIP archive to extract:")
with ZipFile(inp) as zf:
zf.extractall()
It works fine because it extracts all the files but how do I extract all the .mp3 files in the archive that the user specifies.
To extract just the MP3 files from a ZIP archive, you could do the following:
from zipfile import ZipFile
import os
zip_file = r"c:\folder\myzip.zip"
target_folder = r"C:\Users\Fred\Desktop"
with ZipFile(zip_file, 'r') as my_zip:
mp3_files = [name for name in my_zip.namelist() if os.path.splitext(name)[1].lower() == '.mp3']
my_zip.extractall(target_folder, mp3_files)
The list of files inside the ZIP file can be obtained using the namelist function. With this you can filter just those files ending with an mp3 extension. The extractall function lets you pass a list of all of the files you want to extract (it defaults to all files).
You could get a list of the names of the members in the list, and only extract those ending with the suffix .mp3.

Extract files that exist in folders from Zip Archive

How can i extract files from directory that exist in Zip Archive,i uploaded zip archive from form(written in HTML),now if the Zip Archive contains folders i can't extract the files in side this folder,this is a snippet from my code:
form = cgi.FieldStorage()
file_upload = form['file']
zfile=zipfile.ZipFile(file_upload.file,"r")
files_zip=zfile.namelist()
for name in files_zip:
print name
if name.endswith('/'):
print "yes"
l=list()
l=os.listdir(name)
print l
EDIT:
I tried to use StringIO() as:
s=StringIO(file_upload)
f=s.getvalue()
with zipfile.ZipFile(f,'r')as z:
for d in z.namelist():
print "%s: %s"%(d, z.read(d))
but the problem of the second snippet of code is:
No such file or directory: "FieldStorage('file', 'test.zip')
,i want to extract thse files to add them to GAE BlobStore??
Thanks in advance.
There's a working example of how to do this in appengine-mapreduce.
Look at input_readers.py for BlobstoreZipInputReader (which starts at line 898 at the moment).
I don't understand why you are using os.listdir to list files inside zip data, you should just go thru names and extract data, here is a example where I create a in-memory zip file and extract files, even in a folder e.g.
from zipfile import ZipFile
from StringIO import StringIO
# first lets create a zip file with folders to simulate data coming from user
f = StringIO()
with ZipFile(f, 'w') as z:
z.writestr('1.txt', "data of file 1")
z.writestr('folder1/2.txt', "data of file 2")
zipdata = f.getvalue()
# try to read zipped data containing folders
f = StringIO(zipdata)
with ZipFile(f, 'r') as z:
for name in z.namelist():
print "%s: %s"%(name, z.read(name))
output:
1.txt: data of file 1
folder1/2.txt: data of file 2
As appengine doesn't allow writing to file system you will need to read file data (explained aboce) and dump it to blobs, you can just have a simple structure of name and data, but in you local OS you can try z.extractall() and it will create whole folder structure and files.

Categories

Resources