unzipping a .gz extention file in jupyter - python

# Unzip the dataset (if we haven't already)
if not os.path.exists('./cola_public/'):
!unzip cola_public_1.1.zip
The above code will unzip a file in jupyter notebook.
How would I do this in a similar fashion if the file was a .gz file?

The zipfile package works pretty well for gzip
import zipfile as zf
file = zf.ZipFile("/path/to/file/YOUR_FILE.gzip")

I assume that your file was tar.gz and it contains more files, then you can use. (You need to create test folder or use root)
with tarfile.open('TEST.tar.gz', 'r:gz') as _tar:
for member in _tar:
if member.isdir():#here write your own code to make folders
continue
fname = member.name.rsplit('/',1)[1]
_tar.makefile(member, 'TEST' + '/' + fname)
Or if your gz is not a tar file and contains a single file you can use gzip
Reference:- https://docs.python.org/2/library/gzip.html#examples-of-usage
import gzip
import shutil
def gunzip(file_path,output_path):
with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out:
shutil.copyfileobj(f_in, f_out)
f_in.close()
f_out.close()
f='TEST.txt.gz'
gunzip(f,f.replace(".gz",""))

Related

how can i adding custom path for saving the zip_file

I am creating a zip file using the zipfile module. It works like a charm. but that'sĀ file, saved in the executed script place.
my script path is a:
[b]c:/User/Administrator/[/b]script.py
and the zipfile saved in:
[b]c:/User/Administrator/[/b]backup.zip
but I want, [b]creating a zipfile, in another path[/b], like this:
[b]d:/backups/[/b]backup.zip
my code like this:
import zipfile
zip_file = zipfile.ZipFile("backup.zip", 'w')
with zip_file:
for file in filePaths:
zip_file.write(file)
my question is a how can I adding custom path for saving the zip_file. because I have not an enough space in C:
tnx a lot.
Give the path you want to ZipFile function.
When you give only the name of the file, it will save the file in the current directory which the program is running.
Do this instead:
import zipfile
# For example you want to save it in drive 'D'
path = "D:\\PathToYourDir\\backup.zip"
zip_file = zipfile.ZipFile(path, 'w')
with zip_file:
for file in filePaths:
zip_file.write(file)

How to unzip many files with python

I got more than 1000 zip files in the same folder with naming convention output_MOJIBAKE
Example name: output_0aa3199eca63522b520ecfe11a4336eb_20210122_181742
How can I unzip them using Python?
Try this and let me know if it worked.
import os
import zipfile
path = 'path/to/your/zip/files'
os.chdir(path)
for file in os.listdir('.'):
with zipfile.ZipFile(file, 'r') as zip_ref:
zip_ref.extractall('.')

Zip single file

I am trying to zip a single file in python. For whatever reason, I'm having a hard time getting down the syntax. What I am trying to do is keep the original file and create a new zipped file of the original (like what a Mac or Windows would do if you archive a file).
Here is what I have so far:
import zipfile
myfilepath = '/tmp/%s' % self.file_name
myzippath = myfilepath.replace('.xml', '.zip')
zipfile.ZipFile(myzippath, 'w').write(open(myfilepath).read()) # does not zip the file properly
The correct way to zip file is:
zipfile.ZipFile('hello.zip', mode='w').write("hello.csv")
# assume your xxx.py under the same dir with hello.csv
The python official doc says:
ZipFile.write(filename, arcname=None, compress_type=None)
Write the file named filename to the archive, giving it the archive name arcname
You pass open(filename).read() into write(). open(filename).read() is a single string that contains the whole content of file filename, it would throw FileNotFoundError because it is trying to find a file named with the string content.
If the file to be zipped (filename) is in a different directory called pathname, you should use the arcname parameter. Otherwise, it will recreate the full folder hierarchy to the file folder.
from zipfile import ZipFile
import os
with ZipFile(zip_file, 'w') as zipf:
zipf.write(os.path.join(pathname,filename), arcname=filename)
Try calling zipfile.close() afterwards?
from zipfile import ZipFile
zipf = ZipFile("main.zip","w", zipfile.ZIP_DEFLATED)
zipf.write("main.json")
zipf.close()
Since you also want to specify the directory try using os.chdir:
#!/usr/bin/python
from zipfile import ZipFile
import os
os.chdir('/path/of/target/and/destination')
ZipFile('archive.zip', 'w').write('original_file.txt')
Python zipfile : Work with Zip archives
Python Miscellaneous operating system interfaces

Compress all files in a folder with python?

this code takes a bunch of files in a folder (based on the file name), zips them into bz2 and adds them into a tar file. Is there a way I can modify this to only compress the files into bz2 (or gzip)? I do not want to have to deal with having them packaged into a tar. I just want to go through each file in a directory and compress it.
import os
from glob import glob
import tarfile
os.chdir(r'C:\Documents\FTP\\')
compression = "w:bz2"
extension = '.tar.bz2'
filename = 'survey_'
filetype = 'survey_report_*.csv'
tarname = saveloc+filename+extension
files = glob(filetype)
tar = tarfile.open(tarname, compression)
for file in files:
if file not in tarname:
print('Packaging file:', file)
tar.add(file)
tar.close()
EDIT:
This code seems to work for some files, but for other ones it makes them 1kb and when I open it there are just some random characters. Any suggestions?
import bz2
import os
location = r'C:\Users\Documents\FTP\\'
os.chdir(location)
filelist = os.listdir(location)
for file in filelist:
data = open(file).read()
try:
output = bz2.BZ2File(file + '.bz2', 'wb')
output.write(data)
finally:
output.close()

How to compress a tar file in a tar.gz without directory?

I'm looking for a way to compress a tar file in a tar.gz without directory.
Today my code generate a TAR file without directory with "tarfile" library and arcname arguments but when I want to compress this TAR file in TAR.GZ I don't understand how to delete directory.
I have made many tests in the last 3 days.
My code :
Tarname = example.tar
ImageDirectory = C:\...
TarDirectory = C:\..
tar = tarfile.open(Tarname, "w")
tar.add(ImageDirectory,arcname=TarName)
tar.close()
targz = tarfile.open("example.tar.gz", "w:gz")
targz.add(TarDirectory, arcname=TarName)
targz.close()
For individual file(s):
tar.add(file, arcname=os.path.basename(file))
for each file that you want to add. basename will strip the directory information.
Or, for a recursive directory:
def flatten(tarinfo):
tarinfo.name = os.path.basename(tarinfo.name)
return tarinfo
tar = tarfile.open("example.tar.gz", "w:gz")
tar.add("directory", filter=flatten)
tar.close()
Try using the gzip module :
Here is an example of how to use it :
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

Categories

Resources