Create a SFX archive using python

Create a SFX archive using python - python

I am looking for some help with python script to create a Self Extracting Archive (SFX) an exe file which can be created by WinRar basically.
I would want to archive a folder with password protection and also split volume by 3900 MB so that it can be easily burned to a disk.
I know WinRar has command line parameters to create a archive, but i am not sure how to call it via python anyhelp on this would be of great help.
Here are main things I want:
Archive Format - RAR
Compression Method Normal
Split Volume size, 3900 MB
Password protection
I looked up everywhere but don't seem to find anything around this functionality.

You could have a look at rarfile
Alternatively use something like:
from subprocess import call
cmdlineargs = "command -switch1 -switchN archive files.. path_to_extract"
call(["WinRAR"] + cmdlineargs.split())
Note in the second line you will need to use the correct command line arguments, the ones above are just as an example.

Related

Read xlsx file as dataframe inside .rar pack in python directly

I need help to read xlsx file present inside rar pack. I am using below code, however get an error. Is there any better way to read/extract file?
rar = glob.glob(INPATH + "*xyz*.rar*")
rf = rarfile.RarFile(rar[0])
for f in rf.infolist():
print(f.filename, f.file_size)
df = pd.read_excel(rf.read(f))
rarfile.RarCannotExec: Cannot find working tool

According to the brief PyPI docs, you need unrar installed and on your PATH in order for the module to work. It does not implement the RAR unpacking algorithm itself.
(Presumably you need rar as well, for creating archives.)

Archive files directly from memory in Python

I'm writing this program where I get a number of files, then zip them with encryption using pyzipper, and also I'm using io.BitesIO() to write these files to it so I keep them in-memory. So now, after some other additions, I want to get all of these in-memory files and zip them together in a single encrypted zip file using the same pyzipper.
The code looks something like this:
# Create the in-memory file object
in_memory = BytesIO()
# Create the zip file and open in write mode
with pyzipper.AESZipFile(in_memory, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zip_file:
# Set password
zip_file.setpassword(b"password")
# Save "data" with file_name
zip_file.writestr(file_name, data)
# Go to the beginning
in_memory.seek(0)
# Read the zip file data
data = in_memory.read()
# Add the data to a list
files.append(data)
So, as you may guess the "files" list is an attribute from a class and the whole thing above is a function that does this a number of times and then you get the full files list. For simplicity's sake, I removed most of the irrelevant parts.
I get no errors for now, but when I try to write all files to a new zip file I get an error. Here's the code:
with pyzipper.AESZipFile(test_name, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zfile:
zfile.setpassword(b"pass")
for file in files:
zfile.write(file)
I get a ValueError because of os.stat:
File "C:\Users\vulka\AppData\Local\Programs\Python\Python310\lib\site-packages\pyzipper\zipfile.py", line 820, in from_file
st = os.stat(filename)
ValueError: stat: embedded null character in path
[WHAT I TRIED]
So, I tried using mmap for this purpose but I don't think this can help me and if it can - then I have no idea how to make it work.
I also tried using fs.memoryfs.MemoryFS to temporarily create a virtual filessystem in memory to store all the files and then get them back to zip everything together and then save it to disk. Again - failed. I got tons of different errors in my tests and TBH, there's very little information out there on this fs method and even if what I'm trying to do is possible - I couldn't figure it out.
P.S: I don't know if pyzipper (almost 1:1 zipfile with the addition of encryption) supports nested zip files at all. This could be the problem I'm facing but if it doesn't I'm open to any suggestions for a new approach to doing this. Also, I don't want to rely on a 3rd party software, even if it is open source! (I'm talking about the method of using 7zip to do all the archiving and ecryption, even though it shouldn't even be possible to use it without saving the files to disk in the first place, which is the main thing I'm trying to avoid)

Get zip file from url with python3 request : make it more verbose

I try this to load a zip file from a url.
import requests
resp = requests.get('https://nlp.stanford.edu/data/glove.6B.zip')
I now the file is colossal, and I don't know in between if everything is going well or not.
(1) Is there a way to make the loading more verbose ?
(2) How do I know where data are loaded, and is there a relative path for it, which I can use for implementing the rest of my script ?
(3) How to nicely unzip ?
(4) How to either choose/set a file name or get the file name for the downloaded file ?

Is there a way to make the loading more verbose ?
If you want to download file to disk and be aware how many bytes were already downloaded you might use urrlib.request.urlretrieve from built-in module urllib.request. It does accept optional reporthook. This should be function which accept 3 arguments, it will be called at begin and end of each chunk with:
number of chunk
size of chunk
total size or 1 if unknown
Simple example which prints to stdout progress as fraction
from urllib.request import urlretrieve
def report(num, size, total):
print(num*size, '/', total)
urlretrieve("http://www.example.com","index.html",reporthook=report)
This does download www.example.com to current working directory as index.html reporting progress by printing. Note that fraction might be > 1 and should be treated as estimate.
EDIT: After download of zip file end, if you want to just unpack whole archive you might use shutil.unpack_archive from shutil built-in module. If more fine grained control is desired you might use zipfile built-in module, in PyMOTW3 entry for zipfile you might find examples like listing files inside ZIP archive, reading selected file from ZIP archive, reading metadata of file inside ZIP archive.

Can't extract gz file using the patool package

I am trying to use the patool package to perform a simple operation: decompressing a gz archive that consists of one file. This one file in the archive is and xml file that has exactly the same name as the archive, just without the .gz ending.
The code I use for this is:
import patoolib
filePath = 'D:\\inpath\\file.xml.gz'
outPath= 'D:\\outpath'
patoolib.extract_archive(filePath,outdir=outPath, interactive=False, verbosity=-1)
But what happens is that the file is being extracted but in a corrupt manner. That is, the file appears in the outPath folder, but has 0kb and cannot be opened. The error I get is:
PatoolError: Command `['c:\Rtools\bin\gzip.EXE', '-c', '-d', '--', 'D:\inpath\file.xml.gz', '>', 'D:\outPath\file.xml']' returned non-zero exit status 1
Now, I am certain that the archive is not corrupt, since when I perform the extraction manually using Windows Explorer, it does work properly.
This code did work for some other files, but I can't understand why this is occurring for this file. Also, I am wondering whether there is perhaps a simpler way of doing this that is known o work more smoothly.

Transparently mount a tar.gz archive with Python

How can I mount a tar.gz archive transparently with Python?
I have a tar.gz archive whose contents have to be read by an external program. The contents will only be needed temporarily. I could just unpack it to a temporary folder and point my external program there to read it. Afterwards, I could just delete the temp folder again. However, the archives may be large (>1 GB when extracted) so that unpacking them will take up a lot of space on the disk. My server is rather weak regarding HD performance and I cannot waste space ad lib but it does have a lot of RAM and CPU power.
That's why I want to try to mount the archive transparently without unpacking it entirely. I came across archivemount which seems to do exactly what I want. Is there a way to do what archivemount does in pure Python? No subprocess.call "solutions", please. It should run on 64-bit Linux.
I believe there should be a smart way to use tarfile to access archive's contents and then fusepy to create a user-space file system which exposes the contents of the archive. Has anyone already put these pieces together? Any ideas?
If you think that this is not a good idea, please post relevant comments. If you know what is better, please comment.

As of version 0.3.1 of my ratarmount module, you can use it or take a look at its source to mount a .tar.gz in Python. The gzip seeking support is from the dependency indexed_gzip. Ratarmount itself is based on tarindexer, which implements the idea to use tarfile to get offsets and then seek to it. But, ratarmount adds a FUSE layer among other usability and performance features.
You can install ratarmount from PyPI:
pip3 install --user ratarmount
and then call its command line interface directly from python like so:
import ratarmount
ratarmount.cli( [ '--help' ] )
ratarmount.cli( [ pathToTar, pathToMountPoint ] )
The heart of the module is as you already surmised tarfile, which is used to iterate over all TarInfo objects and create a list of filepath,offset,size, which then can be used to seek directly to the offset in the raw tar file and the simply read the next size bytes. This works because TAR is that simple of a format.
Here is the unoptimized and very bare core idea:
import sys
import tarfile
from indexed_gzip import IndexedGzipFile
targzfile = sys.argv[1]
filetoprint = sys.argv[2]
index = {} # path : ( offset, size )
file = IndexedGzipFile( targzfile )
for tarinfo in tarfile.open( fileobj = file, mode = 'r|' ):
index[tarinfo.name] = ( tarinfo.offset_data, tarinfo.size )
# at this point you could save or load the index for faster consecutive file seeks
file.seek( index[filetoprint][0] )
sys.stdout.buffer.write( file.read( index[filetoprint][1] ) )
The above example was tested to work with:
wget -O- 'https://ftp.mozilla.org/pub/firefox/releases/70.0/linux-x86_64/en-US/firefox-70.0.tar.bz2' | bzip2 -d -c | gzip > firefox.tgz
python3 minimal-example.py firefox.tgz firefox/updater.ini

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a SFX archive using python - python

Related

Read xlsx file as dataframe inside .rar pack in python directly

Archive files directly from memory in Python

Get zip file from url with python3 request : make it more verbose

Can't extract gz file using the patool package

Transparently mount a tar.gz archive with Python

Categories

Resources