Can't extract gz file using the patool package

Can't extract gz file using the patool package - python

I am trying to use the patool package to perform a simple operation: decompressing a gz archive that consists of one file. This one file in the archive is and xml file that has exactly the same name as the archive, just without the .gz ending.
The code I use for this is:
import patoolib
filePath = 'D:\\inpath\\file.xml.gz'
outPath= 'D:\\outpath'
patoolib.extract_archive(filePath,outdir=outPath, interactive=False, verbosity=-1)
But what happens is that the file is being extracted but in a corrupt manner. That is, the file appears in the outPath folder, but has 0kb and cannot be opened. The error I get is:
PatoolError: Command `['c:\Rtools\bin\gzip.EXE', '-c', '-d', '--', 'D:\inpath\file.xml.gz', '>', 'D:\outPath\file.xml']' returned non-zero exit status 1
Now, I am certain that the archive is not corrupt, since when I perform the extraction manually using Windows Explorer, it does work properly.
This code did work for some other files, but I can't understand why this is occurring for this file. Also, I am wondering whether there is perhaps a simpler way of doing this that is known o work more smoothly.

Related

How to get the path of a ".lnk" file using tkinter.filedialog.askopenfilenames() in python 3.10? Or any other ways?

My work needs me to collect some file names and their generating time.
I am using the fileName = tkinter.filedialog.askopenfilenames() to realize the function, that the program pops up a window to ask for selecting files, then I can get the files' pathes and then use fileGeneratedTime = datetime.datetime.fromtimestamp(os.path.getmtime(fileName)) to get the files' generating time.
But now here comes the problem. When I want to get the path of a .lnk file, however, it returns the path of the file which the .lnk file is pointing to. It is OK to run the program on the origin computer that has the .lnk files, but when I copy the .lnk files to other computers, the program says FileNotFoundError.
So, is there any parameters that can make the fileName = tkinter.filedialog.askopenfilenames() returns the .lnk file itself's path (not the path of the file which the .lnk file points to)? Or is there any other ways to realize the same function?
Thanks for your answering!

Get zip file from url with python3 request : make it more verbose

I try this to load a zip file from a url.
import requests
resp = requests.get('https://nlp.stanford.edu/data/glove.6B.zip')
I now the file is colossal, and I don't know in between if everything is going well or not.
(1) Is there a way to make the loading more verbose ?
(2) How do I know where data are loaded, and is there a relative path for it, which I can use for implementing the rest of my script ?
(3) How to nicely unzip ?
(4) How to either choose/set a file name or get the file name for the downloaded file ?

Is there a way to make the loading more verbose ?
If you want to download file to disk and be aware how many bytes were already downloaded you might use urrlib.request.urlretrieve from built-in module urllib.request. It does accept optional reporthook. This should be function which accept 3 arguments, it will be called at begin and end of each chunk with:
number of chunk
size of chunk
total size or 1 if unknown
Simple example which prints to stdout progress as fraction
from urllib.request import urlretrieve
def report(num, size, total):
print(num*size, '/', total)
urlretrieve("http://www.example.com","index.html",reporthook=report)
This does download www.example.com to current working directory as index.html reporting progress by printing. Note that fraction might be > 1 and should be treated as estimate.
EDIT: After download of zip file end, if you want to just unpack whole archive you might use shutil.unpack_archive from shutil built-in module. If more fine grained control is desired you might use zipfile built-in module, in PyMOTW3 entry for zipfile you might find examples like listing files inside ZIP archive, reading selected file from ZIP archive, reading metadata of file inside ZIP archive.

Why Does a Strange File Shows Up in Directory When Using os.walk()?

The project is written in Pycharm on Windows 10.
I wrote a program that grabs .docx files from a directory and searches for information. At the end of the list of file names I get this file: "~$640188.docx"
I get this error when it hits this file:
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
This error happens when I try to put file '~$640188.docx' into the docx2text method process
text = docx2txt.process(r'C:\path\to\folder\~$640188.docx')
From what I can see, this file does not exist in the directory I'm searching nor anywhere on my computer. The other strange part is that yesterday I wasn't getting this error.
I know there are sometimes "hidden" files in directories and I ran into those before on my mac (specifically '.DS_Store') but this is a .docx file.
I currently have an ugly solution, which says "don't run the code if you run into '~$640188.docx'". My concern is that this will become more of a problem when I dump 11000 files into the directory.
Where does this file come from?
Below is the code for reference
import docx2txt
import os
check_files = []
for dir, subdir, files in os.walk(r'C:\path\to\folder'):
for file in files:
check_files.append(file)
for file in check_files:
print "file: {0}".format(file)
text = docx2txt.process(r'C:\path\to\folder\{0}'.format(file))

Hidden .docx files starting with ~$ are simply temporary files created by Word while a file is actively open and being edited – the first two characters of the respective parent file's name are replaced with the ~$. They are usually deleted once you save and close a document, but sometimes they manage to stick around after you quit anyway. Since they are designed to be temporary compliments to a proper .docx file, they do not necessary have the correct zip package structure at all times.
You will do well to skip those. Checking if the file name starts with '~' should be good enough. Just add the following filtering:
check_files2 = [fl for fl in check_files if fl[0] != '~']
for file in check_files2:

Create a SFX archive using python

I am looking for some help with python script to create a Self Extracting Archive (SFX) an exe file which can be created by WinRar basically.
I would want to archive a folder with password protection and also split volume by 3900 MB so that it can be easily burned to a disk.
I know WinRar has command line parameters to create a archive, but i am not sure how to call it via python anyhelp on this would be of great help.
Here are main things I want:
Archive Format - RAR
Compression Method Normal
Split Volume size, 3900 MB
Password protection
I looked up everywhere but don't seem to find anything around this functionality.

You could have a look at rarfile
Alternatively use something like:
from subprocess import call
cmdlineargs = "command -switch1 -switchN archive files.. path_to_extract"
call(["WinRAR"] + cmdlineargs.split())
Note in the second line you will need to use the correct command line arguments, the ones above are just as an example.

Apple Automator process csv files and create new files

Is it possible to loop through a set of selected files, process each, and save the output as new files using Apple Automator?
I have a collection of .xls files, and I've gotten Automator to
- Ask for Finder Items
- Open Finder Items
- Convert Format of Excel Files #save each .xls file to a .csv
I've written a python script that accepts a filename as an argument, processes it, and saves it as p_filename in the directory the script's being run from. I'm trying to use Run Shell Script with the /usr/bin/python shell and my python script pasted in.
Some things don't translate too well, though, especially since I'm not sure how it deals with python's open('filename','w') command. It probably doesn't have permissions to create new files, or I'm entering the command incorrectly. I had the idea to instead output the processed file as text, capture it with Automator, and then save it to a new file.
To do so, I tried to use New Text File, but I can't get it to create a new text file for each file selected back in the beginning. Is it possible to loop through all the selected Finder Items?

Why do you want this done in the folder of the script? Or do you mean the folder of the files you are getting from the Finder items? In that case just get the path for each file passed into Python.
When you run open('filename','w') you should thus pass in a full pathname. Probably what's happening is you are actually writing to the root directory rather than where you think you are.
Assuming you are passing your files to the shell command in Automator as arguments then you might have the following:
import sys, os
args = sys.argv[1:]
for a in args:
p = os.path.dirname(a)
mypath = p + "/" + name
f = open(mypath, "w")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't extract gz file using the patool package - python

Related

How to get the path of a ".lnk" file using tkinter.filedialog.askopenfilenames() in python 3.10? Or any other ways?

Get zip file from url with python3 request : make it more verbose

Why Does a Strange File Shows Up in Directory When Using os.walk()?

Create a SFX archive using python

Apple Automator process csv files and create new files

Categories

Resources