Read xlsx file as dataframe inside .rar pack in python directly

Read xlsx file as dataframe inside .rar pack in python directly - python

I need help to read xlsx file present inside rar pack. I am using below code, however get an error. Is there any better way to read/extract file?
rar = glob.glob(INPATH + "*xyz*.rar*")
rf = rarfile.RarFile(rar[0])
for f in rf.infolist():
print(f.filename, f.file_size)
df = pd.read_excel(rf.read(f))
rarfile.RarCannotExec: Cannot find working tool

According to the brief PyPI docs, you need unrar installed and on your PATH in order for the module to work. It does not implement the RAR unpacking algorithm itself.
(Presumably you need rar as well, for creating archives.)

Related

Is it possible to change save path of a file saved by an external library?

I use a library in python called pyansys in which I use a method called save_as_vtk.
There it is: documentation
This method generates a file for me and saves it to my working directory. I would like that file to be saved elsewhere... I don't want it moved because sometimes it is 20+ Gb and it would take too long.
Anybody has an idea?
Thank you!

I'm the maintainer of the pyansys package.
This was answered in https://github.com/akaszynski/pyansys/issues/219
Repeated here:
It appears that ResultFile.save_as_vtk already has a filename parameter:
def save_as_vtk(self, filename, rsets=None, result_types=['ENS']):
"""Writes results to a vtk readable file.
The file extension will select the type of writer to use.
``'.vtk'`` will use the legacy writer, while ``'.vtu'`` will
select the VTK XML writer.
Parameters
----------
filename : str
Filename of grid to be written. The file extension will
select the type of writer to use. ``'.vtk'`` will use the
legacy writer, while ``'.vtu'`` will select the VTK XML
writer.

Why some .xlsx files are successfully open with openpyxl, but others cannot be open?

I am using Python 2.7 & openpyxl==2.5.11.
I want to read .xlsx files using openpyxl. Initially I was testing with files downloaded from Google Drive and everything worked nice.
Now, I tried to load some files generated with Microsoft Excel, but I see this error:
raise IOError("File contains no valid workbook part")
I tried to print some variables and figure out on my own, but I lack deeper knowledge about Excel files and there are some levels of abstraction I couldn't quickly understand.
Here is the relevant piece of code where the error is raised (excel.py):
def _find_workbook_part(package):
workbook_types = [XLTM, XLTX, XLSM, XLSX]
for ct in workbook_types:
part = package.find(ct)
if part:
return part
# some applications reassign the default for application/xml
defaults = set((p.ContentType for p in package.Default))
workbook_type = defaults & set(workbook_types)
if workbook_type:
return Override("/" + ARC_WORKBOOK, workbook_type.pop())
raise IOError("File contains no valid workbook part")
I have the problem on both OSX & Ubuntu, if that's relevant.
EDIT:
I am not capable to reproduce the issue with files I have generated on my own. I think the problem can only be reproduced with older files. The people who had the issue were using Excel 2008 or older version to create the files, so maybe that's the problem?
Thanks in advance

this worked for me
df = pd.read_excel("Set.xlsb", sheet_name='Dataset', engine='pyxlsb')
Credits to the answer here: Using Pandas with XLSB File

probably because you didn't save those .xlsx correctly, such as simply change the files suffix to xlsx.
Here's a solution that works for me:
re-save the fail-to-open files to .xlsx, even if those files already have .xlsx suffix.
if you don't know which of the files need to be resaved, start with the files with unusual large size.

Zip list of files python

I have created some csv files in my code and I would like to zip them as one folder to be sent by e-mail. I already have the e-mail function but the problem is to zip.
I tried to use this: here I am not extracting or find the files in a directory. I am creating the program the csv files and making a list of it.
My list of files is like this:
lista_files = [12.csv,13.csv,14.csv]
It seems to be easy for developers but as a beginning it is hard. I would really appreciate if someone can help me.

I believe you're looking for the zipfile library. And given that you're looking at a list of filenames, I'd just iterate using a for loop. If you have directories listed as well, you could use os.walk.
import zipfile
lista_files = ["12.csv","13.csv","14.csv"]
with zipfile.ZipFile('out.zip', 'w') as zipMe:
for file in lista_files:
zipMe.write(file, compress_type=zipfile.ZIP_DEFLATED)

Create a SFX archive using python

I am looking for some help with python script to create a Self Extracting Archive (SFX) an exe file which can be created by WinRar basically.
I would want to archive a folder with password protection and also split volume by 3900 MB so that it can be easily burned to a disk.
I know WinRar has command line parameters to create a archive, but i am not sure how to call it via python anyhelp on this would be of great help.
Here are main things I want:
Archive Format - RAR
Compression Method Normal
Split Volume size, 3900 MB
Password protection
I looked up everywhere but don't seem to find anything around this functionality.

You could have a look at rarfile
Alternatively use something like:
from subprocess import call
cmdlineargs = "command -switch1 -switchN archive files.. path_to_extract"
call(["WinRAR"] + cmdlineargs.split())
Note in the second line you will need to use the correct command line arguments, the ones above are just as an example.

Unzipping a Zip File in Django

I'm trying to unzip a zip file in Django using the zipfile library.
This is my code:
if formtoaddmodel.is_valid():
content = request.FILES['content']
unzipped = zipfile.ZipFile(content)
print unzipped.namelist()
for libitem in unzipped.namelist():
filecontent = file(libitem,'wb').write(unzipped.read(libitem))
This is the output of print unzipped.namelist()
['FileName1.jpg', 'FileName2.png', '__MACOSX/', '__MACOSX/._FileName2.png']
Im wondering what the last two items are -- it looks like the path. I don't care about there -- so how is there a way to filter them out?

https://superuser.com/questions/104500/what-is-macosx-folder
if libitem.startswith('__MACOSX/'):
continue

Those files are tags added by the zip utility on MACS. You can assume the name starts with '__MACOSX/'
link

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read xlsx file as dataframe inside .rar pack in python directly - python

According to the brief PyPI docs, you need unrar installed and on your PATH in order for the module to work. It does not implement the RAR unpacking algorithm itself. (Presumably you need rar as well, for creating archives.)

Related

Is it possible to change save path of a file saved by an external library?

Why some .xlsx files are successfully open with openpyxl, but others cannot be open?

Zip list of files python

Create a SFX archive using python

Unzipping a Zip File in Django

Categories

Resources