I need to extract a single file (~10kB) from many very large RAR files (>1Gb). The code below shows a basic implementation of how I'm doing this.
from rarfile import RarFile
rar_file='D:\\File.rar'
file_of_interest='Folder 1/Subfolder 2/File.dat'
output_folder='D:/Output'
rardata = RarFile(rar_file)
rardata.extract(file_of_interest, output_folder)
rardata.close()
However, the extract instruction is returning the following error: rarfile.BadRarFile: Failed the read enough data: req=16384 got=52
When I open the file using WinRAR, I can extract the file successfully, so I'm sure the file isn't corrupted.
I've found some similar questions, but not a definite answer that worked for me.
Can someone help me to solve this error?
Additional info:
Windows 10 build 1909
Spyder 5.0.0
Python 3.8.1
Complete traceback of the error:
Traceback (most recent call last):
File "D:\Test\teste_rar_2.py", line 27, in <module>
rardata.extract(file_of_interest, output_folder)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 826, in extract
return self._extract_one(inf, path, pwd, True)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 912, in _extract_one
return self._make_file(info, dstfn, pwd, set_attrs)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 927, in _make_file
shutil.copyfileobj(src, dst)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 2197, in read
raise BadRarFile("Failed the read enough data: req=%d got=%d" % (orig, len(data)))
BadRarFile: Failed the read enough data: req=16384 got=52
Related
I have read through the Python documentation about zip files and watched a couple of videos, but everything didn't work. I'm using Kali Linux, so that the password has to be encoded in bytes.
Here is my code, with which I have tried:
import zipfile
import string
import traceback
def try_function(zip, pwd):
try:
zip.extractall(pwd=pwd.encode())
print("Yes")
except TypeError:
print("No")
z = zipfile.ZipFile("test.txt.zip")
pwd_local = "abc"
if __name__ == '__main__':
try_function(z, pwd_local)
But I always get the same error:
Traceback (most recent call last):
File "ZipWorker.py", line 22, in <module>
try_function(z, pwd_list)
File "ZipWorker.py", line 11, in crack
zip.extractall(pwd.encode())
File "/usr/lib/python3.9/zipfile.py", line 1633, in extractall
self._extract_member(zipinfo, path, pwd)
File "/usr/lib/python3.9/zipfile.py", line 1686, in _
extract_member
with self.open(member, pwd=pwd) as source, \
File "/usr/lib/python3.9/zipfile.py", line 1559, in open
return ZipExtFile(zef_file, mode, zinfo, pwd, True)
File "/usr/lib/python3.9/zipfile.py", line 797, in __init__
self._decompressor = _get_decompressor(self._compress_type)
File "/usr/lib/python3.9/zipfile.py", line 698, in
_get_decompressor
_check_compression(compress_type)
File "/usr/lib/python3.9/zipfile.py", line 678, in
_check_compression
raise NotImplementedError("That compression method is not
supported")
NotImplementedError: That compression method is not supported
Does anyone know how to do this? I'm using python3.9.
So I finally find out, why the code above doesn't work.
When you are creating a zipfile with for example 7zip, this zip file will be encrypted.
But the encryption isn't in bytes, it's encrypted in the hashes: AES-256 or ZipCrypto.
I am trying to run this tensorflow example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification_character_cnn.py
However it keeps failing at the stage to open the tar file. This is the error message I am getting:
Successfully downloaded dbpedia_csv.tar.gz 1613 bytes.
Traceback (most recent call last):
File "text_classification_character_cnn.py", line 110, in <module>
tf.app.run()
File "/Users/alechewitt/Envs/solar_detection/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "text_classification_character_cnn.py", line 87, in main
'dbpedia', test_with_fake_data=FLAGS.test_with_fake_data, size='large')
File "/Users/alechewitt/Envs/solar_detection/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/__init__.py", line 64, in load_dataset
return DATASETS[name](size, test_with_fake_data)
File "/Users/alechewitt/Envs/solar_detection/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/text_datasets.py", line 48, in load_dbpedia
maybe_download_dbpedia(data_dir)
File "/Users/alechewitt/Envs/solar_detection/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/text_datasets.py", line 40, in maybe_download_dbpedia
tfile = tarfile.open(archive_path, 'r:*')
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1672, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
Any help would be much appreciated
When you get that error, you can look at the downloaded dbpedia_dsv.tar.gz in a text editor, and you might find that it is actually a 404 webpage. The file you want seems to be available here as well (I found this link here):
https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
Download that file (at your own risk) and replace it manually. Then you can run your script again.
Open the tar file using the full path. BTW the link you gave is 404 not found.
I've just installed pychecker on windows 7 Pro using "python setup.py install". When I run it on my script using the command:
c:\Python26\Scripts\pychecker -#100 finaltest17.py
I get the following error/traceback:
C:\Users\....\ToBeReleased>C:\Python26\python.exe C:\Python26\Lib\site-packages\pychecker\checker.py -#100 finaltest17.py
Processing module finaltest17 (finaltest17.py)...
Caught exception importing module finaltest17:
File "C:\Python26\Lib\site-packages\pychecker\pcmodules.py", line 533, in setupMainCode()
self.moduleName, self.moduleDir)
File "C:\Python26\Lib\site-packages\pychecker\pychecker\utils.py", line 184, in findModule()
handle, filename, smt = _q_find_module(p, path)
File "C:\Python26\Lib\site-packages\pychecker\pychecker\utils.py", line 162, in _q_find_module()
if not cfg().quixote:
File "C:\Python26\Lib\site-packages\pychecker\pychecker\utils.py", line 39, in cfg()
return _cfg[-1]
IndexError: list index out of range
Traceback (most recent call last):
File "C:\Python26\Lib\site-packages\pychecker\checker.py", line 364, in <module>
sys.exit(main(sys.argv))
File "C:\Python26\Lib\site-packages\pychecker\checker.py", line 337, in main
importWarnings = processFiles(files, _cfg, _print_processing)
File "C:\Python26\Lib\site-packages\pychecker\checker.py", line 270, in processFiles
loaded = pcmodule.load()
File "C:\Python26\Lib\site-packages\pychecker\pcmodules.py", line 477, in load
return utils.cfg().ignoreImportErrors
File "C:\Python26\Lib\site-packages\pychecker\pychecker\utils.py", line 39, in cfg
return _cfg[-1]
IndexError: list index out of range
If anyone could point me in the right direction that would be great.
Thanks
Stewart
Problem resolved.
I found the following support request on SourceForge which refers to a need to use short format (8.3) path and filenames in pychecker.bat and not long format as is allowed in newer versions of Windows.
https://sourceforge.net/p/pychecker/support-requests/7/#96cb
I have a simple task but cannot make my code work. I want to loop over the URLs listed in my textfile and download it using wget command in Python. Each URL are placed in separate line in the textfile.
Basically, this is the structure of the list in my textfile:
http://e4ftl01.cr.usgs.gov//MODIS_Composites/MOLT/MOD11C3.005/2000.03.01/MOD11C3.A2000061.005.2007177231646.hdf
http://e4ftl01.cr.usgs.gov//MODIS_Composites/MOLT/MOD11C3.005/2014.12.01/MOD11C3.A2014335.005.2015005235231.hdf
all the URLs are about 178 lines. Then save it in the current working directory.
Below is the initial code that I am working:
import os, fileinput, urllib2 as url, wget
os.chdir("E:/Test/dwnld")
for line in fileinput.FileInput("E:/Test/dwnld/data.txt"):
print line
openurl = wget.download(line)
The error message is:
Traceback (most recent call last): File "E:\Python_scripts\General_purpose\download_url_from_textfile.py", line 5, in <module>
openurl = wget.download(line) File "C:\Python278\lib\site-packages\wget.py", line 297, in download
(fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".") File "C:\Python278\lib\tempfile.py", line 308, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags) File "C:\Python278\lib\tempfile.py", line 239, in _mkstemp_inner
fd = _os.open(file, flags, 0600) OSError: [Errno 22] Invalid argument: ".\\MOD11C3.A2000061.005.2007177231646.hdf'\n.frbfrp.tmp"
Try to use urllib.urlretrieve. Check the documentation here: https://docs.python.org/2/library/urllib.html#urllib.urlretrieve
I created a file by using:
store = pd.HDFStore('/home/.../data.h5')
and stored some tables using:
store['firstSet'] = df1
store.close()
I closed down python and reopened in a fresh environment.
How do I reopen this file?
When I go:
store = pd.HDFStore('/home/.../data.h5')
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 207, in __init__
self.open(mode=mode, warn=False)
File "/misc/apps/linux/python-2.6.1/lib/python2.6/site-packages/pandas-0.10.0-py2.6-linux-x86_64.egg/pandas/io/pytables.py", line 302, in open
self.handle = _tables().openFile(self.path, self.mode)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 230, in openFile
return File(filename, mode, title, rootUEP, filters, **kwargs)
File "/apps/linux/python-2.6.1/lib/python2.6/site-packages/tables/file.py", line 495, in __init__
self._g_new(filename, mode, **params)
File "hdf5Extension.pyx", line 317, in tables.hdf5Extension.File._g_new (tables/hdf5Extension.c:3039)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5F.c", line 1582, in H5Fopen
unable to open file
File "H5F.c", line 1373, in H5F_open
unable to read superblock
File "H5Fsuper.c", line 334, in H5F_super_read
unable to find file signature
File "H5Fsuper.c", line 155, in H5F_locate_signature
unable to find a valid file signature
End of HDF5 error back trace
Unable to open/create file '/home/.../data.h5'
What am I doing wrong here? Thank you.
In my hands, following approach works best:
df = pd.DataFrame(...)
"write"
with pd.HDFStore('test.h5', mode='w') as store:
store.append('df', df, data_columns= df.columns, format='table')
"read"
with pd.HDFStore('test.h5', mode='r') as newstore:
df_restored = newstore.select('df')
You could try doing instead:
store = pd.io.pytables.HDFStore('/home/.../data.h5')
df1 = store['firstSet']
or use the read method directly:
df1 = pd.read_hdf('/home/.../data.h5', 'firstSet')
Either way, you should have pandas 0.12.0 or higher...
I had the same problem and finally fixed it by installing the pytables module (next to the pandas modules which I was using):
conda install pytables
which got me numexpr-2.4.3 and pytables-3.2.0
After that it worked. I am using pandas 0.16.2 under python 2.7.9