python - extract csv files from 7z - python

I have lots of csv files contained in different 7z files. I want to find specific csv files in those 7z files and save them decompressed in a different directory.
I have tried
import os
import py7zlib
tree = r'Where_the_7zfiles_are_stored'
dst = r'Where_I_want_to_store_the_csvfiles'
for dirpath, dirname, filename in os.walk(tree):
for myfile in filename:
if myfile.endswith('2008-01-01_2008-04-30_1.7z'):
myZip = py7zlib.Archive7z(open(os.path.join(dirpath,myfile), 'rb'))
csvInZipFile = zip(myZip.filenames,myZip.files)
for myCsvFileName, myCsvFile in csvInZipFile:
if '2008-01' in myCsvFileName:
with open(os.path.join(dst,myCsvFileName),'wb') as outfile:
outfile.write(myCsvFile.read())
but I get the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\'\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\'\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users//'/Documents/Example/unzipfiles.py", line 23, in <module>
outfile.write(myCsvFile.read())
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 576, in read
data = getattr(self, decoder)(coder, data)
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 634, in _read_lzma
return self._read_from_decompressor(coder, dec, input, checkremaining=True, with_cache=True)
File "C:\Users\'\Anaconda3\lib\site-packages\py7zlib.py", line 611, in _read_from_decompressor
tmp = decompressor.decompress(data)
ValueError: data error during decompression
The odd thing is that the method seems to work fine for the first two csv files. I have no idea how to get to the root of the problem. At least the data in the csv files do not seem to be different. Manually unpacking the different csv files using IZArc goes without problem. (The problem occurred in both python 2.7 and 3.4).
I have also tried to use the lzma module, but here I could not figure out how to retrieve the different csv files contained in the 7z file.

Related

trouble looping xarray dataframe through subdirectories

I am trying to make a big data frame by looping through sub-directories. I want to:
i) read data from all the files (with .nc extension) in the subdirectories,
ii) select a particular chunk of it
iii) save it in a output.nc file.
import os
import xarray as xr
import numpy as np
rootdir ='/Users/sm/Desktop/along_track_J2'
data_new=[]
for subdir, dirs, files in os.walk(rootdir):
for file in files:
file_name= os.path.join(subdir, file)
df=xr.open_dataset(file_name)
df['longitude'] = ((df.longitude + 180) % 360 - 180).sortby(df.longitude)
ds=df.where((df.longitude>=-65) & (df.longitude<=-45) & (df.latitude>55), drop=True)
data_new.append(ds)
Somehow xarray cannot read the file and I see the following error:
File "", line 1, in
runfile('/Users/sm/Desktop/jason2_processing.py', wdir='/Users/sm/Desktop')
File "/Users/sm/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "/Users/sm/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/sm/Desktop/jason2_processing.py", line 18, in
df=xr.open_dataset(file_name)
File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 320, in open_dataset
**backend_kwargs)
File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 331, in open
ds = opener()
File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 230, in _open_netcdf4_group
ds = nc4.Dataset(filename, mode=mode, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2123, in netCDF4._netCDF4.Dataset.init
File "netCDF4/_netCDF4.pyx", line 1743, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'/Users/sm/Desktop/along_track_J2/.DS_Store'
Can anyone please help me with this. Thank you in advance.
OSError: [Errno -51] NetCDF: Unknown file format: b'/Users/sm/Desktop/along_track_J2/.DS_Store'
You are currently looping through all files, NetCDF and other (system) files. .DS_store is a file created by macOS, which isn't a NetCDF file. If you only want to process NetCDF files, something like this should work:
...
for file in files:
if file.split('.')[-1] == 'nc':
file_name= os.path.join(subdir, file)
df = xr.open_dataset(file_name)
....
if file.split('.')[-1] == 'nc': (the only thing which I added) basically checks if the file extension is .nc, and ignores other files.

Tablib xlsx file badZip file issue

I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'
Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.
You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)
Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

Error using uic to convert .ui file to .py file in Python

I am trying to write a program in python that will convert a .ui file in the same folder (created in Qt Designer) into a .py file. This is the code for this extremely basic program:
# -*- coding: utf-8 -*-
from PyQt4 import uic
with open('exampleinterface.py', 'w') as fout:
uic.compileUi('exampleinterface.ui', fout)
It gives the following error (with long path names shortened):
Traceback (most recent call last):
File "", line 1, in
File "...\Python32_3.5\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "...\Python32_3.5\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File ".../Documents/Python/UiToPy/minimalconvert.py", line 11, in
uic.compileUi('exampleinterface.ui', fout)
File "...\Python32_3.5\lib\site-packages\PyQt4\uic__init__.py", line 173, in compileUi
winfo = compiler.UICompiler().compileUi(uifile, pyfile, from_imports, resource_suffix)
File "...\Python32_3.5\lib\site-packages\PyQt4\uic\Compiler\compiler.py", line 140, in compileUi
w = self.parse(input_stream, resource_suffix)
File "...\Python32_3.5\lib\site-packages\PyQt4\uic\uiparser.py", line 974, in parse
document = parse(filename)
File "...\Python32_3.5\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "...\Python32_3.5\lib\xml\etree\ElementTree.py", line 594, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1
Can anyone tell me why this isn't working, and if there is a solution?
Note: I know that there are other ways to convert a .ui file into a .py file, but I am looking for one that I can easily integrate into a python program, without calling an outside file.
This error popped because my .ui file was not saved with the recent changes that i had done. The file name showed the asterisk(*) mark in the file name. Once i saved the file with changes it could be converted into a .py file.
Thanks to ekhumoro and mwormser. The problem was indeed the .ui file.
I retried it with a new .ui file and everything worked fine.

python memory error when loading MNIST.pkl.gz

I am new to Python and I have downloaded the code DBN.py but there is a problem:when I was trying to load the dataset MNIST.pkl.gz.there is always an meomory error..
my code is very simple:
import cPickle, gzip, numpy
# Load the dataset
f = gzip.open('C:\Users\MAC\Desktop\mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()
and the error is as follows:
Traceback (most recent call last):
File "<ipython-input-17-528eea6bbfdd>", line 1, in <module>
runfile('C:/Users/MAC/Documents/Python Scripts/untitled0.py', wdir='C:/Users/MAC/Documents/Python Scripts')
File "C:\Users\MAC\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\MAC\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/MAC/Documents/Python Scripts/untitled0.py", line 19, in <module>
train_set, valid_set, test_set = cPickle.load(f)
File "C:\Users\MAC\Anaconda\lib\gzip.py", line 268, in read
self._read(readsize)
File "C:\Users\MAC\Anaconda\lib\gzip.py", line 320, in _read
self._add_read_data( uncompress )
File "C:\Users\MAC\Anaconda\lib\gzip.py", line 338, in _add_read_data
self.extrabuf = self.extrabuf[offset:] + data
MemoryError
I really have no idea,is it because the memory of my computer is too small? it is on windows 7,32 bits
I suspect the problem to be Spyder in this case.
As to why, I have no idea but either the process isn't allowed to allocate enugh memory outside of it's own script or it simply gets stuck in a loop some how.
Try running your code without Spyder by pasting your code into myscript.py for instance and open a terminal and navigate to the folder where you saved your script and run python myscript.py and see if that works or gives the same output.
This is based on a conversation in the comments above.

Categories

Resources