numpy loading file error - python

I tried to load .npy file created by numpy:
import numpy as np
F = np.load('file.npy')
And numpy raises this error:
C:\Miniconda3\lib\site-packages\numpy\lib\npyio.py in load(file,
mmap_mode)
379 N = len(format.MAGIC_PREFIX)
380 magic = fid.read(N)
--> 381 fid.seek(-N, 1) # back-up
382 if magic.startswith(_ZIP_PREFIX):
383 # zip-file (assume .npz)
OSError: [Errno 22] Invalid argument
Could anyone explain me what its mean? How can I recover my file?

You are using a file object that does not support the seek method. Note that the file parameter of numpy.load must support the seek method. My guess is that you are perhaps operating on a file object that corresponds to another file object that has already been opened elsewhere and remains open:
>>> f = open('test.npy', 'wb') # file remains open after this line
>>> np.load('test.npy') # numpy now wants to use the same file
# but cannot apply `seek` to the file opened elsewhere
Traceback (most recent call last):
File "<pyshell#114>", line 1, in <module>
np.load('test.npy')
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 370, in load
fid.seek(-N, 1) # back-up
IOError: [Errno 22] Invalid argument
Note that I receive the same error as you did. If you have an open file object, you will want to close it before using np.load and before you use np.save to save your file object.

Related

Persisting a Large scipy.sparse.csr_matrix

I have a very large sparse scipy matrix. Attempting to use save_npz resulted in the following error:
>>> sp.save_npz('/projects/BIGmatrix.npz',W)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 716, in _savez
pickle_kwargs=pickle_kwargs)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/format.py", line 597, in write_array
array.tofile(fp)
OSError: 6257005295 requested and 3283815408 written
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/_matrix_io.py", line 78, in save_npz
np.savez_compressed(file, **arrays_dict)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 659, in savez_compressed
_savez(file, args, kwds, True)
File "/usr/local/lib/python3.5/dist-packages/numpy/lib/npyio.py", line 721, in _savez
raise IOError("Failed to write to %s: %s" % (tmpfile, exc))
OSError: Failed to write to /projects/BIGmatrix.npzg6ub_z3y-numpy.npy: 6257005295 requested and 3283815408 written
As such I wanted to try persisting it to postgres via psycopg2 but I haven't found a method of iterating over all nonzeros so I can persist them as rows in a table.
What is the best way to handle this task?
Save all the attributes in __dict__ of the matrix object, and recreate the csr_matrix when load:
from scipy import sparse
import numpy as np
a = np.zeros((1000, 2000))
a[np.random.randint(0, 1000, 100), np.random.randint(0, 2000, 100)] = np.random.randn(100)
b = sparse.csr_matrix(a)
np.savez("tmp", data=b.data, indices=b.indices, indptr=b.indptr, shape=np.array(b.shape))
f = np.load("tmp.npz")
b2 = sparse.csr_matrix((f["data"], f["indices"], f["indptr"]), shape=f["shape"])
(b != b2).sum()
It seems that the way things go is:
When you invoke scipy.sparse.save_npz(), by default it saves as a compressed file; however, in order to do so it first creates a temporary uncompressed version of the target file that it then compresses down to the final result. This means that whatever drive you save to needs to be large enough to accommodate the uncompressed temp file which in my case was 47G.
I re-tried the save in a larger drive and the process completed without incident.
Note: The compression can take quite a long time.

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.
You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)
Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

IOE error using dataframe.to_csv and dataframe.save to write data to files

I was trying to save dataframe for later use in pandas.
However, I had the error below.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/source/Linux/pkg/python-2.7.3/lib/python2.7/site-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/series.py", line 2881, in to_csv
encoding=encoding)
File "/source/Linux/pkg/python-2.7.3/lib/python2.7/site-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1393, in to_csv
formatter.save()
File "/source/Linux/pkg/python-2.7.3/lib/python2.7/site-packages/pandas-0.11.0-py2.7-linux-x86_64.egg/pandas/core/format.py", line 963, in save
f.close()
IOError: [Errno 5] Input/output error
dataframe.save fails even for a simple object a = DataFrame({'a':[1,3,4],'b':[3,4,5]}).
The save method is deprecated. You should us to_pickle instead. It looks like you're using pandas 0.11 which quite old. The latest version is 0.16.
You could also consider saving it to csv or HDF5.
http://pandas.pydata.org/pandas-docs/stable/io.html

Pickle: Reading a dictionary, EOFError

I recently found out about pickle, which is amazing. But it errors on me when used for my actual script, testing it with a one item dictionary it worked fine. My real script is thousands of lines of code storing various objects within maya into it. I do not know if it has anything to do with the size, I have read around a lot of threads here but none are specific to my error.
I have tried writing with all priorities. No luck.
This is my output code:
output = open('locatorsDump.pkl', 'wb')
pickle.dump(l.locators, output, -1)
output.close()
This is my read code:
jntdump = open('locatorsDump.pkl', 'rb')
test = pickle.load(jntdump)
jntdump.close()
This is the error:
# Error: Error in maya.utils._guiExceptHook:
# File "C:\Program Files\Autodesk\Maya2011\Python\lib\site-packages\pymel-1.0.0-py2.6.egg\maya\utils.py", line 277, in formatGuiException
# exceptionMsg = excLines[-1].split(':',1)[1].strip()
# IndexError: list index out of range
#
# Original exception was:
# Traceback (most recent call last):
# File "<maya console>", line 3, in <module>
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 1370, in load
# return Unpickler(file).load()
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 858, in load
# dispatch[key](self)
# File "C:\Program Files\Autodesk\Maya2011\bin\python26.zip\pickle.py", line 880, in load_eof
# raise EOFError
# EOFError #
Try using pickle.dumps() and pickle.loads() as a test.
If you don't recieve the same error, you know it is related to the file write.

NumPy and memmap: [Errno 24] Too many open files

I am working with large matrixes, so I am using NumPy's memmap. However, I am getting an error as apparently the file descriptors used by memmap are not being closed.
import numpy
import tempfile
counter = 0
while True:
temp_fd, temporary_filename = tempfile.mkstemp(suffix='.memmap')
map = numpy.memmap(temporary_filename, dtype=float, mode="w+", shape=1000)
counter += 1
print counter
map.close()
os.remove(temporary_filename)
From what I understand, the memmap file is closed when the method close() is called. However, the code above cannot loop forever, as it eventually throws the "[Errno 24] Too many open files" error:
1016
1017
1018
1019
Traceback (most recent call last):
File "./memmap_loop.py", line 11, in <module>
File "/usr/lib/python2.5/site-packages/numpy/core/memmap.py", line 226, in __new__
EnvironmentError: [Errno 24] Too many open files
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/apport_python_hook.py", line 38, in apport_excepthook
ImportError: No module named packaging_impl
Original exception was:
Traceback (most recent call last):
File "./memmap_loop.py", line 11, in <module>
File "/usr/lib/python2.5/site-packages/numpy/core/memmap.py", line 226, in __new__
EnvironmentError: [Errno 24] Too many open files
Does anybody know what I am overlooking?
Since the memmap does not take the open file descriptor, but the file name, I suppose you leak the temp_fd file descriptor. Does os.close(temp_fd) help?
Great that it works.
Since you can pass numpy.memmap a file-like object, you could create one from the file descriptor you already have, temp_fd.
fobj = os.fdopen(temp_fd, "w+")
numpy.memmap(fobj, ...

Categories

Resources