save large numpy array as .mat file

save large numpy array as .mat file - python

I'm struggling with this problem:
I've 2 large 2D numpy arrays (about 5 GB) and I want to save them in a .mat file loadable from Matlab
I tried scipy.io and wrote
from scipy.io import savemat
data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
long_field_names=False, do_compression=False, oned_as='row')
but I get the error: OverflowError: Python int too large to convert to C long
EDIT:
Python 3.8, Matlab 2017b
Here the traceback
a.shape (600,1048261) of type <class 'numpy.float64'>
b.shape (1048261) of type <class 'numpy.float64'>
data = {'A': a, 'B': b}
savemat('myfile.mat', data, appendmat=True, format='5',
long_field_names=False, do_compression=False, oned_as='row')
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-19-4d1d08a54148> in <module>
1 data = {'A': a, 'B': b}
----> 2 savemat('myfile.mat', data, appendmat=True, format='5',
3 long_field_names=False, do_compression=False, oned_as='row')
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio.py in savemat(file_name, mdict, appendmat, format, long_field_names, do_compression, oned_as)
277 else:
278 raise ValueError("Format should be '4' or '5'")
--> 279 MW.put_variables(mdict)
280
281
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in put_variables(self, mdict, write_header)
847 self.file_stream.write(out_str)
848 else: # not compressing
--> 849 self._matrix_writer.write_top(var, asbytes(name), is_global)
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_top(self, arr, name, is_global)
588 self._var_name = name
589 # write the header and data
--> 590 self.write(arr)
591
592 def write(self, arr):
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write(self, arr)
627 self.write_char(narr, codec)
628 else:
--> 629 self.write_numeric(narr)
630 self.update_matrix_tag(mat_tag_pos)
631
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_numeric(self, arr)
653 self.write_element(arr.imag)
654 else:
--> 655 self.write_element(arr)
656
657 def write_char(self, arr, codec='ascii'):
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_element(self, arr, mdtype)
494 self.write_smalldata_element(arr, mdtype, byte_count)
495 else:
--> 496 self.write_regular_element(arr, mdtype, byte_count)
497
498 def write_smalldata_element(self, arr, mdtype, byte_count):
~\miniconda3\envs\work\lib\site-packages\scipy\io\matlab\mio5.py in write_regular_element(self, arr, mdtype, byte_count)
508 tag = np.zeros((), NDT_TAG_FULL)
509 tag['mdtype'] = mdtype
--> 510 tag['byte_count'] = byte_count
511 self.write_bytes(tag)
512 self.write_bytes(arr)
OverflowError: Python int too large to convert to C long
I tried also with hdf5storage
hdf5storage.write(data, 'myfile.mat', matlab_compatible=True)
but it fails too.
EDIT:
gives this warning
\miniconda3\envs\work\lib\site-packages\hdf5storage\__init__.py:1306:
H5pyDeprecationWarning: The default file mode will change to 'r' (read-only)
in h5py 3.0. To suppress this warning, pass the mode you need to
h5py.File(), or set the global default h5.get_config().default_file_mode, or
set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are:
'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
f = h5py.File(filename)
Anyway, it creates a 5GB file but when I load it in Matlab I get a variable named with the file path and apparently without data.
Lastly I tried with h5py:
import h5py
hf = h5py.File('C:/Users/flavio/Desktop/STRA-pattern.mat', 'w')
hf.create_dataset('A', data=a)
hf.create_dataset('B', data=b)
hf.close()
but the output file in not recognized/readable in Matlab.
Is splitting the only solution? Hope there is a better way to fix this issue.

Anyone still looking for an answer, this works with hdf5storage
hdf5storage.savemat(
save_path,
data_dict,
format=7.3,
matlab_compatible=True,
compress=False
)

Related

list out of range when using embeddings

I have the following list:
list1=[['brute-force',
'password-guessing',
'password-guessing',
'default-credentials',
'shell'],
['malware',
'ddos',
'phishing',
'spam',
'botnet',
'cryptojacking',
'xss',
'sqli',
'vulnerability'],
['sensitive-information']]
I am trying the example from here enter link description here
However when I am fitting my list to get the embeddings :
embeddings1 =sbert_model.encode(list1, convert_to_tensor=True)
I get the embeding i get the following error:
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_16484/3954167634.py in <module>
----> 1 embeddings2 = sbert_model.encode(list3, convert_to_tensor=True)
~\anaconda3\envs\tensorflow_env\lib\site-packages\sentence_transformers\SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
159 for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
160 sentences_batch = sentences_sorted[start_index:start_index+batch_size]
--> 161 features = self.tokenize(sentences_batch)
162 features = batch_to_device(features, device)
163
~\anaconda3\envs\tensorflow_env\lib\site-packages\sentence_transformers\SentenceTransformer.py in tokenize(self, texts)
317 Tokenizes the texts
318 """
--> 319 return self._first_module().tokenize(texts)
320
321 def get_sentence_features(self, *features):
~\anaconda3\envs\tensorflow_env\lib\site-packages\sentence_transformers\models\Transformer.py in tokenize(self, texts)
101 for text_tuple in texts:
102 batch1.append(text_tuple[0])
--> 103 batch2.append(text_tuple[1])
104 to_tokenize = [batch1, batch2]
105
IndexError: list index out of range
I am understanding how lists work and I have read many asnwers to same problem in here but i cannot fiqure out why is going out of range.
Any ideas?

You need to flatten your input nested list first.
from nltk import flatten
flattened_list1 = flatten(list1)
embeddings1 = sbert_model.encode(flattened_list1, convert_to_tensor=True)

How can I get a single line out?

eni=soleloge.split(" ")
res = [int(sub.split(':')[1]) for sub in yeni]
there is only one line up here
solmatris=numpy.array(res)
if solmatris.size>0:
print(solmatris)
f.write(solmatrisStr)
here the output is as follows:
[ 835 732 474 519 831 834 847 852 841 834 801
-9344
-3660 13808 1648 -463 86]
I want to out:
[ 835 732 474 519 831 834 847 852 841 834 801 -9344 -3660 13808 1648 -463 86 ]
After converting to matrix, I print out to notebook. I don't use console.
https://prnt.sc/10c0m85 I don't want it to be like this.
I apologize for not expressing my problem clearly at first.
Why does it give such a printout and how can I fix it.

Remove the numpy.array type cast and instead try this print(', '.join(str(r) for r in res)).

numpy's representation of arrays has an automatic wrap-around that is not based on the window size. You could do print(*solmatris) or print(list(solmatris)) instead to get it on a single line.

Trouble minimizing a value in python

I'm trying to minimize a value dependant of a function (and therefore optimize the arguments of the function) so the latter matches some experimental data.
Problem is that I don't actually know if I'm coding what I want correctly, or even if I'm using the correct function, because my program gives me an error.
import scipy.optimize as op
prac3 = pd.read_excel('Buena.xlsx', sheetname='nl1')
print(prac3.columns)
tmed = 176
te = np.array(prac3['tempo'])
t = te[0:249]
K = np.array(prac3['cond'])
Kexp = K[0:249]
Kinf = 47.8
K0 = 3.02
DK = Kinf - K0
def f(Kinf,DK,k,t):
return (Kinf-DK*np.exp(-k*t))
def err(Kexp,Kcal):
return ((Kcal-Kexp)**2)
Kcal = np.array(f(Kinf,DK,k,t))
print(Kcal)
dif = np.array(err(Kexp,Kcal))
sumd = sum(dif)
print(sumd)
op.minimize(f, (Kinf,DK,k,t))
The error the program gives me reads as it follows:
ValueError Traceback (most recent call last)
<ipython-input-91-fd51b4735eed> in <module>()
48 print(sumd)
49
---> 50 op.minimize(f, (Kinf,DK,k,t))
51
52
~/anaconda3_501/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
352
353 """
--> 354 x0 = np.asarray(x0)
355 if x0.dtype.kind in np.typecodes["AllInteger"]:
356 x0 = np.asarray(x0, dtype=float)
~/anaconda3_501/lib/python3.6/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
ValueError: setting an array element with a sequence.

The exception says that you're passing an array to something that expects a callable. Without seeing your traceback or knowing more of what you're trying to do, I can only guess where this is happening, but my guess is here:
op.minimize(f(Kinf,DK,k,t),sumd)
From the docs, the first parameter is a callable (function). But you're passing whatever f(Kinf,DK,k,t) returns as the first argument. And, looking at your f function, it looks like it's returning an array, not a function.
My first guess is that you want to minimize f over the args (Kinf, DK, k, t)? if so, you pass f as the function, and the tuple (Kinf, DK, k, t) as the args, like this:
op.minimize(f, sumd, (Kinf,DK,k,t))

Unit test fails just changing from Python 2.6.5 to Python 2.7.3; Decimal-related

All my unit tests succeed running in Python 2.6.5; one fails when I run through Python 2.7.3. The code being tested is complex and involves lots of working in floats and converting to Decimal along the way, by converting to str first as was needed in Python 2.6.
Before I start digging, I was wondering if I could be a bit lazy and see if someone has seen this before and has suggestions on what to search for. Here's the result of the test run:
======================================================================
FAIL: test_hor_tpost_winsize_inside_mm (__main__.Test_ShutterCalculator)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_ShutterCalculator.py", line 506, in test_hor_tpost_winsize_inside_mm
self.assertEqual(o.net_width_closing_tolerance, Decimal("6.4"))
AssertionError: Decimal('6.3') != Decimal('6.4')
----------------------------------------------------------------------
Here's the unit test code for test_hor_tpost_winsize_inside_mm():
490 def test_hor_tpost_winsize_inside_mm(self):
491 """
492 Same as above but test mm
493 """
494 o = self.opening
495 o.unit_of_measure = "millimeters"
496 o.formula_mode = "winsize"
497 o.mount = "inside"
498 o.given_width = Decimal("1117.6")
499 o.given_height = Decimal("2365.4")
500 o.louver_spacing = Decimal("101.6")
501 self.make4SidedFrame("9613", '9613: 2-1/2" Face Deco Z', Decimal("63.5"), Decimal("19.1"))
502 so1 = o.subopenings[(0,0)]
503 so1.fold_left = 1
504 so1.fold_right = 1
505 self.calc()
506 self.assertEqual(o.net_width_closing_tolerance, Decimal("6.4"))
507 self.assertEqual(o.net_height_closing_tolerance, Decimal("6.4"))
508 self.assertEqual(o.horizontal_shim, Decimal(".125")) # in inches
509 self.assertEqual(o.vertical_shim, Decimal(".125")) # in inches
510 self.assertEqual(o.width, Decimal("1069.8")) ## 1070 converted directly from inches
511 self.assertEqual(o.height, Decimal("2317.6")) ## 2317.8 converted directy from inches
512 tpost = o.add_hor_tpost()
513 so2 = o.subopenings[(0,1)]
514 so2.fold_left = 1
515 so2.fold_right = 1
516 self.calc()
517 #self.cs()
518 self.assertEqual(o.net_width_closing_tolerance, Decimal("6.4"))
519 self.assertEqual(o.net_height_closing_tolerance, Decimal("12.7"))
520 self.assertEqual(o.horizontal_shim, Decimal(".125")) # in inches
521 self.assertEqual(o.vertical_shim, Decimal(".125")) # in inches
522 self.assertEqual(o.width, Decimal("1069.8")) ## Rick had 42 but agreed that mine is right
523 self.assertEqual(o.height, Decimal("2311.3"))
524 self.assertEqual(so1.width, Decimal("1069.8"))
525 self.assertEqual(so2.width, Decimal("1069.8"))
526 self.assertEqual(so1.height, Decimal("1139.7")) ## Rick had 44.8125 but agreed mine is right
527 self.assertEqual(so2.height, Decimal("1139.7"))
528 self.assertEqual(tpost.center_pos, Decimal("1182.7"))
529 top_panel_section = so1.panels[0].sections[(0,0)]
530 bottom_panel_section = so2.panels[0].sections[(0,0)]
531 self.assertEqual(top_panel_section.louver_count, 9)
532 self.assertEqual(bottom_panel_section.louver_count, 9)
533 self.assertEqual(top_panel_section.top_rail.width, Decimal("112.6")) ## Rick had 4.40625, but given the changes to net
534 self.assertEqual(bottom_panel_section.bottom_rail.width, Decimal("112.7"))
535 self.assertEqual(top_panel_section.bottom_rail.width, Decimal("112.7"))
536 self.assertEqual(bottom_panel_section.top_rail.width, Decimal("112.6"))
Any hint on what to search for in my code to find the source of the discrepancy?

Python 2.7 introduced changes to the Decimal class and float type to help improve accuracy when converting from strings. This could be the source of the change.
Conversions between floating-point numbers and strings are now correctly rounded on most platforms. These conversions occur in many different places: str() on floats and complex numbers; the float and complex constructors; numeric formatting; serializing and deserializing floats and complex numbers using the marshal, pickle and json modules; parsing of float and imaginary literals in Python code; and Decimal-to-float conversion.
You can see the change details here, under "Other language changes"

Unable to load a previously dumped pickle file of large size in Python

I used cPickle and protocol version 2 to dump some computation results. The code looks like this:
> f = open('foo.pck', 'w')
> cPickle.dump(var, f, protocol=2)
> f.close()
The variable var is a tuple of length two. The type of var[0] is a list and var[1] is a numpy.ndarray.
The above code segment successfully generated a file with large size (~1.7G).
However, when I tried to load the variable from foo.pck, I got the following error.
ValueError Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-fd3ecce18dcd> in <module>()
----> 1 v = cPickle.load(f)
ValueError: buffer size does not match array size
The loading codes looks like the following.
> f= open('foo.pck', 'r')
> v = cPickle.load(f)
I also tried to use pickle (instead of cPickle) to load the variable, but got a similar error msg as follows.
ValueError Traceback (most recent call last)
/home/user_account/tmp/<ipython-input-3-aa6586c8e4bf> in <module>()
----> 1 v = pickle.load(f)
/usr/lib64/python2.6/pickle.pyc in load(file)
1368
1369 def load(file):
-> 1370 return Unpickler(file).load()
1371
1372 def loads(str):
/usr/lib64/python2.6/pickle.pyc in load(self)
856 while 1:
857 key = read(1)
--> 858 dispatch[key](self)
859 except _Stop, stopinst:
860 return stopinst.value
/usr/lib64/python2.6/pickle.pyc in load_build(self)
1215 setstate = getattr(inst, "__setstate__", None)
1216 if setstate:
-> 1217 setstate(state)
1218 return
1219 slotstate = None
ValueError: buffer size does not match array size
I tried the same code segments to a much smaller size data and it worked fine. So my best guess is that I reached the loading size limitation of pickle (or cPickle). However, it is strange to dump successfully (with large size variable) but failed to load.
If this is indeed a loading size limitation problem, how should I bypass it? If not, what can be the possible cause of the problem?
Any suggestion is appreciated. Thanks!

How about save & load the numpy array by numpy.save() & np.load()?
You can save the pickled list and the numpy array to the same file:
import numpy as np
import cPickle
data = np.random.rand(50000000)
f = open('foo.pck', 'wb')
cPickle.dump([1,2,3], f, protocol=2)
np.save(f, data)
f.close()
to read the data:
import cPickle
import numpy as np
f= open('foo.pck', 'rb')
v = cPickle.load(f)
data = np.load(f)
print data.shape, data

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

save large numpy array as .mat file - python

Anyone still looking for an answer, this works with hdf5storage hdf5storage.savemat( save_path, data_dict, format=7.3, matlab_compatible=True, compress=False )

Related

list out of range when using embeddings

How can I get a single line out?

Trouble minimizing a value in python

Unit test fails just changing from Python 2.6.5 to Python 2.7.3; Decimal-related

Unable to load a previously dumped pickle file of large size in Python

Categories

Resources