Hi I am trying to vectorise the QR decomposition in numpy as the documentation suggests here, however I keep getting dimension issues. I am confused as to what I am doing wrong as I believe the following follows the documentation. Does anyone know what is wrong with this:
import numpy as np
X = np.random.randn(100,50,50)
vecQR = np.vectorize(np.linalg.qr)
vecQR(X)
From the doc: "By default, pyfunc is assumed to take scalars as input and output.".
So you need to give it a signature:
vecQR = np.vectorize(np.linalg.qr, signature='(m,n)->(m,p),(p,n)')
How about just map np.linalg.qr to the 1st axis of the arr?:
In [35]: np.array(list(map(np.linalg.qr, X)))
Out[35]:
array([[[[-3.30595447e-01, -2.06613421e-02, 2.50135751e-01, ...,
2.45828025e-02, 9.29150994e-02, -5.02663489e-02],
[-1.04193390e-01, -1.95327811e-02, 1.54158438e-02, ...,
2.62127499e-01, -2.21480958e-02, 1.94813279e-01],
[ 1.62712767e-01, -1.28304663e-01, -1.50172509e-01, ...,
1.73740906e-01, 1.31272690e-01, -2.47868876e-01]
Related
I have this problem where I would prefer not using loops because I am working with a big data as a solution :
This is what I am trying to do : (I know this works to [6 6 6], but I want to "join" it by index)
import numpy as np
np_1 = np.asarray([1,1,1])
np_2 = np.asarray([2,2,2])
np_3 = np.asarray([3,3,3])
np_4 = np_1 + np_2 + np_3
# np_4 should be [1,2,3,1,2,3,1,2,3]
Are there ways to do this? or should I look for options outside of numpy?
Try this:
np.array([np_1, np_2, np_3]).transpose().flatten()
You can try the following method:
np.ravel(np.stack(np_1,np_2,np3),'F')
One way to do it is to stack the sequences depth-wise and flatten it:
np.dstack([np_1, np_2, np_3]).flatten()
I have a list of n matrices where n = 5:
[matrix([[3.62425112, 0.00953506],
[0.00953506, 1.05054417]]), matrix([[4.15808905e+00, 9.27845937e-04],
[9.27845937e-04, 9.88509628e-01]]), matrix([[3.90560856, 0.0504297 ],
[0.0504297 , 0.92587046]]), matrix([[ 3.87347073, -0.12430547],
[-0.12430547, 1.09071475]]), matrix([[ 3.87697392, -0.00475038],
[-0.00475038, 1.01439917]])]
I want to do element-wise addition of these matrices:
I am trying this:
np.add(S_list[0], S_list[1], S_list[2], S_list[3], S_list[4])
It works but I don't want to fix n = 5
Can anyone please help? Thank you.
by the documentation, np.add should add only two matrices.
However np.add.reduce(S_list) or just sum(S_list) will give you what you want.
You could just use Python's built-in function sum
sum(S_list)
Output:
[[19.43839338 -0.06816324]
[-0.06816324 5.07003818]]
Are you sure that np.add(S_list[0], S_list[1], S_list[2], S_list[3], S_list[4]) works ? Because np.add() takes as input arguments two arrays . Anyway , the following code does the work if you want to use np.add():
sum = np.add(S_list[0],S_list[1])
for i in range(len(S_list) - 2):
sum = np.add(sum,S_list[i+2])
print(sum)
I'm trying to implement some algorithm on python. For the sake of documentation and clear understanding of the flow details I use sympy. As it turned out it fails on computation of an inverted float matrix.
So I'm getting
TypeError Traceback (most recent call last)
<ipython-input-20-c2193b2ae217> in <module>()
10 np.linalg.inv(xx)
11 symInv = lambdify(X0,X0Inv)
---> 12 symInv(xx)
/opt/anaconda3/lib/python3.6/site-packages/numpy/__init__.py in <lambda>(X0)
TypeError: ufunc 'bitwise_xor' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
If the matrix is integer it works fine:
import numpy as np
from sympy import *
init_printing()
X0 = MatrixSymbol('X0',2,2)
xx = np.random.rand(4,4)
#xx = np.random.randint(10,size=(4,4)) # this line makes it workable
X0Inv = X0**-1
np.linalg.inv(xx)
symInv = lambdify(X0,X0Inv)
symInv(xx)
Link to a live version of the code
If anybody knows any workaround it would be great if you could share. Thanks in advance.
UPDATE. As it is pointed out by #hpaulj and #tel the issue is how lambdify translates ** to numpy code for matrix symbols: by some reason it tries to XOR elements. I will try to find an easy way to alter this behavior. Any help/hints are appreciated.
As hpaulj points out, the error seems to stem from a conversion of ** to ^ that happens in lambdify, for some reason.
You can fix the error that you're getting by using np.power instead of **:
import numpy as np
from sympy import MatrixSymbol, lambdify
X0 = MatrixSymbol('X0',2,2)
xx = np.random.rand(4,4)
X0Inv = np.power(X0, -1)
symInv = lambdify(X0,X0Inv)
print('matrix xx')
print(xx, end='\n\n')
print('result of symInv(xx)')
print(symInv(xx), end='\n\n')
Output:
matrix xx
[[0.4514882 0.84588859 0.02431252 0.25468078]
[0.46767727 0.85748153 0.51207567 0.59636962]
[0.84557537 0.38459205 0.76814414 0.96624407]
[0.0933803 0.43467119 0.77823338 0.58770188]]
result of symInv(xx)
[[2.214897321138516, 1.1821887747951494], [2.1382266426713077, 1.1662058776397513]]
However, as you have it set up symInv doesn't produce the matrix inverse, but instead only does the element-wise exponentiation of each value in xx. In other words, symInv(xx)[i,j] == xx[i,j]**-1. This code shows the difference between element-wise exponentiation and the true inverse.
print('result of xx**-1')
print(xx**-1, end='\n\n')
print('result of np.linalg.inv(xx)')
print(np.linalg.inv(xx))
Output:
result of xx**-1
[[ 2.21489732 1.18218877 41.13107402 3.92648394]
[ 2.13822664 1.16620588 1.95283638 1.67681243]
[ 1.18262669 2.60015778 1.301839 1.0349352 ]
[10.7088969 2.30058954 1.28496159 1.70154295]]
result of np.linalg.inv(xx)
[[-118.7558445 171.37619558 -20.37188041 -88.94733652]
[ -0.56274492 2.49107626 -1.00812489 -0.62648633]
[-160.35674704 230.3266324 -28.87548299 -116.75862026]
[ 231.62940572 -334.07044947 42.21936405 170.90926978]]
Edit: workaround
I'm 95% sure that what you've run into is a bug in the Sympy code. It seems that X0^-1 was valid syntax for Sympy Matrix objects at some point, but no longer. However, it seems that someone forgot to tell that to whomever maintains the lambdify code, since it still translates every matrix exponentiation into the carrot ^ syntax.
So what you should do is submit an issue on the Sympy github. Just post your code and the error it produces, and ask if that's the intended behavior. In the meantime, here's a filthy hack to work around the problem:
import numpy as np
from sympy import MatrixSymbol, lambdify
class XormulArray(np.ndarray):
def __new__(cls, input_array):
return np.asarray(input_array).view(cls)
def __xor__(self, other):
return np.linalg.matrix_power(self, other)
X0 = MatrixSymbol('X0',2,2)
xx = np.random.rand(4,4)
X0Inv = X0.inv()
symInv = lambdify(X0,X0Inv,'numpy')
print('result of symInv(XormulArray(xx))')
print(symInv(XormulArray(xx)), end='\n\n')
print('result of np.linalg.inv(xx)')
print(np.linalg.inv(xx))
Output:
result of symInv(XormulArray(xx))
[[ 3.50382881 -3.84573344 3.29173896 -2.01224981]
[-1.88719742 1.86688465 0.3277883 0.0319487 ]
[-3.77627792 4.30823019 -5.53247103 5.53412775]
[ 3.89620805 -3.30073088 4.27921307 -4.68944191]]
result of np.linalg.inv(xx)
[[ 3.50382881 -3.84573344 3.29173896 -2.01224981]
[-1.88719742 1.86688465 0.3277883 0.0319487 ]
[-3.77627792 4.30823019 -5.53247103 5.53412775]
[ 3.89620805 -3.30073088 4.27921307 -4.68944191]]
Basically, you'll have to cast all of your arrays to the thin wrapper type XormulArray right before you pass them into symInv. This hack is not best practice for a bunch of reasons (including the fact that it apparently breaks the (2,2) shape restriction you placed on X0), but it'll probably be the best you can do until the Sympy codebase is fixed.
I need to organized a data file with chunks of named data. Data is NUMPY arrays. But I don't want to use numpy.save or numpy.savez function, because in some cases, data have to be sent on a server over a pipe or other interface. So I want to dump numpy array into memory, zip it, and then, send it into a server.
I've tried simple pickle, like this:
try:
import cPickle as pkl
except:
import pickle as pkl
import ziplib
import numpy as np
def send_to_db(data, compress=5):
send( zlib.compress(pkl.dumps(data),compress) )
.. but this is extremely slow process.
Even with compress level 0 (without compression), the process is very slow and just because of pickling.
Is there any way to dump numpy array into string without pickle? I know that numpy allows to get buffer numpy.getbuffer, but it isn't obvious to me, how to use this dumped buffer to obtaine an array back.
You should definitely use numpy.save, you can still do it in-memory:
>>> import io
>>> import numpy as np
>>> import zlib
>>> f = io.BytesIO()
>>> arr = np.random.rand(100, 100)
>>> np.save(f, arr)
>>> compressed = zlib.compress(f.getbuffer())
And to decompress, reverse the process:
>>> np.load(io.BytesIO(zlib.decompress(compressed)))
array([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,
0.9174782 , 0.48671767],
[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,
0.59098735, 0.22716425],
[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,
0.05066641, 0.10364143],
...,
[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728 ,
0.96160519, 0.69746633],
[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,
0.46586237, 0.10949621],
[ 0.8075795 , 0.70107856, 0.81389246, ..., 0.92068768,
0.38013495, 0.21489793]])
>>>
Which, as you can see, matches what we saved earlier:
>>> arr
array([[ 0.80881898, 0.50553303, 0.03859795, ..., 0.05850996,
0.9174782 , 0.48671767],
[ 0.79715979, 0.81465744, 0.93529834, ..., 0.53577085,
0.59098735, 0.22716425],
[ 0.49570713, 0.09599001, 0.74023709, ..., 0.85172897,
0.05066641, 0.10364143],
...,
[ 0.89720137, 0.60616688, 0.62966729, ..., 0.6206728 ,
0.96160519, 0.69746633],
[ 0.59276237, 0.71586014, 0.35959289, ..., 0.46977027,
0.46586237, 0.10949621],
[ 0.8075795 , 0.70107856, 0.81389246, ..., 0.92068768,
0.38013495, 0.21489793]])
>>>
THe default pickle method provides a pure ascii output. To get (much) better performance, use the latest version available. Versions 2 and above are binary and, if memory serves me right, allows numpy arrays to dump their buffer directly into the stream without addtional operations.
To select version to use, add the optional argument while pickling (no need to specify it while unpickling), for instance pkl.dumps(data, 2).
To pick the latest possible version, use pkl.dumps(data, -1)
Note that if you use different python versions, you need to specify the lowest supported version.
See Pickle documentation for details on the different versions
There is a method tobytes which, according to my benchmarks is faster than other alternatives.
Take with a grain of salt, as some of my experiments may be misguided or plainly wrong, but it is a method of dumping numpy array into strings.
Keep in mind that you will need to have some additional data out of band, mainly the data type of the array and also its shape. That may be a deal breaker or it may not be rellevant. It's easy to recover the original shape by calling .fromstring(..., dtype=...).reshape(...).
Edit: A maybe incomplete example
##############
# Generation #
##############
import numpy as np
arr = np.random.randint(1, 7, (4,6))
arr_dtype = arr.dtype.str
arr_shape = arr.shape
arr_data = arr.tobytes()
# Now send / store arr_dtype, arr_shape, arr_data, where:
# arr_dtype is string
# arr_shape is tuple of integers
# arr_data is bytes
############
# Recovery #
############
arr = np.frombuffer(arr_data, dtype=arr_dtype).reshape(arr_shape)
I am not considering the column/row ordering, because I know that numpy supports things about that but I have never used it. If you want to support / need to have the memory arranged in a specific fashion --regarding row/column for multidimensional arrays-- you may need to take that into account at some point.
Also: frombuffer doesn't copy the buffer data, it creates the numpy structure as a view (maybe not exactly that, but you know what I mean). If that's undesired behaviour you can use fromstring (which is deprecated but seems to work on 1.19) or use frombuffer followed by a np.copy.
I have got two dask arrays i.e., a and b. I get dot product of a and b as below
>>>z2 = da.from_array(a.dot(b),chunks=1)
>>> z2
dask.array<from-ar..., shape=(3, 3), dtype=int32, chunksize=(1, 1)>
But when i do
sigmoid(z2)
Shell stops working. I can't even kill it.
Sigmoid is given as below:
def sigmoid(z):
return 1/(1+np.exp(-z))
When working with Dask Arrays, it is normally best to use functions provided in dask.array. The problem with using NumPy functions directly is they will pull of the data from the Dask Array into memory, which could be the cause of the shell freezing that you experienced. The functions provided in dask.array are designed to avoid this by lazily chaining computations until you wish to evaluate them. In this case, it would be better to use da.exp instead of np.exp. Provided an example of this below.
Have provided a modified version of your code to demonstrate how this would be done. In the example I have called .compute(), which also pulls the full result into memory. It is possible that this could also cause issues for you if your data is very large. Hence I have demonstrated taking a small slice of the data before calling compute to keep the result small and memory friendly. If your data is large and you wish to keep the full result, would recommend storing it to disk instead.
Hope this helps.
In [1]: import dask.array as da
In [2]: def sigmoid(z):
...: return 1 / (1 + da.exp(-z))
...:
In [3]: d = da.random.uniform(-6, 6, (100, 110), chunks=(10, 11))
In [4]: ds = sigmoid(d)
In [5]: ds[:5, :6].compute()
Out[5]:
array([[ 0.0067856 , 0.31701817, 0.43301395, 0.23188129, 0.01530903,
0.34420555],
[ 0.24473798, 0.99594466, 0.9942868 , 0.9947099 , 0.98266004,
0.99717379],
[ 0.92617922, 0.17548207, 0.98363658, 0.01764361, 0.74843615,
0.04628735],
[ 0.99155315, 0.99447542, 0.99483032, 0.00380505, 0.0435369 ,
0.01208241],
[ 0.99640952, 0.99703901, 0.69332886, 0.97541982, 0.05356214,
0.1869447 ]])
Got it... I tried and it worked!
ans = z2.map_blocks(sigmoid)