I have this problem where I would prefer not using loops because I am working with a big data as a solution :
This is what I am trying to do : (I know this works to [6 6 6], but I want to "join" it by index)
import numpy as np
np_1 = np.asarray([1,1,1])
np_2 = np.asarray([2,2,2])
np_3 = np.asarray([3,3,3])
np_4 = np_1 + np_2 + np_3
# np_4 should be [1,2,3,1,2,3,1,2,3]
Are there ways to do this? or should I look for options outside of numpy?
Try this:
np.array([np_1, np_2, np_3]).transpose().flatten()
You can try the following method:
np.ravel(np.stack(np_1,np_2,np3),'F')
One way to do it is to stack the sequences depth-wise and flatten it:
np.dstack([np_1, np_2, np_3]).flatten()
Related
I am using numpy arrays aside from pandas for speed purposes. However, I am unable to advance my codes using broadcasting, indexing etc. Instead, I am using loop in loops as below. It is working but seems so ugly and inefficient to me.
Basically what I am doing is, I am trying to imitate groupby of pandas at the step mydata[mydata[:,1]==i]. You may consider it as a firm id number. Then with respect to the lookup data, I am checking if it is inside the selected firm or not at the step all(np.isin(lookup[u],d[:,3])). But as I denoted at the beginning, I feel so uncomfortable about this.
out = []
for i in np.unique(mydata[:,1]):
d = mydata[mydata[:,1]==i]
for u in range(0,len(lookup)):
control = all(np.isin(lookup[u],d[:,3]))
if(control):
out.append(d[np.isin(d[:,3],lookup[u])])
It takes about 0.27 seconds. However there must exist some clever alternatives.
I also tried Numba jit() but it does not work.
Could anyone help me about that?
Thanks in advance!
Fake Data:
a = np.repeat(np.arange(100)+5000, np.random.randint(50, 100, 100))
b = np.random.randint(100,200,len(a))
c = np.random.randint(10,70,len(a))
index = np.arange(len(a))
mydata = np.vstack((index,a, b,c)).T
lookup = []
for i in range(0,60):
lookup.append(np.random.randint(10,70,np.random.randint(3,6,1) ))
I had some problems getting the goal of your Program, but I got a decent performance improvement, by refactoring your second for loop. I was able to compress your code to 3 or 4 lines.
f = (
lambda lookup: out1.append(d[np.isin(d[:, 3], lookup)])
if all(np.isin(lookup, d[:, 3]))
else None
)
out = []
for i in np.unique(mydata[:, 1]):
d = mydata[mydata[:, 1] == i]
list(map(f, lookups))
This resolves to the same output list you received previously and the code runs almost twice as quick (at least on my machine).
a simple code:
suka = pd.Series(range(10))
padla =np.argwhere(suka % 4==0)
get the error Length of passed values is 1, index implies 10. Why the machine can't return the requested indices? Thank you.
The fundamental issues are that the semantics of an array and a DataFrame is significantly different (and hence the return of np.argwhere shouldn't be boxed) and numpy only passing context for ufuncs (hence we don't know np.argwhere is the function calling array_wrap)
This is an issue with the new release of pandas occur in Pandas 1.0.1 and later versions.
Try np.flatnonzero() insted of np.argwhere()
Code is :
suka = pd.Series(range(10))
padla =np.flatnonzero(suka % 4==0)
for more details go to https://github.com/numpy/numpy/issues/15555 and https://github.com/pandas-dev/pandas/pull/35334
IF you're using pandas the way to identify args is index, which is separate to their actual order, so pandas approach would be to define padla as:
padla = (suka % 4 == 0)
padla = padla.loc[padla].index
Roughly speaking equivalent for numpy will be:
padla = np.argwhere((suka % 4 == 0).values)
I have a question, under a specific variable that for semplicitity we call a, I have the following arrays written in this way.
[-6.396736847188359, -6.154559100742114, -6.211476547612676]
[-8.006589632001111, -7.826171257487284, -7.71335303949824]
[-6.456557174187878, -6.262447971939394, -6.38657184063457]
[-7.487923068341583, -7.189375715312779, -7.252991999097159]
[-7.532980499994895, -7.44329050097094, -7.529773039725542]
[-7.429923219897081, -6.960840780894108, -7.173489030350187]
[-7.194082458487091, -6.909676564074833, -6.944666159195248]
[-7.734357883680035, -7.512036612219159, -7.607808831503251]
[-7.734008421702387, -7.164880777772352, -7.709697714174302]
[-8.3156235828106, -8.486948182913475, -8.612390113851397]
How can I apply the scipy formula logsumexp to each column? I tried to use the logsumexp(a[0]) but it doesn't work also I try to iterate over a[0] but i got the error about flot64.
Thanks to all
Use the axis parameter: logsumexp(a, axis=?), where ? Is 0 or 1
I would like to invert a bunch of tensors in a list using cholesky decomposition in tensorflow 2, but the resulting code is quite ugly. is there any elegant / more pythonic way to do something like this :
iMps = []
for Mp in Mps :
cholMp = tf.linalg.cholesky(Mp)
icholMp = tf.linalg.inv(cholMp)
iMp = tf.tensordot(tf.transpose(icholMp),icholMp)
iMps.append(iMp)
is it possible to replace for loop with other stuff ?, Mps is list of tensors with different size (can i represent it as something else?). is there any way to make it more elegant ?
You can achieve this using python Map function.
I have modified your code to create Map function like below.
def inverse_tensors(Mp):
cholMp = tf.linalg.cholesky(Mp)
icholMp = tf.linalg.inv(cholMp)
iMp = tf.tensordot(tf.transpose(icholMp),icholMp,axes=0)
return iMp
iMps = list(map(inverse_tensors,list_tensors))
Hope this answers your question, Happy Learning!
Hi I am trying to vectorise the QR decomposition in numpy as the documentation suggests here, however I keep getting dimension issues. I am confused as to what I am doing wrong as I believe the following follows the documentation. Does anyone know what is wrong with this:
import numpy as np
X = np.random.randn(100,50,50)
vecQR = np.vectorize(np.linalg.qr)
vecQR(X)
From the doc: "By default, pyfunc is assumed to take scalars as input and output.".
So you need to give it a signature:
vecQR = np.vectorize(np.linalg.qr, signature='(m,n)->(m,p),(p,n)')
How about just map np.linalg.qr to the 1st axis of the arr?:
In [35]: np.array(list(map(np.linalg.qr, X)))
Out[35]:
array([[[[-3.30595447e-01, -2.06613421e-02, 2.50135751e-01, ...,
2.45828025e-02, 9.29150994e-02, -5.02663489e-02],
[-1.04193390e-01, -1.95327811e-02, 1.54158438e-02, ...,
2.62127499e-01, -2.21480958e-02, 1.94813279e-01],
[ 1.62712767e-01, -1.28304663e-01, -1.50172509e-01, ...,
1.73740906e-01, 1.31272690e-01, -2.47868876e-01]