Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have two 2d NumPy arrays. I need to mask array 2 on array 1 in such a way that, whatever non zero value present in array2 should get replaced in array1, but others should remain the same.
I need the function selective mask where I can input this two 2d arrays and it returns output.
eg:
array1=np.array([[1,2,3],[0,0,0],[3,3,3]])
array2=n.array([[0,0,0],[0,0,0],[7,8,9]])
result_array=selective_mask(array1,array2)
result array should be [[1,2,3],[0,0,0],[7,8,9]], as you can see only the elements 7,8,9 got switched as that was only the non zero elements in array2.
I came up with a solution.
I used np.where
np.where(array2!=0,array2,array1)
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 days ago.
Improve this question
I have a NumPy array with a shape of (893, 3). It is storing color data that I'm trying to convert into a file format that only support 255 unique colors. Is there a way to average the differences between the range of colors to produce 255 unique colors that are closest in value to the original 893?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What is the fastest method to write a function for time series calculation that counts consecutive values in the same series ? A For loop or vector
Here is what my data looks like:
enter image description here
You can use rolling function to calculate the sum of 4 consecutive hours
df.consumption4hr = df.Consumption.groupby(level='Accounts').rolling(window=4).sum()
with that you can just find the list of accounts that has 0 in that column. for example:
df[df.consumption4hr == 0].Accounts.unique()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is most likely a dumb question but being a beginner in Python/Numpy I will ask it anyways. I have come across a lot of posts on how to Normalize an array/matrix in numpy. But I am not sure about the WHY. Why/When does an array/matrix need to be normalized in numpy? When is it used?
Normalize can have multiple meanings in difference context. My question belongs to the field of Data Analytics/Data Science. What does Normalization mean in this context? Or more specifically in what situation should I normalize an array?
The second part to this question is - What are the different methods of Normalization and can they be used interchangeably in all situations?
The third and final part - can Normalization be used for Arrays of any dimensions?
Links to any reference material (for beginners) will be appreciated.
Consider trying to cluster objects with two numerical attributes A and B. Both are equally important. Attribute A can range from 0 to 1000 and attribute B can range from 0 to 5.
If you did not normalize A and B you would end up with attribute A completely overpowering attribute B when applying any standard distance metric.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a matrix of shape [1000,500], and I would like to normalize the matrix along the second dimension. Is the following implementation right?
def norm(x):
return (x - np.mean(x)) / (np.std(x) + 1e-7)
for row_id in range(datamatrix.shape[0]):
datamatrix[row_id,:] = norm(datamatrix[row_id,:])
Your implementation would indeed normalize along the row-axis (I'm not sure what you mean by second dimension as rows are usually the first dimension of matrices, and numpy starts with dimension 0). You don't need to include the colon as it's implicit that you want all the rows.
Do remember to use the float32 dtype in your datamatrix as opposed to a integer dtype as it doesn't do automatic typecasting.
A more efficient, or clean implementation might be to use sklearn.preprocessing.normalize.
But be aware that you're using standard score normalization which assumes your dataset is normally distributed.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have an array of values which I have generated starting from an observed value y1, and assuming that this value has a poissonian distribution:
array = np.random.poisson(np.real(y1), 10000)
What if I want to extract a random value from array, which is poissonian distributed, and hence has a "most probable value" that peaks at y1? How can I do that? Does it work by simple random extraction, or does it need something else to be specified?
EDIT: trying to be more specific. I have an array whose elements are Poisson distributed. If I want to randomly extract an element from that array, should I tell to the method about the array distribution, or it is not necessary?
I hope this will clarify a bit.
Just
import random
randval = random.choice(array)
should work fine for you. Once array is generated, in order to pick one at random of its items it does not matter any more by what process or according to what distribution array was populated in the first place.