How to change a certain count of numpy matrix elements? - python

I have a square numpy 2D matrix.
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
And I need to set a certain count of random matrix values to 0. Let's say it is 5 elements. That means any 5 from 16 matrix values must be set to 0. For example new matrix could be
2 2 0 0
0 2 2 2
2 2 2 2
0 2 0 2
or
2 0 2 2
2 2 0 2
2 2 0 2
0 2 2 0
or some else.
How could I do this efficient way?

This will do it:
import random
arr1d = arr.ravel()
randidx = random.sample(range(len(arr1d)), 5)
arr1d[randidx] = 0
This modifies arr because ravel() returns a view, not a copy.
For more on how the random numbers can be generated, see: Non-repetitive random number in numpy

lets say your matrix is "matrix"
import random
for i in range(5):
random1=random.randint(0,size_x_ofmatrix)
random2=random.randint(0,size_y_ofmatrix)
matrix[random1,random2]=0

Related

list index out of range in calculation of nodal distance

I am working on a small task in which I have to find the distance between two nodes. Each node has X and Y coordinates which can be seen below.
node_number X_coordinate Y_coordinate
0 0 1 0
1 1 1 1
2 2 1 2
3 3 1 3
4 4 0 3
5 5 0 4
6 6 1 4
7 7 2 4
8 8 3 4
9 9 4 4
10 10 4 3
11 11 3 3
12 12 2 3
13 13 2 2
14 14 2 1
15 15 2 0
For the purpose I mentioned above, I wrote below code,
X1_coordinate = df['X_coordinate'].tolist()
Y1_coordinate = df['Y_coordinate'].tolist()
node_number1 = df['node_number'].tolist()
nodal_dist = []
i = 0
for i in range(len(node_number1)):
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
nodal_dist.append(dist)
I got the error
list index out of range
Kindly let me know what I am doing wrong and what should I change to get the answer.
Indexing starts at zero, so the last element in the list has an index that is one less than the number of elements in that list. But the len() function gives you the number of elements in the list (in other words, it starts counting at 1), so you want the range of your loop to be len(node_number1) - 1 to avoid an -off-by-one error.
The problems should been in this line
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
the X1_coordinate[i+1] and the ] Y1_coordinate[i+1]] go out of range on the last number call.

Rolling sum on a dynamic window

I am new to python and the last time I coded was in the mid-80's so I appreciate your patient help.
It seems .rolling(window) requires the window to be a fixed integer. I need a rolling window where the window or lookback period is dynamic and given by another column.
In the table below, I seek the Lookbacksum which is the rolling sum of Data as specified by the Lookback column.
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
eg:
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
You can create a custom function for use with df.apply, eg:
def lookback_window(row, values, lookback, method='sum', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
Then use it as:
df['new_col'] = df.apply(lookback_window, values=df['Data'], lookback=df['Lookback'], axis=1)
There may be some corner cases but as long as your indices align and are unique - it should fulfil what you're trying to do.
here is one with a list comprehension which stores the index and value of the column df['Lookback'] and the gets the slice by reversing the values and slicing according to the column value:
df['LookbackSum'] = [sum(df.loc[:e,'Data'][::-1].to_numpy()[:i+1])
for e,i in enumerate(df['Lookback'])]
print(df)
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
An exercise in pain, if you want to try an almost fully vectorized approach. Sidenote: I don't think it's worth it here. At all.
Inspired by Divakar's answer here
Given:
import numpy as np
import pandas as pd
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
Using the function from Divakar's answer, but slightly modified
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r, fill_value=np.nan):
# Concatenate with sliced to cover all rolls
p = np.full((a.shape[0],a.shape[1]-1),fill_value)
a_ext = np.concatenate((p,a,p),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), -r + (n-1),0]
Now, we just need to prepare a 2d array for the data and independently shift the rows according to our desired lookback values.
arr = df['Data'].to_numpy().reshape(1, -1).repeat(len(df), axis=0)
shifter = np.arange(len(df) - 1, -1, -1) #+ d['Lookback'] - 1
temp = strided_indexing_roll(arr, shifter, fill_value=0)
out = strided_indexing_roll(temp, (len(df) - 1 - df['Lookback'])*-1, 0).sum(-1)
Output:
array([ 1, 2, 3, 4, 5, 8, 10, 7, 8, 3], dtype=int64)
We can then just assign it back to the dataframe as needed and check.
df['out'] = out
#output:
Data Lookback LookbackSum out
0 1 0 1 1
1 1 1 2 2
2 1 2 3 3
3 2 2 4 4
4 3 1 5 5
5 2 3 8 8
6 3 3 10 10
7 2 2 7 7
8 1 3 8 8
9 2 1 3 3

Cluster Rows in Data Subgroups

I have a dataset df of object components in 3-d space - each ID represents an object which has various components:
ID Comp x y z
A 1 2 2 1
A 2 2 1 -1
A 3 -10 1 -10
A 4 -1 3 -5
B 1 3 0 0
B 2 3 0 -5
...
I would like to loop through each ID, using a clustering technique in sklearn to create clusters of components (Comp) based on each component's (x,y,z) co-ordinates - to achieve something like this:
ID Comp x y z cluster
A 1 2 2 1 1
A 2 2 1 -1 1
A 3 -10 1 -10 2
A 4 -1 3 -5 3
B 1 3 0 0 1
B 2 3 0 -5 1
...
As an example - ID:A,Comp:1 is incluster1, whereasID:A, Comp:4 is in cluster 3. (I plan to then concatenate ID and cluster later).
I'm having no luck with the following groupby + apply:
from sklearn.cluster import AffinityPropagation
ap = AffinityPropagation()
df['cluster']=df.groupby(['ID','Comp']).apply(lambda x: ap.fit_predict(np.array([x.x,x.y,x.z]).T))
I could brute-force it by using a for loop over the ID but my dataset is large (~ 150k ID) and I'm worried about resource and time constraints. Any help would be great!
IIUC, I think you could try something like this:
def ap_fit_pred(x):
ap = AffinityPropagation()
return pd.Series(ap.fit_predict(x.loc[:,['x','y','z']]))
df['cluster'] = df.groupby('ID').apply(ap_fit_pred).reset_index(drop=True)

How to calculate amounts that row values greater than a specific value in pandas?

How to calculate amounts that row values greater than a specific value in pandas?
For example, I have a Pandas DataFrame dff. I want to count row values greater than 0.
dff = pd.DataFrame(np.random.randn(9,3),columns=['a','b','c'])
dff
a b c
0 -0.047753 -1.172751 0.428752
1 -0.763297 -0.539290 1.004502
2 -0.845018 1.780180 1.354705
3 -0.044451 0.271344 0.166762
4 -0.230092 -0.684156 -0.448916
5 -0.137938 1.403581 0.570804
6 -0.259851 0.589898 0.099670
7 0.642413 -0.762344 -0.167562
8 1.940560 -1.276856 0.361775
I am using an inefficient way. How to be more efficient?
dff['count'] = 0
for m in range(len(dff)):
og = 0
for i in dff.columns:
if dff[i][m] > 0:
og += 1
dff['count'][m] = og
dff
a b c count
0 -0.047753 -1.172751 0.428752 1
1 -0.763297 -0.539290 1.004502 1
2 -0.845018 1.780180 1.354705 2
3 -0.044451 0.271344 0.166762 2
4 -0.230092 -0.684156 -0.448916 0
5 -0.137938 1.403581 0.570804 2
6 -0.259851 0.589898 0.099670 2
7 0.642413 -0.762344 -0.167562 1
8 1.940560 -1.276856 0.361775 2
You can create a boolean mask of your DataFrame, that is True wherever a value is greater than your threshold (in this case 0), and then use sum along the first axis.
dff.gt(0).sum(1)
0 1
1 1
2 2
3 2
4 0
5 2
6 2
7 1
8 2
dtype: int64

Matlab to python logic difficulty for arrays

I have created an m-by-n matrix in MATLAB and can easily select a range of values within a certain column and row. For instance, if I have matrix A:
A =
0 0 0 0
1 2 3 4
5 6 7 8
9 10 11 12
I can isolate the values: 1,5 and 9 from the first column by typing: A(2:4,1). The results will yield [1;5;9]. As it relates to python, I am not sure how to index an array such that I have the desired values as above.
This can be done using numpy
a = numpy.matrix('0 0 0 0; 1 2 3 4; 5 6 7 8; 9 10 11 12')
Required result is a[1:,0] or a[1:4,0]
Only difference is that the array indexing start from 0 instead of 1.

Categories

Resources