I'm using HC-SR04 sensor on raspberry pi and I want to compare the datas I have read. When I tried to store the datas in an array it just store one of them and it updates itself constantly. How can I store all of them or compare one data to another?
distance = (pulse_duration * 34320)*0.5
distance = round(distance,2)
array = []
array.append(distance)
output of this code is:
distance: 10.7cm
array: [10.7]
distance: 10.63cm
array: [10.63]
Your first issue of array-recreation is fixed by initializing your array outside of the loop that code resides in. However, for data analysis, I would suggest placing these values into a numpy array or a pandas Series/DataFrame. This will allow you to perform quick (and low time complexity) analysis on frequent sensor data. For example, instead of:
array = []
sum = 0
for i in array:
sum += i
mean = sum / len(array)
You can use the C optimized numpy function:
np.mean(sensor_matrix)
Related
I have a 300.000 row pd.DataFrame comprised of multiple columns, out of which, one is a 50-dimension numpy array of shape (1,50) like so:
ID Array1
1 [2.4252 ... 5.6363]
2 [3.1242 ... 9.0091]
3 [6.6775 ... 12.958]
...
300000 [0.1260 ... 5.3323]
I then generate a new numpy array (let's call it array2) with the same shape and calculate the cosine similarity between each row of the dataframe and the generated array. For this, I am currently using sklearn.metrics.pairwise.cosine_similarity and save the results in a new column:
from sklearn.metrics.pairwise import cosine_similarity
df['Cosine'] = cosine_similarity(df['Array1].tolist(), array2)
Which works as intended and takes, on average, 2.5 seconds to execute. I am currently trying to lower this time to under 1 second simply for the sake of having less waiting time in the system I am building.
I am beginning to learn about Vaex and Dask as alternatives to pandas but am failing to convert the code I provided to a working equivalent that is also faster.
Preferably with one of the technologies I mentioned, how can I go about making pairwise cosine calculations even faster for large datasets?
You could use Faiss here and apply a knn operation. To do this, you would put dataframe into a Faiss index and then search it using the array with k=3000000 (or whatever the total number of rows of your dataframe).
import faiss
dimension = 100
array1 = np.random.random((n, dimension)).astype('float32')
index = faiss.IndexFlatIP(d)
#add the rows of the dataframe into Faiss
for index, row in df.iterrows():
index.add(row)
k= len(df)
D, I = index.search(array1, k)
Note that you'll need to normalise the vectors to make this work (as the above solution is based on inner product).
i want to create a loop for the following lines in python (i use pycharm):
mean_diff = np.mean(np.array([diff_list[0].values, diff_list[1].values, diff_list[2].values, diff_list[3].values,...,diff_list[100], axis=0)
with this i get the mean of each individual cell from different arrays (raster change over time)
i tried the following:
for x in range(100):
mean_diff = np.mean(np.array([diff_list[x].values]), axis=0)
but what's happening here is that it will start to calculate the mean between the mean of the last iteration and the new array and so on, instead of adding everything up first and afterwards calculating the mean of the total. one idea was to create a "sumarray" first with all the diff_list values in it, but i failed to do that too. the original type of my diff_list is a list which contains data frames in it (for each row it has an array, so it's a 3d array/data frame (??)... -> see picture: image shows the structure of the list).
You need to populate the array, not do the computation, within the loop. Python list comprehensions are perfect for this:
Your first program is the equivalent of:
mean_diff = np.mean(np.array([a.values for a in diff_list[:101]], axis=0))
Or if you prefer:
x = []
for a in diff_list[:101]:
x.append(a.values)
mean_diff = np.mean(np.array(x, axis=0))
If you are using the whole list instead of its first 101 elements you can drop the "[:101]".
I'd like to adjust certain values once they exceed a threshold of 180. Here is a sample of the code:
mxn = Lon.shape
lon = []
for i in range(mxn[0]):
for j in range(mxn[1]):
if Lon[i,j]>180:
lon.append(Lon[i,j]-360)
elif Lon[i,j]<=180:
lon.append(Lon[i,j])
Essentially, I'd like to adjust the longitude from 0-360 to -180 to 180. When performing this loop however, it returns a single array rather than a matrix that matches the size of Lon, the original matrix. I know there is a way to do it but I'm having a difficult time finding a good resource showing how to do it. Thanks in advance
You are just appending all the values to a single array. Try making a sub-array in the nested for loop and then append that to lon.
for i in range(mxn[0]):
sub = []
for j in range(mxn[0])
....<append the values to sub>
lon.append(sub)
My problem has different moving objects. We calculate distance between these objects in different time frame.
I have a nd array A with shape (a,b) which stores distances. a is the number of fames and b is the number of coordinates on which this distance is calculated.
I have a list L which has the names of these objects. It has a length of b.
I want to find where distance value is 1 . Then I want to locate the name of this index in list L (which has same index). I write the following
A=[[nd array]]
L=[list of names]
list_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
The problem is i am not getting names per frame. I want this array to be split frame wise so i get (a,x) where a is number of frames and for each frame I have x number of names
sample case
A=np.array[[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]]
L=[('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
llist_to_array=np.array(L)
array_of_names_meeting_criteria=list_to_array[np.where(A==1)[1]]
I should get the below :-
[['cat','dog'],['fish','shark'],['lion','elephant'],['man','women']]
I just did some minor edits from your code and here's the result:
A = np.array([[1,2,2,6],[3,4,5,1],[3,1,17,4],[2,3,1,5]])
L = [('cat','dog'),('lion','elephant'),('man','women'),('fish','shark')]
list_to_array = np.array(L)
array_of_names_meeting_criteria = list_to_array[np.where(A==1)[1]]
I am trying to do the following on Numpy without using a loop :
I have a matrix X of dimensions N*d and a vector y of dimension N.
y contains integers ranging from 1 to K.
I am trying to get a matrix M of size K*d, where M[i,:]=np.mean(X[y==i,:],0)
Can I achieve this without using a loop?
With a loop, it would go something like this.
import numpy as np
N=3
d=3
K=2
X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
line=X[y==i+1,:]
if line.size==0:
M[i,:]=np.zeros(d)
else:
M[i,:]=mp.mean(line,0)
Thank you in advance.
The code's basically collecting specific rows off X and adding them for which we have a NumPy builtin in np.add.reduceat. So, with that in focus, the steps to solve it in a vectorized way could be as listed next -
# Get sort indices of y
sidx = y.argsort()
# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]
# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)
# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])
# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals
This solves the question, but creates an intermediate K×N boolean matrix, and doesn't use the built-in mean function. This may lead to worse performance or worse numerical stability in some cases. I'm letting the class labels range from 0 to K-1 rather than 1 to K.
# Define constants
K,N,d = 10,1000,3
# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)
# Calculate means for each class, vectorized
# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None]
# Count number of examples in each class
count = sum(mark,1)
# Avoid divide by zero if no examples
count += count==0
# Sum within each class and normalize
M = (dot(mark,X).T/count).T
print(M, shape(M), shape(mark))