Related
I have a code for determining some value by two methods. My data is in dataframe and I choose two time intervals and by changing the times within these two time intervals and find the smallest degree value.
My data is in following form: It's just an example.
df:
epoch r1 r2 r3
2020-07-07T08:17 -6.366163 11.8 -1.2
2020-07-07T08:18 -5.163 10.38 -2.5
2020-07-07T08:19 -4.3 9.4 5.2
...........
2020-07-07T14:00 1 0.25 22.2 1.5
Here is my data https://www.dropbox.com/s/39yre6y85luu3tj/example.csv?dl=0
I divide the data set into two selected regions like from 2020-07-07 09:10 to 10:00 as the one selected region, and
2020-07-07 11:10 to 13:00 as the other selected region.
For example:
df['epoch'] = pd.to_datetime(df.epoch, format='%Y-%m-%d %H:%M:%S.%f')
df['time'] = df['epoch'].dt.strftime('%H:%M:%S.%f')
first_datetime = '2020-07-07 09:10'
second_datetime = '2020-07-07 10:00'
third_datetime = '2020-07-07 11:10'
fourth_datetime = '2020-07-07 13:00'
first = datetime.strptime(first_datetime , "%Y-%m-%d %H:%M").strftime('%H:%M')
second = datetime.strptime(second_datetime , "%Y-%m-%d %H:%M").strftime('%H:%M')
third = datetime.strptime(third_datetime , "%Y-%m-%d %H:%M").strftime('%H:%M')
fourth = datetime.strptime(fourth_datetime , "%Y-%m-%d %H:%M").strftime('%H:%M')
And then I do two different procedures with the divided regions and df_1 indicates the first selected region and df_2 indicates the second selected region:
First method is just some matrix stuff
df_1 = df.loc[(df["time"] >= first) & (df["time"] <= second)]
df_2 = df.loc[(df["time"] >= third) & (df["time"] <= fourth)]
df_add = df_1.append(df_2, ignore_index = True)
rx, ry, rz = (df_add['r1'], df_add['r2'], df_add['r3'])
data = np.array([rx,ry,rz])
covMatrix = np.cov(data,bias=True)
eigval_mva, eigvec_mva = np.linalg.eig(covMatrix)
eigval_mva = eigval_mva.real
mincol = list(eigval_mva).index(min(eigval_mva))
eigenvect = eigvec_mva[:,mincol]
print(eigenvect)
second one is also some algorithmic stuff
df_u = df_1
df_d = df_2
rx_1 = df_1['r1'].mean()
ry_1 = df_1['r2'].mean()
rz_1 = df_1['r3'].mean()
rx_2 = df_2['r1'].mean()
ry_2 = df_2['r2'].mean()
rz_2 = df_2['r3'].mean()
r_1 = (rx_1, ry_1, rz_1)
r_2 = (rx_2, ry_2, rz_2)
r_1_vec = np.array(r_1)
r_2_vec = np.array(r_2)
def cross(a,b):
return np.cross(a,b)
def norm(v):
return v/(np.linalg.norm(v))
v = cross(np.subtract(r_1_vec, r_2_vec), cross(r_1_vec, r_2_vec))
nor = norm(v)
print(nor)
And finally, I compare the two methods by their angle
import math
def angle(v1,v2):
unit_vector_1 = v1 / np.linalg.norm(v1)
unit_vector_2 = v2 / np.linalg.norm(v2)
dot_product = np.dot(unit_vector_1, unit_vector_2)
return np.arccos(dot_product)
a = angle(eigenvect,nor)
deg = np.rad2deg(a)
print(deg)
So, my question is how to run this process many times by varying times within selected regions and then choosing the smallest degree it gives?
What I mean is for example I restrict the first chosen region to be from maybe 09:00 o'clock to 11:00 o'clock and the second region to be 11:00 to 12:00 o'clock and changing the intervals in the restricted time of the two regions and finding the smallest possible degree combinations and printing the time interval for two regions which give the smallest degree.
In other words I restrict df_1 and df_2 in some intervals and varying them within that intervals to find the smallest degree value
How to implement this.
Edit: As requested my explanation.
As I stated before, the whole purpose of procedure is searching for two intervals that I can use in the places of df_1 and df_2. Somehow setting the intervals, I can reach two interval sets in which the methods agree the most by their least degree difference.
Best set of intervals are the method results are in degree difference < 1 degree and the bad ones are everything above 10 degrees are the bad ones.
I can find such interval in which the methods agree the most, but I could have been missing some intervals, which could be far more accurate. So, in order to find the true value or the ones who are closer to the true value, I need to check every possible combinations within constraints that I set for two intervals each.
Well, here are some pointers towards a mechanical answer to the question. It feels unsatisfying because I still don't fully understand what the higher objective function is that we want to optimize.
My understanding so far is that a df is given, containing a time series of points in 3 dimensions. The OP wants to find two contiguous and distinct subsets of df (defined by two non-overlapping intervals of time), such that some measure of dissimilarity between the two subsets is minimized. These two subsets are two clouds of points. Two metrics are used:
the direction of minimum variance of the combined clouds, obtained by PCA (eigensystem of the covariance matrix of all points);
the direction orthogonal to the vector difference between the clouds' barycenters, and coplanar with (origin, mean_1, mean_2). Since the problem is 3-dimensional, the OP uses cross-products to calculate this.
The measure to minimize is the angle between these two metrics. No intuitive explanation as to why that is desirable has been provided at the time of this writing. In addition to the lack of intuition, there are caveats to both metrics (see the Discussion section below).
In any case, the following is a brute-force answer. It will take forever on any reasonable size dataset. It examines every pair of intervals satisfying the OP's constraints and selects the pair of intervals for which the measure is minimum. The constraints are that the two intervals are each drawn from bigger, non-verlapping ones that are given.
Some code cleanup
import pandas as pd
import numpy as np
def select(df, t0, t1, t2, t3):
df1 = df.truncate(before=t0, after=t1)
df2 = df.truncate(before=t2, after=t3)
return df1, df2
def eigen(df1, df2):
data = pd.concat([df1, df2], axis=0).values.T
covMatrix = np.cov(data, bias=True)
w, v = np.linalg.eig(covMatrix)
return w.real, v
def norm(v):
return v / np.linalg.norm(v)
def crossdir(df1, df2):
r1 = df1.mean().values
r2 = df2.mean().values
v = np.cross(r1 - r2, np.cross(r1, r2))
return norm(v)
def angle(v1, v2):
dp = np.dot(norm(v1), norm(v2))
return np.arccos(dp)
def measure(df1, df2, eig_mm_min_ratio=1/3):
w, v = eigen(df1, df2)
wwr = v.min() / np.median(v)
if wwr < eig_mm_min_ratio:
return float('inf')
vmin = v[:, w.argmin()] # eigenvector with minimal eigenvalue
nor = crossdir(df1, df2)
return angle(vmin, nor)
A brute force approach
Edit: from discussion in comments, it appears that a constraint can be exploited: the two intervals have to be drawn from two larger intervals that are given.
from itertools import combinations, product
from math import comb
from tqdm.notebook import tqdm
def gen_intervals(df, t0, t1, t2, t3):
"""
Given DataFrame df with DateTimeIndex, generate all
pairs of non-overlapping time intervals with endpoints
picked from df.index, s.t. the first is within
[t0,t1] and the second within [t2,t3].
"""
ix1 = df.truncate(before=t0, after=t1).index
ix2 = df.truncate(before=t2, after=t3).index
for (t0, t1), (t2, t3) in product(
combinations(ix1, 2), combinations(ix2, 2)):
yield t0, t1, t2, t3
def n_intervals(df, t0, t1, t2, t3):
"""
Give the number of values that gen_intervals
would generate given the same arguments.
"""
n1 = df.truncate(before=t0, after=t1).shape[0]
n2 = df.truncate(before=t2, after=t3).shape[0]
return comb(n1, 2) * comb(n2, 2)
def brute(df, t0, t1, t2, t3, eig_mm_min_ratio=1/3):
best = float('inf')
best_ix = None
nc = n_intervals(df, t0, t1, t2, t3)
pb = tqdm(gen_intervals(df, t0, t1, t2, t3), total=nc)
for ix in pb:
df1, df2 = select(df, *ix)
y = measure(df1, df2, eig_mm_min_ratio=eig_mm_min_ratio)
if y < best:
best = y
best_ix = ix
ixstr = [f'{t:%H:%M}' for t in ix]
pb.set_description(f'y={y:4g} for ix={ixstr}')
return best_ix, best
Small reproducible example
n = 4 # knots
np.random.seed(0)
df = pd.DataFrame(
np.random.uniform(-10, 10, (n, 3)),
index=pd.date_range('2020-07-07', freq='10min', periods=n),
columns=['r1', 'r2', 'r3']
)
df = df.resample('min').interpolate('cubic')
Then:
# "big intervals" that should contain the small ones
interval0 = '2020-07-07 00:00:00', '2020-07-07 00:10:00'
interval1 = '2020-07-07 00:20:00', '2020-07-07 00:30:00'
# brute force exploration
ix, y = brute(df, *interval0, *interval1)
# takes ~5 sec for a df of 31 points
>>> ix
(Timestamp('2020-07-07 00:01:00', freq='T'),
Timestamp('2020-07-07 00:03:00', freq='T'),
Timestamp('2020-07-07 00:23:00', freq='T'),
Timestamp('2020-07-07 00:25:00', freq='T'))
>>> y
0.00264...
Plot df, the two "big" intervals to chose from, and the two intervals found:
ax = df.plot()
ax.axvspan(*interval0, color='grey', alpha=0.1)
ax.axvspan(*interval1, color='grey', alpha=0.1)
ax.axvspan(ix[0], ix[1], color='grey', alpha=0.3)
ax.axvspan(ix[2], ix[3], color='grey', alpha=0.3)
Discussion
We need more information about the meaning of the measure being minimized. Not only is it very possible that drastically more efficient optimization techniques can be applied, the actual validity of the measure is called in question.
The meaning of the measure calculated is important and deserves scrutiny. Additionally, we should note conditions by which the measure can be unstable and yield unexpected results. For example, the axis of minimum variance (found by PCA) can be arbitrarily unstable: a close to spherical cloud of points (more precisely, one with close to spherical inertia) will yield a random direction that can be completely affected by as little as a single point added or removed.
Likewise, the direction orthogonal to the two barycenters can be arbitrarily unstable if the two are too close to each other.
According to the original paper by Huang
https://arxiv.org/pdf/1401.4211.pdf
The marginal Hibert spectrum is given by:
where A = A(w,t) (i.e., a function time and frequency) and p(w,A)
the joint probability density function of P(ω, A) of the frequency [ωi] and amplitude [Ai].
I am trying to estimate 1) The joint probability density using the plt.hist2d 2) the integral shown below using a sum.
The code I am using is the following:
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
plt.close()
n1 = np.digitize(IA_flat, A_edges).astype(int) ### Return the indices of the bins to which
n2 = np.digitize(IF_flat, f_edges).astype(int) ### each value in input array belongs.
### define integration function
from numba import jit, prange ### Numba is added for speed
#jit(nopython=True, parallel= True)
def get_int(A_edges, Pjoint ,IA_flat, n1, n2):
dA = np.diff(A_edges)[0] ### Find dx for integration
sum_h = np.zeros(np.shape(Pjoint)[0]) ### Intitalize array
for j in prange(np.shape(Pjoint)[0]):
h = np.zeros(np.shape(Pjoint)[1]) ### Intitalize array
for k in prange(np.shape(Pjoint)[1]):
needed = IA_flat[(n1==k) & (n2==j)] ### Keep only the elements of arrat that
### are related to PJoint[j,k]
h[k] = Pjoint[j,k]*np.nanmean(needed**2)*dA ### Pjoint*A^2*dA
sum_h[j] = np.nansum(h) ### Sum_{i=0}^{N}(Pjoint*A^2*dA)
return sum_h
### Now run previously defined function
sum_h = get_int(A_edges, Pjoint ,IA_flat, n1, n2)
1) I am not sure that everything is correct though. Any suggestions or comments on what I might be doing wrong?
2) Is there a way to do the same using a scipy integration scheme?
You can extract the probability from the 2D histogram and use it for the integration:
# Added some numbers to have something to run
import numpy as np
import matplotlib.pyplot as plt
IA = np.random.rand(100,100)
IF = np.random.rand(100,100)
bins_F = np.linspace(0,1,20)
bins_A = np.linspace(0,1,100)
min_f = 0
fs = 1.0
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
f_values = (f_edges[1:]+f_edges[:-1])/2
A_values = (A_edges[1:]+A_edges[:-1])/2
dA = A_values[1]-A_values[0] # for the integral
#Pjoint.shape (19,99)
h = np.zeros(f_values.shape)
for i in range(len(f_values)):
f = f_values[i]
# column of the histogram with frequency f, probability
p = Pjoint[i]
# summatory equivalent to the integral
integral_result = np.sum(p*A_values**2*dA )
h[i] = integral_result
plt.figure()
plt.plot(f_values,h)
I'm trying to calculate the Fourier transform of three muon polarization signals, which are simply cosine functions multiplied by an exponential decay.
So, doing the Fourier transform, we are going to see broadened peaks centered at the corresponding frequency.
The problem is that I have already tried to do the Fourier transform, but I do not know if it's correct; furthermore, I'm trying to calculate the FWHM using the scipy.stats.moment function, using the 2-nd moment: is it correct?
Can you tell me if the code is correct?
I put here the three signals in .npy file and the code used for the Fourier analysis.
The signals are signal[0], signal[1] and signal[2], arrays of 10 dimension.
Each signal[k] contains 10 polarization functions (1 for each applied magnetic field), which are signals of 400 points.
The corresponding files are signal_100, signal_110, signal_111, provided here:
https://github.com/JonathanFrassineti/UNDI-examples.
Ah, the frequencies range from 0 Hz to 40 MHz.
Thank you!
N = 400 # Number of signal points.
N1 = 40000000
T = 1./800. # Sampling spacing.
xf = np.fft.rfftfreq(N1, T)
yf1 = FWHM1 = sigma1 = delta1 = bhar1 = np.zeros(fields, dtype = object)
yf2 = FWHM2 = sigma2 = delta2 = bhar2 = np.zeros(fields, dtype = object)
yf3 = FWHM3 = sigma3 = delta3 = bhar3 = np.zeros(fields, dtype = object)
for j in range(fields):
# Fourier transform.
yf1[j] = np.fft.rfft(signal[0][j])
yf2[j] = np.fft.rfft(signal[1][j])
yf3[j] = np.fft.rfft(signal[2][j])
FWHM1[j] = moment(yf1[j], moment=2)
FWHM2[j] = moment(yf2[j], moment=2)
FWHM3[j] = moment(yf3[j], moment=2)
sigma1[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
sigma2[j] = np.sqrt(np.abs(FWHM2[j]))/2.355
sigma3[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
delta1[j] = sigma1[j]/gamma_Cu
delta2[j] = sigma2[j]/gamma_Cu
delta3[j] = sigma3[j]/gamma_Cu
bhar1[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta1[j]
bhar2[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta2[j]
bhar3[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta3[j]
Currently i work in a python project with same object. I've a set of data of magnetic field B(x,y,z), i think ideal would be to organize your data periodically at event and deduce Fe (sampling_rate).
f(A, t)=A*( cos(2*pi*fe*t) - sin(2*pi*fe*t)
B=[ 50, 50, 10, 3 ] # where each data is |B| normal at second
res=[ f(a, time) for time, a in enumerate(B) ]
fourrier_transform=np.fft.fft( res )
frequency= fftfreq([ time for time in range(len(B)) ]) # U can use fftfreq provide by scipy
Please star this project, research ressource to contribute
RFSignalToolkit github project
I simply want to see how long it takes this code to execute. There is a similar question here:
timeit module in python does not recognize numpy module
and I understand what they are saying, but I don't get where these lines of code should be placed. Here is what I have. I know its a little long to scroll through, but you can see where I have placed the timeit commands at the beginning and end. This is not working and I am guessing it is because I have placed these lines of code for timeit incorrectly. The code works if I delete the timeit stuff.
Thanks
import timeit
u = timeit.Timer("np.arange(1000)", setup = 'import numpy as np')
#set up variables
m = 4.54
g = 9.81
GR = 8
r_pulley = .1
th1=np.pi/4 #based on motor 1 encoder counts. Number of degrees rotated from + x-axis of base frame 0
th2=np.pi/4 #based on motor 2 encoder counts. Number of degrees rotated from + x-axis of m1 frame 1
th3_motor = np.pi/4*12
th3_pulley = th3_motor/GR
#required forces in x,y,z at end effector
fx = 1
fy = 1
fz = m*g #need to figure this out
l1=6
l2=5
l3=th3_pulley*r_pulley
#Build Homogeneous Tranforms Matrices
H1_0 = np.array(([np.cos(th1),-np.sin(th1),0,0],[np.sin(th1),np.cos(th1),0,0],[0,0,1,l3],[0,0,0,1]))
H2_1 = np.array(([np.cos(th2),-np.sin(th2),0,l1],[np.sin(th2),np.cos(th2),0,0],[0,0,1,0],[0,0,0,1]))
H3_2 = np.array(([1,0,0,l2],[0,1,0,0],[0,0,1,0],[0,0,0,1]))
H2_0 = np.dot(H1_0,H2_1)
H3_0 = np.dot(H2_0,H3_2)
print(np.matrix(H3_0))
#These HTMs are using the way I derived them, not the "correct" way.
#The answers are the same, but I think the processing time will be the same.
#This is because either way the two matrices with all the sines and cosines...
#will be the same. Only difference is in one method the ones and zeroes...
#matrix is the first HTM, in the other method it is the last HTM. So its the...
#same number of matrices with the same information, just being dot-producted...
#in a different order.
#Build Jacobian
#np.cross(x, y)
d10 = H1_0[0:3, 3]
d20 = H2_0[0:3, 3]
d30 = H3_0[0:3, 3]
print(d30)
subt1 = d30-d10
subt2 = d30-d20
#tsubt1 = subt1.transpose()
#tsubt2 = subt2.transpose()
#print(tsubt1)
zeroes = np.array(([0,0,1]))
print(subt1)
print(subt2)
cross1 = np.cross(zeroes, subt1)
cross2 = np.cross(zeroes, subt2)
cross1
cross2
#These cross products are correct but need to be tranposed into columns, right now they are a single row.
#tcross1=cross1.reshape(-1,1)
#tcross2=cross2.reshape(-1,1)
#dont actually need these transposes but I didnt want to forget the command.
# build jacobian (J)
#J = np.zeros((6,2))
#J[0:3,0] = cross1
#J[0:3,1] = cross2
#J[3:6,0] = zeroes
#J[3:6,1] = zeroes
#J
#find torques
J_force = np.zeros((2,3))
J_force[0,:]=cross1
J_force[1,:]=cross2
J_force
#build force matrix
forces = np.array(([fx],[fy],[fz]))
forces
torques = np.dot(J_force,forces)
torques #top number is theta 1 (M1) and bottom number is theta 2 (M2)
#need to add z axis?
print(u.timeit())
# u is a timer eval np.arange(1000)
u = timeit.Timer("np.arange(1000)", setup = 'import numpy as np')
# print how many seconds needed to run np.arange(1000) 1000000 times
# 1000000 is the default value, you can set by passing a int here.
print(u.timeit())
So the following is what you want.
import timeit
def main():
#set up variables
m = 4.54
g = 9.81
GR = 8
r_pulley = .1
th1=np.pi/4 #based on motor 1 encoder counts. Number of degrees rotated from + x-axis of base frame 0
th2=np.pi/4 #based on motor 2 encoder counts. Number of degrees rotated from + x-axis of m1 frame 1
th3_motor = np.pi/4*12
th3_pulley = th3_motor/GR
#required forces in x,y,z at end effector
fx = 1
fy = 1
fz = m*g #need to figure this out
l1=6
l2=5
l3=th3_pulley*r_pulley
#Build Homogeneous Tranforms Matrices
H1_0 = np.array(([np.cos(th1),-np.sin(th1),0,0],[np.sin(th1),np.cos(th1),0,0],[0,0,1,l3],[0,0,0,1]))
H2_1 = np.array(([np.cos(th2),-np.sin(th2),0,l1],[np.sin(th2),np.cos(th2),0,0],[0,0,1,0],[0,0,0,1]))
H3_2 = np.array(([1,0,0,l2],[0,1,0,0],[0,0,1,0],[0,0,0,1]))
H2_0 = np.dot(H1_0,H2_1)
H3_0 = np.dot(H2_0,H3_2)
print(np.matrix(H3_0))
#These HTMs are using the way I derived them, not the "correct" way.
#The answers are the same, but I think the processing time will be the same.
#This is because either way the two matrices with all the sines and cosines...
#will be the same. Only difference is in one method the ones and zeroes...
#matrix is the first HTM, in the other method it is the last HTM. So its the...
#same number of matrices with the same information, just being dot-producted...
#in a different order.
#Build Jacobian
#np.cross(x, y)
d10 = H1_0[0:3, 3]
d20 = H2_0[0:3, 3]
d30 = H3_0[0:3, 3]
print(d30)
subt1 = d30-d10
subt2 = d30-d20
#tsubt1 = subt1.transpose()
#tsubt2 = subt2.transpose()
#print(tsubt1)
zeroes = np.array(([0,0,1]))
print(subt1)
print(subt2)
cross1 = np.cross(zeroes, subt1)
cross2 = np.cross(zeroes, subt2)
cross1
cross2
#These cross products are correct but need to be tranposed into columns, right now they are a single row.
#tcross1=cross1.reshape(-1,1)
#tcross2=cross2.reshape(-1,1)
#dont actually need these transposes but I didnt want to forget the command.
# build jacobian (J)
#J = np.zeros((6,2))
#J[0:3,0] = cross1
#J[0:3,1] = cross2
#J[3:6,0] = zeroes
#J[3:6,1] = zeroes
#J
#find torques
J_force = np.zeros((2,3))
J_force[0,:]=cross1
J_force[1,:]=cross2
J_force
#build force matrix
forces = np.array(([fx],[fy],[fz]))
forces
torques = np.dot(J_force,forces)
torques #top number is theta 1 (M1) and bottom number is theta 2 (M2)
#need to add z axis?
u = timeit.Timer(main)
print(u.timeit(5))
I want to efficiently calculate the average of a variable (say temperature) over multiple areas of the plane.
I essentially want to do the following.
import numpy as np
num = 10000
XYT = np.random.uniform(0, 1, (num, 3))
X = np.transpose(XYT)[0]
Y = np.transpose(XYT)[1]
T = np.transpose(XYT)[2]
size = 10
bins = np.empty((size, size))
for i in range(size):
for j in range(size):
if rescaled X,Y in bin[i][j]:
bins[i][j] = mean T
I would use pandas (although im sure you can achieve basically the same with vanilla numpy)
df = pandas.DataFrame({'x':npX,'y':npY,'z':npZ})
# solve quadrants
df['quadrant'] = (df['x']>=0)*2 + (df['y']>=0)*1
# group by and aggregate
mean_per_quadrant = df.groupby(['quadrant'])['temp'].aggregate(['mean'])
you may need to create multiple quadrant cutoffs to get unique groupings
for example (df['x']>=50)*4 + (df['x']>=0)*2 + (df['y']>=0)*1 would add an extra 2 quadrants to our group (one y>=0, and one y<0) (just make sure you use powers of 2)