I am trying to rotate a series of points around the origin given an angle in radians, but the results always give a skewed version of the graph.
Here is the method I am using, it is a pandas data frame containing x and y values:
df['x'] = df['x']*math.cos(math.radians(45))-df['y']*math.sin(math.radians(45))
df['y'] = df['x']*math.sin(math.radians(45))+df['y']*math.cos(math.radians(45))
I don't understand why it is creating a skewed graph.
df['x'] = df['x']*math.cos(math.radians(45))-df['y']*math.sin(math.radians(45))
df['x'] is overridden with the new value.
So, the 2nd line uses the new df['x'] but that's wrong. It should use the old value instead.
df['y'] = df['x']*math.sin(math.radians(45))+df['y']*math.cos(math.radians(45))
Using at temporary should fix this:
dfx = df['x']*math.cos(math.radians(45))-df['y']*math.sin(math.radians(45))
df['y'] = df['x']*math.sin(math.radians(45))+df['y']*math.cos(math.radians(45))
df['x'] = dfx
Related
I am trying to find a good way to run a (nonlinear, injective, multivariable) transformation on columns in a pandas dataframe. Transform is a black box with multiple variables in and multiple variables out.
As an easy illustration, let's just consider converting r, theta coordinates to x, y coordinates. Run this for setup/context
# set up example (all this is given in my case)
def blackbox_transform(rtheta):
x = rtheta[0]*np.cos(rtheta[1])
y = rtheta[0]*np.sin(rtheta[1])
return (x, y)
n = 50
r = np.ones(n)
theta = np.linspace(0, np.pi / 2, n)
r_theta = np.concatenate((r[:, None], theta[:, None]), axis=1)
df = pd.DataFrame(data=r_theta, columns=['r', 'theta'])
For the solution, this is the best I can come up with, but the apply and unpacking seems clunky (hoping a pandas wizard has a better approach):
# solution
xy = df[['r', 'theta']].apply(blackbox_transform, axis=1)
df = pd.concat((df, pd.DataFrame(data=[*xy], columns=['x', 'y'], index=xy.index)), axis=1)
I get that using pandas may look a little silly here, but there's a lot of other information I have in the dataframe and I need to transform some numerics columns while keeping all the indices and other info straight.
Here is a slightly more readable approach:
out = df[['r', 'theta']].apply(rtheta_to_xy, 1).apply(pd.Series)
df = df.assign(x=out[0], y=out[1])
Btw your use of lambda is dispensable when you are just forwarding the same argument.
Using fuglede's answer, it's easy to find the local extrema of a DataFrame column :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])
# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()
Which gives the following graph :
I now would like to group those extrema in pairs (minimum & extremum that are neighbors, in this order) and remove the pairs where extremum < minimum + threshold. By removing I mean replacing the corresponding values in df['min'] and df['max'] by nans.
This basically filters the irrelevant small extrema.
I've tried find_peaks with various options but none gave the intended results.
Is there an elegant and fast way to do this ?
I think you have missed the excellent answer from Foad reported here Pandas finding local max and min
Instead of calculating max and min by a shift of 1, you can set a window (number of neighbors) and find the local min and max of your values. Although there is no single window param that will fit perfectly, it reduces the noise substantially.
from scipy.signal import argrelextrema
# Find peaks in the window
n = 10 #window size
df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal, order=n)[0]]['data']
df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal, order=n)[0]]['data']
I agree with the previous, but I think this might be more what you are asking for.
threshold = 0.8
points = df.dropna(subset=['min', 'max'], how='all').copy()
ddf = pd.merge(points['min'].dropna().reset_index(),
points['max'].dropna().reset_index(),
left_index=True,
right_index=True)
ddf = ddf[ddf['max'] < (ddf['min'] + threshold)]
# Plot results
plt.scatter(ddf['index_x'], ddf['min'], c='r')
plt.scatter(ddf['index_y'], ddf['max'], c='g')
df.data.plot()
Although I suspect what you want is actually this:
threshold = 3
points = df.dropna(subset=['min', 'max'], how='all').copy()
ddf = pd.merge(points['min'].dropna().reset_index(),
points['max'].dropna().reset_index(),
left_index=True,
right_index=True)
ddf = ddf[ddf['max'] > (ddf['min'] + threshold)]
# Plot results
plt.scatter(ddf['index_x'], ddf['min'], c='r')
plt.scatter(ddf['index_y'], ddf['max'], c='g')
df.data.plot()
To merge this back onto the original dataframe:
df['min'] = df.index.map(ddf.set_index('index_x')['min'])
df['max'] = df.index.map(ddf.set_index('index_y')['max'])
I have a set of data points over time, but there is some missing data and the data is not at regular intervals. In order to get a full data set over time at regular intervals I did the following:
import pandas as pd
import numpy as np
from scipy import interpolate
x = data['time']
y = data['shares']
f = interpolate.interp1d(x, y, fill_value='extrapolate')
time = np.arange(0, 3780060, 600)
new_data = []
for interval in time:
new_data.append(f(interval))
test = pd.DataFrame({'time': time, 'shares': y})
test_func = test_func.astype(float)
When both the original and the extrapolated data sets are plotted, they seem to line up almost perfectly, but I still wonder if there is a more efficient and/or accurate way to accomplish the above.
You should apply interpolation function only once, like this
new_data = f(time)
If you need values at regular intervals fill_value='extrapolate' is redundant, because it is just interpolation. You may use 'extrapolate' if your new interval is wider than original one. But it is bad practice.
I have created a graph in python but I now need to take a section of the graph and expand this by using a small range of the original data, but I don't know how to find the row number of the results that form the range or how I can create a graph using just these results form the file. This is the code I have for the graph:
import numpy as np
import matplotlib.pyplot as plt
#variable for data to plot
spec_to_plot = "SN2012fr_20121129.42_wifes_BR.dat"
#tells python where to look for the file
spec_directory = '/home/fh1u16/Documents/spectra/'
data = np.loadtxt(spec_directory + spec_to_plot, dtype=np.float)
x = data[:,0]
y = data[:,1]
plt.plot(x, y)
plt.xlabel("Wavelength")
plt.ylabel("Flux")
plt.title(spec_to_plot)
plt.show()
edit: data is between 3.5e+3 and 9.9e+3 in the first column, I need to use just the data between 5.5e+3 and 6e+3 to plot another graph, but this only applies to the first column. Hope this makes a bit more sense?
Python version 2.7
If I understand you correctly, you could do it this way:
my_slice = slice(np.argwhere(x>5.5e3)[0], np.argwhere(x>6e3)[0])
x = data[my_slice,0]
y = data[my_slice,1]
np.argwhere(x>5.5e3)[0] is the index of the first occurrence of x>5.5e3 and like wise for the end of the slice. (assuming your data is sorted)
A more general way working even if your data is not sorted:
mask = (x>5.5e3) & (x<6e3)
x = data[mask, 0]
y = data[mask, 1]
solved by using
plt.axis([5500, 6000, 0, 8e-15])
thanks for help.
I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.