I have a dataset where each sample consists of x- and y-position, timestamp and a pressure value of touch input on a smartphone. I have uploaded the dataset here (OneDrive): data.csv
It can be read by:
import pandas as pd
df = pd.read_csv('data.csv')
Now, I would like to create a heat map visualizing the pressure distribution in the x-y space.
I envision a heat map which looks like the left or right image:
For a heat map of spatial positions a similar approach as given here could be used. For the heat map of pressure values the problem is that there are 3 dimensions, namely the x- and y-position and the pressure.
I'm happy about every input regarding the creation of the heat map.
There are several ways data can be binned. One is just by the number of events. Functions like numpy.histogram2d or hist2d allow to specify weights to each data point to manipulate the weight of each event.
But there is a more general histogram function that might be useful in your case: scipy.stats.binned_statistic_2d
By using the keyword argument statistic you can pick how the value of each bin is calculated from the values that lie within:
mean
std
median
count
sum
min
max
or a user defined function
I guess in your case mean or median might be a good solution.
Related
I used the interpolate function on two data sets. Now I want to transform the resultant functions into discrete x-y datasets, with 500 points each for each dataset. It feels like something I should be able to do. but I haven't been able to figure out how. Here is my code. Ch_resting_spores is a species of diatom within my dataframe.
intersmall=interp1d(counting.age_med_Ka, counting.small)
elems={}
for i in diatoms.columns:
elems[i]=interp1d(diatoms.Age_ka,diatoms[i],kind='cubic')
fig,ax=plt.subplots()
ax.plot(diatoms["Age_ka"], elems['Ch_resting_spores'](diatoms["Age_ka"]),'r')
ax2=ax.twinx()
ax2.plot(counting['age_med_Ka'],intersmall(counting['age_med_Ka']))
ax.set_xlim(1030,1102)
I have calculated the Moist Brunt-Vaisala frequency.
Let's say that the variable is moistb and has a dimension of [height, lat, lon].
I would like to plot the horizontal distribution of the total depth of the moistb.
How do I calculate the total depth? The idea is to sum all the depth of moistb in each grid point. Is there a way to do this with metpy?
For reference, here's an example as shown by Schumacher and Johnson (2008)
where they plot the horizontal distribution of total depth (m).
It sounds like in this case that you're working with data stored in an Xarray DataArray. If so, the way to do what you're looking for is:
moistb.sum(dim='height')
You can also do this with regular numpy arrays (or a DataArray) by using the axis argument, which corresponds to the number of the dimension in order. So for the order listed above this would be:
moistb.sum(axis=0)
For more information see the Xarray docs or the Numpy docs.
I have data from a number of high frequency data capture devices connected to generators on an electricity grid. These meters collect data in ~1 second "bursts" at ~1.25ms frequency, ie. fast enough to actually see the waveform. See below graphs showing voltage and current for the three phases shown in different colours.
This timeseries has a changing fundamental frequency, ie the frequency of the electricity grid is changing over the length of the timeseries. I want to roll this (messy) waveform data up to summary statistics of frequency and phase angle for each phase, calculated/estimated every 20ms (approx once per cycle).
The simplest way that I can think of would be to just count the gap between the 0 passes (y=0) on each wave and use the offset to calculate phase angle. Is there a neat way to achieve this (ie. a table of interpolated x values for which y=0).
However the above may be quite noisy, and I was wondering if there is a more mathematically elegant way of estimating a changing frequency and phase angle with pandas/scipy etc. I know there are some sophisticated techniques available for periodic functions but I'm not familiar enough with them. Any suggestions would be appreciated :)
Here's a "toy" data set of the first few waves as a pandas Series:
import pandas as pd, datetime as dt
ds_waveform = pd.Series(
index = pd.date_range('2020-08-23 12:35:37.017625', '2020-08-23 12:35:37.142212890', periods=100),
data = [ -9982., -110097., -113600., -91812., -48691., -17532.,
24452., 75533., 103644., 110967., 114652., 92864.,
49697., 18402., -23309., -74481., -103047., -110461.,
-113964., -92130., -49373., -18351., 24042., 75033.,
103644., 111286., 115061., 81628., 61614., 19039.,
-34408., -62428., -103002., -110734., -114237., -92858.,
-49919., -19124., 23542., 74987., 103644., 111877.,
115379., 82720., 62251., 19949., -33953., -62382.,
-102820., -111053., -114555., -81941., -62564., -19579.,
34459., 62706., 103325., 111877., 115698., 83084.,
62888., 20949., -33362., -61791., -102547., -111053.,
-114919., -82805., -62882., -20261., 33777., 62479.,
103189., 112195., 116380., 83630., 63843., 21586.,
-32543., -61427., -102410., -111553., -115374., -83442.,
-63565., -21217., 33276., 62024., 103007., 112468.,
116471., 84631., 64707., 22405., -31952., -61108.,
-101955., -111780., -115647., -84261.])
I created a heatmap where correlations of two entities are visualized. However, as the matrix is symmetric i added significance values below the diagonal for higher information density. As those values are usually far smaller than the ones of the correlation coefficient I want to use a second colormap to differentiate between the upper and lower diagonal of the matrix. The code is the following:
fig = px.imshow(data,
labels=dict(x="Correlation of Returns", y="", color="PCC"),
x=domain,
y=domain,
color_continuous_scale=px.colors.diverging.balance,
zmin=-1, zmax=1
)
The data object simply is my nxn matrix as a list of lists. Domain is my label values. The following graph already contains one colormap:Sample HeatMap. Is there a way to add a second one and refer it to the values below the diagonal? I didn't find a solution online yet. Thanks in advance!
Note: I am using Dash, so I may need to stick to plotly figures and won't be able to use e.g. matplotlib
I have a high frequency set of ordered 1D data set that relates to observations of a property with respect to depth, consisting of a continuous float value observation versus monotonically increasing depth
I'd like to find a way to coarsen this data set up into user defined number of contiguous bins (or zones), each of which is described by a single mean value and lower depth limit (the top depth limit being defined by the end of the zone above it). The criteria for splitting the zones should be k-means like - in that (within the bounds of the number of zones specified) there will be minimum property variance within each zone and maximum variation between adjacent zones.
As an example, if I had a small high frequency dataset as follows;
depth = [2920.530612, 2920.653061, 2920.734694, 2920.857143, 2920.938776, 2921.102041, 2921.22449, 2921.346939, 2921.469388, 2921.510204, 2921.55, 2921.632653, 2921.795918, 2922, 2922.081633, 2922.122449, 2922.244898, 2922.326531, 2922.489796, 2922.612245, 2922.857143, 2922.979592, 2923.020408, 2923.142857, 2923.265306]
value = [0.0098299, 0.009827939, 0.009826632, 1004.042327, 3696.000306, 3943.831644, 3038.254723, 3693.543377, 3692.806616, 50.04989348, 15.0127, 2665.2111, 3690.842641, 3238.749497, 429.4979635, 18.81228993, 1800.889643, 2662.199897, 3454.082382, 3934.140146, 3030.184014, 0.556587319, 8.593768956, 11.90163067, 26.01012696]
And I was to request a split into 7 zones, it would return something like the following;
depth_7zone =[2920.530612, 2920.857143, 2920.857143, 2921.510204, 2921.510204, 2921.632653, 2921.632653, 2922.081633, 2922.081633, 2922.244898, 2922.244898, 2922.979592, 2922.979592, 2923.265306]
value_7zone = [0.009828157, 0.009828157, 3178.079832, 3178.079832, 32.53129674, 32.53129674, 3198.267746, 3198.267746, 224.1551267, 224.1551267, 2976.299216, 2976.299216, 11.76552848, 11.76552848]
which can be visualized as (blue = original data, red = data split into 7 zones);
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
plt.plot(value, depth, '-o')
plt.plot(value_7zone, depth_7zone, '-', color='red')
plt.gca().invert_yaxis()
plt.xlabel('Values')
plt.ylabel('Depth')
plt.show()
I've tried standard k-means clustering, and it doesn't appear suited to this ordered 1D problem. I was thinking of methods perhaps used for digital signal processing but all I could find discretize into constant bin sizes, or even for image compression but that may be overkill and likely expect 2D data
Can anyone suggest an avenue to explore further? (I'm fairly new to Python so apologies in advance)