Calculating total depth of a variable - python

I have calculated the Moist Brunt-Vaisala frequency.
Let's say that the variable is moistb and has a dimension of [height, lat, lon].
I would like to plot the horizontal distribution of the total depth of the moistb.
How do I calculate the total depth? The idea is to sum all the depth of moistb in each grid point. Is there a way to do this with metpy?
For reference, here's an example as shown by Schumacher and Johnson (2008)
where they plot the horizontal distribution of total depth (m).

It sounds like in this case that you're working with data stored in an Xarray DataArray. If so, the way to do what you're looking for is:
moistb.sum(dim='height')
You can also do this with regular numpy arrays (or a DataArray) by using the axis argument, which corresponds to the number of the dimension in order. So for the order listed above this would be:
moistb.sum(axis=0)
For more information see the Xarray docs or the Numpy docs.

Related

Is it possible to generate data with peak and x y location?

I am trying to create a 3d surface plot like this, link available here :
https://plotly.com/python/3d-surface-plots/
But the problem is that I only have limited data available where I only have data for the peak location and the height of peak but the rest of the data is missing. In the example z-data need 25 X 25 values 625 data points to generate a valid surface plot.
My data looks something like this:
So my question is that, is it possible to use some polynomial function with the peak location value as a constrain to generate Z-data based on the information I have?
Open to any discussion. Any form of suggestion is appreciated.
Though I don't like this form of interpolation, which is pretty artificial, you can use the following trick:
F(P) = (Σ Fk / d(P, Pk)) / (Σ 1 / d(P, Pk))
P is the point where you interpolate and Pk are the known peak positions. d is the Euclidean distance. (This gives sharp peaks; the squared distance gives smooth ones.)
Unfortunately, far from the peaks this formula tends to the average of the Fk, giving an horizontal surface that is above some of the Fk, giving downward peaks. You can work around this by adding fake peaks of negative height around your data set, to lower the average.

How do I estimate the 80% cumulative distribution point in scipy, numpy and/or Python?

Seaborn has a kdeplot function where if you pass in cumulative=True, then a cumulative distribution of the data is drawn. I need to annotate or figure out the value on the x-axis at which the cumulative distribution is 80% and then draw a vertical line from that value.
Is there a method in numpy, scipy or elsewhere in Python that may compute that value?
If you already have the cdf, then you can do the following. I'm not sure how your data is formatted, but assuming you have two arrays, one of x-values and one of y-values, you can search for the index of the y-value just above 0.8. The corresponding x-value would be what you're looking for. A quick way to do this, since your y-values should already be sorted, is:
import bisect
index = bisect.bisect_right(y_vals, 0.8) - 1
This is a nearest neighbor approach. If you want a slightly more accurate x-value, you can linearly interpolate between index and index-1.

Heat map visualizing touch input on smartphone (weighted 2d binning, histogram)

I have a dataset where each sample consists of x- and y-position, timestamp and a pressure value of touch input on a smartphone. I have uploaded the dataset here (OneDrive): data.csv
It can be read by:
import pandas as pd
df = pd.read_csv('data.csv')
Now, I would like to create a heat map visualizing the pressure distribution in the x-y space.
I envision a heat map which looks like the left or right image:
For a heat map of spatial positions a similar approach as given here could be used. For the heat map of pressure values the problem is that there are 3 dimensions, namely the x- and y-position and the pressure.
I'm happy about every input regarding the creation of the heat map.
There are several ways data can be binned. One is just by the number of events. Functions like numpy.histogram2d or hist2d allow to specify weights to each data point to manipulate the weight of each event.
But there is a more general histogram function that might be useful in your case: scipy.stats.binned_statistic_2d
By using the keyword argument statistic you can pick how the value of each bin is calculated from the values that lie within:
mean
std
median
count
sum
min
max
or a user defined function
I guess in your case mean or median might be a good solution.

How to coarsen ordered 1D data into irregular bins with Python

I have a high frequency set of ordered 1D data set that relates to observations of a property with respect to depth, consisting of a continuous float value observation versus monotonically increasing depth
I'd like to find a way to coarsen this data set up into user defined number of contiguous bins (or zones), each of which is described by a single mean value and lower depth limit (the top depth limit being defined by the end of the zone above it). The criteria for splitting the zones should be k-means like - in that (within the bounds of the number of zones specified) there will be minimum property variance within each zone and maximum variation between adjacent zones.
As an example, if I had a small high frequency dataset as follows;
depth = [2920.530612, 2920.653061, 2920.734694, 2920.857143, 2920.938776, 2921.102041, 2921.22449, 2921.346939, 2921.469388, 2921.510204, 2921.55, 2921.632653, 2921.795918, 2922, 2922.081633, 2922.122449, 2922.244898, 2922.326531, 2922.489796, 2922.612245, 2922.857143, 2922.979592, 2923.020408, 2923.142857, 2923.265306]
value = [0.0098299, 0.009827939, 0.009826632, 1004.042327, 3696.000306, 3943.831644, 3038.254723, 3693.543377, 3692.806616, 50.04989348, 15.0127, 2665.2111, 3690.842641, 3238.749497, 429.4979635, 18.81228993, 1800.889643, 2662.199897, 3454.082382, 3934.140146, 3030.184014, 0.556587319, 8.593768956, 11.90163067, 26.01012696]
And I was to request a split into 7 zones, it would return something like the following;
depth_7zone =[2920.530612, 2920.857143, 2920.857143, 2921.510204, 2921.510204, 2921.632653, 2921.632653, 2922.081633, 2922.081633, 2922.244898, 2922.244898, 2922.979592, 2922.979592, 2923.265306]
value_7zone = [0.009828157, 0.009828157, 3178.079832, 3178.079832, 32.53129674, 32.53129674, 3198.267746, 3198.267746, 224.1551267, 224.1551267, 2976.299216, 2976.299216, 11.76552848, 11.76552848]
which can be visualized as (blue = original data, red = data split into 7 zones);
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
plt.plot(value, depth, '-o')
plt.plot(value_7zone, depth_7zone, '-', color='red')
plt.gca().invert_yaxis()
plt.xlabel('Values')
plt.ylabel('Depth')
plt.show()
I've tried standard k-means clustering, and it doesn't appear suited to this ordered 1D problem. I was thinking of methods perhaps used for digital signal processing but all I could find discretize into constant bin sizes, or even for image compression but that may be overkill and likely expect 2D data
Can anyone suggest an avenue to explore further? (I'm fairly new to Python so apologies in advance)

Is there a Python package that can trace a curve with a Gaussian lineshape over several x and y values?

My apologies for my ignorance in advance; I've only been learning Python for about two months. Every example question that I've seen on Stack Overflow seems to discuss a single distribution over a series of data, but not one distribution per data point with band broadening.
I have some (essentially) infinitely-thin bars at value x with height y that I need to run a line over so that it looks like the following photo:
The bars are the obtained from the the table of data on the far right. The curve is what I'm trying to make.
I am doing some TD-DFT work to calculate a theoretical UV/visible spectrum. It will output absorbance strengths (y-values, i.e., heights) for specific wavelengths of light (x-values). Theoretically, these are typically plotted as infinitely-thin bars, though we experimentally obtain a curve instead. The theoretical data can be made to appear like an experimental spectrum by running a curve over it that hugs y=0 and has a Gaussian lineshape around every absorbance bar.
I'm not sure if there's a feature that will do this for me, or if I need to do something like make a loop summing Gaussian curves for every individual absorbance, and then plot the resulting formula.
Thanks for reading!
It looks like my answer was using Seaborn to do a kernel density estimation. Because a KDE isn't weighted and only considers the density of x-values, I had to create a small loop to create a new list consisting of the x-entries each multiplied out by their respective intensities:
for j in range(len(list1)): #list1 contains x-values
list5.append([list1[j]]*int(list3[j])) #list5 was empty; see below for list3
#now to drop the brackets from within the list:
for k in range(len(list5)): #list5 was just made, containing intensity-proportional x-values
for l in list5[k]:
list4.append(l) #now just a list, rather than a list of lists
(had to make another list earlier of the intensities multiplied by 1000000 to make them all integers):
list3 = [i * 1000000 for i in list2] #list3 now contains integer intensities

Categories

Resources