I am using Python 2, and dealing with a netcdf data.
This array is a variable called cloud water mixing ratio, which is an output from WRF climate model that has 4 dimensions:
QC(time (25), vertical level (69), latitude (119), longitude (199))
I'm trying to get the minimum value of the values in this array. From initial analysis using NCVIEW visualisation, I found that the minimum value is approximately 1x10-5 and the maximum is 1x10-3.
I've used
var = fh.variables['QC']
var[:].max()
var[:].min()
The max works fine, but the min gives me 0.0.
Then I tried a solution from here , which is
var[var>0].min()
but I also get zero. Then I realised that the above code works for arrays with negatives, while mine doesn't have negatives.
I've tried looking for solutions here and there but found nothing that works for my situation. Please, if anyone could point me to the right directions, I'd appreciate it a lot.
Thanks.
var[var>0].min is a function, you need to call it using ()
var[var>0].min() should work much better
sorry for not being able to post the data as I don't have the privilege to share the data. I have tried creating a random 4d array that is similar to the data, and used all the solution you all provided, especially by #Joao Abrantes, they all seemed to work fine. So I thought maybe there is some problem with the data.
Fortunately, there is nothing wrong with the data. I have discussed this with my friend and we have finally found the solution.
The solution is
qc[:][qc[:]>0].min()
I have to specify the [:] after the variable instead of just doing
qc[qc>0].min()
There is also another way, which is to specify the array into numpy array because, qc = fh.variables['QC']
returns a netCDF4.Variable. By adding the second line qc2 = qc[:], it has become numpy.ndarray.
qc = fh.variables['QC']
qc2 = qc[:] # create numpy array
qc2[qc2>0].min()
I'm sorry if my question was not clear when I posted it yesterday. As I have only learned about this today.
Related
Situation
I have two datasets:
Raster data loaded using rioxarray, an xarray.DataArray
An geopandas.DataFrame with geometries indicating areas in the 1. dataset
The geo data in both datasets are in the same CRS (EPSG:4326).
Problem
For each entry in 2. I want to aggregate all values from 1. which overlap with the specific geometry. Kind of like an .group-by() using the geometries + .sum().
Current WIP approach
The package xagg does that already, is unfortunately slow on a subset of my dataset and scales worse when I try to use it on my full dataset.
Question
Is there an simple and efficient way to do this in Python?
(The solution wouldn't need to replicate the results from xagg accurately.)
WRT your comment here is some pseudo code I've used to do what you're after. The function being executed outputs files in this case. If it's not obvious this, strategy isn't going to help if you just have 1 big raster and 1 big poly file. This method assumes tiled data and uses an indexing system to match the right rasters with the overlaying polys. The full example is kind of beyond the scope of a single answer. But if you ask specifics if you have issues I can try to assist. In addition to dask's good documentation, there are lots of other posts on here with dask delayed examples.
results_list = []
for f in raster_file_list:
temp_list = dask.delayed(your_custom_function)(f, your_custom_function_arg_1, your_custom_function_arg_2)
results.append(temp_list)
results = dask.compute(*results_list, scheduler='processes')
I am trying to replicate a paper whose code was written in Stata for my course project using Python. I have difficulty replicating the results from a collapse command in their do-file. The corresponding line in the do-file is
collapse lexptot, by(clwpop right)
while I have
df.groupby(['cwpop', 'right'])['lexptot'].agg(['mean'])
The lexptot variable is the logarithm of a variable 'exptot' which I calculated previously using np.log(dfs['exptot]).
Does anyone have an idea what is going wrong here? The means I calculate are typically around 1.5 higher than the means calculated in Stata.
Once you update the question with more relevant details maybe I can answer more. But this is what I think might help you!
df.groupby(['cwpop', 'right']).mean()['lexptot']
This is my first post on StackOverflow. I've never asked here before because all of it found its answer by browsing here, except this one time.
I am trying to plot date to have a schematic representation but I am not sure what is the 'best way' to do it and how to achieve what I thought of.
I figured this is the best way to represent the dataset I have :
So each data has
a x-axis start
a x-axis end
a y-axis value
a z-axis value
My data are stored into csv file like
start | stop | y-value | z-value
I thought about using heatmap to do so but I am not sure if :
This is the best way to do it (The overlap can be problematic to handle)
is there is an easy way to do it (should I manually add all the required point between start & stop?
If I want to highlight some data, can I change (just for some of it) the z-color scale?
I thought a little help from here might clarify things :)
Cheers,
Little update : The way I was heading is working. I am not sure whether is is the best aproach but at least it is working.
I still need to make formating better but this is more or less what one can get :
So, the way I implemented it is to build a matrix and for each data fill each discreet point of the matrix with the highest z-value.
I'm currently trying to fit a set of (positive) data with the powerlaw.Fit() function from the powerlaw package. However, every single time I do this I obtain the following message:
<powerlaw.Fit at 0x25eac6d3e80>
which I've been trying to figure out what it means for ages, but obviously without success. Another issue that I've been facing is that whenever I plot my CCDF using
powerlaw.plot_ccdf()
and my PDF using
powerlaw.plot_pdf()
with my data, I only obtain a plot for the CCDF but nothing for the PDF. Why are all of these things happening? My data is within a NumPy array and looks as follows:
array([ 9.90857053e-06, 3.45336391e-05, 4.06757403e-05, ...,
6.91411789e-02, 6.92511375e-02, 7.45046008e-02])
I doubt there is any kind of issue with my data, since, as I said, I get the plot for the CCDF more than fine. Any kind of help would be highly appreciated. Thanks in advance. (Edit: the data is composed of 1908 non-integer values)
It probably helps to read the documentation. http://pythonhosted.org/powerlaw/
powerlaw.Fit is a class, so when you call powerlaw.Fit(...), you will get an object with associated methods. Save the object in a variable, then pull the results you want from it. For example:
results = powerlaw.Fit(data)
print(results.find_xmin())
The 'message' you are getting is just a placeholder for the Fit object that is created.
The Blaze readme (here https://github.com/ContinuumIO/blaze) describes a number of improvements over NumPy including "Synthetic Dimensions". I have searched around but have been unable to find out what they are.
Could someone enlighten me?
Thanks.
A couple of caveats first. One, I work on the project and second these features are still in the planning stage so they are not implemented at this moment.
That being said a "synthetic dimension" in Blaze is much like a derived column in database. It is a value calculated formulaically from an expression in terms of other rows or columns but does not exist in physical memory or on disk.
In Blaze we trying to make these kind of columns ( especially for columnar data stores ) be able to be consumed by next generation ufunc objects that are able to calculate values in the computation from the synthetic dimensions as well as physical dimensions.
Hope that answers the question!