I would like to take a temperature variable from a netcdf file in python and average over all of the satellite's scans.
The temperature variable is given as:
tdisk = file.variables['tdisk'][:,:,:] # Disk Temp(nscans,nlons,nlats)
The shape of the tdisk array is 68,52,46. The satellite makes 68 scans per day. The longitude and latitude variables are given as:
lats = file.variables['latitude'][:,:] # Latitude(nlons,nlats)
lons = file.variables['longitude'][:,:] # Longitude(nlons,nlats)
Which have sizes of 52,46. I would like to average the each nscan of temperature together to get a daily mean so the temperature size becomes 52,46. I've seen ways to stack the arrays and concatenate them, but I would like a mean value. Eventually I am looking to make a contour plot with (x=longitude, y=latitude , and z=temp)
Is this possible to do? Thanks for any help
If you are using Xarray, you can do this using DataArray.mean:
import xarray as xr
# open netcdf file
ds = xr.open_dataset('file.nc')
# take the mean of the tdisk variable
da = ds['tdisk'].mean(dim='nscans')
# make a contour plot
da.plot.contour('longitude', 'latitude')
Based on the question you seem to want to calculate a temporal mean, not a daily mean as you seem to only have one day. So the following will probably work:
ds.mean(“time”)
Related
I have a large dataset, with hourly data for an entire year. What I want to do is to create a new dataset with specific variables at sepcific distances from a point source and use all the data to create a box and wiskers plot.
The dataset has time, lon and lat and a concentration variable with multiple dimensions:
Concentration[hours,lat, lon]
I want to create a dataset that loops through all of the times for different lat and lon and produce a concentration output for all time at each of these different locations, to then use it to create a box and wisker plot and to show the decreace of atmospheric concetration from a point source. I know the specific grids I am interested in but need help setting up the script.
EDIT:
I cropped the Global dataset and this is what the output currently looks like:
time: (8761), latitude: (30), longitude: (30)
I tried using a for loop, but it would not allow me to loop over lat/lon...
for i in range(8761):
print(Conc[i,:,:])
This lets me loop over all times and see the conc at all grids, but I want to instead of printing create a new ds, and also only loop through certains grids..
I wants a list that provides me 8761 concentrations values for each grid that I specify, and to keep all the data in one dataset so I can make a Box plot from this...
So I am pretty new in programming, currently doing some research on netCDF .nc files to work with python. I have the following code which I am sure it will not work. The objective here is to plot a graph simple line graph of 10m u-component of winds over time.
The problem I think is that 10m u-component of winds has 4D shape of (time=840, expver=2, latitude=19, longitude=27) while time being only (time=840).
Any replies or ideas will be much appreciated! The code is as shown:
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
nc = Dataset(r'C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc','r')
for i in nc.variables:
print(i)
lat = nc.variables['latitude'][:]
lon = nc.variables['longitude'][:]
time = nc.variables['time'][:]
u = nc.variables['u10'][:]
plt.plot(np.squeeze(u), np.squeeze(time))
plt.show()
Right, you have winds that represent the 10m wind at every location of the model grid (lat,lon) as well as dimensioned on exper--not sure what that last one is. You need to select a subset of u. For example, let's pick exper=1, and lat/lon indexes of lat=8, lon=12 (not sure where those are going to be:
exper_index = 1
lat_index = 8
lon_index = 12
# ':' below means "full slice"--take everything along that dimension
plt.plot(u[:, exper_index, lat_index, lon_index], time)
plt.title(f'U at latitude {lat[lat_index]}, longitude {lon[lon_index]}')
plt.show()
Have you tried using xarray?
I think that it will be easier for you to read the netCDF4 file and plot it using matplotlib.
This is straight forward:
import xarray as xr
ds = xr.open_dataset('C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc')
This line will plot the timeseries of the horizontal mean U10
ds['u10'].mean(['longitude','latitude']).plot()
You can also select by the value or index in a specific dimension using sel and isel methods:
This line selects the 10th latitude and 5th longitude and plot it. In this case I am interested in specific indexes for latitude and longitude, not in the real units.
ds['u10'].isel(latitude=10,longitude=5).plot()
This line selects the nearest value of latitude and longitude to the given values and plot it. In this case, I am interested in the values of latitude and longitude in the real units.
ds['u10'].sel(latitude=-15,longitude=40,method='nearest').plot()
See their documentation to learn more about xarray.
I hope this solution is better for your case and it also introduce you this great tool. Please, let me know if you still need some help with this.
I have one dataset of satellite based solar induced fluorescence (SIF) and one of modeled precipitation. I want to compare precipitation to SIF on a per pixel basis in my study area. My two datasets are of the same area but at slightly different spatial resolutions. I can successfully plot these values across time and compare against each other when I take the mean for the whole area, but I'm struggling to create a scatter plot of this on a per pixel basis.
Honestly I'm not sure if this is the best way to compare these two values when looking for the impact of precip on SIF so I'm open to ideas of different approaches. As for merging the data currently I'm using xr.combine_by_coords but it is giving me an error I have described below. I could also do this by converting the netcdfs into geotiffs and then using rasterio to warp them, but that seems like an inefficient way to do this comparison. Here is what I have thus far:
import netCDF4
import numpy as np
import dask
import xarray as xr
rainy_bbox = np.array([
[-69.29519955115512,-13.861261028444734],
[-69.29519955115512,-12.384786628185896],
[-71.19583431678012,-12.384786628185896],
[-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)
# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')
# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')
# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])
Currently I'm getting a seemingly unhelpful RecursionError: maximum recursion depth exceeded in comparison on that final line. When I add the argument join='left' then the data from the rainy_imerg_xds dataset is in combo_xds and when I do join='right' the rainy_xds data is present, and if I do join='inner' no data is present. I assumed there was some internal interpolation with this function but it appears not.
This documentation from xarray outlines quite simply the solution to this problem. xarray allows you to interpolate in multiple dimensions and specify another Dataset's x and y dimensions as the output dimensions. So in this case it is done with
# interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
# interpolation can't be done across the chunked dimension so we have to load it all into memory
rainy_sif_xds.load()
#interpolate into the higher resolution grid from IMERG
interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])
# visualize the output
rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')
# now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()
# now the merge can easily be done with
merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")
# now visualize the two datasets together // multiply SIF by 30 because values are so ow
merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
(merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')
I have a dataset where each sample consists of x- and y-position, timestamp and a pressure value of touch input on a smartphone. I have uploaded the dataset here (OneDrive): data.csv
It can be read by:
import pandas as pd
df = pd.read_csv('data.csv')
Now, I would like to create a heat map visualizing the pressure distribution in the x-y space.
I envision a heat map which looks like the left or right image:
For a heat map of spatial positions a similar approach as given here could be used. For the heat map of pressure values the problem is that there are 3 dimensions, namely the x- and y-position and the pressure.
I'm happy about every input regarding the creation of the heat map.
There are several ways data can be binned. One is just by the number of events. Functions like numpy.histogram2d or hist2d allow to specify weights to each data point to manipulate the weight of each event.
But there is a more general histogram function that might be useful in your case: scipy.stats.binned_statistic_2d
By using the keyword argument statistic you can pick how the value of each bin is calculated from the values that lie within:
mean
std
median
count
sum
min
max
or a user defined function
I guess in your case mean or median might be a good solution.
Forgive me if this is simple but I am new to python. I have daily wind speed data with one data point for every latitude(180) and longitude(360) and time(6624) which is a 3D array with numpy.shape (time, lat, lon). I am trying to extract every wind speed and put it into a new array or list so that I can plot a histogram or a probability density function. Is there a way in python to extract each of these values?
so if you do wind_speedjja.shape you get (6624, 180, 360)?
This is not an efficient answer, more written for being illustrative with a nested loop.
all_wsp = np.array([])
mtx = wind_speed.shape
for idx_lat in range(mtx[1]):
for idx_long in range(mtx[2]):
lat_long_wsp = wind_speed[:, idx_lat, idx_long]
# do a plot on lat_long_wsp, or your histogram
all_wsp = np.concatenate((all_wsp, lat_long_wsp))
# all_wsp will be all single values in a flattened array
If you are just after the flattened array, do flat_wsp = windspeed.flatten().
Your data are huge, so you must first have global approach.
As a toy example :
from pylab import *
wind = rand(662,18,36)
means = wind.mean(axis=0)
subplot(121)
hist(means.ravel(),100)
subplot(122)
imshow(means)
colorbar()
show()
Then you can decide which area you will refine.