Unmasking and handling satellite data in python - python

I was working with satellite data and by reading the variable it created a masked array by masking the fill values. But now I can't extract any values from that masked array.
How can I unmask the array
The code is mentioned below
import glob
import os
import netCDF4 as nc
import numpy as np
#listing all nc files in this folder
myfiles = glob.glob('*.nc')
#now myfiles has every nc file name saved as an array
#reading lat and lon one time because same band in every file
aods=[] #will save the AOD data in this array
# Creating Loop and reading AOD Data
for i in range(len(myfiles)):
now I want to extract values of specific latitude and Longitude from the list of masked array list (aods) but I am unable to do so
any solutions?


How to export 3D array into a single row in excel using python

I am attempting to export a large array of 3D points into excel.
import numpy as np
import pandas as pd
d = np.asarray(data)
df = pd.Dataframe(d)
This exports the data into rows as below:
3.361490011 -27.39559937 -2.934410095
4.573401244 -26.45699201 -3.845634521
Each line representing the x,y,z coordinates. However, for my analysis, I would like that the 2nd row is moved to columns beside the 1st row, and so on, so that all the coordinates for one shape are on the one row of the excel. I tried turning the data into a string but this returned the above too.
The reason is so I can add some population characteristics to the row for each 3d shape. Thanks for any help that anyone can give.
you can use x = df.to_numpy().flatten() to flatten your data and then save it to csv using np.savetxt.

Sentinel3 OLCI (chl) Average of netcdf files on Python

I'm having some troubles with trying to get a monthly average with Sentinel 3 images on... Everything, really. Python, Matlab, we are two people getting stuck in this problem.
The main reason deals with the fact that these images' information is not on a single netcdf file, neatly put with coordinates and products. Instead, they are all in separate files inside a one day folder as
different .nc files with different information each, about one single satellite image. SNAP uses an xmlxs file to work with all of these separate .nc files as I understand it.
Now, I though it would be a good idea to try to merge and create/edit the .nc files as to create a new daily .nc which included the chlorophyll, the coordinates and, might as well add it, time. Later on, I would merge these new ones so to be able to make a monthly mean with xarray. At least that was my idea but I can't do the first part. It might be an obvious solution however here's what I tried, using the xarray module
import os
import numpy as np
import xarray as xr
import netCDF4
from netCDF4 import Dataset
nc_folder = df_try.iloc[0] #folder where the image files are
#open dataset in xarray
nc_chl = xr.open_dataset(str(nc_folder['path']) + '/' + 'chl_nn.nc') #path to chlorophyll file
n_coord =xr.open_dataset(str(nc_folder['path'])+ '/'+ 'geo_coordinates.nc') #path to coordinates file
n_time = xr.open_dataset(str(nc_folder['path'])+ '/' + 'time_coordinates.nc') #path to time file
ds_grid = [[nc_chl], [n_coord], [n_time]]
combined = xr.combine_nested(ds_grid, concat_dim=[None, None])
combined #dataset with all but not recognizing coordinates
ds = combined.rename({'latitude': 'lat', 'longitude': 'lon', 'time_stamp' : 'time'}).set_coords(['lon', 'lat', 'time']) #dataset recognizing coordinates as coordinates
which gives a dataset with
Dimensions: columns 4865 rows: 4091
3 coordinates (lat, lon and time) and the chl variable.
Now, it doesn't save to netcdf4 (I tried but there was an error) but I was also thinking if anyone knew of another way to make an average? I have images from three years (beginning on 2017 to ending on 2019) I would need to average in different ways (monthly, seasonally...). My main current problem is that the chlorophyll values are separate from the geographical coordinates so directly only using the chlorophyll files should not work and would just make a mess.
Any suggestions?
Two options here:
Using xarray
In xarray you can add them as coordinates. It is a bit tricky as the coordinates in the geo_coordinates.nc file are multidimensional as well.
A possible solution is the following:
import netCDF4
import xarray as xr
import matplotlib.pyplot as plt
# paths
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\chl_nn.nc' #set path to chl file
coor = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\geo_coordinates.nc' #set path to the coordinates file
# loading xarray datasets
ds = xr.open_dataset(root)
olci_geo_coords = xr.open_dataset(coor)
# extracting coordinates
lat = olci_geo_coords.latitude.data
lon = olci_geo_coords.longitude.data
# assign coordinates to the chl dataset (needs to refer to both the dimensions of our dataset)
ds = ds.assign_coords({"lon":(["rows","columns"], lon), "lat":(["rows","columns"], lat)})
# clip the image (add your own coordinates)
area_of_interest = ds.where((10 < ds.lon) & (ds.lon < 12) & (58 < ds.lat) & (ds.lat < 59), drop=True)
# simple plot with coordinates as axis
Even simpler is to add them as variables in a new dataset:
# path to the folder
root = r'C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\*.nc' #set path to chl file
# create a dataset by combining nc files (coordinates will become variables)
ds = xr.open_mfdataset(root,combine = 'by_coords')
But in this case when you plot the image or clip it you cannot use the coordinates directly.
Using snappy
In python the snappy package is available and based on SNAP toolbox (which is implemented on JAVA). Check: https://senbox.atlassian.net/wiki/spaces/SNAP/pages/19300362/How+to+use+the+SNAP+API+from+Python
Once installed (unfortunately snappy supports only python 2.7, 3.3 or 3.4), you can use the available SNAP function directly on python to aggregate your satellite images and create week/month averages. You then do not need to merge the lon, lat netcdf file as you will work on the xfdumanifest.xml and SNAP will take care of that.
This is an example. It performs aggregation as well (mean calculated on two chl nc files):
from snappy import ProductIO, WKTReader
from snappy import jpy
from snappy import GPF
from snappy import HashMap
# setting the aggregator method
aggregator_average_config = snappy.jpy.get_type('org.esa.snap.binning.aggregators.AggregatorAverage$Config')
agg_avg_chl = aggregator_average_config('CHL_NN')
# creating the hashmap to store the parameters
HashMap = snappy.jpy.get_type('java.util.HashMap')
parameters = HashMap()
#creating the aggregator array
aggregators = snappy.jpy.array('org.esa.snap.binning.aggregators.AggregatorAverage$Config', 1)
#adding my aggregators in the list
aggregators[0] = agg_avg_chl
# set parameters
# output directory
dir_out = 'level-3_py_dynamic.dim'
parameters.put('outputFile', dir_out)
# number of rows (directly linked with resolution)
parameters.put('numRows', 66792) # to have about 300 meters spatial resolution
# aggregators list
parameters.put('aggregators', aggregators)
# Region to clip the aggregation on
wkt="POLYGON ((8.923302175377243 59.55648108694149, 13.488748662344074 59.11388968719029,12.480488185001589 56.690625338725155, 8.212366327767503 57.12425256476263,8.923302175377243 59.55648108694149))"
geom = WKTReader().read(wkt)
parameters.put('region', geom)
# Source product path
path_15 = r"C:<your_path>\S3B_OL_2_WFR____20201015.SEN3\xfdumanifest.xml"
path_16 = r"C:\<your_path>\S3B_OL_2_WFR____20201016.SEN3\xfdumanifest.xml"
path = path_15 + "," + path_16
parameters.put('sourceProductPaths', path)
#result = snappy.GPF.createProduct('Binning', parameters, (source_p1, source_p2))
# create results
result = snappy.GPF.createProduct('Binning', parameters) #to be used with product paths specified in the parameters hashmap
print("results stored in: {0}".format(dir_out) )
I am quite new and interested in the topic and would be happy to hear your/other solutions!

ndarray with 3 dimension into pandas dataframe

I know this topic has been asked before, but as i'm new to python I couldn't fully understand how to do that and I would like to get explanations about.
I have ndarray cube (stack of images from the same location with the same size and shape which differs in the wavelength they were taken).
I want to convert this image into pandas dataframe in order to be able to iterate through specific rows.
i'm really confused because of the big number of columns I have: I ahve 1024 columns in each image and that confuse me when I need to index those images.
My end goal is to get in the end the images in structure of df, so maybe it means to have kind of imagecollection that I can iterate rows in each one of them.
this is the code I have written until now:
import spectral.io.envi as envi
import matplotlib.pyplot as plt
import os
from spectral import *
import numpy as np
#Create the image path
#the path
img_path = r'N:\this\is\a\path\capture'
#the specific file
img_file = 'emptyname_2019-08-13_11-05-46.hdr'
img_dark= 'DARKREF_emptyname_2019-08-13_11-05-46.hdr'
cali_hdr= 'Radiometric_1x1.hdr'
cali_img = 'Radiometric_1x1.cal'
img= envi.open(os.path.join(img_path,img_file)).load()
img_dark= envi.open(os.path.join(img_path,img_dark)).load()
img_cali= envi.open(os.path.join(cali_path,cali_hdr), image = os.path.join(cali_path,cali_img)).load()
print('shape image:',img_shape,'shape dark:',dark_shape,'calibration shape:',cali_shape)
wavelength=[float(i) for i in img.metadata['wavelength']]
#get the exposure time
#goak: need to reduce the dark reference from DN image.
#step 1: for each column in the dark reference, calculate mean. then reduce this mean line from the DN image.
#we have created average according to the horizontal axix- axis=0, it calculates the mean for the whole column and we get one row.
from numpy import asarray
import pandas as pd

numpy.mean of mixing layer height from multiple netcdf files

I know how to calculate the mean of a variable in one netcdf file. But, I have 40 netcdf files. In each file I have 4000 data values for mixing layer height. I want to create a list of mean mixing layer height for the multiple netcdf files.
In the end the size of my list should be 40.
Can some help me with a python code to create this list?
Thank you so much.
Here is the code I used to calculate the mean mixing layer height for one layer in a single netcdf file
import numpy as np
import netCDF4
f = netCDF4.Dataset('niv.nc')
#the shape of my data set is (5760,3)
#5760 is the number of lists of time
#In each list I have 3 mixing layer heights for 3 layers.
#I'm going to call all the mixing layer height data for the first layer
a= (f.variables['pbl'][:,0])
print (np.mean(a))
You have to get the list of filenames somehow. Here I'll assume you have all your files in one folder, and there are no other netCDF files in that folder.To do this using netCDF4 and requiring a separate mean for each file
import numpy as np
import netCDF4
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
mean_list = []
for filename in filename_list: # make filename_list with something like os.listdir
with netCDF4.Dataset(filename) as ds:
mean_list.append(np.mean(ds.variables['pbl'][:, 0]))
To do the same thing with xarray:
import xarray as xr
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
mean_list = []
for filename in filename_list: # make filename_list with something like os.listdir
with xr.open_dataset(filename) as ds:
mean_list.append(np.mean(ds['pbl'][:, 0].values))
If instead of getting the average for each file, let's say the first dimension is time and you want to get the average among all the files. To do that with xarray, you could use open_mfdataset like so:
import xarray as xr
import os
from glob import glob
# you want to modify this to use your actual data directory
filename_list = glob('/home/user/data_dir/*.nc')
ds = xr.open_mfdataset(filename_list, concat_dim='time')
mean = np.mean(ds['pbl'][:, 0].values)

Reading a .VTK polydata file and converting it into Numpy array

I want to convert a .VTK ASCII polydata file into numpy array of just the coordinates of the points. I first tried this: https://stackoverflow.com/a/11894302 but it stores a (3,3) numpy array where each entry is actually the coordinates of THREE points that make that particular cell (in this case a triangle). However, I don't want the cells, I want the coordinates of each point (without repeatition). Next I tried this: https://stackoverflow.com/a/23359921/6619666 with some modifications. Here is my final code. Instead of numpy array, the values are being stored as a tuple but I am not sure if that tuple represents each point.
import sys
import numpy
import vtk
from vtk.util.numpy_support import vtk_to_numpy
reader = vtk.vtkPolyDataReader()
nodes_vtk_array= reader.GetOutput().GetPoints().GetData()
print nodes_vtk_array
Please give suggestions.
You can use dataset_adapter from vtk.numpy_interface:
from vtk.numpy_interface import dataset_adapter as dsa
polydata = reader.GetOutput()
numpy_array_of_points = dsa.WrapDataObject(polydata).Points
From Kitware blog:
It is possible to access PointData, CellData, FieldData, Points
(subclasses of vtkPointSet only), Polygons (vtkPolyData only) this
You can get the point coordinates from a polydata object like so:
polydata = reader.GetOutput()
points = polydata.GetPoints()
array = points.GetData()
numpy_nodes = vtk_to_numpy(array)

