xarray cannot directly convert an xarray.Dataset into a numpy array

xarray cannot directly convert an xarray.Dataset into a numpy array - python

I'm experiencing an error that I cannot seem to resolve when attempting to convert from a Dataset to array when using xarray. I'm encountering this because I'm attempting to add a time dimension to a netcdf file (open netcdf, add a timestamp that is the same across all data, save out netcdf).
import xarray as xr
import pandas as pd
scriptpath = os.path.dirname(os.path.abspath(__file__))
outputfile = scriptpath + '\\20210629_deadgrass.aus.nc'
times= pd.to_datetime(str(yesterday.strftime('%Y%m%d')))
time_da = xr.Dataset({"time": times})
arr = xr.open_dataset(outputfile)
ds = arr.to_array()
dst = ds.expand_dims(time=time_da) #errors here
The error I'm receiving is
Exception has occurred: TypeError
cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.
File "Z:\UpdateAussieGRASS.py", line 101, in <module>
dst = ds.expand_dims(time=time_da)
I can't seem to work out what I'm doing wrong with to_array() in the second last line. Examples of to_array() are here. Autogenerated documentation is here.

ds is already an xarray.DataArray. The error occurs on this line:
dst = ds.expand_dims(time=time_da) #errors here
Because while ds is a DataArray, time_da is not. This should work:
dst = ds.expand_dims(time=time_da.to_array())

Related

RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Long' after converting np array to list

I am trying to send image data across modules.
The following works fine:
def process_image(pdf_path, page_dimensions):
pdf_path = get_pdf(pdf_path, False)
pdf_name = os.path.basename(pdf_path)
with tempfile.TemporaryDirectory() as path:
conversion_time = time.time()
chart_images = convert_from_path(
pdf_path=pdf_path,
dpi=300,
fmt="jpg",
output_file=os.path.basename(str(pdf_path)).split(".")[0],
output_folder=path,
use_pdftocairo=False,
paths_only=True,
thread_count=8,
)
pg_dim, pg_image_path = page_dimensions[0], chart_images[0]
pg_image = cv2.imread(pg_image_path, cv2.IMREAD_UNCHANGED)
pg_image = cv2.resize(pg_image, (pg_dim[1], pg_dim[2]))
return pg_image
The result of this function (pg_image) works fine when taken as input to my Detectron2 model.
However, when I send pg_image.tolist() and convert back to np array on receiving (np.array(pg_image)) and send it to my Detectron2 model, I keep getting the following error:
RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Long'
So, if converting the np array to a list and converting back changing the data?

Unable to read variable from netCDF file

I'm trying to read a specific variable from a netCDF file but have been unable to do so. the variable is data type "int16" and has five dimensions integrated into it (e.g. latitude, longitude, time, elevation, etc.). So far, here is what I've tried and, consequently, here are the errors I've received.
import netCDF4 as nc
import numpy as np
x = nc.Dataset('file.nc')
lat = x.variables('lat')
print(lat)
shape = x.variables('shape')
The error reads:
"Attribute Error: netCDF4\_netCDF4.pyx in
netCDF4._netCDF4.Dataset._getattr_()
netCDF4._netCDF4.Dataset._getncattr_()
netCDF4._netCDF4.Dataset._get_att_()
netCDF4._netCDF4.Dataset._ensure_nc_success_()
Attribute Error: NetCDF: Attribute not found.
Any help would be greatly appreciated! Thanks!

Some typo's above. Try this:
import netCDF4 as nc
import numpy as np
x = nc.Dataset('file.nc')
lat = x.variables['lat']
print(lat)
shape = x.variables['lat'].shape
# or
shape = lat.shape
print(shape)

Converting VTK UnstructuredGrid to StructuredGrid

The background to my problem is that I have a 3D structure saved in a .vtk file that I need to manipulate (dilate, erode, etc.). The following code snippets are designed to be run sequentially, i.e. if you run them one after the other, there should be no problems (apart from those I mention!).
I'm very new to VTK, so apologies for any very basic mistakes!
Problem
My problem stems from a problem with SimpleITK, wherein it is unable to read UnstructuredGrid or PolyData:
In [1]: import SimpleITK as sitk
In [2]: img_vtk = sitk.ReadImage(file_vtk)
Traceback (most recent call last):
File "<ipython-input-52-435ce999db50>", line 1, in <module>
img_vtk = sitk.ReadImage(file_vtk)
File "/usr/local/lib/python3.5/dist-packages/SimpleITK/SimpleITK.py", line 8614, in ReadImage
return _SimpleITK.ReadImage(*args)
RuntimeError: Exception thrown in SimpleITK ReadImage: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:97:
sitk::ERROR: Unable to determine ImageIO reader for "/data/ROMPA_MRIandSeg/09S/Analysis/1_model/clip_dilate.vtk"
SimpleITK can, however, read StructuredGrid, so I tried to solve this by reading using VTK and converting.
import vtk
reader = vtk.vtkGenericDataObjectReader() # Using generic to allow it to match either Unstructured or PolyData
reader.SetFileName(file_vtk)
reader.Update()
output = reader.GetOutput()
However, from that point on, every method I've tried seems to have failed.
Proposed Solutions
Conversion to numpy, then conversion to sitk image
I attempted to convert it to a numpy array (), then interpolate a regular grid, with a dummy variable of 1 to specify the values on the structure.
from vtk.utils import numpy_support
import scipy.interpolate
import numpy as np
nparray = numpy_support.vtk_to_numpy(output.GetPointData().GetArray(0))
output_bounds = output.GetBounds()
x_grid = range(math.floor(output_bounds[0]),math.ceil(output_bounds[1]),1)
y_grid = range(math.floor(output_bounds[2]),math.ceil(output_bounds[3]),1)
z_grid = range(math.floor(output_bounds[4]),math.ceil(output_bounds[5]),1)
grid = list()
for x in x_grid:
for y in y_grid:
for z in z_grid:
grid.append((x,y,z))
dummy = np.array([1 for i in range(nparray.shape[0])])
npgrid = scipy.interpolate.griddata(nparray,dummy,grid,fill_value=0)
npgrid.reshape(len(x_grid),len(y_grid),len(z_grid))
img = sitk.GetImageFromArray(npgrid)
sitk.WriteImage(img,file_out)
However, when I load this in ParaView, a bounding box is displayed for the output, but a contour of the output is empty.
Using ShepardMethod
I attempted to interpolate using the built-in ShepardMethod, after converting the UnstructuredGrid to PolyData (as I'd mostly seen ShepardMethod being applied to PolyData):
bounds = output.GetBounds()
spacings = [1.0,1.0,1.0] # arbitrary spacing
dimensions = [0,0,0]
for i,spacing in enumerate(spacings):
dimensions[i] = int(math.ceil((bounds[i*2 + 1]-bounds[i*2])/spacing))
vtkPoints = vtk.vtkPoints()
for i in range(0,nparray.shape[0]):
x=nparray[i,0]
y=nparray[i,1]
z=nparray[i,2]
p=[x,y,z]
vtkPoints.InsertNextPoint(p)
poly = vtk.vtkPolyData()
poly.SetPoints(vtkPoints)
shepard = vtk.vtkShepardMethod()
shepard.SetInputData(poly)
shepard.SetSampleDimensions(dimensions)
shepard.SetModelBounds(output.GetBounds())
shepard.Update()
shepard_data = shepard.GetOutput().GetPointData().GetArray(0)
shepard_numpy = numpy_support.vtk_to_numpy(shepard_data)
shepard_numpy = shepard_numpy.reshape(dimensions[0],dimensions[1],dimensions[2])
shepard_img = sitk.GetImageFromArray(shepard_numpy)
sitk.WriteImage(shepard_img,file_out)
As with the numpy effort above, this provided a bounding box in ParaView. Applying a contour provided a structure of two triangles, i.e. next to nothing seems to have been successfully written. Alternatively, I attempted to write the output directly using VTK.
shepard_data = shepard.GetOutput()
shepard_grid = vtk.vtkImageToStructuredGrid()
shepard_grid.SetInputData(shepard_data)
shepard_grid.Update()
writer = vtk.vtkStructuredGridWriter()
writer.SetFileName(file_out)
writer.SetInputData(shepard_grid.GetOutput())
writer.Write()
This produced the same output as before.
Using ProbeFilter
I tried the above using ProbeFilter instead (with both conversion to numpy and writing directly). Unfortunately, the output was the same as above.
mesh = vtk.vtkStructuredGrid()
mesh.SetDimensions(dimensions)
probe = vtk.vtkProbeFilter()
probe.SetInputData(mesh)
probe.SetSourceData(output)
probe.Update()
probe_out = probe.GetOutput()
writer = vtk.vtkStructuredGridWriter()
writer.SetFileName(file_out)
writer.SetInputData(probe.GetOutput())
writer.Write()
probe_data = probe.GetOutput().GetPointData().GetArray(0)
probe_numpy = numpy_support.vtk_to_numpy(probe_data)
probe_numpy = probe_numpy.reshape(dimensions[0],dimensions[1],dimensions[2])
probe_img = sitk.GetImageFromArray(probe_numpy)
sitk.WriteImage(probe_img,file_out)
However, this seemed to produce no viable output (vtkStructuredGridWriter produced an empty file, and probe_numpy was empty).
Changing ParaView output
My original data comes from a structuredGrid .vtk file, that I open using ParaView, and then clip to remove structures that aren't required in the mesh. Saving the output saves an unstructuredGrid, and I have been unable to figure out whether I can change that, and avoid this mess in the first place!

Just use "Resample With Dataset" filter in ParaView.
Open ParaView
Open a StructuredGrid file file with the geometry you want it to have
Open your UnstructuredGrid file
Add a "Resample with dataset" filter
Select structured data as source input
Apply

How to use numpy in the Programmable Filter in ParaView

Assume, I have a ProgrammableFilter in paraview, which gets two inputs: mesh1 with data and mesh2 without.
Furthermore, I know the permutation of the points from mesh1 to mesh2.
Inside the filter, I can access the point values through
data0=inputs[0].GetPointData().GetArray('data')`
and obtain a part of the array using
subData=data0[0:6]
for example. But how could I add this subData to the output without a python loop?
To experiment with the code, I created a (not so small) working example:
#!/usr/bin/python
from paraview.simple import *
import numpy as np
import vtk
from vtk.util.numpy_support import numpy_to_vtk
#generate an arbitrary source with data
mesh2=Sphere()
mesh2.Center=[0.0, 0.0, 0.0]
mesh2.EndPhi=360
mesh2.EndTheta=360
mesh2.PhiResolution=100
mesh2.Radius=1.0
mesh2.StartPhi=0.0
mesh2.StartTheta=0.0
mesh2.ThetaResolution=100
mesh2.UpdatePipeline()
#add the data
mesh2Vtk=servermanager.Fetch(mesh2)
nPointsSphere=mesh2Vtk.GetNumberOfPoints()
mesh2Data=paraview.vtk.vtkFloatArray()
mesh2Data.SetNumberOfValues(nPointsSphere)
mesh2Data.SetName("mesh2Data")
#TODO: use numpy here?? do this with a ProgrammableFilter ?
data=np.random.rand(nPointsSphere,1)
for k in range(nPointsSphere):
mesh2Data.SetValue(k, data[k])
mesh2Vtk.GetPointData().AddArray(mesh2Data)
#send back to paraview server
#from https://public.kitware.com/pipermail/paraview/2011-February/020120.html
t=TrivialProducer()
filter= t.GetClientSideObject()
filter.SetOutput(mesh2Vtk)
t.UpdatePipeline()
w=CreateWriter('Sphere_withData.vtp')
w.UpdatePipeline()
Delete(w)
#create mesh1 without data
mesh1=Line()
mesh1.Point1=[0,0,0]
mesh1.Point2=[0,0,1]
mesh1.Resolution=5
mesh1.UpdatePipeline()
progFilter=ProgrammableFilter(mesh1)
progFilter.Input=[mesh1, t]
progFilter.Script="curT=inputs[1].GetPointData().GetArray('mesh2Data')"\
"\nglobIndices=range(0,6)"\
"\nsubT=curT[globIndices]"\
"\nswap=vtk.vtkFloatArray()"\
"\nswap.SetNumberOfValues(len(globIndices))"\
"\nswap.SetName('T')"\
"\n#TODO: how can i avoid this loop, i.e. write output.GetPointData().AddArray(converToVTK(subT))"\
"\nfor k in range(len(globIndices)):"\
"\n swap.SetValue(k,subT[k])"\
"\noutput.PointData.AddArray(swap)"
progFilter.UpdatePipeline()
w=CreateWriter('Line_withData.vtp')
w.UpdatePipeline()
Delete(w)
I accepted the answer, because it looks right. The following two scripts even show the problem:
base script 'run.py':
src1='file1.vtu'
r1=XMLUnstructuredGridReader(FileName=src1)
progFilter=ProgrammableFilter(r1)
progFilter.Input=[r1]
with open('script.py','r') as myFile:
progFilter.Script=myFile.read()
progFilter.UpdatePipeline()
progData=progFilter.GetPointDataInformation()
print progData.GetArray('T2').GetRange()
and the script for the programmable filter:
import vtk
import vtk.numpy_interface.dataset_adapter as dsa
import numpy as np
globIndices=inputs[0].GetPointData().GetArray('T')
subT=np.ones((globIndices.shape[0],1))
subTVtk=dsa.VTKArray(subT)
output.PointData.append(subTVtk, 'T2')
With this combination, I get the error messages:
File "/usr/lib/python2.7/dist-packages/vtk/numpy_interface/dataset_adapter.py", line 652, in append
self.VTKObject.AddArray(arr)
TypeError: AddArray argument 1: method requires a VTK object
File "run.py", line 15, in
print progData.GetArray('T2').GetRange()
AttributeError: 'NoneType' object has no attribute 'GetRange'
The first error message stems seems to be the reason for the second one.

Here's a minimal example that creates a VTK data array from a Numpy array. You should be able to adapt it for your purposes.
import numpy as np
import vtk
from vtk.numpy_interface import dataset_adapter as da
np_arr = np.ones(6)
vtk_arr = da.VTKArray(np_arr)
output.PointData.append(vtk_arr, "my data")

Pandas Dataframe Data Type Conversion or Isomap Transformation

I load images with scipy's misc.imread, which returns in my case 2304x3 ndarray. Later, I append this array to the list and convert it to a DataFrame. The purpose of doing so is to later apply Isomap transform on the DataFrame. My data frame is 84 rows/samples (images in the folder) and 2304 features each feature is array/list of 3 elements. When I try using Isomap transform I get error:
ValueError: setting an array element with a sequence.
I think error is there because elements of my data frame are of the object type. First I tried using a conversion to_numeric on each column, but got an error, then I wrote a loop to convert each element to numeric. The results I get are still of the object type. Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples, coerce_float = True)
for i in range(0,2304):
for j in range(0,84):
df[i][j] = pd.to_numeric(df[i][j], errors = 'coerce')
df[i] = pd.to_numeric(df[i], errors = 'coerce')
print df[2303][83]
print df[2303].dtype
print df[2303][83].dtype
#iso = manifold.Isomap(n_neighbors=6, n_components=3)
#iso.fit(df)
#manifold = iso.transform(df)
#print manifold.shape
Last four lines commented out because they give an error. The output I get is:
[ 0.05098039 0.05098039 0.05098039]
object
float64
As you can see each element of DataFrame is of the type float64 but whole column is an object.
Does anyone know how to convert whole data frame to numeric?
Is there another way of applying Isomap?

Do you want to reshape your image to a new shape instead of the original one?
If that is not the case then you should change the following line in your code
x = (img/255.0).reshape(-1,3)
with
x = (img/255.0).reshape(-1)
Hope this will resolve your issue

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

xarray cannot directly convert an xarray.Dataset into a numpy array - python

ds is already an xarray.DataArray. The error occurs on this line: dst = ds.expand_dims(time=time_da) #errors here Because while ds is a DataArray, time_da is not. This should work: dst = ds.expand_dims(time=time_da.to_array())

Related

RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Long' after converting np array to list

Unable to read variable from netCDF file

Converting VTK UnstructuredGrid to StructuredGrid

How to use numpy in the Programmable Filter in ParaView

Pandas Dataframe Data Type Conversion or Isomap Transformation

Categories

Resources