Substitute dataset coordinates in xarray (Python) - python

I have a dataset stored in NetCDF4 format that consists of Intensity values with 3 dimensions: Loop, Delay and Wavelength. I named my coordinates the same as the dimensions (I don't know if it's good or bad...)
I'm using xarray (formerly xray) in Python to load the dataset:
import xarray as xr
ds = xr.open_dataset('test_data.netcdf4')
Now I want to manipulate the data while keeping track of the original data. For instance, I would:
Apply an offset to the Delay coordinates and keep the original Delay dataarray untouched. This seems to be done with:
ds_ = ds.assign_coords(Delay_corr=ds_.Delay.copy(deep=True) + 25)
Substitute the coordinates Delay for Delay_corr for all relevant dataarrays in the dataset. However, I have no clue how to do this and I didn't find anything in the documentation.
Would anybody know how to perform item #2?
To download the NetCDF4 file with test data:
http://1drv.ms/1QHQTRy

The method you're looking for is the xr.swap_dims() method:
ds.coords['Delay_corr'] = ds.Delay + 25 # could also use assign_coords
ds2 = ds.swap_dims({'Delay': 'Delay_corr'})
See this section of the xarray docs for a full example.

I think it's much simpler than that.
If you don't want to change the existing data, you create a copy. Note that changing ds won't change the netcdf4 file, but assuming you still don't want to change ds:
ds_ = ds.copy(deep=True)
Then just set the Delay coord as a modified version of the old one
ds_.coords['Delay'] = ds_['Delay'] + 25

Related

Is there any way to use arithmetic ops on FITS files in Python?

I'm fairly new to Python, and I have been trying to recreate a working IDL program to Python, but I'm stuck and keep getting errors. I haven't been able to find a solution yet.
The program requires 4 FITS files in total (img and correctional images dark, flat1, flat2). The operations are as follows:
flat12 = (flat1 + flat2)/2
img1 = (img - dark)/flat12
The said files have dimensions (1024,1024,1). I have resized them to (1024,1024) to be able to even use im_show() function.
I have also tried using cv2.add(), but I get this:
TypeError: Expected Ptr for argument 'src1'
Is there any workaround for this? Thanks in advance.
To read your FITS files use astropy.io.fits: http://docs.astropy.org/en/latest/io/fits/index.html
This will give you Numpy arrays (and FITS headers if needed, there are different ways to do this, as explained in the documentation), so you could do something like:
>>> from astropy.io import fits
>>> img = fits.getdata('image.fits', ext=0) # extension number depends on your FITS files
>>> dark = fits.getdata('dark.fits') # by default it reads the first "data" extension
>>> darksub = img - dark
>>> fits.writeto('out.fits', darksub) # save output
If your data has an extra dimension, as shown with the (1024,1024,1) shape, and if you want to remove that axis, you can use the normal Numpy array slicing syntax: darksub = img[0] - dark[0].
Otherwise in the example above it will produce and save a (1024,1024,1) image.

How to save extracted pitch values in a csv file?

Well I think I should mention it that it's the very first time I'm trying Audio signal processing in Python. I have an audio data set and I am extracting pitch features using Aubio library, and MFCC feature using the python_speech_features library in Python. The thing is, for a single audio file, I am getting around 84 valued vector for the pitch and 12 valued feature vector for MFCC.
Image of extracted pitch feature vector
So how do I save all these so many values in a single csv file? I have around 700 audio files separated in different directories wrt to emotions. Should I take the mean of all of these values and save them wrt the audio file in a csv? Like this:
Also, how would I used these values for classification then?
Any help would be much appreciated, Thanks.
There is not a simple answer to your question.
I have understand that for each data sample you extract a set of features, the same for each sample, don't you?
I suppose you work within a for loop, something like this:
import numpy as np
all_features = []
for path in path_list:
x = open_file(path) #an hypothetical function to open your files
features = extract_features(x) #an hypothetical function to extract features
all_features.append(features)
if your code looks like my simple example, you have created a list all_features whose elements all_features[i] contains the extracted features from the sample i. In addition i suppose that your extracted features is a numpy vector. If it is not, you should convert it into a numpy vector (something like features = np.array(features)).
Ok, now you are ready to create a dataset:
data = np.vstack(all_features)
the vertical stack np.vstack generates a matrix of shape (n_samples, n_features). Warning: all features vector must have the same shape!
Now you want to save the dataset, there is on ocean of possibilities, this my favorite three options:
1) using pandas to create a csv file:
import pandas as pd
df = pd.DataFrame(data)
df.to_csv(filename+'.csv', index=False, header=header) #header is a list of string to name columns of csv
#see https://pandas.pydata.org/pandasdocs/stable/generated/pandas.DataFrame.to_csv.html
2) dump memory into a pickle file:
import six.moves.cPickle as pickle
with open(filename+'.pkl', 'wb') as f:
pickle.dump(data, f)
3)save as numpy file:
np.save(filename+'.npy', data)
Concerning the classification problem, if you want to use a supervised method (MLP, RF, SVM, KNN, ...) you need a class labels (the ground truth), i.e. a vector with shape equals to the number of sample that relates each sample to a integer (for example 0,1 in a binary classification, or 0,1,2,3 for a 4-class classification). This strongly depend from what you want, what is the goal of your training.
Once you have the the data matrix and the label vector, each machine-learning method will be able to classify, if you have enough samples. With this aim, i suggest you to use same augmenting criteria, to have an idea have a look to this paper, it could give you same ideas.
Hoping i have help you, good work!
Python has a built-in csv module.
This section's example gives a simple example on how to use a writer to write rows to your csv.

Creating a new Image file in astropy with specified dimensions of another image file

I'm having a problem wit fits file manipulation in the astropy package, and I'm in need of some help.
I essentially want to take an image I have in fits file format, and create a new file I need to start inputing correction factors to and a new image which can then be used with the correction factors and the original image to produce a correction image. Each of these will have the same dimensions.
Starting with this:
from astropy.io import fits
# Compute the size of the images (you can also do this manually rather than calling these keywords from the header):
#URL: /Users/UCL_Astronomy/Documents/UCL/PHASG199/M33_UVOT_sum/UVOTIMSUM/M33_sum_epoch1_um2_norm.img
nxpix_um2_ext1 = fits.open('...')[1]['NAXIS1']
nypix_um2_ext1 = fits.open('...')[1]['NAXIS2']
#nxpix_um2_ext1 = 4071 #hima_sk_um2[1].header['NAXIS1'] # IDL: nxpix_uw1_ext1 = sxpar(hima_sk_uw1_ext1,'NAXIS1')
#nypix_um2_ext1 = 4321 #hima_sk_um2[1].header['NAXIS2'] # IDL: nypix_uw1_ext1 = sxpar(hima_sk_uw1_ext1,'NAXIS2')
# Make a new image file with the same dimensions (and headers, etc) to save the correction factors:
coicorr_um2_ext1 = ??[nxpix_um2_ext1,nypix_um2_ext1]
# Make a new image file with the same dimensions (and headers, etc) to save the corrected image:
ima_sk_coicorr_um2_ext1 = ??[nxpix_um2_ext1,nypix_um2_ext1]
Can anyone give me the obvious knowledge I am missing to do this...the last two lines are just there to outline what is missing. I have included ?? to perhaps signal I need something else there perhaps fits.writeto() or something similar...
The astropy documentation takes you though this task step by step: create an array with size (NAXIS1,NAXIS2), put the data in the primary HDU, make an HDUlist and write it to disk:
import numpy as np
from astropy.io import fits
data = np.zeros((NAXIS2,NAXIS1))
hdu = fits.PrimaryHDU(data)
hdulist = fits.HDUList([hdu])
hdulist.writeto('new.fits')
I think #VincePs answer is correct but I'll add some more information because I think you are not using the capabilities of astropy well here.
First of all Python is zero-based so the primary extension has the number 0. Maybe you got that wrong, maybe you don't - but it's uncommon to access the second HDU so I thought I better mention it.
hdu_num = 0 # Or use = 1 if you really want the second hdu.
First you do not need to open the same file twice, you can open it once and close it after extracting the relevant values:
with fits.open('...') as hdus:
nxpix_um2_ext1 = hdus[hdu_num]['NAXIS1']
nxpix_um2_ext1 = hdus[hdu_num]['NAXIS2']
# Continue without indentation and the file will be closed again.
or if you want to keep the whole header (for saving it later) and the data you can use:
with fits.open('...') as hdus:
hdr = hdus[hdu_num].header
data = hdus[hdu_num].data # I'll also take the data for comparison.
I'll continue with the second approach because I think it's a lot cleaner and you'll have all the data and header values ready.
new_data = np.zeros((hdr['NAXIS2'], hdr['NAXIS1']))
Please note that Python interprets the axis different than IRAF (and I think IDL, but I'm not sure) so you need axis2 as first and axis1 as second element.
So do a quick check that the shapes are the same:
print(new_data.shape)
print(data.shape)
If they are not equal I got confused about the axis in Python (again) but I don't think so. But instead of creating a new array based on the header values you can also create a new array by just using the old shape:
new_data_2 = np.zeros(data.shape)
That will ensure the dimensions and shape is identical. Now you have an empty image. If you rather like a copy then you can, but do not need to, explicitly copy the data (except if you opened the file explicitly in write/append/update mode then you should always copy it but that's not the default.)
new_data = data # or = data.copy() for explicitly copying.
Do your operations on it and if you want to save it again you can use what #VinceP suggested:
hdu = fits.PrimaryHDU(new_data, header=hdr) # This ensures the same header is written to the new file
hdulist = fits.HDUList([hdu])
hdulist.writeto('new.fits')
Please note that you don't have to alter the shape-related header keywords even if you changed the data's shape because during writeto astropy will update these (by default)

Importing variables from Netcdf into Python

I am very new to Python, and I have managed to read in some variables from NetCDF in to Python and plot them, but the size of the variables isn't correct.
My dataset is 144 x 90 (lon x lat) but when I call in the variables, it seems to miss a large section of data.
Do I need to specify the size of the dataset I'm reading in? Is that what I'm doing wrong here?
Here is the code I am using:
import netCDF4
from netCDF4 import Dataset
from pylab import *
ncfile = Dataset('DEC3499.aijE03Ccek11p5A.nc','r')
temp = ncfile.variables['tsurf']
prec = ncfile.variables['prec']
subplot(2,1,1)
pcolor(temp)
subplot(2,1,2)
pcolor(prec)
savefig('DEC3499.png',optimize=True,quality=85)
quit()
Just to clarify, here is an image showing the output. There should be data right to the far right hand side of the box.
(http://img163.imageshack.us/img163/6900/screenshot20130520at112.png)
I figured it out.
For those interested, I just needed to amend the following lines to pull in the variables properly:
temp = ncfile.variables['tsurf'][:,:]
prec = ncfile.variables['prec'][:,:]
Thanks!

Check if a geopoint with latitude and longitude is within a shapefile

How can I check if a geopoint is within the area of a given shapefile?
I managed to load a shapefile in python, but can't get any further.
Another option is to use Shapely (a Python library based on GEOS, the engine for PostGIS) and Fiona (which is basically for reading/writing files):
import fiona
import shapely
with fiona.open("path/to/shapefile.shp") as fiona_collection:
# In this case, we'll assume the shapefile only has one record/layer (e.g., the shapefile
# is just for the borders of a single country, etc.).
shapefile_record = fiona_collection.next()
# Use Shapely to create the polygon
shape = shapely.geometry.asShape( shapefile_record['geometry'] )
point = shapely.geometry.Point(32.398516, -39.754028) # longitude, latitude
# Alternative: if point.within(shape)
if shape.contains(point):
print "Found shape for point."
Note that doing point-in-polygon tests can be expensive if the polygon is large/complicated (e.g., shapefiles for some countries with extremely irregular coastlines). In some cases it can help to use bounding boxes to quickly rule things out before doing the more intensive test:
minx, miny, maxx, maxy = shape.bounds
bounding_box = shapely.geometry.box(minx, miny, maxx, maxy)
if bounding_box.contains(point):
...
Lastly, keep in mind that it takes some time to load and parse large/irregular shapefiles (unfortunately, those types of polygons are often expensive to keep in memory, too).
This is an adaptation of yosukesabai's answer.
I wanted to ensure that the point I was searching for was in the same projection system as the shapefile, so I've added code for that.
I couldn't understand why he was doing a contains test on ply = feat_in.GetGeometryRef() (in my testing things seemed to work just as well without it), so I removed that.
I've also improved the commenting to better explain what's going on (as I understand it).
#!/usr/bin/python
import ogr
from IPython import embed
import sys
drv = ogr.GetDriverByName('ESRI Shapefile') #We will load a shape file
ds_in = drv.Open("MN.shp") #Get the contents of the shape file
lyr_in = ds_in.GetLayer(0) #Get the shape file's first layer
#Put the title of the field you are interested in here
idx_reg = lyr_in.GetLayerDefn().GetFieldIndex("P_Loc_Nm")
#If the latitude/longitude we're going to use is not in the projection
#of the shapefile, then we will get erroneous results.
#The following assumes that the latitude longitude is in WGS84
#This is identified by the number "4326", as in "EPSG:4326"
#We will create a transformation between this and the shapefile's
#project, whatever it may be
geo_ref = lyr_in.GetSpatialRef()
point_ref=ogr.osr.SpatialReference()
point_ref.ImportFromEPSG(4326)
ctran=ogr.osr.CoordinateTransformation(point_ref,geo_ref)
def check(lon, lat):
#Transform incoming longitude/latitude to the shapefile's projection
[lon,lat,z]=ctran.TransformPoint(lon,lat)
#Create a point
pt = ogr.Geometry(ogr.wkbPoint)
pt.SetPoint_2D(0, lon, lat)
#Set up a spatial filter such that the only features we see when we
#loop through "lyr_in" are those which overlap the point defined above
lyr_in.SetSpatialFilter(pt)
#Loop through the overlapped features and display the field of interest
for feat_in in lyr_in:
print lon, lat, feat_in.GetFieldAsString(idx_reg)
#Take command-line input and do all this
check(float(sys.argv[1]),float(sys.argv[2]))
#check(-95,47)
This site, this site, and this site were helpful regarding the projection check. EPSG:4326
Here is a simple solution based on pyshp and shapely.
Let's assume that your shapefile only contains one polygon (but you can easily adapt for multiple polygons):
import shapefile
from shapely.geometry import shape, Point
# read your shapefile
r = shapefile.Reader("your_shapefile.shp")
# get the shapes
shapes = r.shapes()
# build a shapely polygon from your shape
polygon = shape(shapes[0])
def check(lon, lat):
# build a shapely point from your geopoint
point = Point(lon, lat)
# the contains function does exactly what you want
return polygon.contains(point)
i did almost exactly what you are doing yesterday using gdal's ogr with python binding. It looked like this.
import ogr
# load the shape file as a layer
drv = ogr.GetDriverByName('ESRI Shapefile')
ds_in = drv.Open("./shp_reg/satreg_etx12_wgs84.shp")
lyr_in = ds_in.GetLayer(0)
# field index for which i want the data extracted
# ("satreg2" was what i was looking for)
idx_reg = lyr_in.GetLayerDefn().GetFieldIndex("satreg2")
def check(lon, lat):
# create point geometry
pt = ogr.Geometry(ogr.wkbPoint)
pt.SetPoint_2D(0, lon, lat)
lyr_in.SetSpatialFilter(pt)
# go over all the polygons in the layer see if one include the point
for feat_in in lyr_in:
# roughly subsets features, instead of go over everything
ply = feat_in.GetGeometryRef()
# test
if ply.Contains(pt):
# TODO do what you need to do here
print(lon, lat, feat_in.GetFieldAsString(idx_reg))
Checkout http://geospatialpython.com/2011/01/point-in-polygon.html and http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html
One way to do this is to read the ESRI Shape file using the OGR
library Link and then use the GEOS geometry
library http://trac.osgeo.org/geos/ to do the point-in-polygon test.
This requires some C/C++ programming.
There is also a python interface to GEOS at http://sgillies.net/blog/14/python-geos-module/ (which I have never used). Maybe that is what you want?
Another solution is to use the http://geotools.org/ library.
That is in Java.
I also have my own Java software to do this (which you can download
from http://www.mapyrus.org plus jts.jar from http://www.vividsolutions.com/products.asp ). You need only a text command
file inside.mapyrus containing
the following lines to check if a point lays inside the
first polygon in the ESRI Shape file:
dataset "shapefile", "us_states.shp"
fetch
print contains(GEOMETRY, -120, 46)
And run with:
java -cp mapyrus.jar:jts-1.8.jar org.mapyrus.Mapyrus inside.mapyrus
It will print a 1 if the point is inside, 0 otherwise.
You might also get some good answers if you post this question on
https://gis.stackexchange.com/
If you want to find out which polygon (from a shapefile full of them) contains a given point (and you have a bunch of points as well), the fastest way is using postgis. I actually implemented a fiona based version, using the answers here, but it was painfully slow (I was using multiprocessing and checking bounding box first). 400 minutes of processing = 50k points. Using postgis, that took less than 10seconds. B tree indexes are efficient!
shp2pgsql -s 4326 shapes.shp > shapes.sql
That will generate a sql file with the information from the shapefiles, create a database with postgis support and run that sql. Create a gist index on the geom column. Then, to find the name of the polygon:
sql="SELECT name FROM shapes WHERE ST_Contains(geom,ST_SetSRID(ST_MakePoint(%s,%s),4326));"
cur.execute(sql,(x,y))

Categories

Resources