How to get band names in geotiff stack?

How to get band names in geotiff stack? - python

From an stack of geotiff (time-series of NDVI) in a file like 'NDVI_TS.tif' I want to get individual band names. For example: 'Band 086: 20190803T004719'. I can see that when I load the stack into QGIS for example. I need to trace back the dates in the name, in the example above would be 2019-08-03. However, I can't find a way to access it from Python. I can access the bands by index but that doesn't help me finding from which date they are from.
from osgeo import gdal
NDVI = gdal.Open('NDVI_TS.tif', gdal.GA_ReadOnly)
#for example I can get data from band 86 as and array with:
band86 = NDVI.GetRasterBand(86).ReadAsArray()
I feel there should be some easy solution for this but failed to find.

I don't know if this is efficient, but is the way I solve this (only tested with small images):
bands = {NDVI.GetRasterBand(i).GetDescription(): i for i in range(1, NDVI.RasterCount + 1)}
You get a dictionary with the band name (the one you see in QGIS) and their respective index.
{ ...,
'Band 086: 20190803T004719': 86,
...
}
You can create a small function to parse the band name and get the dates as you want (didn't test it properly):
import datetime
import re
def string2date(string):
date_string = re.match('.* (.*)T', string).group(1)
return datetime.datetime.strptime(date_string, '%Y%m%d').strftime('%Y-%m-%d')
And then you can apply to the previous dict:
bands = {string2date(NDVI.GetRasterBand(i).GetDescription()): i for i in range(1, NDVI.RasterCount + 1)}
Hope it helps

Related

Trying to translate my Matlab routine for loading and preparing data to Python. Stuck in pandas

this is my first post and I have been struggling with this problem for a few days now. The following code is a Matlab code that I usually use to load my data (.csv files) and prepare them for further calculations.
%I use this later on to predefine my array because they need to have to same length for calculations
maxind = 400;
% i is given a vector (like test_numbers = [1 2 3]) I get from the user so I can iterate over the numbers of test specimen
for i = test_numbers
% Setup the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 4);
% Specify range and delimiter
opts.DataLines = [2, Inf];
opts.Delimiter = ";";
% Specify column names and types
opts.VariableNames = ["time", "force", "displ_1", "displ_2"];
opts.VariableTypes = ["double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
%Here I build the name and read the files/ the csv files are in the same folder as the main program.
data_col = readtable(['specimen_name',num2str(i),'.csv'], opts);
Data.force(:,i)=nan(maxind,1);
Data.force(1:length(data_col.time),i)=data_col.force;
Data.displ(:,i)=nan(maxind,1);
Data.displ(1:length(data_col.time),i)=nanmean([data_col.displ_1,data_col.displ_2]')';
Data.time(:,i)=nan(maxind,1);
Data.time(1:length(data_col.time),i)=data_col.time;
Data.name(i)={['specimen_name',num2str(i)]};
% Clear temporary variables
clear opts
end
Now I have to use Python instead of Matlab and I started with pandas to read my csv as DataFrame.
Now my question. Is there a way to access my data like in this part of my Matlab code or should I not use dataframes in the first place to do something like that? (I know I can access my data with the name of the column, but I got stuck by trying to refer my data like Data.force(1:length(data_col.time),i) in a new dataframe)
Data.force(:,i)=nan(maxind,1);
Data.force(1:length(data_col.time),i)=data_col.force;
Data.displ(:,i)=nan(maxind,1);
Data.displ(1:length(data_col.time),i)=nanmean([data_col.displ_1,data_col.displ_2]')';
Data.time(:,i)=nan(maxind,1);
Data.time(1:length(data_col.time),i)=data_col.time;
Data.name(i)={['specimen_name',num2str(i)]};
Many thanks in advance for your help.

Time series on folium map

I have a dataframe that denotes events that occured in particular locations.
I am aware that folium does not allow dynamic display of the appearance of the events so I was thinking about basically iterate through the dates and save a png of each folium map created.
Unfortunately I am mentally stuck in a 2 part problem:
1) how to loop through a ranges of dates (for example one map for each month)
2) an appropriate way to save the generated images for each loop.
This is a dataframe sample for this example:
since = ['2019-07-05', '2019-07-17', '2014-06-12', '2016-03-11']
lats = [38.72572, 38.71362, 38.79263, 38.71931]
longs = [-9.13412, -9.14407, -9.40824, -9.13143]
since_map = {'since' : pd.Series(since), 'lats' : pd.Series(lats), 'longs' : pd.Series(longs)}
since_df = pd.DataFrame(since_map)
I was able to create the base map:
lat_l = 38.736946
long_l = -9.142685
base_l = folium.Map(location=[lat_l,long_l], zoom_start=12)
neigh = folium.map.FeatureGroup()
And add some of the markers to the folium map:
for lati, longi in zip(since_df.longs, since_df.lats):
neigh.add_child(folium.CircleMarker([longi, lati], radius = 2, color = 'blue', fill = True))
base_l.add_child(neigh)
I am struggling to visualize how to loop through ranges of the dates and save each file. From what I saw here:
https://github.com/python-visualization/folium/issues/35 I actually have to open the saved html and then save it as png for each image.
If you could point me to an example or documentation that could demonstrate how this can be accomplished I would be very appreciative.
If you think I am overcomplicating it or you have a better alternative to what I am thinking I have an open ear to suggestions.
Thank you for your help.

Error "numpy.float64 object is not iterable" for CSV file creation in Python

I have some very noisy (astronomy) data in csv format. Its shape is (815900,2) with 815k points giving information of what the mass of a disk is at a certain time. The fluctuations are pretty noticeable when you look at it close up. For example, here is an snippet of the data where the first column is time in seconds and the second is mass in kg:
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
So it looks like there is a 1.53E+028 data point of noise, and also probably the 2.19E+028 and 2.35E+028 points.
To fix this, I am trying to set a Python script that will read in the csv data, then put some restriction on it so that if the mass is e.g. < 2.35E+028, it will remove the whole row and then create a new csv file with only the "good" data points:
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41242600,2.40936E+028
Following this old question top answer by n8henrie, I so far have:
import pandas as pd
import csv
# Here are the locations of my csv file of my original data and an EMPTY csv file that will contain my good, noiseless set of data
originaldata = '/Users/myname/anaconda2/originaldata.csv'
gooddata = '/Users/myname/anaconda2/gooddata.csv'
# I use pandas to read in the original data because then I can separate the columns of time as 'T' and mass as 'M'
originaldata = pd.read_csv('originaldata.csv',delimiter=',',header=None,names=['t','m'])
# Numerical values of the mass values
M = originaldata['m'].values
# Now to put a restriction in
for row in M:
new_row = []
for column in row:
if column > 2.35E+028:
new_row.append(column)
csv.writer(open(newfile,'a')).writerow(new_row)
print('\n\n')
print('After:')
print(open(newfile).read())
However, when I run this, I get this error:
TypeError: 'numpy.float64' object is not iterable
I know the first column (time) is dtype int64 and the second column (mass) is dtype float64... but as a beginner, I'm still not quite sure what this error means or where I'm going wrong. Any help at all would be appreciated. Thank you very much in advance.

You can select rows by a boolean operation. Example:
import pandas as pd
from io import StringIO
data = StringIO('''\
40023700,2.40896E+028
40145700,2.44487E+028
40267700,2.44487E+028
40389700,2.44478E+028
40511600,1.535E+028
40633500,2.19067E+028
40755400,2.44496E+028
40877200,2.44489E+028
40999000,2.44489E+028
41120800,2.34767E+028
41242600,2.40936E+028
''')
df = pd.read_csv(data,names=['t','m'])
good = df[df.m > 2.35e+28]
out = StringIO()
good.to_csv(out,index=False,header=False)
print(out.getvalue())
Output:
40023700,2.40896e+28
40145700,2.44487e+28
40267700,2.44487e+28
40389700,2.44478e+28
40755400,2.44496e+28
40877200,2.44489e+28
40999000,2.44489e+28
41242600,2.40936e+28

This returns a column: M = originaldata['m'].values
So when you do for row in M:, you get only one value in row, so you can't iterate on it again.

Formating time and plot it

I have the following excel file and the time stamp in the format
20180821_2330
1) for a lot of days. How would I format it as standard time so that I can plot it versus the other sensor values ?
2) I would like to have a big plot with for example sensor 1 reading against all the days, is that possible ?
https://www.mediafire.com/file/m36ha4777d6epvd/median_data.xlsx/file

is this something you are looking for? I improvised and created 'n' column which could represent your 'timestamp' as the data frame. Basically, what I think you should do, is to apply another function - let's call it 'apply_fun' on your column which stores 'timestamps' a function which takes each element and transforms it into strptime() format.
import datetime
import pandas as pd
n = {'timestamp':['20180822_2330', '20180821_2334', '20180821_2334', '20180821_2330']}
data_series = pd.DataFrame(n)
def format_dates(n):
x = n.find('_')
y = datetime.datetime.strptime(n[:x]+n[x+1:], '%Y%m%d%H%M')
return y
def apply_fun(dataset):
dataset['timestamp2'] = dataset['timestamp'].apply(format_dates)
return dataset
print(apply_fun(data_series))
When it comes to 2nd point, I am not able to reach the site due to McAffe agent at work, which does not allow to open it. Once you have 1st, you can ask for 2nd separately.

Get results in an Earth Engine python script

I'm trying to get NDVI mean in every polygon in a feature collection with earth engine python API.
I think that I succeeded getting the result (a feature collection in a feature collection), but then I don't know how to get data from it.
The data I want is IDs from features and ndvi mean in each feature.
import datetime
import ee
ee.Initialize()
#Feature collection
fc = ee.FeatureCollection("ft:1s57dkY_Sg_E_COTe3sy1tIR_U-5Gw-BQNwHh4Xel");
fc_filtered = fc.filter(ee.Filter.equals('NUM_DECS', 1))
#Image collection
Sentinel_collection1 = (ee.ImageCollection('COPERNICUS/S2')).filterBounds(fc_filtered)
Sentinel_collection2 = Sentinel_collection1.filterDate(datetime.datetime(2017, 1, 1),datetime.datetime(2017, 8, 1))
# NDVI function to use with ee map
def NDVIcalc (image):
red = image.select('B4')
nir = image.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename('NDVI')
#NDVI mean calculation with reduceRegions
MeansFeatures = ndvi.reduceRegions(reducer= ee.Reducer.mean(),collection= fc_filtered,scale= 10)
return (MeansFeatures)
#Result that I don't know to get the information: Features ID and NDVI mean
result = Sentinel_collection2.map(NDVIcalc)

If the result is small, you pull them into python using result.getInfo(). That will give you a python dictionary containing a list of FeatureCollection (which are more dictionaries). However, if the results are large or the polygons cover large regions, you'll have to Export the collection instead.
That said, there are probably some other things you'll want to do first:
1) You might want to flatten() the collection, so it's not nested collections. It'll be easier to handle that way.
2) You might want to add a date to each result so you know what time the result came from. You can do that with a map on the result, inside your NDVIcalc function
return MeansFeatures.map(lambda f : f.set('date', image.date().format())
3) If what you really want is a time-series of NDVI over time for each polygon (most common), then restructuring your code to map over polygons first will be easier:
Sentinel_collection = (ee.ImageCollection('COPERNICUS/S2')
.filterBounds(fc_filtered)
.filterDate(ee.Date('2017-01-01'),ee.Date('2017-08-01')))
def GetSeries(feature):
def NDVIcalc(img):
red = img.select('B4')
nir = img.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename(['NDVI'])
return (feature
.set(ndvi.reduceRegion(ee.Reducer.mean(), feature.geometry(), 10))
.set('date', img.date().format("YYYYMMdd")))
series = Sentinel_collection.map(NDVIcalc)
// Get the time-series of values as two lists.
list = series.reduceColumns(ee.Reducer.toList(2), ['date', 'NDVI']).get('list')
return feature.set(ee.Dictionary(ee.List(list).flatten()))
result = fc_filtered.map(GetSeries)
print(result.getInfo())
4) And finally, if you're going to try to Export the result, you're likely to run into an issue where the columns of the exported table are selected from whatever columns the first feature has, so it's good to provide a "header" feature that has all columns (times), that you can merge() with the result as the first feature:
# Get all possible dates.
dates = ee.List(Sentinel_collection.map(function(img) {
return ee.Feature(null, {'date': img.date().format("YYYYMMdd") })
}).aggregate_array('date'))
# Make a default value for every date.
header = ee.Feature(null, ee.Dictionary(dates, ee.List.repeat(-1, dates.size())))
output = header.merge(result)
ee.batch.Export.table.toDrive(...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get band names in geotiff stack? - python

Related

Trying to translate my Matlab routine for loading and preparing data to Python. Stuck in pandas

Time series on folium map

Error "numpy.float64 object is not iterable" for CSV file creation in Python

Formating time and plot it

Get results in an Earth Engine python script

Categories

Resources