I try to build a heat map by using bokeh. However I keep getting the same error. I'll include both my code and error below, please help me out!
I assumed that the error is mainly about Nan's in my data, so I've added necessary if statements to the code to make sure that this issue is addressed. Even tried to fill any possible Na's with zero in the following lists: 'user','module','ratio','color', and 'alpha'. However none of these changes helped.
colors = ['#ff0000','#ff1919','#ff4c4c','#ff7f7f','#99cc99','#7fbf7f','#4ca64c','#329932','#008000'] sorted_userlist = list(total_checks_sorted.index) user = [] module = [] ratio = [] color = [] alpha = []
for m_id in ol_module_ids:
pset = m_id.split('/')[-1]
col_name1 = m_id + '_ratio'
col_name2 = m_id + '_total'
min_checks = min(check_matrix[col_name2].values)
max_checks = max(check_matrix[col_name2].values)
for i, u in enumerate(sorted_userlist):
module.append(pset)
user.append(str(i+1))
ratio_value = check_matrix[col_name1][u]
ratio.append(ratio_value)
al= math.sqrt((check_matrix[col_name2][u]-min_checks+0.0001)/float(max_checks))
if ratio_value>0.16:
al = min(al*100,1)
alpha.append(al)
if np.isnan(ratio_value):
color.append(colors[0])
else:
color.append(colors[int(ratio_value*8)])
#fill NAs in source lists with zeroes pd.Series(ratio).fillna(0).tolist()
col_source = ColumnDataSource(data = dict(module = module, user = user, color=color, alpha=alpha, ratio = ratio))
#source = source.fillna('')
#TOOLS = "resize,hover,save,pan,box_zoom,wheel_zoom" TOOLS = "reset,hover,save,pan,box_zoom,wheel_zoom"
p=figure(title="Ratio of Correct Checks Each Student Each Online Homework Problem",
x_range=pset,
#y_range = list(reversed(sorted_userlist)),
y_range=list(reversed(list(map(str, range(1,475))))),
x_axis_location="above", plot_width=900, plot_height=4000,
toolbar_location="left", tools=TOOLS)
#axis_line_color = None)
#outline_line_color = None)#
p.rect("module", "user", 1, 1, source=col_source,
color="color", alpha = 'alpha', line_color=None)
show(p)
NaN values are not JSON serializable (this is a glaring deficiency in the JSON standard). You mentioned there are NaN values in the ratio list, which you are putting in the ColumnDataSource here:
col_source = ColumnDataSource(data=dict(..., ratio=ratio))
Since it is in the CDS, Bokeh will try to serialize it, resulting in the error. You have two options:
If you don't actually need the numeric ratio values in the plot for some reason (e.g. to drive a hover tool or custom mapper or something), then just leave it out of the data source.
If you do need to send the ratio values, then you must put the data into a NumPy array. Bokeh serializes NumPy arrays using a different, non-JSON approach, so it is then possible to send NaNs successfully.
Related
I am trying to extract point values from a Google Earth Engine Image Collection by specifying lat/long information.
This seems to work perfectly fine when I am working with multiple images and use ee.Image.cat() to join them before I query the image. In the code example below composite = ee.Image.cat().
However, when I change composite (line 3 from the bottom) to one of the image collections (eg. chirps), it does not seem to work.
Please could someone assist me with this.
def getPropertyValue(settings):
collection = settings['collection'];
fieldName = settings['fieldName'];
dateRange = settings['dateRange'];
geoLocation = settings['geoLocation'];
scale = settings['scale'];
image = ee.ImageCollection(collection).select(fieldName).filterDate(dateRange[0], dateRange[1]).mean();
point = ee.Geometry.Point(geoLocation);
mean = image.reduceRegions(point, 'mean', scale);
valueRef = mean.select([fieldName], ['precipitation'], retainGeometry=True).getInfo();
value = valueRef[fieldName][0]['properties'][fieldName];
return value;
fieldName = 'LST_AVE';
chirps = ee.ImageCollection("JAXA/GCOM-C/L3/LAND/LST/V2").select(fieldName).filterDate('2020-01-01', '2020-02-01').mean()
point = ee.Geometry.Point([26.8206, 30.8025])
dist_stats = composite.reduceRegions(point, 'mean', 5000)
dist_stats = dist_stats.select([fieldName], [fieldName], retainGeometry=True).getInfo();
print(dist_stats['features'][0]['properties'][fieldName])
Result when using composite
14248.55
Error when replacing composite with a Google Earth Engine Image
EEException: Error in map(ID=0):
Feature.select: Selected a different number of properties (0) than names (1).
reduceRegions names the output column after the reducer, not the field that is being reduced. (Though it's more complicated when you have multiple bands and reducers).
So this:
dist_stats = dist_stats.select([fieldName], [fieldName], retainGeometry=True).getInfo();
should be changed to this
dist_stats = dist_stats.select(['mean'], [fieldName], retainGeometry=True).getInfo();
Hi I am working on a dataset where there is a host_id and two other columns : reviews_per_month and number_of_reviews. For every host_id, majority of the values are present for these two columns whereas some of them are zeros. For each column, I want to replace those 0 values by the mean of all the values related with that host_id. Here is the code I have tried :
def process_rpm_nor(data):
data['reviews_per_month'] = data['reviews_per_month'].fillna(0)
data['number_of_reviews'] = data['number_of_reviews'].fillna(0)
data_list = []
for host_id in set(data['host_id']):
data_temp = data[data['host_id'] == host_id]
nor_non_zero = np.mean(data_temp[data_temp['number_of_reviews'] > 0]['number_of_reviews'])
rpm_non_zero = np.mean(data_temp[data_temp['reviews_per_month'] > 0]['reviews_per_month'])
data_temp['number_of_reviews'] = data_temp['number_of_reviews'].replace(0,nor_non_zero)
data_temp['reviews_per_month'] = data_temp['reviews_per_month'].replace(0,rpm_non_zero)
data_list.append(data_temp)
return pd.concat(data_list, axis = 1)
Though the code works, yet it takes a lot of time to process and I was wondering if anyone could help by offering an alternate solution to this problem or help me optimize my code. I'd really appreciate the help.
The Problem
I wanted to create an interactive hbar plot, where you can switch between 3 different data sources, using a select widget, a python callback and a local bokeh serve. The plot with the default source renders fine, but when I switch to a different source, the y labels stay the same and the plot turns blank. Changing back to the original value on the select widget does not show the plot I started out with and stays blank.
When I hard-code the inital source to a different one in the code, it renders just fine until I switch it by using the widget again, so the data itself seems to work fine individually.
Am I missing something? I read through many threads, docs and tutorials but can't find anything wrong with my code.
Here is what I have done so far:
I read a .csv and create 3 seperate dataframes and then convert then to columndatasources. Every source has 10 data entries with the columns "species", "ci_lower" and "ci_upper".
Here is an example of one source (all three are built exactly the same way, with different taxon classes):
df = pd.read_csv(os.path.join(os.path.dirname(__file__), "AZA_MLE_Jul2018_utf8.csv",), encoding='utf-8')
m_df = df[df["taxon_class"]=="Mammalia"]
m_df = m_df.sort_values(by="mle", ascending=False)
m_df = m_df.reset_index(drop=True)
m_df = m_df.head(10)
m_df = m_df.sort_values(by="species", ascending=False)
m_df = m_df.reset_index(drop=True)
m_source = bp.ColumnDataSource(m_df)
I saved all 3 sources in a dict:
sources_dict={
"Mammalia": m_source,
"Aves": a_source,
"Reptilia": r_source
}
... and then created my variable called "source" that should change interactively with the "Mammalia" source as default:
source = sources_dict["Mammalia"]
Next I created a figure and added a hbar plot with the source variable as follows:
plot = bp.figure(x_range=(0, np.amax(source.data["ci_upper"])+5), y_range=source.data["species"])
plot.hbar(y="species", right="ci_lower", left="ci_upper", height=0.5, fill_color="#b3de69", source=source)
Then I added the select widget with a python callback:
def select_handler(attr, old, new):
source.data["species"]=sources_dict[new].data["species"]
source.data["ci_lower"]=sources_dict[new].data["ci_lower"]
source.data["ci_upper"]=sources_dict[new].data["ci_upper"]
select = Select(title="Taxonomic Class:", value="Mammalia", options=list(sources_dict.keys()))
select.on_change("value", select_handler)
curdoc().add_root(bk.layouts.row(plot, select))
I tried this:
My suspicion was that the error lies within the callback function, so I tried many different variants, all with the same bad result. I will list some of them here:
I tried using a python native dictionary:
new_data= {
'species': sources_dict[new].data["species"],
'ci_lower': sources_dict[new].data["ci_lower"],
'ci_upper': sources_dict[new].data["ci_upper"]
}
source.data=new_data
I tried assigning the whole data source, not just swapping the data
source=sources_dict[new]
I also tried using dict()
source.data = dict(species=sources_dict[new].data["species"], ci_lower=sources_dict[new].data["ci_lower"], ci_upper=sources_dict[new].data["ci_upper"])
Screenshots
Here is a screenshot of the initial plot, when I run the py file with bokeh serve --show file.py
And here one after changing the selected value:
Would greatly appreaciate any hints that could help me figure this out
Answering your question in the comment, changing data does not change the ranges because y_range=some_thing is just a convenience over creating a proper range class that's done behind the curtain.
Here's how you can do it manually. Notice that I don't touch x_range at all - by default it's DataRange1d that computes its start/end values automatically.
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import Select, ColumnDataSource
from bokeh.plotting import figure
d1 = dict(x=[0, 1], y=['a', 'b'])
d2 = dict(x=[8, 9], y=['x', 'y'])
ds = ColumnDataSource(d1)
def get_factors(data):
return sorted(set(data['y']))
p = figure(y_range=get_factors(d1))
p.circle(x='x', y='y', source=ds)
s = Select(options=['1', '2'], value='1')
def update(attr, old, new):
if new == '1':
ds.data = d1
else:
ds.data = d2
p.y_range.factors = get_factors(ds.data)
s.on_change('value', update)
curdoc().add_root(column(p, s))
I have a dataframe that denotes events that occured in particular locations.
I am aware that folium does not allow dynamic display of the appearance of the events so I was thinking about basically iterate through the dates and save a png of each folium map created.
Unfortunately I am mentally stuck in a 2 part problem:
1) how to loop through a ranges of dates (for example one map for each month)
2) an appropriate way to save the generated images for each loop.
This is a dataframe sample for this example:
since = ['2019-07-05', '2019-07-17', '2014-06-12', '2016-03-11']
lats = [38.72572, 38.71362, 38.79263, 38.71931]
longs = [-9.13412, -9.14407, -9.40824, -9.13143]
since_map = {'since' : pd.Series(since), 'lats' : pd.Series(lats), 'longs' : pd.Series(longs)}
since_df = pd.DataFrame(since_map)
I was able to create the base map:
lat_l = 38.736946
long_l = -9.142685
base_l = folium.Map(location=[lat_l,long_l], zoom_start=12)
neigh = folium.map.FeatureGroup()
And add some of the markers to the folium map:
for lati, longi in zip(since_df.longs, since_df.lats):
neigh.add_child(folium.CircleMarker([longi, lati], radius = 2, color = 'blue', fill = True))
base_l.add_child(neigh)
I am struggling to visualize how to loop through ranges of the dates and save each file. From what I saw here:
https://github.com/python-visualization/folium/issues/35 I actually have to open the saved html and then save it as png for each image.
If you could point me to an example or documentation that could demonstrate how this can be accomplished I would be very appreciative.
If you think I am overcomplicating it or you have a better alternative to what I am thinking I have an open ear to suggestions.
Thank you for your help.
I'm trying to get NDVI mean in every polygon in a feature collection with earth engine python API.
I think that I succeeded getting the result (a feature collection in a feature collection), but then I don't know how to get data from it.
The data I want is IDs from features and ndvi mean in each feature.
import datetime
import ee
ee.Initialize()
#Feature collection
fc = ee.FeatureCollection("ft:1s57dkY_Sg_E_COTe3sy1tIR_U-5Gw-BQNwHh4Xel");
fc_filtered = fc.filter(ee.Filter.equals('NUM_DECS', 1))
#Image collection
Sentinel_collection1 = (ee.ImageCollection('COPERNICUS/S2')).filterBounds(fc_filtered)
Sentinel_collection2 = Sentinel_collection1.filterDate(datetime.datetime(2017, 1, 1),datetime.datetime(2017, 8, 1))
# NDVI function to use with ee map
def NDVIcalc (image):
red = image.select('B4')
nir = image.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename('NDVI')
#NDVI mean calculation with reduceRegions
MeansFeatures = ndvi.reduceRegions(reducer= ee.Reducer.mean(),collection= fc_filtered,scale= 10)
return (MeansFeatures)
#Result that I don't know to get the information: Features ID and NDVI mean
result = Sentinel_collection2.map(NDVIcalc)
If the result is small, you pull them into python using result.getInfo(). That will give you a python dictionary containing a list of FeatureCollection (which are more dictionaries). However, if the results are large or the polygons cover large regions, you'll have to Export the collection instead.
That said, there are probably some other things you'll want to do first:
1) You might want to flatten() the collection, so it's not nested collections. It'll be easier to handle that way.
2) You might want to add a date to each result so you know what time the result came from. You can do that with a map on the result, inside your NDVIcalc function
return MeansFeatures.map(lambda f : f.set('date', image.date().format())
3) If what you really want is a time-series of NDVI over time for each polygon (most common), then restructuring your code to map over polygons first will be easier:
Sentinel_collection = (ee.ImageCollection('COPERNICUS/S2')
.filterBounds(fc_filtered)
.filterDate(ee.Date('2017-01-01'),ee.Date('2017-08-01')))
def GetSeries(feature):
def NDVIcalc(img):
red = img.select('B4')
nir = img.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename(['NDVI'])
return (feature
.set(ndvi.reduceRegion(ee.Reducer.mean(), feature.geometry(), 10))
.set('date', img.date().format("YYYYMMdd")))
series = Sentinel_collection.map(NDVIcalc)
// Get the time-series of values as two lists.
list = series.reduceColumns(ee.Reducer.toList(2), ['date', 'NDVI']).get('list')
return feature.set(ee.Dictionary(ee.List(list).flatten()))
result = fc_filtered.map(GetSeries)
print(result.getInfo())
4) And finally, if you're going to try to Export the result, you're likely to run into an issue where the columns of the exported table are selected from whatever columns the first feature has, so it's good to provide a "header" feature that has all columns (times), that you can merge() with the result as the first feature:
# Get all possible dates.
dates = ee.List(Sentinel_collection.map(function(img) {
return ee.Feature(null, {'date': img.date().format("YYYYMMdd") })
}).aggregate_array('date'))
# Make a default value for every date.
header = ee.Feature(null, ee.Dictionary(dates, ee.List.repeat(-1, dates.size())))
output = header.merge(result)
ee.batch.Export.table.toDrive(...)