Plotting Pandas Dataframe data using Folium GeoJson

Plotting Pandas Dataframe data using Folium GeoJson - python

I am trying to plot the amount of times a satellite goes over a certain location using Python and a heatmap. I easily generate the satellite data, but I am having issues with displaying it in a nice manner. I am trying to follow this example, as I can use the style function to lower the opacity. I am having some issues replicating this though as it seems that the GeoJson version they were using no longer accepts the same inputs. This is the dataframe I am using:
print(df.head())
latitude longitude countSp geometry
0 -57.9 151.1 1.0 POLYGON ((151.05 -57.95, 151.15 -57.95, 151.15...
1 -57.9 151.2 2.0 POLYGON ((151.15 -57.95, 151.25 -57.95, 151.25...
2 -57.8 151.2 1.0 POLYGON ((151.15 -57.84999999999999, 151.25 -5...
3 -57.8 151.3 3.0 POLYGON ((151.25 -57.84999999999999, 151.35 -5...
4 -57.8 151.4 2.0 POLYGON ((151.35 -57.84999999999999, 151.45 -5...
I then call folium through:
hmap = folium.Map(location=[42.5, -80], zoom_start=7, )
colormap_dept = branca.colormap.StepColormap(
colors=['#00ae53', '#86dc76', '#daf8aa',
'#ffe6a4', '#ff9a61', '#ee0028'],
vmin=0,
vmax=max_amt,
index=[0, 2, 4, 6, 8, 10, 12])
style_func = lambda x: {
'fillColor': colormap_dept(x['countSp']),
'color': '',
'weight': 0.0001,
'fillOpacity': 0.1
}
folium.GeoJson(
df,
style_function=style_func,
).add_to(hmap)
This is the error I get when I run my code:
ValueError: Cannot render objects with any missing geometries: latitude longitude countSp geometry
I know that I can use the HeatMap plugin from folium in order to get most of this done, but I have found a couple of issues with doing that. First is that I cannot easily generate a legend (though I have been able to work around this). Second is that it is way too opaque, and I am not finding any ways of reducing that. I have tried playing around with the radius, and blur parameters for HeatMap without much change. I think that the fillOpacity of the style_func above is a much better way of making my data translucent.
By the way, I generate the polygon in my df by the following command. So in my dataframe all I need folium to know about is the geometry and countSp (which is the number of times a satellite goes over a certain area - ~10kmx10km square).
df['geometry'] = df.apply(lambda row: Polygon([(row.longitude-0.05, row.latitude-0.05),
(row.longitude+0.05, row.latitude-0.05),
(row.longitude+0.05, row.latitude+0.05),
(row.longitude-0.05, row.latitude+0.05)]), axis=1)
Is there a good way of going about this issue?

Once again, they were looking for a way to express the purpose in a heat map, so I used Plotly's data on airline arrivals and departures to visualize it.
The number of flights to and from the U.S. mainland only was used for the data.
Excluded IATA codes['LIH','HNL','STT','STX','SJU','OGG','KOA']
Draw a straight line on the map from the latitude and longitude of the departure airport to the latitude and longitude of the arrival airport.
Draw a heat map with data on the number of arrivals and departures by airport.
Since we cannot use a discrete colormap, we will create a linear colormap and add it.
Embed the heatmap as a layer named Traffic
import pandas as pd
df_airports = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv')
df_airports.sort_values('cnt', ascending=False)
df_air = df_airports[['lat','long','cnt']]
df_flight_paths = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv')
df_flight_paths = df_flight_paths[~df_flight_paths['airport1'].isin(['HNL','STT','SJU','OGG','KOA'])]
df_flight_paths = df_flight_paths[~df_flight_paths['airport2'].isin(['LIH','HNL','STT','STX','SJU'])]
df_flight_paths = df_flight_paths[['start_lat', 'start_lon', 'end_lat', 'end_lon', 'cnt']]
import folium
from folium.plugins import HeatMap
import branca.colormap as cm
from collections import defaultdict
steps=10
colormap = cm.linear.YlGnBu_09.scale(0, 1).to_step(steps)
gradient_map=defaultdict(dict)
for i in range(steps):
gradient_map[1/steps*i] = colormap.rgb_hex_str(1/steps*i)
m = folium.Map(location=[32.500, -97.500], zoom_start=4, tiles="cartodbpositron")
data = []
for idx,row in df_flight_paths.iterrows():
folium.PolyLine([[row.start_lat, row.start_lon], [row.end_lat, row.end_lon]], weight=2, color="red", opacity=0.4
).add_to(m)
HeatMap(
df_air.values,
gradient=gradient_map,
name='Traffic',
mini_opacity=0.1,
radius=15,
blur=5
).add_to(m)
folium.LayerControl().add_to(m)
colormap.add_to(m)
m

Related

Draw polygons around a set of points and create clusters in python

I have a Pandas DataFrame containing Lat, Long coordinates. How do I draw non-overlapping polygons around a cluster of points and aggregate the geometries in a geopandas DataFrame. Below is sample code to work with:
import pandas as pd
import numpy as np
import geopandas as gpd
df = pd.DataFrame({
'yr': [2018, 2017, 2018, 2016],
'id': [0, 1, 2, 3],
'v': [10, 12, 8, 10],
'lat': [32.7418248, 32.8340583, 32.8340583, 32.7471895],
'lon':[-97.524066, -97.0805484, -97.0805484, -96.9400779]
})
df = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Long'], df['Lat']))
# set crs for buffer calculations
df.set_crs("ESRI:102003", inplace=True)
The Polygons can be of any shape, however, must include a minimum of 5 points. I tried creating a buffer around the points but circle is not the ideal solution. I am looking for a way to draw a more flexible polygon.
This polygon representation will be added as a new column to the pandas dataframe containing the points.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.buffer.html

your question and sample data make no sense! You say you want clusters of 5 points or more and only provide 4 points. Leaving person who answers this question mandated to find some data. Better practice is to generate a MWE of what you've tried which can possibly become solution you want. Have used UK hospitals to get some data with lat / lon
from your other scatter gun questions, it's clear you have tried using geohash as a solution. Let's explore this
get geohash for each point geolib.geohash.encode()
aggregate points in same geohash by using dissolve() This will give a MULTIPOINT geometry. Convert this to POLYGON using convex_hull
now have polygons that do not overlap and have clusters of points. It doesn't ensure that a cluster has a minimum of 5 points
import requests, io
import pandas as pd
import numpy as np
import geopandas as gpd
import geolib.geohash
import folium
# get some data that meets sample with enough data
df = (
pd.read_csv(
io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",
engine="python",
)
.rename(columns={"Latitude": "lat", "Longitude": "lon"})
.loc[:, ["lat", "lon"]]
).dropna()
df["id"] = df.index
df["yr"] = np.random.choice(range(2016, 2019), len(df))
df["v"] = np.random.randint(0, 11, len(df))
# get geohash so points in same area can be clustered
df["geohash"] = df.apply(lambda r: geolib.geohash.encode(r["lon"], r["lat"], 3), axis=1)
# construct geodataframe
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df["lon"], df["lat"]), crs="epsg:4386"
)
# cluster points to polygons
gdf2 = gdf.dissolve(by="geohash", aggfunc={"v": "sum", "id":"count", "yr":"mean"})
gdf2["geometry"] = gdf2["geometry"].convex_hull
# let's visualise everything
m = gdf2.explore(color="green", name="cluster", height=300, width=600)
m = gdf.explore(column="geohash", m=m, name="popints")
folium.LayerControl().add_to(m)
m

Use Geopandas convex hull.
The convex hull of a geometry is the smallest convex Polygon containing all the points in each geometry.
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.convex_hull.html

open3d compute distance between mesh and point cloud

For a study project, I try to get into point cloud comparison.
to keep it short, I have a CAD file (.stl) and several point clouds created by a laser scanner.
now I want to calculate the difference between the CAD file and each point cloud.
first I started with Cloud Compare which helps a lot to get a basic understanding. (reduction of points, remove duplicates, create a mesh and compare distances)
In python, I was able to import the files and do some basic calculations. However, I am not able to calculate the distance.
here is my code:
import numpy as np
import open3d as o3d
#read point cloud
dataname_pcd= "pcd.xyz"
point_cloud = np.loadtxt(input_path+dataname_pcd,skiprows=1)
#read mesh
dataname_mesh = "cad.stl"
mesh = o3d.io.read_triangle_mesh(input_path+dataname_mesh)
print (mesh)
#calulate the distance
mD = o3d.geometry.PointCloud.compute_point_cloud_distance([point_cloud],[mesh])
#calculate the distance gives me this error:
"TypeError: compute_point_cloud_distance(): incompatible function arguments. The following argument types are supported:
1. (self: open3d.cpu.pybind.geometry.PointCloud, target: open3d.cpu.pybind.geometry.PointCloud) -> open3d.cpu.pybind.utility.DoubleVector"
Questions:
what pre transformations for mesh and point clouds are needed to calculate their distances?
is there a recommended way to display the differences?
so far I just used the visualization line below
o3d.visualization.draw_geometries([pcd],
zoom=0.3412,
front=[0.4257, -0.2125, -0.8795],
lookat=[2.6172, 2.0475, 1.532],
up=[-0.0694, -0.9768, 0.2024])

You need 2 point clouds for the function "compute point cloud distance()", but one of your geometries is a mesh, which is made of polygons and vertices. Just convert it to a point cloud:
pcd = o3d.geometry.PointCloud() # create a empty geometry
pcd.points = mesh.vertices # take the vertices of your mesh
I'll illustrate how you can visualize the distances between 2 clouds, both captured on a moving robot (a Velodyne LIDAR) separeted by 1 meter in average. Consider 2 cloud before and after the registration, the distances between them should decrease, right? Here is some code:
import copy
import pandas as pd
import numpy as np
import open3d as o3d
from matplotlib import pyplot as plt
# Import 2 clouds, paint and show both
pc_1 = o3d.io.read_point_cloud("scan_0.pcd") # 18,421 points
pc_2 = o3d.io.read_point_cloud("scan_1.pcd") # 19,051 points
pc_1.paint_uniform_color([0,0,1])
pc_2.paint_uniform_color([0.5,0.5,0])
o3d.visualization.draw_geometries([pc_1,pc_2])
# Calculate distances of pc_1 to pc_2.
dist_pc1_pc2 = pc_1.compute_point_cloud_distance(pc_2)
# dist_pc1_pc2 is an Open3d object, we need to convert it to a numpy array to
# acess the data
dist_pc1_pc2 = np.asarray(dist_pc1_pc2)
# We have 18,421 distances in dist_pc1_pc2, because cloud pc_1 has 18,421 pts.
# Let's make a boxplot, histogram and serie to visualize it.
# We'll use matplotlib + pandas.
df = pd.DataFrame({"distances": dist_pc1_pc2}) # transform to a dataframe
# Some graphs
ax1 = df.boxplot(return_type="axes") # BOXPLOT
ax2 = df.plot(kind="hist", alpha=0.5, bins = 1000) # HISTOGRAM
ax3 = df.plot(kind="line") # SERIE
plt.show()
# Load a previos transformation to register pc_2 on pc_1
# I finded it with the Fast Global Registration algorithm, in Open3D
T = np.array([[ 0.997, -0.062 , 0.038, 1.161],
[ 0.062, 0.9980, 0.002, 0.031],
[-0.038, 0.001, 0.999, 0.077],
[ 0.0, 0.0 , 0.0 , 1.0 ]])
# Make a copy of pc_2 to preserv the original cloud
pc_2_copy = copy.deepcopy(pc_2)
# Aply the transformation T on pc_2_copy
pc_2_copy.transform(T)
o3d.visualization.draw_geometries([pc_1,pc_2_copy]) # show again
# Calculate distances
dist_pc1_pc2_transformed = pc_1.compute_point_cloud_distance(pc_2_copy)
dist_pc1_pc2_transformed = np.asarray(dist_pc1_pc2_transformed)
# Do as before to show diferences
df_2 = pd.DataFrame({"distances": dist_pc1_pc2_transformed})
# Some graphs (after registration)
ax1 = df_2.boxplot(return_type="axes") # BOXPLOT
ax2 = df_2.plot(kind="hist", alpha=0.5, bins = 1000) # HISTOGRAM
ax3 = df_2.plot(kind="line") # SERIE
plt.show()

Spatial merge/combine of multiple netcdf with xarray or satpy

I have two spatial dataset in netcdf format. They have same time, dimensions, coordinates, and data variable. But they are for different spatial coordinates. In below I tried to show my two dataset by a polygon:
import glob
import xarray as xr
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
file1 = '20190109T071048.nc'
file2 = '20190109T085117.nc'
ds1 = xr.open_dataset(file1, group='PRODUCT')
ds2 = xr.open_dataset(file2, group='PRODUCT')
PATH_TO_GPK = 'Study_Area.gpkg'
SA = gpd.read_file(PATH_TO_GPK, layer='Study_Area')
First dataset plot:
plt.figure(figsize=(12,8))
ax = plt.axes()
ds1.qa_value.isel(time = 0).plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none')
Second dataset plot:
plt.figure(figsize=(12,8))
ax = plt.axes()
ds2.qa_value.isel(time = 0).plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none')
I want to merge these two netcdf files with xarray.
combined = xr.merge([ds1, ds2], compat='no_conflicts')
Error:
MergeError: conflicting values for variable 'latitude' on objects to be combined. You can skip this check by specifying compat='override'.
tried with:
combined = xr.merge([ds1, ds2], compat='override')
but plot of combined was same as above first plot.
Then tried with:
combined = xr.combine_by_coords([ds1,ds2], compat='no_conflicts')
Error:
Could not find any dimension coordinates to use to order the datasets for concatenation
Then tried with:
combined = xr.combine_nested([ds1,ds2], concat_dim=["time"])
and plot of combined was again same as first plot.
Based on ThomasNicolas suggestion, I used below code:
ds = xr.open_mfdataset([file1, file2], combine='nested')
But it return this error:
AttributeError: 'Dataset' object has no attribute 'qa_value'
There are not any data in result:
print of the first dataset (for example) shows:
print (ds1)
<xarray.Dataset>
Dimensions: (corner: 4, ground_pixel: 450, scanline: 3245, time: 1)
Coordinates:
* scanline (scanline) float64 0.0 1.0 ... 3.244e+03
* ground_pixel (ground_pixel) float64 0.0 1.0 ... 449.0
* time (time) datetime64[ns] 2019-01-03
* corner (corner) float64 0.0 1.0 2.0 3.0
latitude (time, scanline, ground_pixel) float32 ...
longitude (time, scanline, ground_pixel) float32 ...
Data variables:
delta_time (time, scanline) timedelta64[ns] 08:07:0...
time_utc (time, scanline) object '2019-01-03T08:0...
qa_value (time, scanline, ground_pixel) float32 ...
Is there any suggestion for merge or combine of these files?
UPDATED
Base on #dl.meteo advice, I used satpy library for solve my problem, it seems that it can merge two netcdf files but not completely, you can see an incorrect part (red boundary) in joined image.
Can satpy do it correctly?
# Read NetCDF files
from satpy import Scene
import glob
filenames = glob.glob('myfiles*.nc')
scn = Scene(filenames=filenames, reader='tropomi_l2')
scn.load(['qq'])
mask = SA_mask_poly.mask(d, lat_name='latitude', lon_name='longitude')
out_sel = d.compute().where(mask == 0, drop=True)
plt.figure(figsize=(12,8))
ax = plt.axes()
out_sel.plot(ax = ax, x='longitude', y='latitude')
SA.plot(ax = ax, alpha = 0.8, facecolor = 'none', lw = 1)

I've come across this problem just now. xarray can't combine values with different coordinates. As your two passes have their own unique coordinates, you can't directly combine them.
One solution for this is to use the pyresample module to resample both granules from their own coordinates onto a common grid. Open each file as a separate Scene and then apply scn.resample() method. This will put both onto the same grid. From there you can combine them.
Another solution might be the experimental MultiScene object, which is designed for this use case. As per the documentation:
Scene objects in Satpy are meant to represent a single geographic region at a specific single instant in time or range of time. This means they are not suited for handling multiple orbits of polar-orbiting satellite data, multiple time steps of geostationary satellite data, or other special data cases. To handle these cases Satpy provides the MultiScene class.
The reason your current solution has artefacts is your Scene object has two separate orbits stuck together as one array. I think the discontinuity in their coordinates will cause stretch/tear artefacts in your quadmesh plot and further processing, such as convolution filtering, is likely to return unexpected results as it expects neighbouring values in an array to be physically neighbours in the final image and not in another orbit.

Using adjustText to avoid label overlap with Python prince correspondence analysis

I have read about how efficient the package adjustText is with respect to avoiding label overlap and I would like to use to the following diagram created by prince:
Here is the code that created the image:
import pandas as pd
import prince
from adjustText import adjust_text
pd.set_option('display.float_format', lambda x: '{:.6f}'.format(x))
X=pd.DataFrame(data=[ ... my data ... ],
columns=pd.Series([ ... my data ... ]),
index=pd.Series([ ... my data ...]),
)
ca = prince.CA(n_components=2,n_iter=3,copy=True,check_input=True,engine='auto',random_state=42)
ca = ca.fit(X)
ca.row_coordinates(X)
ca.column_coordinates(X)
ax = ca.plot_coordinates(X=X,ax=None,figsize=(6, 6),x_component=0,y_component=1,show_row_labels=True,show_col_labels=True)
ax.get_figure().savefig('figure.png')
In all examples of adjustText I could find, there was always a direct access to the coordinates of labels. How do I access the coordinates of labels in this case? How can I apply adjust_text to this figure?

First, deactivate label display by plot_coordinates():
ax = ca.plot_coordinates(X=X,ax=None,figsize=(6, 6),x_component=0,y_component=1,show_row_labels=False,show_col_labels=False)
Then, extract coordinates of columns and rows:
COLS=ca.column_coordinates(X).to_dict()
XCOLS=COLS[0]
YCOLS=COLS[1]
ROWS=ca.row_coordinates(X).to_dict()
XROWS=ROWS[0]
YROWS=ROWS[1]
Structures XCOLS, YCOLS, XROWS, YROWS are dictionaries with values that are floats (the coordinates). Let us merge the two x-axis dictionaries in a single x-axis dictionary I will call XGLOBAL, same thing for the y-axis dictionaries, into YGLOBAL:
XGLOBAL={ k : XCOLS.get(k,0)+XROWS.get(k,0) for k in set(XCOLS) | set(XROWS) }
YGLOBAL={ k : YCOLS.get(k,0)+YROWS.get(k,0) for k in set(YCOLS) | set(YROWS) }
Now I just apply adjust_text() as described in the documentation:
fig = ax.get_figure()
texts=[plt.text(XGLOBAL[x],YGLOBAL[x],x,fontsize=7) for x in XGLOBAL.keys()]
adjust_text(texts,arrowprops=dict(arrowstyle='-', color='red'))
fig.savefig('newfigure.png')
And the result is:
Notice that while the image generation was instantaneous without adjust_text, it took around 40 seconds with adjust_text.

You can also put a small angle in texts iteration. I saw on my side that it helps the adjust_text routine.
texts=[plt.text(XGLOBAL[x],YGLOBAL[x],x,fontsize=7,
rotation = -XGLOBAL.keys()+2*x) for x in XGLOBAL.keys()]

Plotting netCDF data with Python: How to change grid?

I'm an new one in python and plotting data with Matplotlib. I really need help and thank you in advance for the answers.
So, I have a netCDF file with v-component of wind data. Grid coordinates: points=9600 (240x40)
lon : 0 to 358.5 by 1.5 degrees_east circular
lat : 88.5 to 30 by -1.5 degrees_north
My code is:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from netCDF4 import Dataset
from matplotlib.mlab import griddata
from matplotlib import pyplot as plt
from mpl_toolkits.basemap import Basemap
#read data from NETcdf file ".nc"
my_file = '/home/Era-Interim/NH-EraInt-1979.nc'
fh = Dataset(my_file, mode='r')
lons = fh.variables['lon'][:]
lats = fh.variables['lat'][:]
V = fh.variables['V'][:]
V_units = fh.variables['V'].units
fh.close()
# create figure
fig = plt.figure(figsize=(20,20))
# create a map
m = Basemap(projection='nplaea',boundinglat=30,lon_0=10,resolution='l',round=True)
#draw parallels, meridians, coastlines, countries, mapboundary
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
#m.drawmapboundary(linewidth=2)
m.drawparallels(np.arange(30,90,20), labels=[1,1,0,0]) #paral in 10 degree, right, left
m.drawmeridians(np.arange(0,360,30), labels=[1,1,1,1]) #merid in 10 degree, bottom
#Plot the data on top of the map
lon,lat = np.meshgrid(lons,lats)
x,y = m(lon,lat)
cs = m.pcolor(x,y,np.squeeze(V),cmap=plt.cm.RdBu_r)
plt.title("", fontsize=25, verticalalignment='baseline')
plt.savefig("/home/Era-Interim/1.png")
As a result, I received a map (you can find in my dropbox folder) https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0
On the map, there are white pixels between 358.5 and 0 (360) lon, because I have no data between 358.5 and 0 (360) lon.
The question is: how can I change the size of the grid, regrid it, interpolate data, or something else in order to not have this white sector?

I have found a solution. At the beginning of the script, you must add
from mpl_toolkits.basemap import Basemap, addcyclic
and further
datain, lonsin = addcyclic(np.squeeze(Q), lons)
lons, Q = m.shiftdata(lonsin, datain = np.squeeze(Q), lon_0=180.)
print lons
lon, lat = np.meshgrid(lons, lats)
x,y = m(lon, lat)
cs = m.pcolor(x,y,datain,cmap=plt.cm.RdBu_r)
The difference can be seen in the figures (I still can not post images).
https://www.dropbox.com/sh/nvy8wcodk9jtat0/AAC-omkPP8_7uINSSXbzImeja?dl=0

I think in this case some kind of interpolation techniques can be applied.
Check this out. There was similar problem.
Hope it is useful.

The simple answer is 360 degrees is 0 degrees, so you can copy the 0 degrees data and it should look right. I may be interpreting this wrong though, as I believe that the data is representing the pressure levels at each of the points, not between the two points (i.e. at zero degrees, not between zero degrees and 1.5 degrees).
My interpretation means that, yes, you don't have data between 358.5 and 0, but you also don't have data between 357 and 358.5. This seems more likely than just skipping an area. This would mean that the data from 358.5 should be touching the data from 0 as it is just as far away as 0 is from 1.5 which is touching.
Copying the last bit would grant you the ability to change your m.pcolor call to an imshow call (as in Roman Dryndik's link) and use interpolation to smooth out the graph.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting Pandas Dataframe data using Folium GeoJson - python

Related

Draw polygons around a set of points and create clusters in python

open3d compute distance between mesh and point cloud

Spatial merge/combine of multiple netcdf with xarray or satpy

Using adjustText to avoid label overlap with Python prince correspondence analysis

Plotting netCDF data with Python: How to change grid?

Categories

Resources