GeoPandas - grid scattered data and reproject - python

I need to grid scattered data in a GeoPandas dataframe to a regular grid (e.g. 1 degree) and get the mean values of the individual grid boxes and secondly plot this data with various projections.
The first point I managed to achieve using the gpd_lite_toolbox.
This result I can plot on a simple lat lon map, however trying to convert this to any other projection fails.
Here is a small example with some artificial data showing my issue:
import gpd_lite_toolbox as glt
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from shapely import wkt
# creating the artificial df
df = pd.DataFrame(
{'data': [20, 15, 17.5, 11.25, 16],
'Coordinates': ['POINT(-58.66 -34.58)', 'POINT(-47.91 -15.78)',
'POINT(-70.66 -33.45)', 'POINT(-74.08 4.60)',
'POINT(-66.86 10.48)']})
# converting the df to a gdf with projection
df['Coordinates'] = df['Coordinates'].apply(wkt.loads)
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs, geometry='Coordinates')
# gridding the data using the gridify_data function from the toolbox and setting grids without data to nan
g1 = glt.gridify_data(gdf, 1, 'data', cut=False)
g1 = g1.where(g1['data'] > 1)
# simple plot of the gridded data
fig, ax = plt.subplots(ncols=1, figsize=(20, 10))
g1.plot(ax=ax, column='data', cmap='jet')
# trying to convert to (any) other projection
g2 = g1.to_crs({'init': 'epsg:3395'})
# I get the following error
---------------------------------------------------------------------------
AttributeError: 'float' object has no attribute 'is_empty'
I would also be happy to use different gridding function if this solves the problem

Your g1 conatin too much NaN value.
g1 = g1.where(g1['data'] > 1)
print(g1)
geometry data
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 POLYGON ((-74.08 5.48, -73.08 5.48, -73.08 4.4... 11.25
...
You should use g1[g1['data'] > 1] instead of g1.where(g1['data'] > 1).
g1 = g1[g1['data'] > 1]
print(g1)
geometry data
5 POLYGON ((-74.08 5.48, -73.08 5.48, -73.08 4.4... 11.25
181 POLYGON ((-71.08 -32.52, -70.08 -32.52, -70.08... 17.50
322 POLYGON ((-67.08 10.48, -66.08 10.48, -66.08 9... 16.00
735 POLYGON ((-59.08 -34.52, -58.08 -34.52, -58.08... 20.00
1222 POLYGON ((-48.08 -15.52, -47.08 -15.52, -47.08... 15.00
g2 = g1.to_crs({'init': 'epsg:3395'})
print(g2)
geometry data
5 POLYGON ((-8246547.877965705 606885.3761893312... 11.25
181 POLYGON ((-7912589.405585884 -3808795.10464339... 17.50
322 POLYGON ((-7467311.442412791 1165421.424891677... 16.00
735 POLYGON ((-6576755.516066602 -4074627.00861716... 20.00
1222 POLYGON ((-5352241.117340593 -1737775.44359649... 15.00

Related

Geopandas: dataframe to geodataframe with different espg code

I have a dataframe (df2): wherein x,y are specified in rd new epsg:28992 coordinates.
x y z batch_nr batch_description
0 117298.377 560406.392 0.612 5800 PRF Grasland (l)
1 117297.803 560411.756 1.015
2 117296.327 560419.840 1.580
3 117295.470 560425.716 2.490
4 117296.875 560429.976 4.529
more CRS info:
# def CRS, used in geopandas
from pyproj import CRS
crs_rd = CRS.from_user_input(28992)
crs_rd
<Derived Projected CRS: EPSG:28992>
Name: Amersfoort / RD New
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: Netherlands - onshore, including Waddenzee, Dutch Wadden Islands and 12-mile offshore coastal zone.
- bounds: (3.2, 50.75, 7.22, 53.7)
Coordinate Operation:
- name: RD New
- method: Oblique Stereographic
Datum: Amersfoort
- Ellipsoid: Bessel 1841
- Prime Meridian: Greenwich
How can I convert df2 to a geodatafame where the geometry is set as CRS: EPSG 28992?
It's a simple case of using GeoPandas constructor with crs parameter and points_from_xy()
import geopandas as gpd
import pandas as pd
import io
df2 = pd.read_csv(io.StringIO(""" x y z batch_nr batch_description
0 117298.377 560406.392 0.612 5800 PRF Grasland (l)
1 117297.803 560411.756 1.015
2 117296.327 560419.840 1.580
3 117295.470 560425.716 2.490
4 117296.875 560429.976 4.529"""), sep="\s\s+", engine="python")
gdf = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2["x"], df2["y"], df2["z"]), crs="epsg:28992")
gdf
output
x
y
z
batch_nr
batch_description
geometry
0
117298
560406
0.612
5800
PRF Grasland (l)
POINT Z (117298.377 560406.392 0.612)
1
117298
560412
1.015
nan
POINT Z (117297.803 560411.756 1.015)
2
117296
560420
1.58
nan
POINT Z (117296.327 560419.84 1.58)
3
117295
560426
2.49
nan
POINT Z (117295.47 560425.716 2.49)
4
117297
560430
4.529
nan
POINT Z (117296.875 560429.976 4.529)

Erasing outliers from a dataframe in python

For an assignment I have to erase the outliers of a csv based on the different method
I tried working with the variable 'height' of the csv after opening the csv into a panda dataframe, but it keeps giving me errors or not touching the outliers at all, all this trying to use KNN method in python
The code that I wrote is the following
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.datasets import make_blobs
df = pd.read_csv("data.csv")
print(df.describe())
print(df.columns)
df['height'].plot(kind='hist')
print(df['height'].value_counts())
data= pd.DataFrame(df['height'],df['active'])
k=1
knn = NearestNeighbors(n_neighbors=k)
knn.fit([df['height']])
neighbors_and_distances = knn.kneighbors([df['height']])
knn_distances = neighbors_and_distances[0]
tnn_distance = np.mean(knn_distances, axis=1)
print(knn_distances)
PCM = df.plot(kind='scatter', x='x', y='y', c=tnn_distance, colormap='viridis')
plt.show()
And the data it something like this:
id,age,gender,height,weight,ap_hi,ap_lo,cholesterol,gluc,smoke,alco,active,cardio
0,18393,2,168,62.0,110,80,1,1,0,0,1,0
1,20228,1,156,85.0,140,90,3,1,0,0,1,1
2,18857,1,50,64.0,130,70,3,1,0,0,0,1
3,17623,2,250,82.0,150,100,1,1,0,0,1,1
I dont know what Im missing or doing wrong
df = pd.read_csv("data.csv")
X = df[['height', 'weight']]
X.plot(kind='scatter', x='weight', y='height', colormap='viridis')
plt.show()
knn = NearestNeighbors(n_neighbors=2).fit(X)
distances, indices = knn.kneighbors(X)
X['distances'] = distances[:,1]
X.distances
0 1.000000
1 1.000000
2 1.000000
3 3.000000
4 1.000000
5 1.000000
6 133.958949
7 100.344407
...
X.plot(kind='scatter', x='weight', y='height', c='distances', colormap='viridis')
plt.show()
MAX_DIST = 10
X[distances < MAX_DIST]
height weight
0 162 78.0
1 162 78.0
2 151 76.0
3 151 76.0
4 171 84.0
...
And finally to filter out all the outliers:
MAX_DIST = 10
X = X[X.distances < MAX_DIST]

How to make a smooth heatmap?

I have a pandas dataframe called 'result' containing Longitude, Latitude and Production values. The dataframe looks like the following. For each pair of latitude and longitude there is one production value, therefore there many NaN values.
> Latitude 0.00000 32.00057 32.00078 ... 32.92114 32.98220 33.11217
Longitude ...
-104.5213 NaN NaN NaN ... NaN NaN NaN
-104.4745 NaN NaN NaN ... NaN NaN NaN
-104.4679 NaN NaN NaN ... NaN NaN NaN
-104.4678 NaN NaN NaN ... NaN NaN NaN
-104.4660 NaN NaN NaN ... NaN NaN NaN
This is my code:
plt.rcParams['figure.figsize'] = (12.0, 10.0)
plt.rcParams['font.family'] = "serif"
plt.figure(figsize=(14,7))
plt.title('Heatmap based on ANN results')
sns.heatmap(result)
The heatmap plot looks like this
but I want it to look more like this
How to adjust my code so it looks like the one on the second image?
I made a quick and dirty example of how you can smooth data in numpy array. It should be directly applicable to pandas dataframes as well.
First I present the code, then go through it:
# Some needed packages
import numpy as np
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.ndimage import gaussian_filter
np.random.seed(42)
# init an array with a lot of nans to imitate OP data
non_zero_entries = sparse.random(50, 60)
sparse_matrix = np.zeros(non_zero_entries.shape) + non_zero_entries
sparse_matrix[sparse_matrix == 0] = None
# set nans to 0
sparse_matrix[np.isnan(sparse_matrix)] = 0
# smooth the matrix
smoothed_matrix = gaussian_filter(sparse_matrix, sigma=5)
# Set 0s to None as they will be ignored when plotting
# smoothed_matrix[smoothed_matrix == 0] = None
sparse_matrix[sparse_matrix == 0] = None
# Plot the data
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
sharex=False, sharey=True,
figsize=(9, 4))
ax1.matshow(sparse_matrix)
ax1.set_title("Original matrix")
ax2.matshow(smoothed_matrix)
ax2.set_title("Smoothed matrix")
plt.tight_layout()
plt.show()
The code is fairly simple. You can't smooth NaN and we have to get rid of them. I set them to zero, but depending on your field you might want to interpolate them.
Using the gaussian_filter we smooth the image, where sigma controls the width of the kernel.
The plot code yields the following images

Coloring edges in OSMnx graph based on edge attribute

I want to create a map of the roads within a country, and color the edges based on their "highway" attribute, so that motorways are yellow, trunk green, etc...
However, when following the osmnx example files and attempting to replicate, i receive the following error message:
Input:
ec = ox.plot.get_edge_colors_by_attr(graph, attr='highway', cmap='plasma_r')
Output:
TypeError: '<=' not supported between instances of 'str' and 'list'
I'm assuming this is because "highway" is not a numeric variable?
This is the code I currently have for the graph
graph = ox.io.load_graphml("graph.graphml")
nodes, streets = ox.graph_to_gdfs(graph)
streets.head()
Output:
osmid oneway lanes ref highway junction length geometry name maxspeed bridge tunnel access width service u v key
0 659557392 True 1 410 secondary roundabout 48.672 LINESTRING (-21.93067 64.05665, -21.93067 64.0... NaN NaN NaN NaN NaN NaN NaN 6175252481 6175252453 0
1 659557393 False 2 410 secondary NaN 132.007 LINESTRING (-21.93067 64.05665, -21.93057 64.0... Kaldárselsvegur NaN NaN NaN NaN NaN NaN 6175252481 6275284224 0
2 48547677 True NaN 430 secondary NaN 237.337 LINESTRING (-21.72904 64.13621, -21.72959 64.1... Skyggnisbraut 50 NaN NaN NaN NaN NaN 5070446594 616709938 0
3 160506796 False NaN 430 secondary NaN 2892.051 LINESTRING (-21.72904 64.13621, -21.72848 64.1... Úlfarsfellsvegur 70 NaN NaN NaN NaN NaN 5070446594 56620274 0
4 157591872 True 2 41 trunk roundabout 47.075 LINESTRING (-21.93736 64.06693, -21.93730 64.0... Hlíðartorg 60 NaN NaN NaN NaN NaN 12886026 12885866 0
I'm assuming this is because "highway" is not a numeric variable?
Yes. As you can see in the OSMnx docs, the ox.plot.get_edge_colors_by_attr function expects the attr argument to be the "name of a numerical edge attribute." In your example, it's not numeric. Instead, you can use the ox.plot.get_colors function to get one color for each highway type in the graph, then get a list of colors for the edges based on each's highway type:
import osmnx as ox
import pandas as pd
ox.config(use_cache=True, log_console=True)
G = ox.graph_from_place('Piedmont, CA, USA', network_type='drive')
# get one color for each highway type in the graph
edges = ox.graph_to_gdfs(G, nodes=False)
edge_types = edges['highway'].value_counts()
color_list = ox.plot.get_colors(n=len(edge_types), cmap='plasma_r')
color_mapper = pd.Series(color_list, index=edge_types.index).to_dict()
# get the color for each edge based on its highway type
ec = [color_mapper[d['highway']] for u, v, k, d in G.edges(keys=True, data=True)]
fig, ax = ox.plot_graph(G, edge_color=ec)

Scatter plot with custom ticks

I want to do a scatter plot of a wavelength (float) in y-axis and spectral class (list of character/string) in x-axis, labels = ['B','A','F','G','K','M']. Data are saved in pandas dataframe, df.
df['Spec Type Index']
0 NaN
1 A
2 G
. .
. .
167 K
168 Nan
169 G
Then,
df['Disk Major Axis "']
0 4.30
1 4.50
2 22.00
. .
. .
167 1.32
168 0.28
169 25.00
Thus, I thought this should be done simply with
plt.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
But I get this annoying error
could not convert string to float: 'G'
After fixing this, I want to make custom xticks as follows. However, how can I
labels = ['B','A','F','G','K','M']
ticks = np.arange(len(labels))
plt.xticks(ticks, labels)
First, I think you have to map those strings to integers then matplotlib can decide where to place those points.
labels = ['B','A','F','G','K','M']
mapping = {'B': 0,'A': 1,'F': 2,'G': 3,'K': 4,'M': 5}
df = df.replace({'Spec Type Index': mapping})
Then plot the scatter,
fig, ax = plt.subplots()
ax.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
Finally,
ax.set_xticklabels(labels)

Categories

Resources