Strange Plotly behaviour with Choropleth Mapbox - python

I want to create a choropleth map out of a GeoJSON file that looks like this:
{"type": "FeatureCollection", "features": [
{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[... , ...] ... [..., ...]]]]}, 'properties': {'id': 'A'},
{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[... , ...] ... [..., ...]]]]}, 'properties': {'id': 'B'},
...
]}
with each id property being different for each feature.
I mapped each feature (by the id property) to it's particular region as it follows:
regions = {
'A': 'MOUNTAINS',
'B': 'BEACH',
...
}
and then created a DataFrame to store each id and each region:
ids = []
for feature in geojson['features']:
ids.append(feature['properties']['id'])
df = pd.DataFrame (ids, columns = ['id'])
df['region'] = df['id'].map(regions)
That returns a DataFrame like this:
id region
0 A MOUNTAIN
1 B BEACH
2 C PLAIN
3 D FOREST
...
I then tried to create a choropleth map with that info:
fig = px.choropleth_mapbox(df, geojson=geojson, color="region",
locations="id", featureidkey="properties.id",
center={"lat": -9.893, "lon": -50.423},
mapbox_style="white-bg", zoom=9)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
However, this results in an excessively long running time, which crashes about a minute or so later, with no error.
I wanted to check if there was something wrong with the GeoJSON file and/or with the mapping, so I assigned random numeric data to each id in df, by:
df['random_number'] = np.random.randint(0,100,size=len(df))
and re-tried the map with the following code:
fig = px.choropleth_mapbox(df, geojson=geojson, color="random_number",
locations="id", featureidkey="properties.id",
center={"lat": -9.893, "lon": -50.423},
mapbox_style="white-bg", zoom=9)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
and it worked, so I am guessing there is some kind of trouble with the non-numeric values in the region column of df, which are not being properly passed to the choropleth map.
Any advice, help or solution will be much appreciated!

Related

mapping some key-value pairs from nested json to new columns in Pandas dataframe

I spent a few hours searching for hints on how to do this, and tried a bunch of things (see below). I'm not getting anywhere, so I finally decided to post a new question.
I have a nested JSON with a dictionary data structure, like this:
for k,v in d.items():
print(f'{k} = {v}')
First two keys:
obj1 = {'color': 'red', 'size': 'large', 'description': 'a large red ball'}
obj2 = {'color': 'blue', 'size': 'small', 'description': 'a small blue ball'}
Side question: is this actually a nested json? Each key (obj1, obj2) has a set of keys, so I think so but I'm not sure.
I then have a dataframe like this:
df
key id_num source
obj1 143 loc1
obj2 139 loc1
I want to map only 'size' and 'description' from my json dictionary to this dataframe, by key. And I want to do that efficiently and readably. I also want it to be robust to the presence of the key, so that if a key doesn't exist in the JSON dict, it just prints "NA" or something.
Things I've tried that got me closest (I tried to map one column at a time, and both at same time):
df['description'] = df['key'].map(d['description'])
df['description'] = df['key'].map(lambda x: d[x]['description'])
df2 = df.join(pd.DataFrame.from_dict(d, orient='index', columns=['size','description']), on='key')
The first one - it's obvious why this doesn't work. It prints KeyError: 'description', as expected. The second one I think would work, but there is a key in my dataframe that doesn't exist in my JSON dict. It prints KeyError: 'obj42' (an object in my df but not in d). The third one works, but requires creating a new dataframe which I'd like to avoid.
How can I make Solution #2 robust to missing keys? Also, is there a way to assign both columns at the same time without creating a new df? I found a way to assign all values in the dict here, but that's not what I want. I only want a couple.
There's always a possibility that my search keywords were not quite right, so if a post exists that answers my question please do let me know and I can delete this one.
One way to go, based on your second attempt, would be as follows:
import pandas as pd
import numpy as np
d = {'obj1': {'color': 'red', 'size': 'large', 'description': 'a large red ball'},
'obj2': {'color': 'blue', 'size': 'small', 'description': 'a small blue ball'}
}
# just adding `obj3` here to supply a `KeyError`
data = {'key': {0: 'obj1', 1: 'obj2', 2: 'obj3'},
'id_num': {0: 143, 1: 139, 2: 140},
'source': {0: 'loc1', 1: 'loc1', 2: 'loc1'}}
df = pd.DataFrame(data)
df[['size','description']] = df['key'].map(lambda x: [d[x]['size'], d[x]['description']] if x in d else [np.nan]*2).tolist()
print(df)
key id_num source size description
0 obj1 143 loc1 large a large red ball
1 obj2 139 loc1 small a small blue ball
2 obj3 140 loc1 NaN NaN
You can create a dataframe from the dictionary and then do .merge:
df = df.merge(
pd.DataFrame(d.values(), index=d.keys())[["size", "description"]],
left_on="key",
right_index=True,
how="left",
)
print(df)
Prints:
key id_num source size description
0 obj1 143 loc1 large a large red ball
1 obj2 139 loc1 small a small blue ball
2 obj3 140 loc1 NaN NaN
Data used:
d = {
"obj1": {
"color": "red",
"size": "large",
"description": "a large red ball",
},
"obj2": {
"color": "blue",
"size": "small",
"description": "a small blue ball",
},
}
data = {
"key": {0: "obj1", 1: "obj2", 2: "obj3"},
"id_num": {0: 143, 1: 139, 2: 140},
"source": {0: "loc1", 1: "loc1", 2: "loc1"},
}
df = pd.DataFrame(data)

The fastest way to search for all the lines that don't have a matching ID string in pandas

I have a large dataset, and I am trying to draw them as lines using GeoJSON. For any line, there needs to be a minimum of 2 points so that it can be drawn correctly. However, I realise that in my dataset, there are some points, that have no matching ID (i.e they cannot form a line as I am grouping them by their IDs which is the last value in each row - wayID). The error I get says LineStrings must have at least 2 coordinate tuples
This is the dataset sample
data = '''lat=1.3240787,long=103.93576,102677,130828
lat=1.3195231,long=103.9343126,106192,190592
lat=1.3194455,long=103.9343254,106191,713620084
lat=1.3202566,long=103.9330146,106190,190591
lat=1.3202224,long=103.9327891,106189,885346352
lat=1.3236842,long=103.9368979,102702,130898
lat=1.3192259,long=103.9338829,106188,464289019
lat=1.3201896,long=103.9326392,106177,473393241
lat=1.3217119,long=103.932483,106176,885346352
lat=1.3217504,long=103.9323308,106173,641080502
lat=1.3226904,long=103.9322832,106172,885346352
lat=1.3226729,long=103.9321595,106171,655522077
lat=1.3231835,long=103.9322084,106170,885346352
lat=1.3219643,long=103.9371845,102882,131521
lat=1.3231554,long=103.9320845,106169,473376614
lat=1.3222227,long=103.9371391,102883,131521
lat=1.3222314,long=103.9349844,106168,190584
lat=1.321424,long=103.9349895,106153,190572
lat=1.3214117,long=103.9351812,106152,190576
lat=1.3215218,long=103.9352676,106151,190576
lat=1.3216347,long=103.9352875,106150,190574
lat=1.3218405,long=103.9351328,106147,190576
lat=1.3218434,long=103.9350341,106146,190573
lat=1.3213905,long=103.9351205,106141,190573'''
This is the code I am using:
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import io
col = ['lat','long','pointID','WAYID']
#load csv as dataframe (replace io.StringIO(data) with the csv filename), use converters to clean up lat and long columns upon loading
df = pd.read_csv(io.StringIO(data), names=col, sep=',', engine='python', converters={'lat': lambda x: float(x.split('=')[1]), 'long': lambda x: float(x.split('=')[1])})
#input the data from the text file
#df = pd.read_csv("latlongWayID.txt", names=col, sep=',', engine='python', converters={'lat': lambda x: float(x.split('=')[1]), 'long': lambda x: float(x.split('=')[1])})
#load dataframe as geodataframe
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.long, df.lat))
#groupby on name and description, while converting the grouped geometries to a LineString
#gdf = gdf.groupby(['description'])['geometry'].apply(lambda p: LineString(zip(p.x, p.y)) if len(p) > 1 else Point(p.x, p.y))
gdf = gdf.groupby(['WAYID'])['geometry'].apply(lambda x: LineString(x.tolist())).reset_index()
jsonLoad = gdf.to_json()
Then save to a file using
import json
from geojson import Point, Feature, dump
#save the data to the file
parsed = json.loads(jsonLoad)
print(json.dumps(parsed, indent=4, sort_keys=True))
#parsed = gdf.to_json()
with open('savedMyfile.geojson', 'w') as f:
dump(parsed, f,indent=1)
Is there a way to check through the large file and quickly exclude all those that don't have the matching ID? I wouldn't mind converting those not-matching coords into a 'Point' type and those with pairs kept as LineString using the code above.
Could someone advise on how I should go about doing it?
Thanks in advance!
this is a simple case of filter in pandas before generating geopandas GeoDataFrame
(df.groupby("WAYID").size() >= 2).loc[lambda s: s].index will give list of WAYID where there are at least 2 associated rows
then it's a simple case of build up a filter for df
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import io
col = ["lat", "long", "pointID", "WAYID"]
df = pd.read_csv(
io.StringIO(data),
names=col,
sep=",",
engine="python",
converters={
"lat": lambda x: float(x.split("=")[1]),
"long": lambda x: float(x.split("=")[1]),
},
)
# filter dataframe so that remaining WAYID have at least 2 co-ordinates
df = df.loc[df["WAYID"].isin((df.groupby("WAYID").size() >= 2).loc[lambda s: s].index)]
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.long, df.lat))
gdf = gdf.groupby(["WAYID"], as_index=False)["geometry"].apply(
lambda x: LineString(x.tolist())
)
# check generated geojson...
gdf.__geo_interface__
output
{'type': 'FeatureCollection',
'features': [{'id': '0',
'type': 'Feature',
'properties': {'WAYID': 131521},
'geometry': {'type': 'LineString',
'coordinates': ((103.9371845, 1.3219643), (103.9371391, 1.3222227))},
'bbox': (103.9371391, 1.3219643, 103.9371845, 1.3222227)},
{'id': '1',
'type': 'Feature',
'properties': {'WAYID': 190573},
'geometry': {'type': 'LineString',
'coordinates': ((103.9350341, 1.3218434), (103.9351205, 1.3213905))},
'bbox': (103.9350341, 1.3213905, 103.9351205, 1.3218434)},
{'id': '2',
'type': 'Feature',
'properties': {'WAYID': 190576},
'geometry': {'type': 'LineString',
'coordinates': ((103.9351812, 1.3214117),
(103.9352676, 1.3215218),
(103.9351328, 1.3218405))},
'bbox': (103.9351328, 1.3214117, 103.9352676, 1.3218405)},
{'id': '3',
'type': 'Feature',
'properties': {'WAYID': 885346352},
'geometry': {'type': 'LineString',
'coordinates': ((103.9327891, 1.3202224),
(103.932483, 1.3217119),
(103.9322832, 1.3226904),
(103.9322084, 1.3231835))},
'bbox': (103.9322084, 1.3202224, 103.9327891, 1.3231835)}],
'bbox': (103.9322084, 1.3202224, 103.9371845, 1.3231835)}

Add multiple markers on each coordinates in flask-googlemaps

I tried to do a simple car rental web project using flask, but meet an issue in add multiple markers on coordinates in flask-googlemaps, tried did this according to the tutorial https://github.com/rochacbruno/Flask-GoogleMaps ,
below is my code for add multiple coordinates on google map
catdatas = CarsDataset.query.all()
locations = [d.serializer() for d in catdatas]
carmap = Map(
identifier="carmap",
style="height:500px;width:500px;margin:0;",
lat=locations[0]['lat'],
lng=locations[0]['lng'],
markers=[(loc['lat'], loc['lng']) for loc in locations]
)
each coordinates are successful added, but I don't know how to add multiple markers on it.. thanks in advance!
According to the docs of GoogleMapsFlask, you can put in the "markers" key a list of dictionaries (objects). Example:
[
{
'icon': 'http://maps.google.com/mapfiles/ms/icons/green-dot.png',
'lat': 37.4419,
'lng': -122.1419,
'infobox': "<b>Hello World</b>"
},
{
'icon': 'http://maps.google.com/mapfiles/ms/icons/blue-dot.png',
'lat': 37.4300,
'lng': -122.1400,
'infobox': "<b>Hello World from other place</b>"
}
]

xlsxwriter chart create dynamic rows

I'm trying to create charts with xlsxwriter python module.
It works fine, but I would like to not have to hard code the row amount
This example will chart 30 rows.
chart.add_series({
'name': 'SNR of old AP',
'values': '=Depart!$D$2:$D$30',
'marker': {'type': 'circle'},
'data_labels': {'value': True,'num_format':'#,##0'},
})
For values': I would like the row count to be dynamic. How do I do this?
Thanks.
It works fine, but I would like to not have to hard code the row amount
XlsxWriter supports a list syntax in add_series() for this exact case. So your example could be written as:
chart.add_series({
'name': 'SNR of old AP',
'values': ['Depart', 1, 3, 29, 3],
'marker': {'type': 'circle'},
'data_labels': {'value': True, 'num_format':'#,##0'},
})
And then you can set any of the first_row, first_col, last_row, last_col parameters programmatically.
See the docs for add_series().

Pandas DataFrame.apply: create new column with data from two columns

I have a DataFrame (df) like this:
PointID Time geojson
---- ---- ----
36F 2016-04-01T03:52:30 {'type': 'Point', 'coordinates': [3.961389, 43.123]}
36G 2016-04-01T03:52:50 {'type': 'Point', 'coordinates': [3.543234, 43.789]}
The geojson column contains data in geoJSON format (esentially, a Python dict).
I want to create a new column in geoJSON format, which includes the time coordinate. In other words, I want to inject the time information into the geoJSON info.
For a single value, I can successfully do:
oldjson = df.iloc[0]['geojson']
newjson = [df['coordinates'][0], df['coordinates'][1], df.iloc[0]['time'] ]
For a single parameter, I successfully used dataFrame.apply in combination with lambda (thanks to SO: related question
But now, I have two parameters, and I want to use it on the whole DataFrame. As I am not confident with the .apply syntax and lambda, I do not know if this is even possible. I would like to do something like this:
def inject_time(geojson, time):
"""
Injects Time dimension into geoJSON coordinates. Expects a dict in geojson POINT format.
"""
geojson['coordinates'] = [geojson['coordinates'][0], geojson['coordinates'][1], time]
return geojson
df["newcolumn"] = df["geojson"].apply(lambda x: inject_time(x, df['time'])))
...but that does not work, because the function would inject the whole series.
EDIT:
I figured that the format of the timestamped geoJSON should be something like this:
TimestampedGeoJson({
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [[-70,-25],[-70,35],[70,35]],
},
"properties": {
"times": [1435708800000, 1435795200000, 1435881600000]
}
}
]
})
So the time element is in the properties element, but this does not change the problem much.
You need DataFrame.apply with axis=1 for processing by rows:
df['new'] = df.apply(lambda x: inject_time(x['geojson'], x['Time']), axis=1)
#temporary display long string in column
with pd.option_context('display.max_colwidth', 100):
print (df['new'])
0 {'type': 'Point', 'coordinates': [3.961389, 43.123, '2016-04-01T03:52:30']}
1 {'type': 'Point', 'coordinates': [3.543234, 43.789, '2016-04-01T03:52:50']}
Name: new, dtype: object

Categories

Resources