I spent a few hours searching for hints on how to do this, and tried a bunch of things (see below). I'm not getting anywhere, so I finally decided to post a new question.
I have a nested JSON with a dictionary data structure, like this:
for k,v in d.items():
print(f'{k} = {v}')
First two keys:
obj1 = {'color': 'red', 'size': 'large', 'description': 'a large red ball'}
obj2 = {'color': 'blue', 'size': 'small', 'description': 'a small blue ball'}
Side question: is this actually a nested json? Each key (obj1, obj2) has a set of keys, so I think so but I'm not sure.
I then have a dataframe like this:
df
key id_num source
obj1 143 loc1
obj2 139 loc1
I want to map only 'size' and 'description' from my json dictionary to this dataframe, by key. And I want to do that efficiently and readably. I also want it to be robust to the presence of the key, so that if a key doesn't exist in the JSON dict, it just prints "NA" or something.
Things I've tried that got me closest (I tried to map one column at a time, and both at same time):
df['description'] = df['key'].map(d['description'])
df['description'] = df['key'].map(lambda x: d[x]['description'])
df2 = df.join(pd.DataFrame.from_dict(d, orient='index', columns=['size','description']), on='key')
The first one - it's obvious why this doesn't work. It prints KeyError: 'description', as expected. The second one I think would work, but there is a key in my dataframe that doesn't exist in my JSON dict. It prints KeyError: 'obj42' (an object in my df but not in d). The third one works, but requires creating a new dataframe which I'd like to avoid.
How can I make Solution #2 robust to missing keys? Also, is there a way to assign both columns at the same time without creating a new df? I found a way to assign all values in the dict here, but that's not what I want. I only want a couple.
There's always a possibility that my search keywords were not quite right, so if a post exists that answers my question please do let me know and I can delete this one.
One way to go, based on your second attempt, would be as follows:
import pandas as pd
import numpy as np
d = {'obj1': {'color': 'red', 'size': 'large', 'description': 'a large red ball'},
'obj2': {'color': 'blue', 'size': 'small', 'description': 'a small blue ball'}
}
# just adding `obj3` here to supply a `KeyError`
data = {'key': {0: 'obj1', 1: 'obj2', 2: 'obj3'},
'id_num': {0: 143, 1: 139, 2: 140},
'source': {0: 'loc1', 1: 'loc1', 2: 'loc1'}}
df = pd.DataFrame(data)
df[['size','description']] = df['key'].map(lambda x: [d[x]['size'], d[x]['description']] if x in d else [np.nan]*2).tolist()
print(df)
key id_num source size description
0 obj1 143 loc1 large a large red ball
1 obj2 139 loc1 small a small blue ball
2 obj3 140 loc1 NaN NaN
You can create a dataframe from the dictionary and then do .merge:
df = df.merge(
pd.DataFrame(d.values(), index=d.keys())[["size", "description"]],
left_on="key",
right_index=True,
how="left",
)
print(df)
Prints:
key id_num source size description
0 obj1 143 loc1 large a large red ball
1 obj2 139 loc1 small a small blue ball
2 obj3 140 loc1 NaN NaN
Data used:
d = {
"obj1": {
"color": "red",
"size": "large",
"description": "a large red ball",
},
"obj2": {
"color": "blue",
"size": "small",
"description": "a small blue ball",
},
}
data = {
"key": {0: "obj1", 1: "obj2", 2: "obj3"},
"id_num": {0: 143, 1: 139, 2: 140},
"source": {0: "loc1", 1: "loc1", 2: "loc1"},
}
df = pd.DataFrame(data)
I have a large dataset, and I am trying to draw them as lines using GeoJSON. For any line, there needs to be a minimum of 2 points so that it can be drawn correctly. However, I realise that in my dataset, there are some points, that have no matching ID (i.e they cannot form a line as I am grouping them by their IDs which is the last value in each row - wayID). The error I get says LineStrings must have at least 2 coordinate tuples
This is the dataset sample
data = '''lat=1.3240787,long=103.93576,102677,130828
lat=1.3195231,long=103.9343126,106192,190592
lat=1.3194455,long=103.9343254,106191,713620084
lat=1.3202566,long=103.9330146,106190,190591
lat=1.3202224,long=103.9327891,106189,885346352
lat=1.3236842,long=103.9368979,102702,130898
lat=1.3192259,long=103.9338829,106188,464289019
lat=1.3201896,long=103.9326392,106177,473393241
lat=1.3217119,long=103.932483,106176,885346352
lat=1.3217504,long=103.9323308,106173,641080502
lat=1.3226904,long=103.9322832,106172,885346352
lat=1.3226729,long=103.9321595,106171,655522077
lat=1.3231835,long=103.9322084,106170,885346352
lat=1.3219643,long=103.9371845,102882,131521
lat=1.3231554,long=103.9320845,106169,473376614
lat=1.3222227,long=103.9371391,102883,131521
lat=1.3222314,long=103.9349844,106168,190584
lat=1.321424,long=103.9349895,106153,190572
lat=1.3214117,long=103.9351812,106152,190576
lat=1.3215218,long=103.9352676,106151,190576
lat=1.3216347,long=103.9352875,106150,190574
lat=1.3218405,long=103.9351328,106147,190576
lat=1.3218434,long=103.9350341,106146,190573
lat=1.3213905,long=103.9351205,106141,190573'''
This is the code I am using:
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import io
col = ['lat','long','pointID','WAYID']
#load csv as dataframe (replace io.StringIO(data) with the csv filename), use converters to clean up lat and long columns upon loading
df = pd.read_csv(io.StringIO(data), names=col, sep=',', engine='python', converters={'lat': lambda x: float(x.split('=')[1]), 'long': lambda x: float(x.split('=')[1])})
#input the data from the text file
#df = pd.read_csv("latlongWayID.txt", names=col, sep=',', engine='python', converters={'lat': lambda x: float(x.split('=')[1]), 'long': lambda x: float(x.split('=')[1])})
#load dataframe as geodataframe
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.long, df.lat))
#groupby on name and description, while converting the grouped geometries to a LineString
#gdf = gdf.groupby(['description'])['geometry'].apply(lambda p: LineString(zip(p.x, p.y)) if len(p) > 1 else Point(p.x, p.y))
gdf = gdf.groupby(['WAYID'])['geometry'].apply(lambda x: LineString(x.tolist())).reset_index()
jsonLoad = gdf.to_json()
Then save to a file using
import json
from geojson import Point, Feature, dump
#save the data to the file
parsed = json.loads(jsonLoad)
print(json.dumps(parsed, indent=4, sort_keys=True))
#parsed = gdf.to_json()
with open('savedMyfile.geojson', 'w') as f:
dump(parsed, f,indent=1)
Is there a way to check through the large file and quickly exclude all those that don't have the matching ID? I wouldn't mind converting those not-matching coords into a 'Point' type and those with pairs kept as LineString using the code above.
Could someone advise on how I should go about doing it?
Thanks in advance!
this is a simple case of filter in pandas before generating geopandas GeoDataFrame
(df.groupby("WAYID").size() >= 2).loc[lambda s: s].index will give list of WAYID where there are at least 2 associated rows
then it's a simple case of build up a filter for df
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
import io
col = ["lat", "long", "pointID", "WAYID"]
df = pd.read_csv(
io.StringIO(data),
names=col,
sep=",",
engine="python",
converters={
"lat": lambda x: float(x.split("=")[1]),
"long": lambda x: float(x.split("=")[1]),
},
)
# filter dataframe so that remaining WAYID have at least 2 co-ordinates
df = df.loc[df["WAYID"].isin((df.groupby("WAYID").size() >= 2).loc[lambda s: s].index)]
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.long, df.lat))
gdf = gdf.groupby(["WAYID"], as_index=False)["geometry"].apply(
lambda x: LineString(x.tolist())
)
# check generated geojson...
gdf.__geo_interface__
output
{'type': 'FeatureCollection',
'features': [{'id': '0',
'type': 'Feature',
'properties': {'WAYID': 131521},
'geometry': {'type': 'LineString',
'coordinates': ((103.9371845, 1.3219643), (103.9371391, 1.3222227))},
'bbox': (103.9371391, 1.3219643, 103.9371845, 1.3222227)},
{'id': '1',
'type': 'Feature',
'properties': {'WAYID': 190573},
'geometry': {'type': 'LineString',
'coordinates': ((103.9350341, 1.3218434), (103.9351205, 1.3213905))},
'bbox': (103.9350341, 1.3213905, 103.9351205, 1.3218434)},
{'id': '2',
'type': 'Feature',
'properties': {'WAYID': 190576},
'geometry': {'type': 'LineString',
'coordinates': ((103.9351812, 1.3214117),
(103.9352676, 1.3215218),
(103.9351328, 1.3218405))},
'bbox': (103.9351328, 1.3214117, 103.9352676, 1.3218405)},
{'id': '3',
'type': 'Feature',
'properties': {'WAYID': 885346352},
'geometry': {'type': 'LineString',
'coordinates': ((103.9327891, 1.3202224),
(103.932483, 1.3217119),
(103.9322832, 1.3226904),
(103.9322084, 1.3231835))},
'bbox': (103.9322084, 1.3202224, 103.9327891, 1.3231835)}],
'bbox': (103.9322084, 1.3202224, 103.9371845, 1.3231835)}
I tried to do a simple car rental web project using flask, but meet an issue in add multiple markers on coordinates in flask-googlemaps, tried did this according to the tutorial https://github.com/rochacbruno/Flask-GoogleMaps ,
below is my code for add multiple coordinates on google map
catdatas = CarsDataset.query.all()
locations = [d.serializer() for d in catdatas]
carmap = Map(
identifier="carmap",
style="height:500px;width:500px;margin:0;",
lat=locations[0]['lat'],
lng=locations[0]['lng'],
markers=[(loc['lat'], loc['lng']) for loc in locations]
)
each coordinates are successful added, but I don't know how to add multiple markers on it.. thanks in advance!
According to the docs of GoogleMapsFlask, you can put in the "markers" key a list of dictionaries (objects). Example:
[
{
'icon': 'http://maps.google.com/mapfiles/ms/icons/green-dot.png',
'lat': 37.4419,
'lng': -122.1419,
'infobox': "<b>Hello World</b>"
},
{
'icon': 'http://maps.google.com/mapfiles/ms/icons/blue-dot.png',
'lat': 37.4300,
'lng': -122.1400,
'infobox': "<b>Hello World from other place</b>"
}
]
I'm trying to create charts with xlsxwriter python module.
It works fine, but I would like to not have to hard code the row amount
This example will chart 30 rows.
chart.add_series({
'name': 'SNR of old AP',
'values': '=Depart!$D$2:$D$30',
'marker': {'type': 'circle'},
'data_labels': {'value': True,'num_format':'#,##0'},
})
For values': I would like the row count to be dynamic. How do I do this?
Thanks.
It works fine, but I would like to not have to hard code the row amount
XlsxWriter supports a list syntax in add_series() for this exact case. So your example could be written as:
chart.add_series({
'name': 'SNR of old AP',
'values': ['Depart', 1, 3, 29, 3],
'marker': {'type': 'circle'},
'data_labels': {'value': True, 'num_format':'#,##0'},
})
And then you can set any of the first_row, first_col, last_row, last_col parameters programmatically.
See the docs for add_series().
I have a DataFrame (df) like this:
PointID Time geojson
---- ---- ----
36F 2016-04-01T03:52:30 {'type': 'Point', 'coordinates': [3.961389, 43.123]}
36G 2016-04-01T03:52:50 {'type': 'Point', 'coordinates': [3.543234, 43.789]}
The geojson column contains data in geoJSON format (esentially, a Python dict).
I want to create a new column in geoJSON format, which includes the time coordinate. In other words, I want to inject the time information into the geoJSON info.
For a single value, I can successfully do:
oldjson = df.iloc[0]['geojson']
newjson = [df['coordinates'][0], df['coordinates'][1], df.iloc[0]['time'] ]
For a single parameter, I successfully used dataFrame.apply in combination with lambda (thanks to SO: related question
But now, I have two parameters, and I want to use it on the whole DataFrame. As I am not confident with the .apply syntax and lambda, I do not know if this is even possible. I would like to do something like this:
def inject_time(geojson, time):
"""
Injects Time dimension into geoJSON coordinates. Expects a dict in geojson POINT format.
"""
geojson['coordinates'] = [geojson['coordinates'][0], geojson['coordinates'][1], time]
return geojson
df["newcolumn"] = df["geojson"].apply(lambda x: inject_time(x, df['time'])))
...but that does not work, because the function would inject the whole series.
EDIT:
I figured that the format of the timestamped geoJSON should be something like this:
TimestampedGeoJson({
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [[-70,-25],[-70,35],[70,35]],
},
"properties": {
"times": [1435708800000, 1435795200000, 1435881600000]
}
}
]
})
So the time element is in the properties element, but this does not change the problem much.
You need DataFrame.apply with axis=1 for processing by rows:
df['new'] = df.apply(lambda x: inject_time(x['geojson'], x['Time']), axis=1)
#temporary display long string in column
with pd.option_context('display.max_colwidth', 100):
print (df['new'])
0 {'type': 'Point', 'coordinates': [3.961389, 43.123, '2016-04-01T03:52:30']}
1 {'type': 'Point', 'coordinates': [3.543234, 43.789, '2016-04-01T03:52:50']}
Name: new, dtype: object