How do I convert pandas data frame to GeoJson overlay in folium - python

[I have this code here but the thing is when I do folium.GeoJson(data, name="geojson") it just returns ValueError: Cannot render objects with any missing geometries: 0 {'type': 'MultiPolygon', 'coordinates': [[[[-7....
import requests
import json
import pandas as pd
import folium
map = folium.Map(location=\[40, -73\], zoom_start=6, tiles="OpenStreetMap")
fg = folium.FeatureGroup(name="Parks", show = False)
map.add_child(fg)
park_data = requests.get("https://data.cityofnewyork.us/resource/enfh-gkve.json")
park_data = park_data.json()
park_frame = pd.DataFrame(data=park_data)
park_geo_data = park_frame["multipolygon"]
fg.add_child(folium.GeoJson(park_geo_data, name = "Parks"))
map.add_child(folium.LayerControl(position="topright"))
map.save("about.html")
``` See image for what the data looks like raw from the rest api][1]
[Shows error as well as shows what raw rest api response looks like][2]
[1]: https://i.stack.imgur.com/dGSqL.png
[2]: https://i.stack.imgur.com/XlKdS.png

You can iterate over the rows of your dataframe and add row's 'multipolygon' column as your geojson feature to the featureGroup.
park_data = pd.read_json("https://data.cityofnewyork.us/resource/enfh-gkve.json")
for i, row in park_data.iterrows():
geo_data = row['multipolygon']
fg.add_child(folium.GeoJson(geo_data))

Related

Streamlit -Aggrid date filter - unable to filter

I am not sure why, but I am unable to filter the date.
When I filter its either blank or it doesn't filter at all
before filter, after filter using greater than, after filter using the equals to filter
Below is my code:
st.markdown(hide_menu,unsafe_allow_html=True)
file_upload = st.file_uploader('Upload file','.xlsx')
if file_upload is not None:
df = pd.read_excel(file_upload)
#st.checkbox("Use container width", value=False, key="use_container_width")
#Installation Duration
df['Installation_Actual_End'] = pd.to_datetime(df["Installation_Actual_End"])
df['Installation_Actual_Start'] = pd.to_datetime(df["Installation_Actual_Start"])
df['Installation_Duration'] = df["Installation_Actual_End"] - df["Installation_Actual_Start"]
df["Installation_Duration"] = df["Installation_Duration"]
df1 = df['Installation_Actual_Start','Installation_Actual_End','Integration']
AgGrid(df1)
Add a format parameter in the conversion to datetime see code below.
Code
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid
file_upload = st.file_uploader('Upload file','.xlsx')
if file_upload is not None:
df = pd.read_excel(file_upload)
st.write('### Initial data')
st.dataframe(df)
# Add format parameter based from excel data format.
df['Installation_Actual_Start'] = pd.to_datetime(df['Installation_Actual_Start'], format="%d/%m/%Y")
df['Installation_Actual_End'] = pd.to_datetime(df["Installation_Actual_End"], format="%d/%m/%Y")
df['Installation_Duration_Hr'] = (df["Installation_Actual_End"] - df["Installation_Actual_Start"]).astype('timedelta64[h]')
st.write('### Date conversion')
st.dataframe(df)
# Select column to show.
df1 = df[['Installation_Actual_Start', 'Installation_Actual_End', 'Integration']]
st.write('### AgGrid')
AgGrid(df1)
Output
Excel
test_image1
test_image2
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid
file_upload = st.file_uploader('Upload file','.xlsx')
if file_upload is not None:
df = pd.read_excel(file_upload)
st.write('### Initial data')
st.dataframe(df)
# Add format parameter based from excel data format.
df['Installation_Actual_Start'] = pd.to_datetime(df['Installation_Actual_Start'], format="%d/%m/%Y")
df['Installation_Actual_End'] = pd.to_datetime(df["Installation_Actual_End"], format="%d/%m/%Y")
df['Installation_Duration_Hr'] = (df["Installation_Actual_End"] - df["Installation_Actual_Start"]).astype('timedelta64[h]')
st.write('### Date conversion')
st.dataframe(df)
# Select column to show.
df1 = df[['Installation_Actual_Start', 'Installation_Actual_End']]
st.write('### AgGrid')
AgGrid(df1)
Hi #ferdy,
Thank you for taking you time to answer my doubts.
I tested the code, but it wasn't working in my end. in the agrid columns it seems the data did not appear full as your screenshot. I've attached the output that im getting.

Adding tooltip to folium.features.GeoJson from a geopandas dataframe

I am having issues adding tooltips to my folium.features.GeoJson. I can't get columns to display from the dataframe when I select them.
feature = folium.features.GeoJson(df.geometry,
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= [df.acquired],aliases=["Time"],labels=True))
ax.add_child(feature)
For some reason when I run the code above it responds with
Name: acquired, Length: 100, dtype: object is not available in the data. Choose from: ().
I can't seem to link the data to my tooltip.
have made your code a MWE by including some data
two key issues with your code
need to pass properties not just geometry to folium.features.GeoJson() Hence passed df instead of df.geometry
folium.GeoJsonTooltip() takes a list of properties (columns) not an array of values. Hence passed ["acquired"] instead of array of values from a dataframe column
implied issue with your code. All dataframe columns need to contain values that can be serialised to JSON. Hence conversion of acquired to string and drop()
import geopandas as gpd
import pandas as pd
import shapely.wkt
import io
import folium
df = pd.read_csv(io.StringIO("""ref;lanes;highway;maxspeed;length;name;geometry
A3015;2;primary;40 mph;40.68;Rydon Lane;MULTILINESTRING ((-3.4851169 50.70864409999999, -3.4849879 50.7090007), (-3.4857269 50.70693379999999, -3.4853034 50.7081574), (-3.488620899999999 50.70365289999999, -3.4857269 50.70693379999999), (-3.4853034 50.7081574, -3.4851434 50.70856839999999), (-3.4851434 50.70856839999999, -3.4851169 50.70864409999999))
A379;3;primary;50 mph;177.963;Rydon Lane;MULTILINESTRING ((-3.4763853 50.70886769999999, -3.4786112 50.70811229999999), (-3.4746017 50.70944449999999, -3.4763853 50.70886769999999), (-3.470350900000001 50.71041779999999, -3.471219399999999 50.71028909999998), (-3.465049699999999 50.712158, -3.470350900000001 50.71041779999999), (-3.481215600000001 50.70762499999999, -3.4813909 50.70760109999999), (-3.4934747 50.70059599999998, -3.4930204 50.7007898), (-3.4930204 50.7007898, -3.4930048 50.7008015), (-3.4930048 50.7008015, -3.4919513 50.70168349999999), (-3.4919513 50.70168349999999, -3.49137 50.70213669999998), (-3.49137 50.70213669999998, -3.4911565 50.7023015), (-3.4911565 50.7023015, -3.4909108 50.70246919999999), (-3.4909108 50.70246919999999, -3.4902349 50.70291189999999), (-3.4902349 50.70291189999999, -3.4897693 50.70314579999999), (-3.4805021 50.7077218, -3.4806265 50.70770150000001), (-3.488620899999999 50.70365289999999, -3.4888806 50.70353719999999), (-3.4897693 50.70314579999999, -3.489176800000001 50.70340539999999), (-3.489176800000001 50.70340539999999, -3.4888806 50.70353719999999), (-3.4865751 50.70487679999999, -3.4882604 50.70375799999999), (-3.479841700000001 50.70784459999999, -3.4805021 50.7077218), (-3.4882604 50.70375799999999, -3.488620899999999 50.70365289999999), (-3.4806265 50.70770150000001, -3.481215600000001 50.70762499999999), (-3.4717096 50.71021009999998, -3.4746017 50.70944449999999), (-3.4786112 50.70811229999999, -3.479841700000001 50.70784459999999), (-3.471219399999999 50.71028909999998, -3.4717096 50.71021009999998))"""),
sep=";")
df = gpd.GeoDataFrame(df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326")
df["acquired"] = pd.date_range("8-feb-2022", freq="1H", periods=len(df))
def style_function(x):
return {"color":"blue", "weight":3}
ax = folium.Map(
location=[sum(df.total_bounds[[1, 3]]) / 2, sum(df.total_bounds[[0, 2]]) / 2],
zoom_start=12,
)
# data time is not JSON serializable...
df["tt"] = df["acquired"].dt.strftime("%Y-%b-%d %H:%M")
feature = folium.features.GeoJson(df.drop(columns="acquired"),
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= ["tt"],aliases=["Time"],labels=True))
ax.add_child(feature)

ValueError: '_index' is a reserved name for dataframe columns

I am trying to save a file as to h5ad format and it is giving this value error; ValueError: '_index' is a reserved name for dataframe columns.
import pandas as pd
import scanpy as sc
import numpy as np
data = sc.read_h5ad('f.h5ad')
annotation = pd.read_csv('n.tsv', sep='\t')
annotation_dict = {item['barcodes']:item['celltype'] for item in annotation.to_dict('records')}
data.obs['barcodes'] = data.obs.index
data.obs['celltype'] = data.obs['barcodes'].map(annotation_dict)
sc.pp.filter_genes(data,min_cells=686)
sc.pp.filter_cells(data,min_genes=10)
sc.pp.normalize_per_cell(data,20000)
sc.pp.log1p(data)
sc.pp.highly_variable_genes(data,n_top_genes=1000)
data.X = np.exp(data.X.toarray())-1
data=data[:,data.var['highly_variable']]
sc.pp.normalize_per_cell(data,3800)
clustered = sc.read_h5ad('f.h5ad')
sc.pp.filter_cells(data,min_genes=10)
sc.pp.recipe_zheng17(clustered)
sc.tl.pca(clustered, n_comps=50)
sc.pp.neighbors(clustered, n_pcs=50)
sc.tl.louvain(clustered, resolution=0.15)
clustered.obs.groupby('louvain').count()
data.obs['louvain'] = list(clustered.obs['louvain'])
split = pd.DataFrame(data.obs['barcodes'])
test = split.sample(frac=0.2)
d_split = {item:'test' for item in test['barcodes']}
data.obs['split'] = data.obs['barcodes'].map(d_split).fillna('train')
data.write_h5ad(e.h5ad')
This is probably related to a known issue with the AnnData .raw object.
Two workarounds (From here):
#1
data.__dict__['_raw'].__dict__['_var'] = data.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})
#2, deleting the backed up raw information
del data.raw

why does this return an empty data frame?

import pandas as pd, networkx as nx, numpy as np, pylab
from google.colab import drive
drive.mount('/content/drive')
connect = pd.DataFrame(data = pd.read_csv('/content/drive/My Drive/Network resources/693137576_T_ONTIME_REPORTING.csv'), columns = ['FL_DATE','ORIGIN','ORIGIN_CITY_NAME','ORIGIN_STATE_ABR','DEST','DEST_CITY_NAME','DEST_STATE_ABR','DISTANCE'])
G = nx.Graph()
connectdata = pd.DataFrame(columns = ['ORIGIN','ORIGIN_CITY_NAME','ORIGIN_ST','DEST','DEST_CITY_NAME','DEST_ST','DISTANCE'])
for i in range(0,607346):
if G.has_edge(connect.iloc[i,1], connect.iloc[i,5]) == False:
G.add_edge(connect.iloc[i,1], connect.iloc[i,5])
connectdata.append({'ORIGIN': connect.iloc[i,1],'ORIGIN_CITY_NAME': connect.iloc[i,2],'ORIGIN_ST': connect.iloc[i,3],'DEST': connect.iloc[i,4],'DEST_CITY_NAME': connect.iloc[i,5],'DEST_ST': connect.iloc[i,6],'DISTANCE': connect.iloc[i,7],'NO_OF_FLIGHTS': 1 }, ignore_index = True)
print(G.number_of_nodes())
pd.set_option('display.max_columns',None)
print(connectdata)
Here I am simply importing a dataframe, creating a Graph and then a dataframe. The graph being formed is fine, but the data frame is empty, any reason why this is happening? Any help is appreciated, thanks.
Mostly when You get empty DataFrame in your output with only column names/header names it's because of irregular dimensions of your data or number of rows of each column does not match to show the output.
I just had to :
connectdata =
connectdata.append({'ORIGIN': connect.iloc[i,1],'ORIGIN_CITY_NAME': connect.iloc[i,2],'ORIGIN_ST': connect.iloc[i,3],'DEST': connect.iloc[i,4],'DEST_CITY_NAME': connect.iloc[i,5],'DEST_ST': connect.iloc[i,6],'DISTANCE': connect.iloc[i,7],'NO_OF_FLIGHTS': 1 }, ignore_index = True)

Pandas Google Distance Matrix API - Pass coordinates into URL

I am working with the Google Distance Matrix API, where I want to feed coordinates from a dataframe into the API and return the duration and distance between the two points.
Here is my dataframe:
import pandas as pd
import simplejson
import urllib
import numpy as np
Record orig_lat orig_lng dest_lat dest_lng
1 40.7484405 -74.0073127 40.7115242 -74.0145492
2 40.7421218 -73.9878531 40.7727216 -73.9863531
First, I need to combine the orig_lat & orig_lng and dest_lat & dest_lng into strings, which then pass into the url. So I've tried creating the variables orig_coord & dest_coord then passing them into the URL and returning values:
orig_coord = df[['orig_lat','orig_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
dest_coord = df[['dest_lat','dest_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
for row in df.itertuples():
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,end_coord)
result = simplejson.load(urllib.urlopen(url))
df['driving_time_text'] = result['rows'][0]['elements'][0]['duration']['text']
But I get the following error: "TypeError: () got an unexpected keyword argument 'axis'"
So my question is: how do I concatenate values from two columns into a string, then pass that string into a URL and output the result?
Thank you in advance!
Hmm, I am not sure how you constructed your data frame. Maybe post those details? But if you can live with referencing tuple elements positionally, this worked for me:
import pandas as pd
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353}]
df = pd.DataFrame(data)
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
print url
produces
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.748441,-74.007313&destinations=40.711524,-74.014549&units=imperial&MYGOOGLEAPIKEY
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.742122,-73.987853&destinations=40.772722,-73.986353&units=imperial&MYGOOGLEAPIKEY
To update the data frame with the result, since row is a tuple and not writeable, you might want to keep track of the current index as you iterate. Maybe something like this:
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549, 'result': -1},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353, 'result': -1}]
df = pd.DataFrame(data)
i_row = 0
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
# Do stuff to get your result
df['result'][i_row] = result
i_row++

Categories

Resources