I am having issues adding tooltips to my folium.features.GeoJson. I can't get columns to display from the dataframe when I select them.
feature = folium.features.GeoJson(df.geometry,
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= [df.acquired],aliases=["Time"],labels=True))
ax.add_child(feature)
For some reason when I run the code above it responds with
Name: acquired, Length: 100, dtype: object is not available in the data. Choose from: ().
I can't seem to link the data to my tooltip.
have made your code a MWE by including some data
two key issues with your code
need to pass properties not just geometry to folium.features.GeoJson() Hence passed df instead of df.geometry
folium.GeoJsonTooltip() takes a list of properties (columns) not an array of values. Hence passed ["acquired"] instead of array of values from a dataframe column
implied issue with your code. All dataframe columns need to contain values that can be serialised to JSON. Hence conversion of acquired to string and drop()
import geopandas as gpd
import pandas as pd
import shapely.wkt
import io
import folium
df = pd.read_csv(io.StringIO("""ref;lanes;highway;maxspeed;length;name;geometry
A3015;2;primary;40 mph;40.68;Rydon Lane;MULTILINESTRING ((-3.4851169 50.70864409999999, -3.4849879 50.7090007), (-3.4857269 50.70693379999999, -3.4853034 50.7081574), (-3.488620899999999 50.70365289999999, -3.4857269 50.70693379999999), (-3.4853034 50.7081574, -3.4851434 50.70856839999999), (-3.4851434 50.70856839999999, -3.4851169 50.70864409999999))
A379;3;primary;50 mph;177.963;Rydon Lane;MULTILINESTRING ((-3.4763853 50.70886769999999, -3.4786112 50.70811229999999), (-3.4746017 50.70944449999999, -3.4763853 50.70886769999999), (-3.470350900000001 50.71041779999999, -3.471219399999999 50.71028909999998), (-3.465049699999999 50.712158, -3.470350900000001 50.71041779999999), (-3.481215600000001 50.70762499999999, -3.4813909 50.70760109999999), (-3.4934747 50.70059599999998, -3.4930204 50.7007898), (-3.4930204 50.7007898, -3.4930048 50.7008015), (-3.4930048 50.7008015, -3.4919513 50.70168349999999), (-3.4919513 50.70168349999999, -3.49137 50.70213669999998), (-3.49137 50.70213669999998, -3.4911565 50.7023015), (-3.4911565 50.7023015, -3.4909108 50.70246919999999), (-3.4909108 50.70246919999999, -3.4902349 50.70291189999999), (-3.4902349 50.70291189999999, -3.4897693 50.70314579999999), (-3.4805021 50.7077218, -3.4806265 50.70770150000001), (-3.488620899999999 50.70365289999999, -3.4888806 50.70353719999999), (-3.4897693 50.70314579999999, -3.489176800000001 50.70340539999999), (-3.489176800000001 50.70340539999999, -3.4888806 50.70353719999999), (-3.4865751 50.70487679999999, -3.4882604 50.70375799999999), (-3.479841700000001 50.70784459999999, -3.4805021 50.7077218), (-3.4882604 50.70375799999999, -3.488620899999999 50.70365289999999), (-3.4806265 50.70770150000001, -3.481215600000001 50.70762499999999), (-3.4717096 50.71021009999998, -3.4746017 50.70944449999999), (-3.4786112 50.70811229999999, -3.479841700000001 50.70784459999999), (-3.471219399999999 50.71028909999998, -3.4717096 50.71021009999998))"""),
sep=";")
df = gpd.GeoDataFrame(df, geometry=df["geometry"].apply(shapely.wkt.loads), crs="epsg:4326")
df["acquired"] = pd.date_range("8-feb-2022", freq="1H", periods=len(df))
def style_function(x):
return {"color":"blue", "weight":3}
ax = folium.Map(
location=[sum(df.total_bounds[[1, 3]]) / 2, sum(df.total_bounds[[0, 2]]) / 2],
zoom_start=12,
)
# data time is not JSON serializable...
df["tt"] = df["acquired"].dt.strftime("%Y-%b-%d %H:%M")
feature = folium.features.GeoJson(df.drop(columns="acquired"),
name='Location',
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields= ["tt"],aliases=["Time"],labels=True))
ax.add_child(feature)
Related
I have a pandas.core.frame.DataFrame with many attributes. I would like to convert the DF to a GDF and export as a geojson. I have columns 'geometry.type' and 'geometry.coordinates' - both are pandas.core.series.Series. An example exerpt is below - note that geometry.coordinates has a list in it
geometry.type
geometry.coordinates
MultiLineString
[[[-74.07224, 40.64417], [-74.07012, 40.64506], [-74.06953, 40.64547], [-74.03249, 40.68565], [-74.01335, 40.69824], [-74.0128, 40.69866], [-74.01265, 40.69907], [-74.01296, 40.70048]], [[-74.01296, 40.70048], [-74.01265, 40.69907], [-74.0128, 40.69866], [-74.01335, 40.69824], [-74.03249, 40.68565], [-74.06953, 40.64547], [-74.07012, 40.64506], [-74.07224, 40.64417]]]
I would like concatenate the two for a proper geometry column in order to export the data as a geojson
Taking your previous question as well
pandas json_normalize() can be used to create a dataframe from the JSON source. This also expands out the nested dicts
it's then a simple case of selecting out columns you want as properties (have renamed as well)
build geometry from geometry.coordinates
import urllib.request, json
import pandas as pd
import geopandas as gpd
import shapely.geometry
with urllib.request.urlopen(
"https://transit.land/api/v2/rest/routes.geojson?operator_onestop_id=o-9q8y-sfmta&api_key=LsyqCJs5aYI6uyxvUz1d0VQQLYoDYdh4&l&"
) as url:
data = json.loads(url.read())
df = pd.json_normalize(data["features"])
# use just attributes that were properties in input that is almost geojson
gdf = gpd.GeoDataFrame(
data=df.loc[:, [c for c in df.columns if c.startswith("properties.")]].pipe(
lambda d: d.rename(columns={c: ".".join(c.split(".")[1:]) for c in d.columns})
),
# build geometry from the co-rodinates
geometry=df["geometry.coordinates"].apply(shapely.geometry.MultiLineString),
crs="epsg:4386",
)
gdf
I'm new in the world of plotting in Python I started learning today doing a mini project by my own, I tried to scrape data and represent here's my code:
import requests
import pandas as pd
from pandas import DataFrame
import numpy as np
import bs4
from bs4 import BeautifulSoup
import matplotlib.pyplot as plot
# Getting the HTML page
URL = "https://www.worldometers.info/coronavirus/#countries"
pag_html = requests.get(URL).text
# Extracting data with BeautifulSoup.
soup = BeautifulSoup(pag_html, 'html.parser')
tabla = soup.find("table", id="main_table_countries_today")
datos_tabla = tabla.tbody.find_all("tr")
Lista = []
for x in range(len(datos_tabla)):
values = [j.string for j in datos_tabla[x].find_all('td')]
Lista.append(values)
df = pd.DataFrame(Lista).iloc[7: , 1:9]
nombre_columna = ["Pais", "Casos totales", "Nuevos Casos", "Muertes totales", "Nuevas Muertes", "Total Recuperados", "Nuevos Recuperados", "Activos"]
df.columns = nombre_columna
df.plot(x="Pais", y="Casos totales", kind ="barh")
plot.show()
The error it's giving me is: "TypeError: no numeric data to plot" I understand that this error is because the column "Casos totales" is a string not a float.
I tried to convert the columns of my Dataframe into floats, but there's no way I got error from everywhere.
Does anyone have any idea how can I represent my DataFrame?
Thanks.
After running the script, as you say the column "Casos Totales" is being interpreted as string due to the commas in the values. You can change this using .str.replace(',','') and then .astype(float), right after renaming the column names in your dataframe:
df['Casos totales'] = df['Casos totales'].str.replace(',','').astype(float)
df.plot(x="Pais", y="Casos totales", kind ="barh")
plot.show()
And this plots the graph (although the visualization is quite poor, but that's another story)
I pull the data from the census api using the census wrapper, i would like to filter that data out with a list of zips i compiled.
So i am trying to filter the data from a pull request data of the census. I Have a csv file of the zip i want to use and i have it put into a list already. I have tried a few things such as putting the census in a data frame and trying to filter the zipcode column by my list but i dont think my syntax is correct.
this is just the test data i pulled,
census_data = c.acs5.get(('NAME', 'B25034_010E'),
{'for': 'zip code tabulation area:*'})
census_pd = census_pd.rename(columns={"NAME": "Name", "zip code tabulation area": "Zipcode"})
censusfilter = census_pd['Zipcode'==ziplst]
so i tried this way, and also i tried a for loop where i take census_pd['Zipcode'] and a inner for loop to iterate over the list with a if statement like zip1 == zip2 append to a list.
my dependencys
# Dependencies
import pandas as pd
import requests
import json
import pprint
import numpy as np
import matplotlib.pyplot as plt
import requests
from census import Census
import gmaps
from us import states
# Census & gmaps API Keys
from config import (api_key, gkey)
c = Census(api_key, year=2013)
# Configure gmaps
gmaps.configure(api_key=gkey)
as mentioned i want to filter out whatever data i may pull from the census data specific to the zipcodes i use
It's not clear how your data looks like. I am guessing that you have a scalar column and you want to filter that column using a list. If it is the question then you can use isin built in method to filter the dataframe.
import pandas as pd
data = {'col': [2, 3, 4], 'col2': [1, 2, 3], 'col3': ["asd", "ads", "asdf"]}
df = pd.DataFrame.from_dict(data)
random_list = ["asd", "ads"]
df_filtered = df[df["col3"].isin(random_list)]
The sample data isn't very clear, so below is how to filter a dataframe on a column using a list of values to filter by
import pandas as pd
from io import StringIO
# Example data
df = pd.read_csv(StringIO(
'''zip,some_column
"01234",A1
"01234",A2
"01235",A3
"01236",B1
'''), dtype = {"zip": str})
zips_list = ["01234", "01235"]
# using a join
zips_df = pd.DataFrame({"zip": zips_list})
df1 = df.merge(zips_df, how='inner', on='zip')
print(df1)
# using query
df2 = df.query('zip in #zips_list')
print(df2)
# using an index
df.set_index("zip", inplace=True)
df3=df.loc[zips_list]
print(df3)
Output in all cases:
zip some_column
0 01234 A1
1 01234 A2
2 01235 A3
I tried to run a function through multiple data frames, but I have a problem with it. My main questions are:
1) I tried to run a defined function with zip(df1, df2, df3,...) and the outputs are new DF1, DF2, DF3,...; however, I failed. Is it possible to run a function through multiple dataframes and outputs are also dataframes by "zip"?
2) If zip() is not a choice, how do I do to make my function running in a loop? Currently, I just have three dataframes and they are easy to be done separately. But I would like to know how to handle it when I have 50, 100, or even more dataframes.
Here are my codes:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#import scipy.stats as ss
# *********** 3 City Temperature files from NOAA ***********
# City 1
df1 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/1Uj5N363dEVJZ9WVy2a_kkbJKJnyyE5qnEqOfzO0UCQE/gviz/tq?tqx=out:csv')
# City 2
df2 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/13CgTdDCDzB_3WIYIRVMeLu6E36xzHSzRR5T_Ku0vThA/gviz/tq?tqx=out:csv')
# City 3
df3 = pd.pandas.read_csv('https://docs.google.com/spreadsheets/d/17pNZFIaV_NpQfSed-msIGu9jzzqF6JBvCZrBRiU2ZkQ/gviz/tq?tqx=out:csv')
def CleanDATA(data):
data = data.drop(columns=['Annual'])
data = data.drop(data.index[29:-1])
data = data.drop(data.index[-1])
monthname=[]
Temp=[]
for row in range(0,len(data)):
for col in range(1,13):
#monthname.append(str(col)+"-"+str(data['Year'][row]))
monthname.append(str(data['Year'][row])+str(col))
Temp.append(data.iloc[row,col])
df0=pd.DataFrame()
df0['Month']=monthname
df0['Temperature']=Temp
df0['Month']=pd.to_datetime(df0['Month'],format='%Y.0%m') #change the date form
df0['Month'] = pd.to_datetime(df0['Month']).dt.date # remove time, only keep date
data =df0[df0.applymap(np.isreal).all(1)] # remove non-numerical
return data
data1 = CleanDATA(df1)
data2 = CleanDATA(df2)
data3 = CleanDATA(df3)
Also, I found an issue with Pandas while reading the following excel file:
https://drive.google.com/file/d/1V9fKpACbLrSi0NfB0FHSgc96PQerKkUF/view?usp=sharing (This is city 1 temperature data from 1990-2019)
2019 is ongoing, hence, NOAA stations only provide information till this May. The excel data labels all missing data by "M". I noticed that once the column comes with an "M", I cannot use boxplot directly even I already drop 2019 row. Spyder console will say "items [Jun to Dec]" are missing (and the wired thing is I can use the same data to plot XY line plot). To plot the boxplot, I have to manually remove 2019 information (1 row) in excel than read the new file.
I would do it using dictionaries (or lists or other iterable).
cities = {'city1': 'https://...', 'city2': 'https://...', 'city3': 'https://...'}
df = {}
data = {}
for city, url in iteritems(cities):
df[city] = pd.pandas.read_csv(url)
data[city] = CleanDATA(df[city])
I'd like to use pandas for all my analysis along with numpy but use Rpy2 for plotting my data. I want to do all analyses using pandas dataframes and then use full plotting of R via rpy2 to plot these. py2, and am using ipython to plot. What's the correct way to do this?
Nearly all commands I try fail. For example:
I'm trying to plot a scatter between two columns of a pandas DataFrame df. I'd like the labels of df to be used in x/y axis just like would be used if it were an R dataframe. Is there a way to do this? When I try to do it with r.plot, I get this gibberish plot:
In: r.plot(df.a, df.b) # df is pandas DataFrame
yields:
Out: rpy2.rinterface.NULL
resulting in the plot:
As you can see, the axes labels are messed up and it's not reading the axes labels from the DataFrame like it should (the X axis is column a of df and the Y axis is column b).
If I try to make a histogram with r.hist, it doesn't work at all, yielding the error:
In: r.hist(df.a)
Out:
...
vectors.pyc in <genexpr>((x,))
293 if l < 7:
294 s = '[' + \
--> 295 ', '.join((p_str(x, max_width = math.floor(52 / l)) for x in self[ : 8])) +\
296 ']'
297 else:
vectors.pyc in p_str(x, max_width)
287 res = x
288 else:
--> 289 res = "%s..." % (str(x[ : (max_width - 3)]))
290 return res
291
TypeError: slice indices must be integers or None or have an __index__ method
And resulting in this plot:
Any idea what the error means? And again here, the axes are all messed up and littered with gibberish data.
EDIT: This error occurs only when using ipython. When I run the command from a script, it still produces the problematic plot, but at least runs with no errors. It must be something wrong with calling these commands from ipython.
I also tried to convert the pandas DataFrame df to an R DataFrame as recommended by the poster below, but that fails too with this error:
com.convert_to_r_dataframe(mydf) # mydf is a pandas DataFrame
----> 1 com.convert_to_r_dataframe(mydf)
in convert_to_r_dataframe(df, strings_as_factors)
275 # FIXME: This doesn't handle MultiIndex
276
--> 277 for column in df:
278 value = df[column]
279 value_type = value.dtype.type
TypeError: iteration over non-sequence
How can I get these basic plotting features to work on Pandas DataFrame (with labels of plots read from the labels of the Pandas DataFrame), and also get the conversion between a Pandas DF to an R DF to work?
EDIT2: Here is a complete example of a csv file "test.txt" (http://pastebin.ca/2311928) and my code to answer #dale's comment:
import rpy2
from rpy2.robjects import r
import rpy2.robjects.numpy2ri
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
from rpy2.robjects.lib import grid
from rpy2.robjects.lib import ggplot2
rpy2.robjects.numpy2ri.activate()
from numpy import *
import scipy
# load up pandas df
import pandas
data = pandas.read_table("./test.txt")
# plotting a column fails
print "data.c2: ", data.c2
r.plot(data.c2)
# Conversion and then plotting also fails
r_df = com.convert_to_r_dataframe(data)
r.plot(r_df)
The call to plot the column of "data.c2" fails, even though data.c2 is a column of a pandas df and therefore for all intents and purposes should be a numpy array. I use the activate() call so I thought it would handle this column as a numpy array and plot it.
The second call to plot the dataframe data after conversion to an R dataframe also fails. Why is that? If I load up test.txt from R as a dataframe, I'm able to plot() it and since my dataframe was converted from pandas to R, it seems like it should work here too.
When I do try rmagic in ipython, it does not fire up a plot window for some reason, though it does not error. I.e. if I do:
In [12]: X = np.array([0,1,2,3,4])
In [13]: Y = np.array([3,5,4,6,7])
In [14]: import rpy2
In [15]: from rpy2.robjects import r
In [16]: import rpy2.robjects.numpy2ri
In [17]: import pandas.rpy.common as com
In [18]: from rpy2.robjects.packages import importr
In [19]: from rpy2.robjects.lib import grid
In [20]: from rpy2.robjects.lib import ggplot2
In [21]: rpy2.robjects.numpy2ri.activate()
In [22]: from numpy import *
In [23]: import scipy
In [24]: r.assign("x", X)
Out[24]:
<Array - Python:0x592ad88 / R:0x6110850>
[ 0, 1, 2, 3, 4]
In [25]: r.assign("y", Y)
<Array - Python:0x592f5f0 / R:0x61109b8>
[ 3, 5, 4, 6, 7]
In [27]: %R plot(x,y)
There's no error, but no plot window either. In any case, I'd like to stick to rpy2 and not rely on rmagic if possible.
Thanks.
[note: Your code in "edit 2" is working here (Python 2.7, rpy2-2.3.2, R-1.15.2).]
As #dale mentions it whenever R objects are anonymous (that is no R symbol exists for the object) the R deparse(substitute()) will end up returning the structure() of the R object, and a possible fix is to specify the "xlab" and "ylab" parameters; for some plots you'll have to also specify main (the title).
An other way to work around that is to use R's formulas and feed the data frame (more below, after we work out the conversion part).
Forget about what is in pandas.rpy. It is both broken and seem to ignore features available in rpy2.
An earlier quick fix to conversion with ipython can be turned into a proper conversion rather easily. I am considering adding one to the rpy2 codebase (with more bells and whistles), but in the meantime just add the following snippet after all your imports in your code examples. It will transparently convert pandas' DataFrame objects into rpy2's DataFrame whenever an R call is made.
from collections import OrderedDict
py2ri_orig = rpy2.robjects.conversion.py2ri
def conversion_pydataframe(obj):
if isinstance(obj, pandas.core.frame.DataFrame):
od = OrderedDict()
for name, values in obj.iteritems():
if values.dtype.kind == 'O':
od[name] = rpy2.robjects.vectors.StrVector(values)
else:
od[name] = rpy2.robjects.conversion.py2ri(values)
return rpy2.robjects.vectors.DataFrame(od)
elif isinstance(obj, pandas.core.series.Series):
# converted as a numpy array
res = py2ri_orig(obj)
# "index" is equivalent to "names" in R
if obj.ndim == 1:
res.names = ListVector({'x': ro.conversion.py2ri(obj.index)})
else:
res.dimnames = ListVector(ro.conversion.py2ri(obj.index))
return res
else:
return py2ri_orig(obj)
rpy2.robjects.conversion.py2ri = conversion_pydataframe
Now the following code will "just work":
r.plot(rpy2.robjects.Formula('c3~c2'), data)
# `data` was converted to an rpy2 data.frame on the fly
# and the a scatter plot c3 vs c2 (with "c2" and "c3" the labels on
# the "x" axis and "y" axis).
I also note that you are importing ggplot2, without using it. Currently the conversion
will have to be explicitly requested. For example:
p = ggplot2.ggplot(rpy2.robjects.conversion.py2ri(data)) +\
ggplot2.geom_histogram(ggplot2.aes_string(x = 'c3'))
p.plot()
You need to pass in the labels explicitly when calling the r.plot function.
r.plot([1,2,3],[1,2,3], xlab="X", ylab="Y")
When you plot in R, it grabs the labels via deparse(substitute(x)) which essentially grabs the variable name from the plot(testX, testY). When you're passing in python objects via rpy2, it's an anonymous R object and akin to the following in R:
> deparse(substitute(c(1,2,3)))
[1] "c(1, 2, 3)"
which is why you're getting the crazy labels.
A lot of times it's saner to use rpy2 to only push data back and forth.
r.assign('testX', df.A)
r.assign('testY', df.B)
%R plot(testX, testY)
rdf = com.convert_to_r_dataframe(df)
r.assign('bob', rdf)
%R plot(bob$$A, bob$$B)
http://nbviewer.ipython.org/4734581/
use rpy. the conversion is part of pandas so you don't need to do it yoursef
http://pandas.pydata.org/pandas-docs/dev/r_interface.html
In [1217]: from pandas import DataFrame
In [1218]: df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
......: index=["one", "two", "three"])
......:
In [1219]: r_dataframe = com.convert_to_r_dataframe(df)
In [1220]: print type(r_dataframe)
<class 'rpy2.robjects.vectors.DataFrame'>