I have very little experience with folium maps.
I need to make a map with the number of establishments in each department, the problem is that the capital has far more establishments than the interior, so when I create the color layer I get the capital as dark blue and all the rest with the same lighter color. This way the map is not so useful...
How could I solve that? I thought of maybe dividing the value by the population but it would be better to use the original value.
In the documentation, I did not find a way to parameterize the color.
df1 = pd.DataFrame({'code':['75','77','78','91','92','93','94','95'],'value':['13000','2000','2500','2300','2150','2600','1630','1300']})
dep_geo = geopandas.read_file('./dep.json', driver="JSON") #geodata taken from https://github.com/gregoiredavid/france-geojson/blob/master/departements.geojson
departments = {'75', '77', '78', '91', '92', '93', '94', '95'}
dep_geo = dep_geo[dep_geo['code'].isin(departments)]
df_map = dep_geo.merge(df1, how="left", left_on=['code'], right_on=['code'])
my_map = folium.Map(location=[48.856614, 2.3522219],
zoom_start = 9, tiles='cartodbpositron')
folium.Choropleth(
geo_data=df_map,
data=df_map,
columns=['code',"value"],
key_on="feature.properties.code",
fill_color='YlGnBu',
fill_opacity=0.5,
line_opacity=0.2,
legend_name="value ",
smooth_factor=0,
Highlight= True,
line_color = "black",
name = "value",
show=False,
overlay=True,
nan_fill_color = "White"
).add_to(my_map)
Result:
Thank you for your help!
it's as simple as using vmax argument. I've set to 85th percentile
also used geopandas explore() to generate the folium map
import geopandas as gpd
import pandas as pd
import folium
df1 = pd.DataFrame(
{
"code": ["75", "77", "78", "91", "92", "93", "94", "95"],
"value": ["13000", "2000", "2500", "2300", "2150", "2600", "1630", "1300"],
}
)
# dep_geo = geopandas.read_file('./dep.json', driver="JSON") #geodata taken from https://github.com/gregoiredavid/france-geojson/blob/master/departements.geojson
dep_geo = gpd.read_file(
"https://github.com/gregoiredavid/france-geojson/raw/master/departements.geojson"
) # geodata taken from https://github.com/gregoiredavid/france-geojson/blob/master/departements.geojson
departments = {"75", "77", "78", "91", "92", "93", "94", "95"}
dep_geo = dep_geo[dep_geo["code"].isin(departments)]
df_map = dep_geo.merge(df1, how="left", left_on=["code"], right_on=["code"])
df_map["value"] = pd.to_numeric(df_map["value"])
df_map.explore(
column="value",
cmap="YlGnBu",
vmax=df_map["value"].quantile(0.85),
style_kwds=dict(
color="rgba(0,0,0,.2)",
),
location=[48.856614, 2.3522219],
zoom_start=9,
tiles="cartodbpositron",
)
I am working on a presidential elections project that involves filtering a choropleth map. My data is at the county level, and I have a drop down box that allows a user to select a state. The counties are colored by a blue to red continuous color scale representing the lean from democrat to republican. The variable I use for the color scale is the margin between the vote of both parties.
If the margin is positive, the county should be colored a shade of blue. If the margin is negative, the county should colored a shade of red.
However, when I filter to a particular state and all counties in that state voted for one party, the scale finds the lowest margin value and assigns that a color on the blue end of the spectrum even if that county voted more for the Republican.
Is there a way to fix the color scale when filtering so the counties are colored correctly?
Here is some example code:
import pandas as pd
import dash
import os
from urllib.request import urlopen
import json
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px
with urlopen(
"https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json"
) as response:
counties = json.load(response)
data = [
["Delaware", "Kent County", 10001, 0.467, 0.517, -75.513210, 39.156876],
["Delaware", "New Castle County", 10003, 0.322, 0.663, -75.513210, 39.156876],
["Delaware", "Sussex County", 10005, 0.559, 0.428, -75.513210, 39.156876],
["District of Columbia", "District of Columbia",11001,0.0712,0.913,-77.014468,38.910270],
["Rhode Island", "Bristol County",44001,0.2429,0.7352,-71.41572,41.65665],
["Rhode Island", "Kent County",44003,0.45117,0.5275,-71.41572,41.65665],
["Rhode Island", "Newport County",44005,0.3406,0.6389,-71.41572,41.65665],
["Rhode Island", "Providence County",44007,0.3761,0.605177,-71.41572,41.65665],
["Rhode Island", "Washington County",44009,0.392032,0.5857,-71.41572,41.65665]
]
data = pd.DataFrame(
data,
columns=[
"State",
"County",
"fips_code",
"perc_gop",
"perc_dem",
"lon",
"lat",
],
)
state_choices = data["State"].sort_values().unique()
data['margin_perc'] = data['perc_dem'] - data['perc_gop']
app = dash.Dash(__name__, assets_folder=os.path.join(os.curdir, "assets"))
server = app.server
app.layout = html.Div([
html.Div([
dcc.Dropdown(
id="dropdown1",
options=[{"label": i, "value": i} for i in state_choices],
value=state_choices[0],
)
],style={"width": "100%", "display": "inline-block", "text-align": "center"}
),
# State Map with County Choropleth
html.Div([
dcc.Graph(id="state_map")],
style={"width": "100%", "display": "inline-block", "text-align": "center"},
)
]
)
#app.callback(Output("state_map", "figure"), Input("dropdown1", "value"))
def update_figure3(state_select):
new_df = data[data["State"] == state_select]
avg_lat = new_df["lat"].mean()
avg_lon = new_df["lon"].mean()
fig = px.choropleth_mapbox(
new_df,
geojson=counties,
locations="fips_code",
color="margin_perc",
color_continuous_scale="balance",
mapbox_style="carto-positron",
zoom=6,
center={"lat": avg_lat, "lon": avg_lon},
opacity=0.5,
labels={
"State": "State",
"County": "County",
"perc_gop": "% Republican",
"perc_dem": "% Democratic",
"margin_perc":"% Margin"
},
hover_data={
"fips_code": False,
"State": True,
"County": True,
"perc_gop": True,
"perc_dem": True,
},
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
return fig
app.run_server(host="0.0.0.0", port="8051")
Figured it out --> needed to read documentation more carefully :/
The color_continuous_midpoint argument came in handy. Just calculated the midpoint for the color variable for the entire dataset and used that as the fixed midpoint in the scale.
import pandas as pd
from pandas_datareader import data as wb
tickers = ["MMM", "ABT", "ABBV", "ABMD", "ACN", "ATVI", "ADBE", "AMD", "AAP", "AES", "AFL", "A", "APD", "AKAM", "ALK", "ALB", "ARE", "ALXN", "ALGN", "ALLE", "LNT", "ALL", "GOOGL", "GOOG", "MO", "AMZN", "AMCR", "AEE", "AAL", "AEP", "AXP", "AIG", "AMT", "AWK", "AMP", "ABC", "AME", "AMGN", "APH", "ADI", "ANSS", "ANTM", "AON", "AOS", "APA", "AAPL", "AMAT", "APTV", "ADM", "ANET", "AJG", "AIZ", "T", "ATO", "ADSK", "ADP", "AZO", "AVB", "AVY", "BKR", "BLL", "BAC", "BK", "BAX", "BDX", "BBY", "BIO", "BIIB", "BLK", "BA", "BKNG", "BWA", "BXP", "BSX", "BMY", "AVGO", "BR", "CHRW", "COG", "CDNS", "CZR", "CPB", "COF", "CAH", "KMX", "CCL", "CARR", "CTLT", "CAT", "CBOE", "CBRE", "CDW", "CE", "CNC", "CNP", "CERN", "CF", "SCHW", "CHTR", "CVX", "CMG", "CB", "CHD", "CI", "CINF", "CTAS", "CSCO", "C", "CFG", "CTXS", "CLX", "CME", "CMS", "KO", "CTSH", "CL", "CMCSA", "CMA", "CAG", "COP", "ED", "STZ", "COO", "CPRT", "GLW", "CTVA", "COST", "CCI", "CSX", "CMI", "CVS", "DHI", "DHR", "DRI", "DVA", "DE", "DAL", "XRAY", "DVN", "DXCM", "FANG", "DLR", "DFS", "DISCA", "DISCK", "DISH", "DG", "DLTR", "D", "DPZ", "DOV", "DOW", "DTE", "DUK", "DRE", "DD", "DXC", "EMN", "ETN", "EBAY", "ECL", "EIX", "EW", "EA", "EMR", "ENPH", "ETR", "EOG", "EFX", "EQIX", "EQR", "ESS", "EL", "ETSY", "EVRG", "ES", "RE", "EXC", "EXPE", "EXPD", "EXR", "XOM", "FFIV", "FB", "FAST", "FRT", "FDX", "FIS", "FITB", "FE", "FRC", "FISV", "FLT", "FLIR", "FMC", "F", "FTNT", "FTV", "FBHS", "FOXA", "FOX", "BEN", "FCX", "GPS", "GRMN", "IT", "GNRC", "GD", "GE", "GIS", "GM", "GPC", "GILD", "GL", "GPN", "GS", "GWW", "HAL", "HBI", "HIG", "HAS", "HCA", "PEAK", "HSIC", "HSY", "HES", "HPE", "HLT", "HFC", "HOLX", "HD", "HON", "HRL", "HST", "HWM", "HPQ", "HUM", "HBAN", "HII", "IEX", "IDXX", "INFO", "ITW", "ILMN", "INCY", "IR", "INTC", "ICE", "IBM", "IP", "IPG", "IFF", "INTU", "ISRG", "IVZ", "IPGP", "IQV", "IRM", "JKHY", "J", "JBHT", "SJM", "JNJ", "JCI", "JPM", "JNPR", "KSU", "K", "KEY", "KEYS", "KMB", "KIM", "KMI", "KLAC", "KHC", "KR", "LB", "LHX", "LH", "LRCX", "LW", "LVS", "LEG", "LDOS", "LEN", "LLY", "LNC", "LIN", "LYV", "LKQ", "LMT", "L", "LOW", "LUMN", "LYB", "MTB", "MRO", "MPC", "MKTX", "MAR", "MMC", "MLM", "MAS", "MA", "MKC", "MXIM", "MCD", "MCK", "MDT", "MRK", "MET", "MTD", "MGM", "MCHP", "MU", "MSFT", "MAA", "MHK", "TAP", "MDLZ", "MPWR", "MNST", "MCO", "MS", "MOS", "MSI", "MSCI", "NDAQ", "NTAP", "NFLX", "NWL", "NEM", "NWSA", "NWS", "NEE", "NLSN", "NKE", "NI", "NSC", "NTRS", "NOC", "NLOK", "NCLH", "NOV", "NRG", "NUE", "NVDA", "NVR", "NXPI", "ORLY", "OXY", "ODFL", "OMC", "OKE", "ORCL", "OTIS", "PCAR", "PKG", "PH", "PAYX", "PAYC", "PYPL", "PENN", "PNR", "PBCT", "PEP", "PKI", "PRGO", "PFE", "PM", "PSX", "PNW", "PXD", "PNC", "POOL", "PPG", "PPL", "PFG", "PG", "PGR", "PLD", "PRU", "PEG", "PSA", "PHM", "PVH", "QRVO", "PWR", "QCOM", "DGX", "RL", "RJF", "RTX", "O", "REG", "REGN", "RF", "RSG", "RMD", "RHI", "ROK", "ROL", "ROP", "ROST", "RCL", "SPGI", "CRM", "SBAC", "SLB", "STX", "SEE", "SRE", "NOW", "SHW", "SPG", "SWKS", "SNA", "SO", "LUV", "SWK", "SBUX", "STT", "STE", "SYK", "SIVB", "SYF", "SNPS", "SYY", "TMUS", "TROW", "TTWO", "TPR", "TGT", "TEL", "TDY", "TFX", "TER", "TSLA", "TXN", "TXT", "TMO", "TJX", "TSCO", "TT", "TDG", "TRV", "TRMB", "TFC", "TWTR", "TYL", "TSN", "UDR", "ULTA", "USB", "UAA", "UA", "UNP", "UAL", "UNH", "UPS", "URI", "UHS", "UNM", "VLO", "VAR", "VTR", "VRSN", "VRSK", "VZ", "VRTX", "VFC", "VIAC", "VTRS", "V", "VNO", "VMC", "WRB", "WAB", "WMT", "WBA", "DIS", "WM", "WAT", "WEC", "WFC", "WELL", "WST", "WDC", "WU", "WRK", "WY", "WHR", "WMB", "WLTW", "WYNN", "XEL", "XLNX", "XYL", "YUM", "ZBRA", "ZBH", "ZION", "ZTS"]
financial_data = pd.DataFrame()
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = '1995-1-1')["Adj Close"]
financial_data.to_excel("Financial Data.xlsx")
I am using Datareader to gather some stock info. I am grabbing a lot of info (from 1995 to 2021) and then I export it to Excel. I was wondering if there is a way, let's say tomorrow, to speed up the update of the information, instead of running the whole script on Python from start to bottom, since my goal tomorrow would just be to have a single new line on the whole Excel file. If I just execute the script, it will override the Excel file + add a new line of info. This seems pretty ineffective, and I was wondering if there's a way to "tell the script" I am just looking for tomorrow's info, instead of "telling it" to grab me again the information starting from 1995.
Thanks.
I don't know exactly how pandas works, but I would say it does lazy fast loading and this is not very computationally expensive. The costly thing is to operate with each loaded data. Then I think that in your case if the data is ordered by dates in increasing order, it would be enough to have a variable called timestamp_toStart initialized the first time to '1995-1-1' and that after this, after the first execution it is updated to the last value of the last date read. You could save this value in a file and reread it and load it every time you rerun the script.
financial_data = pd.DataFrame()
#load timestamp_toStart from the file here
for t in tickers:
financial_data[t] = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["Adj Close"]
timestamp = wb.DataReader(t, data_source='yahoo', start = timestamp_toStart)["MMM"] #Not Sure about the correct syntax
timestamp_toStart = timestamp
#Save in a file timestamp_toStart
financial_data.to_excel("Financial Data.xlsx")