Folium || Highlight specific countries based on some data? - python

Problem: I would like to highlight specific countries based on some data I have. As an example, I have a list of shows and countries where they are licensed. I would like to highlight those countries when a show is selected or searched *selecting and searching comes later in the program right now I just want to be able to highlight specific countries.
I have been following the Folium Quickstart page here https://python-visualization.github.io/folium/quickstart.html ,specifically the GeoJSON and TopoJSON. This is the code I have right now and it highlights every country on the map.
#Loads show data into panda dataframe
show_data = pd.read_csv('input files/Show Licensing.csv')
show_data['Contract Expiration'] = pd.to_datetime(show_data['Contract Expiration'])
#Loads country poloygon and names
country_geo=(open("input files/countries.geojson", "r", encoding="utf-8-sig")).read()
folium_map = folium.Map(location=[40.738, -73.98],
tiles="CartoDB positron",
zoom_start=5)
folium.GeoJson(country_geo).add_to(folium_map)
folium_map.save("my_map.html")
Expected Results: For right now I would like to highlight all countries found in my csv file. End goal is to be able to search a show and highlight countries where the show is licensed.

This is the code I wrote which answered my question:
for country in countriesAndContinents_json['features']:
if country['properties']['Name'].lower() == h_country.lower():
if highlightFlag == 'License A':
return folium.GeoJson(
country,
name=(showTitle + ' License A ' + h_country),
style_function=styleLicenseA_function,
highlight_function=highlightLicenseA_function
)
'country', which is used as the geo_data for folium.GeoJson, is the geojson response for a specific country. So when a search for a country is found in the countries.geojson data it will return the geojson response for that specific country, including the geometry needed to highlight it.

Related

Sorting keywords from a dataframe

It is necessary to write a geo-classifier that will be able to set the geographical affiliation of a certain region to each row. I.e., if the search query contains the name of the city of the region, then the name of this region is written in the ‘region’ column. If the search query does not contain the name of the city, then put ‘undefined'.
I have the following code that doesn't work
import pandas as pd
data_location = pd.read_csv(r'\Users\super\Desktop\keywords.csv', sep = ',')
def sorting(row):
keyword_set = row['keywords'].lower()
for region, city_list in geo_data.items():
for town in keyword_set:
if town in city_list:
return region
return 'undefined'
Rules of distribution by region Center, North-West and Far East:
geo_location = {
'Центр': ['москва', 'тула', 'ярославль'],
'Северо-Запад': ['петербург', 'псков', 'мурманск'],
'Дальний Восток': ['владивосток', 'сахалин', 'хабаровск']
}
Link to the csv file that is used in the program https://dropmefiles.com/IurAn
I tried to sort through the function, but it doesn't work, there was an idea to create a template for all existing cities and run each line of the file through this template for sorting.
I apologize in advance for such an extensive question, I'm still new in this field and I'm just learning. I will be glad to receive various tips and help.

How to clean up messy "Country" attribute from biopython pubmed extracts?

I have extracted ~60,000 PubMed abstracts into a data frame using Biopython. The attributes include "Authors", "Title", "Year", "Journal", "Country", and "Abstract".
The attribute "Country" is very messy, with a mixture of countries, cities, names, addresses, free-text items (e.g., "freelance journalist with interest in Norwegian science"), faculties, etc.
I want to clean up the column only to contain the country - and "NA" for those records that are missing the entry, or have a free-text item that does not make sense.
Currently, my clean-up process of this column is very cumbersome:
pub = df['Country']
chicago = pub.str.contains('Chicago')
df['Country'] = np.where(chicago, 'USA', pub.str.replace('-', ' '))
au = pub.str.contains('#edu.au')
df['Country'] = np.where(au, 'Australia', pub.str.replace('-', ' '))
... and so on
Are you aware of some python libraries, or have some ideas for a more automated way of cleaning up this column?

Folium legend threshold throws error when changing values, grey color

I am using the Folium package to build a "Choropleth" map with python. The data that is displayed is pulled from an API that keeps track of the most recent Covid-19 infected rates per country. A column shared between the countries.geojson file (a json file of a world map) and the data that I pulled is the name of the country. Most of the countries are shaded in with color (successfully), while some of the countries are not identical and therefore is shaded with grey, for example "US" in the pandas dataframe and "United States of America" in the .geojson file doesnt match, and therefore doesn't display their data on the map.
covid_data = requests.get('https://covid2019-api.herokuapp.com/v2/current')
covid_data = covid_data.json()
covid_data = pd.DataFrame.from_dict(covid_data['data'])
location confirmed deaths recovered active
US 636350 28326 52096 555928
Spain 177644 18708 70853 88083
The way I'm storing the API data is within a pandas DataFrame, because it works best with Folium. My hacky way of transforming the data from the country names that arent identical is with the code:
covid_data.location[covid_data.location=='US'] = 'United States of America'
By doing this, the country name is now the same on both the .geojson file and the DataFrame
location confirmed deaths recovered active
United States of America 636350 28326 52096 555928
Spain 177644 18708 70853 88083
is now the same as
{ "type": "Feature", "properties": { "ADMIN": "United States of America", "ISO_A3": "USA"} "geometry": {}} (countries.geojson)
Before editing the dataframe, the map was rendered, but when US is changed to United States of America, it throws an error
return color_range[color_idx], fill_opacity
IndexError: list index out of range
So that means that im setting the Choropleth threshold_scale to 636,360 (which is the highest # in the 'confirmed' column) but theres no data to match that # to. Therefore if I change the threshold_scale down to the next highest number, 177,644 (which is Italy) I get the error
ValueError: All values are expected to fall into one of the provided bins (or to be Nan). Please check > the bins parameter and/or your data.
Heres the rest of the code to help resolve this issue,
#this variable is to get the highest value of the rates as the max threshold used for coloring
covid_data_max = covid_data['confirmed'].max()
covid_data_max = covid_data_max.item()
world_geo = r'countries.geojson'
world_map = folium.Map(location=[4.68, 8.33],
tiles='Mapbox Bright', zoom_start=3)
world_map = folium.Choropleth(
geo_data=world_geo,
name='choropleth',
data=covid_data,
columns=['location','confirmed'],
key_on='properties.ADMIN',
threshold_scale = [0,int((covid_data_max/15)),int((covid_data_max/10)),int((covid_data_max/4)),covid_data_max],
fill_color='BuPu',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Number of deaths per country',
highlight=True,
line_color='black'
).add_to(world_map)
folium.LayerControl().add_to(world_map)
world_map.save(r'./templates/map.html')
You can see the image of the map (for some reason the threshold includes the USA #'s):
Let me know if theres anything else I can provide!
As suggested by the error:
IndexError: list index out of range
the problem is in:
threshold_scale = [0,int((covid_data_max/15)),int((covid_data_max/10)),int((covid_data_max/4)),covid_data_max],
You can simply add 1 to covid_data_max:
threshold_scale = [0,int((covid_data_max/15)),int((covid_data_max/10)),int((covid_data_max/4)),covid_data_max+1],
and you get your map:
Please, note that threshold_scale parameter is now deprecated in favor of the bins parameter.
So I was able to display the color by editing the json file before it was converted to a pandas frame. I would still love an explaination of why editing a pandasFrame directly doesnt work. Maybe it has something to do with the error
A value is trying to be set on a copy of a slice from a DataFrame
Therefore I coded a for loop that edits the specific country and changed it to "United States of America" to match the .geojson entry for "United States of America"
for data, values in covid_data.items():
if data == 'data':
for country in values:
if country['location'] == 'US':
country['location']='United States of America'

Passing an array of countries to a function

I am fairly new to Python. I am leveraging Python's holidays package which has public holidays by country. I am looking to write a function that loops over any number of countries and returns a dataframe with 3 columns:
Date, Holiday, Country
Based on my limited knowledge, I came up with this sort of implementation:
import holidays
def getholidayDF(*args):
holidayDF = pd.DataFrame(columns=['Date','Holiday','Country'])
for country in args:
holidayDF.append(sorted(holidays.CountryHoliday(country,years=np.arange(2014,2030,1)).items()))
holidayDF['Country'] = country
return holidayDF
holidays = getholidayDF('FRA', 'Norway', 'Finland', 'US', 'Germany', 'UnitedKingdom', 'Sweden')
This returns a blank dataframe. I am not sure how to proceed!
If you change your for-loop as shown below it should be okay for you. Most relevant comments were made by user roganjosh. O'Reilly, Wrokx, Prentece Hall, Pearson, Packt.. just to name a few publishers... they have some good books for you. Skip the cookbooks for now.
.. code snippet ...
for country in args:
holidayDF = holidayDF.append(sorted(holidays.CountryHoliday(country,years=np.arange(2014,2030,1)).items()))
# holidayDF['Country'] = country # remove this from the for-loop.
return holidayDF # move out of the for-loop

I Need Assistance With Data Sorting In Python Code

In my Python Code, I would also like Dakota with Hurricane, display appearances to show, in the Data Table, when run in Jupyter Notebook.
I typed the following modification to the Code, aiming to achieve this :-
(df['Spitfire'].str.contains('S', na=True))
Now the Dakota with Hurricane Display booking, i.e. in this case for Worthing - Display, that Data Displays, as does the Dakota Spitfire and Hurricane, and Dakota with Spitfire Display Bookings. But also the Solo Dakota Display bookings, which I don't want to display. What do I type to enable, that when Dakota = 'D' and 'Spitfire' = 'NaN' and 'Hurricane' = 'NaN', that Row is not displayed ?
I have almost managed, to sort out what I need to, in my Python code, for the 2007 Url, I just need, the Dakota with Hurricane bookings issue, sorting out Here is my Code, containing the relevant Url :-
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df = df[1]
df = df.rename(columns=df.iloc[0])
df = df.iloc[2:]
df.head(15)
display = df[(df['Location'].str.contains('- Display')) & (df['Dakota'].str.contains('D')) & (df['Spitfire'].str.contains('S', na=True)) & (df['Lancaster'] != 'L')]
display
Any help would be much appreciated.
Regards
Eddie
You could query your display variable to refine the data:
display = display[~((display['Dakota'] == 'D') & (display["Spitfire"].isnull() & (display['Hurricane'].isnull())))]
where the ~ is used to negate the condition, so that the following query excludes elements from the DataFrame.
You can also include this in your original query on df.

Categories

Resources