Colour scatter plot by column Plotly

Colour scatter plot by column Plotly - python

I would like to create a scatter plot with 3 variables: Age, Value and City. How can I colour the plot by City?
Current output is a simple scatter plot of Value against Age:
Current Code:
import datetime
import plotly.offline as py
import plotly
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x= data1['Age'], y = data1['Value'], mode='markers', name='lines+markers'))
fig.show()
Update:
Tried:
import plotly.express as px
fig = px.scatter(data1, x=data1['Age'], y=data1['Value'], color=data1['City'])
fig.show()
and caught error:
KeyError: (nan, '', '', '', '')
Update:
Age and Value have been cleaned. Here are some unique values for City(sorry to change the column). There are some messy figures.
['NT', 'WAIKATO', 'VICTORIA', 'South Australia', 'OTHER', 'ON',
'Nsw', 'IL', 'MD - MARYLAND', 'ABU DHABI', 'VIENNA', 'TX',
'VILKAVISKIS', 'NY', 'BALEARES', 'UK', 'GLOUCESTERSHIRE',
'LA MANCHE', 'TEXAS', 'DUBAI', 'ENGLAND', 'ITALY', nan,
'GREATER LONDON', 'BEDFORDSHIRE', 'HEREFORDSHIRE',
'BADEN-WÃ?RTTEMBERG', 'Australian Capital Territory',
'ABERDEENSHIRE', 'OXFORDSHIRE', 'LONDON', 'BC', 'SK',
'NOORD-HOLLAND', 'UNITED KINGDOM', 'New South Wales', 'Brookdale',
'Western Australia', 'GALWAY', 'Queensland', 'TOKYO',
'HAUTE-GARONNE', 'WORCESTERSHIRE', 'CALIFORNIA', 'JAPAN',
'NORTHUMBERLAND', 'NJ - NEW JERSEY', 'GLOS', 'DORSET', 'TENNESSEE',
'BANGKOK', 'CANTERBURY', 'WEXFORD', 'MIDDLESEX', 'SURREY', 'MI',
'NEVADA', 'KENTUCKY', 'NEW YORK', 'ZUID-HOLLAND', 'HONG KONG',
'ESSEX', 'FL', 'LILLEHAMMER', 'DEVON', 'NEW TERRITORIES', 'KENT',
'THAILAND', 'Pyrmont', 'SINGAPORE', 'FRIBOURG', 'CAIRO',
'QUEENSLAND', 'HAMPSHIRE', 'NEW JERSEY', 'WEST MIDLANDS',
'MICHIGAN', 'NONE', 'WI', 'BARNET', 'STAFFS', 'WARWICKSHIRE'...]

Inside the go.Scatter definition you should specify the color parameter as color=data1['Continent']. See the Plotly documentation for more information.

Related

How to reaggregate a MultiIndex pandas.DataFrame?

I have a MultiIndex pandas.DataFrame dcba with a large level of spatial desagregation:
region FR \
sector Agriculture Crude coal Crude oil Natural gas
region stressor
FR CO2 4.796711 1.382087e-02 3.149139e-05 2.894532
CH4 15.816831 3.744709e-05 3.567591e-04 0.275431
N2O 9.715682 9.290865e-05 5.603963e-07 0.007834
SF6 0.028011 2.818101e-06 2.607044e-08 0.000477
HFC 1.487352 1.473641e-04 1.475096e-06 0.024675
... ... ... ... ...
RoW Middle East CH4 0.455748 3.566337e-05 7.060048e-04 0.035420
N2O 0.193417 1.176733e-06 7.366779e-07 0.002564
SF6 0.001478 7.465562e-08 2.960808e-08 0.000107
HFC 0.006629 3.190865e-07 1.281020e-07 0.000472
PFC 0.001390 1.491053e-07 3.205249e-08 0.000603
region \
sector Extractive industry Biomass_industry Clothing
region stressor
FR CO2 5.817926e-03 9.866832 0.394570
CH4 3.622520e-04 9.923741 0.075845
N2O 1.267742e-04 6.010542 0.027877
SF6 4.797571e-04 0.036355 0.000561
HFC 2.502868e-02 1.894707 0.028297
... ... ... ...
RoW Middle East CH4 1.844972e-04 0.419346 0.193006
N2O 7.236885e-06 0.062690 0.018240
SF6 9.461463e-07 0.001052 0.000477
HFC 4.114220e-06 0.004652 0.002087
PFC 1.314401e-06 0.002939 0.001726
region ... \
sector Heavy_industry Construction Automobile ...
region stressor ...
FR CO2 13.261457 14.029825 2.479608 ...
CH4 0.632317 2.537475 0.319671 ...
N2O 0.196020 0.968326 0.082451 ...
SF6 0.024654 0.054173 0.003670 ...
HFC 1.641677 2.874809 0.197846 ...
... ... ... ... ...
RoW Middle East CH4 0.677210 0.926126 0.325147 ...
N2O 0.049768 0.034912 0.020158 ...
SF6 0.002112 0.001568 0.000955 ...
HFC 0.009280 0.006824 0.004142 ...
PFC 0.011609 0.006201 0.003916 ...
region RoW Middle East \
sector Heavy_industry Construction Automobile
region stressor
FR CO2 0.580714 0.382980 0.162650
CH4 0.046371 0.114092 0.021962
N2O 0.019406 0.059560 0.007892
SF6 0.001126 0.000872 0.000270
HFC 0.073273 0.049812 0.015326
... ... ... ...
RoW Middle East CH4 2.238149 19.760153 1.079266
N2O 0.222995 2.752258 0.069067
SF6 0.009341 0.162138 0.004313
HFC 0.041137 0.702098 0.018245
PFC 0.057405 0.285898 0.007766
region \
sector Oth transport equipment Machinery Electronics
region stressor
FR CO2 0.116935 0.394273 0.080354
CH4 0.016530 0.048727 0.010756
N2O 0.004032 0.018393 0.004233
SF6 0.000166 0.000665 0.000115
HFC 0.008293 0.036774 0.006075
... ... ... ...
RoW Middle East CH4 0.139413 3.370381 0.650511
N2O 0.009559 0.247341 0.058345
SF6 0.000506 0.013730 0.003265
HFC 0.002176 0.056321 0.013685
PFC 0.001429 0.030418 0.006383
region \
sector Fossil fuels Electricity and heat Transport services
region stressor
FR CO2 0.107540 0.015568 0.058673
CH4 0.018198 0.003783 0.007705
N2O 0.006238 0.001653 0.003543
SF6 0.000204 0.000029 0.000061
HFC 0.010712 0.001534 0.003187
... ... ... ...
RoW Middle East CH4 16.407198 5.020937 2.359744
N2O 0.134513 0.432547 0.510101
SF6 0.009963 0.007036 0.012166
HFC 0.044495 0.031509 0.051611
PFC 0.008458 0.004833 0.006725
region
sector Composite
region stressor
FR CO2 0.801035
CH4 0.311628
N2O 0.150162
SF6 0.001836
HFC 0.094331
... ...
RoW Middle East CH4 119.001176
N2O 8.039872
SF6 0.941479
HFC 3.943134
PFC 0.422255
[294 rows x 833 columns]
the desagregation is defined by the list of the regions.
reg_list = ['FR', 'Austria', 'Belgium', 'Bulgaria', 'Cyprus', 'Czech Republic', 'Germany', 'Denmark', 'Estonia', 'Spain', 'Finland', 'Greece', 'Croatia', 'Hungary', 'Ireland', 'Italy', 'Lithuania', 'Luxembourg', 'Latvia', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Sweden', 'Slovenia', 'Slovakia', 'United Kingdom', 'United States', 'Japan', 'China', 'Canada', 'South Korea', 'Brazil', 'India', 'Mexico', 'Russia', 'Australia', 'Switzerland', 'Turkey', 'Taiwan', 'Norway', 'Indonesia', 'South Africa', 'RoW Asia and Pacific', 'RoW America', 'RoW Europe', 'RoW Africa', 'RoW Middle East']
sectors_list = ['Agriculture', 'Crude coal', 'Crude oil', 'Natural gas', 'Extractive industry', 'Biomass_industry', 'Clothing', 'Heavy_industry', 'Construction', 'Automobile', 'Oth transport equipment', 'Machinery', 'Electronics', 'Fossil fuels', 'Electricity and heat', 'Transport services', 'Composite']
The Dataframe dcba has the following index and columns :
dcba.index =
MultiIndex([( 'FR', 'CO2'),
( 'FR', 'CH4'),
( 'FR', 'N2O'),
( 'FR', 'SF6'),
( 'FR', 'HFC'),
( 'FR', 'PFC'),
( 'Austria', 'CO2'),
( 'Austria', 'CH4'),
( 'Austria', 'N2O'),
( 'Austria', 'SF6'),
...
( 'RoW Africa', 'N2O'),
( 'RoW Africa', 'SF6'),
( 'RoW Africa', 'HFC'),
( 'RoW Africa', 'PFC'),
('RoW Middle East', 'CO2'),
('RoW Middle East', 'CH4'),
('RoW Middle East', 'N2O'),
('RoW Middle East', 'SF6'),
('RoW Middle East', 'HFC'),
('RoW Middle East', 'PFC')],
names=['region', 'stressor'], length=294)
dcba.columns =
MultiIndex([( 'FR', 'Agriculture'),
( 'FR', 'Crude coal'),
( 'FR', 'Crude oil'),
( 'FR', 'Natural gas'),
( 'FR', 'Extractive industry'),
( 'FR', 'Biomass_industry'),
( 'FR', 'Clothing'),
( 'FR', 'Heavy_industry'),
( 'FR', 'Construction'),
( 'FR', 'Automobile'),
...
('RoW Middle East', 'Heavy_industry'),
('RoW Middle East', 'Construction'),
('RoW Middle East', 'Automobile'),
('RoW Middle East', 'Oth transport equipment'),
('RoW Middle East', 'Machinery'),
('RoW Middle East', 'Electronics'),
('RoW Middle East', 'Fossil fuels'),
('RoW Middle East', 'Electricity and heat'),
('RoW Middle East', 'Transport services'),
('RoW Middle East', 'Composite')],
names=['region', 'sector'], length=833)
And I would like to reaggreagte this DataFrame at a different level by grouping the regions diferently, defined here :
dict_reag =
{'United Kingdom': ['United Kingdom'],
'United States': ['United States'],
'Asia and Row Europe': ['Japan',
'India',
'Russia',
'Indonesia',
'RoW Europe'],
'Chinafrica': ['China', 'RoW Africa'],
'Turkey and RoW America': ['Canada', 'Turkey', 'RoW America'],
'Pacific and RoW Middle East': ['South Korea',
'Australia',
'Taiwan',
'RoW Middle East'],
'Brazil, Mexico and South Africa': ['Brazil', 'Mexico', 'South Africa'],
'Switzerland and Norway': ['Switzerland', 'Norway'],
'RoW Asia and Pacific': ['RoW Asia and Pacific'],
'EU': ['Austria',
'Belgium',
'Bulgaria',
'Cyprus',
'Czech Republic',
'Germany',
'Denmark',
'Estonia',
'Spain',
'Finland',
'Greece',
'Croatia',
'Hungary',
'Ireland',
'Italy',
'Lithuania',
'Luxembourg',
'Latvia',
'Malta',
'Netherlands',
'Poland',
'Portugal',
'Romania',
'Sweden',
'Slovenia',
'Slovakia'],
'FR': ['FR']}
The reaggregation process would transform this 294x833 DataFrame into a 66x187 DataFrame. Note that the new reaggreagation DataFrame corresponds to a sum of the first set of subregions.
I created an empty DataFrame with the correct new level of aggregation :
ghg_list = ['CO2', 'CH4', 'N2O', 'SF6', 'HFC', 'PFC']
multi_reg = []
multi_sec = []
for reg in list(reag_matrix.columns[2:]) :
for sec in sectors_list :
multi_reg.append(reg)
multi_sec.append(sec)
arrays = [multi_reg, multi_sec]
new_col = pd.MultiIndex.from_arrays(arrays, names=('region', 'sector'))
multi_reg2 = []
multi_ghg = []
for reg in list(reag_matrix.columns[2:]) :
for ghg in ghg_list :
multi_reg2.append(reg)
multi_ghg.append(ghg)
arrays2 = [multi_reg2, multi_ghg]
new_index = pd.MultiIndex.from_arrays(arrays2, names=('region', 'stressor'))
new_dcba = pd.DataFrame(np.zeros((len(ghg_list)*len(list(reag_matrix.columns[2:])),len(sectors_list)*len(list(reag_matrix.columns[2:])))),
index =new_index,columns = new_col)
where reag_matrix.columns[2:] corresponds to the new list of regions, as defined in dict_reag :
list(reag_matrix.columns[2:]) = ['FR', 'United Kingdom', 'United States', 'Asia and Row Europe', 'Chinafrica', 'Turkey and RoW America', 'Pacific and RoW Middle East', 'Brazil, Mexico and South Africa', 'Switzerland and Norway', 'RoW Asia and Pacific', 'EU']
I guess I could use the groupby function but I could not make it work without losing the sector desaggregation.
Otherwise, I intended to do it iteratively, but I have errors I do not understand. I first tried to copy the French block which will stay the same :
s1 = dcba.loc['FR','FR'].copy()
new_dcba.loc['FR','FR'] = s1
But this last line raises the error "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" although it does not seems to contain any boolean. What is my problem here ?
Also, to avoid this, I tried to use :
new_dcba.loc['FR','FR'].update(s1, overwrite=True)
But it does not change the values in new_dcba.
Finally, I tried to use .values but then a new error is raised :
new_dcba.loc['FR','FR'] = s1.values
"Must have equal len keys and value when setting with an ndarray"
So, I have two question :
Can you guess of a way to use groupby (and .sum()) for this ?
What is the issue raising the first error ?
Note that I have gone through https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy and could not find any explanation for my specific problem.
I wrote an MCVE here, and it happens to work at a reducted size (what is commented), but not at the actual size of the problem :
list_reg_mcve = reg_list #['FR', 'United States', 'United Kingdom']
list_reg_mcve_new = list_reg_reag_new #['FR','Other']
sectors_list_mcve = sectors_list #['Agriculture','Composite']
dict_mcve = dict_reag #{'FR':['FR'],'Other' : ['United States', 'United Kingdom']}
ghg_list_mcve = ['CO2', 'CH4', 'N2O', 'SF6', 'HFC', 'PFC'] #['CO2','CH4']
multi_reg = []
multi_sec = []
for reg in list_reg_mcve :
for sec in sectors_list_mcve :
multi_reg.append(reg)
multi_sec.append(sec)
arrays = [multi_reg, multi_sec]
new_col = pd.MultiIndex.from_arrays(arrays, names=('region', 'sector'))
multi_reg2 = []
multi_ghg = []
for reg in list_reg_mcve :
for ghg in ghg_list_mcve :
multi_reg2.append(reg)
multi_ghg.append(ghg)
arrays2 = [multi_reg2, multi_ghg]
new_index = pd.MultiIndex.from_arrays(arrays2, names=('region', 'stressor'))
dcba_mcve = pd.DataFrame(np.zeros((len(ghg_list_mcve)*len(list_reg_mcve),
len(sectors_list_mcve)*len(list_reg_mcve))),
index =new_index,columns = new_col)
multi_reg = []
multi_sec = []
for reg in list_reg_mcve_new :
for sec in sectors_list_mcve :
multi_reg.append(reg)
multi_sec.append(sec)
arrays = [multi_reg, multi_sec]
new_col = pd.MultiIndex.from_arrays(arrays, names=('region', 'sector'))
multi_reg2 = []
multi_ghg = []
for reg in list_reg_mcve_new :
for ghg in ghg_list_mcve :
multi_reg2.append(reg)
multi_ghg.append(ghg)
arrays2 = [multi_reg2, multi_ghg]
new_index = pd.MultiIndex.from_arrays(arrays2, names=('region', 'stressor'))
dcba_mcve_new = pd.DataFrame(np.zeros((len(ghg_list_mcve)*len(list_reg_mcve_new),
len(sectors_list_mcve)*len(list_reg_mcve_new))),
index =new_index,columns = new_col)
from random import randint
for col in dcba_mcve.columns:
dcba_mcve[col]=dcba_mcve.apply(lambda x: randint(0,5), axis=1)
print(dcba_mcve)
for reg_export in dict_mcve :
list_reg_agg_1 = dict_mcve[reg_export]
for reg_import in dict_mcve :
list_reg_agg_2 = dict_mcve[reg_import]
s1=pd.DataFrame(np.zeros_like(dcba_mcve_new.loc['FR','FR']),index=dcba_mcve_new.loc['FR','FR'].index, columns = dcba_mcve_new.loc['FR','FR'].columns)
for reg1 in list_reg_agg_1 :
for reg2 in list_reg_agg_2 :
#print(reg1,reg2)
s1 += dcba_mcve.loc[reg1,reg2].copy()
#print(s1)
dcba_mcve_new.loc[reg_export,reg_import].update(s1)
dcba_mcve_new
Thank you in advance

How can I extract values in parentheses from a Python list using a regex?

This is the data as a list:
states = ['Alabama (AL)', 'Alaska (AK)', 'Arizona (AZ)', 'Arkansas (AR)', 'California (CA)', 'Colorado (CO)', 'Connecticut (CT)', 'Delaware (DE)', 'District of Columbia (DC)', 'Florida (FL)', 'Georgia (GA)', 'Hawaii (HI)', 'Idaho (ID)', 'Illinois (IL)', 'Indiana (IN)', 'Iowa (IA)', 'Kansas (KS)', 'Kentucky (KY)', 'Louisiana (LA)', 'Maine (ME)', 'Maryland (MD)', 'Massachusetts (MA)', 'Michigan (MI)', 'Minnesota (MN)', 'Mississippi (MS)', 'Missouri (MO)', 'Montana (MT)', 'Nebraska (NE)', 'Nevada (NV)', 'New Hampshire (NH)', 'New Jersey (NJ)', 'New Mexico (NM)', 'New York (NY)', 'North Carolina (NC)', 'North Dakota (ND)', 'Ohio (OH)', 'Oklahoma (OK)', 'Oregon (OR)', 'Pennsylvania (PA)', 'Rhode Island (RI)', 'South Carolina (SC)', 'South Dakota (SD)', 'Tennessee (TN)', 'Texas (TX)', 'Utah (UT)', 'Vermont (VT)', 'Virginia (VA)', 'Washington (WA)', 'West Virginia (WV)', 'Wisconsin (WI)', 'Wyoming (WY)']
I want to extract all the codes in parentheses.
This code returned None:
re.search('[(A-Z)]')
How can I do this?

Since you're using a list, you probably don't need a regex. If you're guaranteed that's the format, something like this should do it:
abbreviations = [state[-3:-1] for state in states]
That code uses a List Comprehension to make a new list from your old list. For each item in the states list, we're using negative indexes (which start at the back of the string) and the slice operator to pull out the abbreviations since they're always the 2nd to last and 3rd to last characters in the strings.
Sample usage:
>>> states = ['Alabama (AL)', 'Alaska (AK)', 'Arizona (AZ)', 'Arkansas (AR)', 'California (CA)']
>>> [state[-3:-1] for state in states]
['AL', 'AK', 'AZ', 'AR', 'CA']

import re
regex = r"(?<=\()[A-Z]+(?=\))"
print(re.findall(regex, "".join(states)))
Output:
['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY']

Capital city that starts with "a", and ends with "a". Doesn't matter if letter "a" is uppercase or lowercase

Start with a, and ends with a. I have been trying to output capital cities that start and end with the letter "a". Doesn't matter if they start with capital "A"
capitals = ('Kabul', 'Tirana (Tirane)', 'Algiers', 'Andorra la Vella', 'Luanda', "Saint John's", 'Buenos Aires', 'Yerevan', 'Canberra', 'Vienna', 'Baku', 'Nassau', 'Manama', 'Dhaka', 'Bridgetown', 'Minsk', 'Brussels', 'Belmopan', 'Porto Novo', 'Thimphu', 'Sucre', 'Sarajevo', 'Gaborone', 'Brasilia', 'Bandar Seri Begawan', 'Sofia', 'Ouagadougou', 'Gitega', 'Phnom Penh', 'Yaounde', 'Ottawa', 'Praia', 'Bangui', "N'Djamena", 'Santiago', 'Beijing', 'Bogota', 'Moroni', 'Kinshasa', 'Brazzaville', 'San Jose', 'Yamoussoukro', 'Zagreb', 'Havana', 'Nicosia', 'Prague', 'Copenhagen', 'Djibouti', 'Roseau', 'Santo Domingo', 'Dili', 'Quito', 'Cairo', 'San Salvador', 'London', 'Malabo', 'Asmara', 'Tallinn', 'Mbabana', 'Addis Ababa', 'Palikir', 'Suva', 'Helsinki', 'Paris', 'Libreville', 'Banjul', 'Tbilisi', 'Berlin', 'Accra', 'Athens', "Saint George's", 'Guatemala City', 'Conakry', 'Bissau', 'Georgetown', 'Port au Prince', 'Tegucigalpa', 'Budapest', 'Reykjavik', 'New Delhi', 'Jakarta', 'Tehran', 'Baghdad', 'Dublin', 'Jerusalem', 'Rome', 'Kingston', 'Tokyo', 'Amman', 'Nur-Sultan', 'Nairobi', 'Tarawa Atoll', 'Pristina', 'Kuwait City', 'Bishkek', 'Vientiane', 'Riga', 'Beirut', 'Maseru', 'Monrovia', 'Tripoli', 'Vaduz', 'Vilnius', 'Luxembourg', 'Antananarivo', 'Lilongwe', 'Kuala Lumpur', 'Male', 'Bamako', 'Valletta', 'Majuro', 'Nouakchott', 'Port Louis', 'Mexico City', 'Chisinau', 'Monaco', 'Ulaanbaatar', 'Podgorica', 'Rabat', 'Maputo', 'Nay Pyi Taw', 'Windhoek', 'No official capital', 'Kathmandu', 'Amsterdam', 'Wellington', 'Managua', 'Niamey', 'Abuja', 'Pyongyang', 'Skopje', 'Belfast', 'Oslo', 'Muscat', 'Islamabad', 'Melekeok', 'Panama City', 'Port Moresby', 'Asuncion', 'Lima', 'Manila', 'Warsaw', 'Lisbon', 'Doha', 'Bucharest', 'Moscow', 'Kigali', 'Basseterre', 'Castries', 'Kingstown', 'Apia', 'San Marino', 'Sao Tome', 'Riyadh', 'Edinburgh', 'Dakar', 'Belgrade', 'Victoria', 'Freetown', 'Singapore', 'Bratislava', 'Ljubljana', 'Honiara', 'Mogadishu', 'Pretoria, Bloemfontein, Cape Town', 'Seoul', 'Juba', 'Madrid', 'Colombo', 'Khartoum', 'Paramaribo', 'Stockholm', 'Bern', 'Damascus', 'Taipei', 'Dushanbe', 'Dodoma', 'Bangkok', 'Lome', "Nuku'alofa", 'Port of Spain', 'Tunis', 'Ankara', 'Ashgabat', 'Funafuti', 'Kampala', 'Kiev', 'Abu Dhabi', 'London', 'Washington D.C.', 'Montevideo', 'Tashkent', 'Port Vila', 'Vatican City', 'Caracas', 'Hanoi', 'Cardiff', "Sana'a", 'Lusaka', 'Harare')
This is my code:
for elem in capitals:
elem = elem.lower()
["".join(j for j in i if j not in string.punctuation) for i in capitals]
if (len(elem) >=4 and elem.endswith(elem[0])):
print(elem)
My output is:
andorra la vella
saint john's
asmara
addis ababa
accra
saint george's
nur-sultan
abuja
oslo
warsaw
apia
ankara
tashkent
My expected output is:
andorra la vella
asmara
addis ababa
accra
abuja
apia
ankara

You didn't check if the capital starts with 'a'. I also assumed you want to filter out punctuation based on your code, so this is what I ended up with:
import string
for elem in capitals:
elem = elem.lower()
for punct in string.punctuation:
elem = elem.replace(punct, '')
if elem.startswith('a') and elem.endswith('a'):
print(elem)

for elem in capitals:
elem = elem.lower()
if (elem.startswith('a') and elem.endswith('a')):
print(elem)

Dynamically naming saved dataframes in loop

I'm attempting to use the GTab package to query Google Search trends data for every state in the US, but am having some trouble getting my loop to work.
For one state it's easy enough to do this, and new_query produces a dataframe.
t = gtab.GTAB()
t.set_options(pytrends_config={"geo": "US-NY", "timeframe": "2020-09-01 2020-10-01"})
query = t.new_query("weather")
To loop through I'm trying to use a dict to assign geo dynamically. However, I can't figure out how to do the same for the df name (query).
state_abbrevs = {
'Alabama': 'AL',
'Alaska': 'AK',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'Washington DC' : 'DC',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
for v in state_abbrevs.values():
t = gtab.GTAB()
t.set_options(pytrends_config={"geo": f"US-{v}", "timeframe": "2020-09-01 2020-10-01"})
query = t.new_query("weather")
I've tried using an f string but that produces SyntaxError: can't assign to literal.

I used two answers from here. I think your best option is just storing the DataFrames in a dictionary but this should work to create your query_* variables.
query_dict = {}
for n, v in enumerate(state_abbrevs.values()):
t = gtab.GTAB()
t.set_options(pytrends_config={"geo": f"US-{v}", "timeframe": "2020-09-01 2020-10-01"})
query = t.new_query("weather")
key = "query_" + str(n)
query_dict[key] = query
for k in query_dict.keys():
exec("%s = query_dict['%s']" % (k,k))

Position of bar plot xtick labels have irregular spaces [duplicate]

This question already has answers here:
Aligning rotated xticklabels with their respective xticks
(5 answers)
Closed 5 months ago.
I'm trying to do a bar plot for many countries and i want the names displayed a bit rotated underneath the bars.
The problem is that the spaces between the labels are irregular.
Here's the relevant code:
plt.bar(i, bar_height, align='center', label=country ,color=cm.jet(1.*counter/float( len(play_list))))
xticks_pos = scipy.arange( len( country_list)) +1
plt.xticks(xticks_pos ,country_list, rotation=45 )
Does anyone know a solution?

I think the problem is that the xtick label is aligned to the center of the text, but when it is rotated you care about the end of it. As a side note, you can use the position of the bars to select the xtick positions which better handles gaps/uneven spacing.
Here's an example that uses a web resource for list of countries (use your own if you don't trust the arbitrary resource google found for me)
import requests
import numpy as np
import matplotlib.pyplot as plt
# create sample data
# get a list of countries
website = "http://vbcity.com/cfs-filesystemfile.ashx/__key/CommunityServer.Components.PostAttachments/00.00.61.18.99/Country-List.txt"
r = requests.get(website)
many_countries = r.text.split()
# pick out a subset of them
n = 25
ind = np.random.randint(0, len(many_countries), 25)
country_list = [many_countries[i] for i in ind]
# some random heights for each of the bars.
heights = np.random.randint(3, 12, len(country_list))
# create plot
plt.figure(1)
h = plt.bar(range(len(country_list)), heights, label=country_list)
plt.subplots_adjust(bottom=0.3)
xticks_pos = [0.65*patch.get_width() + patch.get_xy()[0] for patch in h]
_ = plt.xticks(xticks_pos, country_list, ha='right', rotation=45)
and results in a bar chart whose labels are evenly spaced and rotated:
(your example doesn't give a hint as to what the colors mean so that's omitted here, but seems immaterial to the question anyway).
many_countries
Provided in case the website no longer works
many_countries = ['Abkhazia', 'Afghanistan', 'Akrotiri', 'and', 'Dhekelia', 'Aland', 'Albania', 'Algeria', 'American', 'Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antigua', 'and', 'Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Ascension', 'Island', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas,', 'The', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia', 'Bosnia', 'and', 'Herzegovina', 'Botswana', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina', 'Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape', 'Verde', 'Cayman', 'Islands', 'Central', 'Africa', 'Republic', 'Chad', 'Chile', 'China', 'Christmas', 'Island', 'Cocos', '(Keeling)', 'Islands', 'Colombia', 'Comoros', 'Congo', 'Cook', 'Islands', 'Costa', 'Rica', 'Cote', "d'lvoire", 'Croatia', 'Cuba', 'Cyprus', 'Czech', 'Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican', 'Republic', 'East', 'Timor', 'Ecuador', 'Egypt', 'El', 'Salvador', 'Equatorial', 'Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland', 'Islands', 'Faroe', 'Islands', 'Fiji', 'Finland', 'France', 'French', 'Polynesia', 'Gabon', 'Cambia,', 'The', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guam', 'Guatemala', 'Guemsey', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hong', 'Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Isle', 'of', 'Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea,', 'N', 'Korea,', 'S', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall', 'Islands', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Nagorno-Karabakh', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'Netherlands', 'Antilles', 'New', 'Caledonia', 'New', 'Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk', 'Island', 'Northern', 'Cyprus', 'Northern', 'Mariana', 'Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestine', 'Panama', 'Papua', 'New', 'Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcaim', 'Islands', 'Poland', 'Portugal', 'Puerto', 'Rico', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Sahrawi', 'Arab', 'Democratic', 'Republic', 'Saint-Barthelemy', 'Saint', 'Helena', 'Saint', 'Kitts', 'and', 'Nevis', 'Saint', 'Lucia', 'Saint', 'Martin', 'Saint', 'Pierre', 'and', 'Miquelon', 'Saint', 'Vincent', 'and', 'Grenadines', 'Samos', 'San', 'Marino', 'Sao', 'Tome', 'and', 'Principe', 'Saudi', 'Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra', 'Leone', 'Singapore', 'Slovakia', 'Slovenia', 'Solomon', 'Islands', 'Somalia', 'Somaliland', 'South', 'Africa', 'South', 'Ossetia', 'Spain', 'Sri', 'Lanka', 'Sudan', 'Suriname', 'Svalbard', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Tajikistan', 'Tanzania', 'Thailand', 'Togo', 'Tokelau', 'Tonga', 'Transnistria', 'Trinidad', 'and', 'Tobago', 'Tristan', 'da', 'Cunha', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks', 'and', 'Caicos', 'Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United', 'Arab', 'Emirates', 'United', 'Kingdom', 'United', 'States', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Vatican', 'City', 'Venezuela', 'Vietnam', 'Virgin', 'Islands,', 'British', 'Virgin', 'Islands,', 'U.S.', 'Wallis', 'and', 'Futuna', 'Yemen', 'Zambia', 'Zimbabwe']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Colour scatter plot by column Plotly - python

Inside the go.Scatter definition you should specify the color parameter as color=data1['Continent']. See the Plotly documentation for more information.

Related

How to reaggregate a MultiIndex pandas.DataFrame?

How can I extract values in parentheses from a Python list using a regex?

Capital city that starts with "a", and ends with "a". Doesn't matter if letter "a" is uppercase or lowercase

Dynamically naming saved dataframes in loop

Position of bar plot xtick labels have irregular spaces [duplicate]

Categories

Resources