How to replace multiple values to one in Python3 - python

I am currently trying to get countries from rows of data frame. Here is the code that i currently have:
l = [
['[Aydemir, Deniz\', \' Gunduz, Gokhan\', \' Asik, Nejla] Bartin
Univ, Fac Forestry, Dept Forest Ind Engn, TR-74100 Bartin,
Turkey\', \' [Wang, Alice] Lulea Univ Technol, Wood Technol,
Skelleftea, Sweden',1990],
['[Fang, Qun\', \' Cui, Hui-Wang] Zhejiang A&F Univ, Sch Engn, Linan
311300, Peoples R China\', \' [Du, Guan-Ben] Southwest Forestry
Univ, Kunming 650224, Yunnan, Peoples R China',2005],
['[Blumentritt, Melanie\', \' Gardner, Douglas J.\', \' Shaler
Stephen M.] Univ Maine, Sch Resources, Orono, ME USA\', \' [Cole,
Barbara J. W.] Univ Maine, Dept Chem, Orono, ME 04469 USA',2012],
['[Kyvelou, Pinelopi; Gardner, Leroy; Nethercot, David A.] Univ
London Imperial Coll Sci Technol & Med, London SW7 2AZ,
England',1998]]
dataf = pd.DataFrame(l, columns = ['Authors', 'Year'])
This is the data frame. And here is the code:
df = (dataf['Authors']
.replace(r"\bUSA\b", "United States", regex=True)
.apply(lambda x: geotext.GeoText(x).countries))
The problem was that GeoText didn't recognize "USA", but now I also saw that I need to change "England", "Scotland", "Wales" and "Northern Ireland" to "United Kingdom".
How can I extend .replace to achieve this?

You can use the translate method of the Series.str module and pass a dictionary of replacements.
dataf.Authors.str.translate({
'USA': 'United States',
"England": "United Kingdom",
"Scotland": "United Kingdom",
"Wales": "United Kingdom",
"Northern Ireland": "United Kingdom"
})

This worked for me. Here is the code:
replace_list = ['England', 'Scotland', 'Wales', 'Northern Ireland']
for check in replace_list:
dataf['Authors'] = dataf['Authors'].str.replace(check, 'United Kingdom', regex=True)

Related

Why isnt this function not returning f pd.DataFrames? Python

When I run this Code, load_data() will not return any of the data frames as pd.DataFrames. What am I doing wrong here? Thank you in advance for all your help!
def load_data():
# Competency: reading files in Pandas, df manipulation, regex
scim_en = pd.read_excel('assets/scimagojr-3.xlsx')
energy = pd.read_excel('assets/Energy Indicators.xls',na_values=["..."],header = None,skiprows=18,skipfooter= 38,usecols=[2,3,4,5],names=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'])
energy['Energy Supply'] = energy['Energy Supply']*1000000
energy = energy.replace("Republic of Korea", "South Korea")
energy = energy.replace("United States of America20", "United States")
energy = energy.replace("United Kingdom of Great Britain and Northern Ireland19", "United Kingdom")
energy = energy.replace("China, Hong Kong Special Administrative Region3", "Hong Kong")
energy = energy.replace("China, Macao Special Administrative Region4", "Macao")
energy["Country"] = energy["Country"].str.extract('(^[a-zA-Z\s]+)', expand=False).str.strip()
gdp = pd.read_csv('assets/world_bank.csv', skiprows = 4)
gdp = gdp.rename(columns = {'Country Name':'Country'})
gdp = gdp.replace("Korea, Rep.", "South Korea")
gdp = gdp.replace("Iran, Islamic Rep.", "Iran")
gdp = gdp.replace("Hong Kong SAR, China", "Hong Kong")
scim_en.set_index('Country')
energy.set_index('Country')
gdp.set_index('Country')
return energy, gdp, scim_en

'DataFrame' object is not callable PYTHON

I have a code that should write information to excel using selenium. I have 1 list with some information. I need to write all this to excel, and i have solution. But, when i tried to use it i got 'DataFrame' object is not callable. How can i solve it?
All this code into iteration:
for schools in List: #in the List i have data from excel file with Name of schools
data = pd.DataFrame()
data({
"School Name":School_list_result[0::17],
"Principal":School_list_result[1::17],
"Principal's E-mail":School_list_result[2::17],
"Type":School_list_result[8::17],
"Grade Span": School_list_result[3::17],
"Address":School_list_result[4::17],
"Phone":School_list_result[14::17],
"Website":School_list_result[13::17],
"Associations/Communities":School_list_result[5::17],
"GreatSchools Summary Rating":School_list_result[6::17],
"U.S.News Rankings":School_list_result[12::17],
"Total # Students":School_list_result[15::17],
"Full-Time Teachers":School_list_result[16::17],
"Student/Teacher Ratio":School_list_result[17::17],
"Charter":School_list_result[9::17],
"Enrollment by Race/Ethnicity": School_list_result[7::17],
"Enrollment by Gender":School_list_result[10::17],
"Enrollment by Grade":School_list_result[11::17],
})
data.to_excel("D:\Schools.xlsx")
In School_list_result i have this data:
'Cape Elizabeth High School',
'Mr. Jeffrey Shedd',
'No data.',
'9-12',
'345 Ocean House Road, Cape Elizabeth, ME 04107',
'Cape Elizabeth Public Schools',
'8/10',
'White\n91%\nAsian\n3%\nTwo or more races\n3%\nHispanic\n3%\nBlack\n1%',
'Regular school',
'No',
' Male Female\n Students 281 252',
' 9 10 11 12\n Students 139 135 117 142',
'#5,667 in National Rankings',
'https://cehs.cape.k12.me.us/',
'Tel: (207)799-3309',
'516 students',
'47 teachers',
'11:1',
Please follow the syntax about how to create a dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
So your code should be modified as:
for schools in List: #in the List i have data from excel file with Name of schools
data = pd.DataFrame(data={
"School Name": School_list_result[0::17],
"Principal": School_list_result[1::17],
"Principal's E-mail": School_list_result[2::17],
"Type": School_list_result[8::17],
"Grade Span": School_list_result[3::17],
"Address": School_list_result[4::17],
"Phone": School_list_result[14::17],
"Website": School_list_result[13::17],
"Associations/Communities": School_list_result[5::17],
"GreatSchools Summary Rating": School_list_result[6::17],
"U.S.News Rankings": School_list_result[12::17],
"Total # Students": School_list_result[15::17],
"Full-Time Teachers": School_list_result[16::17],
"Student/Teacher Ratio": School_list_result[17::17],
"Charter": School_list_result[9::17],
"Enrollment by Race/Ethnicity": School_list_result[7::17],
"Enrollment by Gender": School_list_result[10::17],
"Enrollment by Grade": School_list_result[11::17],
})
Do you want to add in an existing xlsx file?
First, create the dictionary and then call the DataFrame method, like this:
r = {"column1":["data"], "column2":["data"]}
data = pd.DataFrame(r)

Pandas Search Returns Missing Data when data exists

i've a dataframe "energy" which i imported from excel file and did some data cleaning like this :
header = 17
footer = 38
energy=(pd.read_excel((path0),skiprows=header,skipfooter=footer)
.drop(columns=['Unnamed: 0','Unnamed: 1'],axis=1)
.rename(columns={"Unnamed: 2": "Country", "Petajoules": "Energy Supply","Gigajoules":"Energy
Supply per Capita","%":"% Renewable"})
.replace({"Republic of Korea": "South Korea", "United States of America": "United States", "United Kingdom of Great Britain and Northern Ireland": "United Kingdom", "China, Hong Kong Special Administrative Region": "Hong Kong"}))
energy.loc[energy['Energy Supply'] == "...", 'Energy Supply'] = np.NaN
energy.loc[energy['Energy Supply per Capita'] == "...", 'Energy Supply per Capita'] = np.NaN
energy.loc[energy['% Renewable'] == "...", '% Renewable'] = np.NaN
energy["Energy Supply"] = 1000000 * energy["Energy Supply"]
energy['Country'] = energy['Country'].str.replace('\d+', '')
energy['Country'] = energy['Country'].str.replace(r"\(.*\)","")
after that i tried searching for specific Countries like this :
energy[energy['Country']=='United Kingdom']
returns the values except when i search for these 3 Countries "Iran , United States , United Kingdom" but i can see them on the dataframe when i list the dataframe ranges :
energy.iloc[80:100]
and it is not a spelling error i tried copying and pasting the Country Name Directly .

Parsing specific region of a txt, comparing to list of strings, then generating new list composed of matches

I am trying to do the following:
Read through a specific portion of a text file (there is a known starting point and ending point)
While reading through these lines, check to see if a word matches a word that I have included in a list
If a match is detected, then add that specific word to a new list
I have been able to read through the text and grab other data from it that I need, but I've been unable to do the above mentioned thus far.
I have tried to implement the following example: Python - Search Text File For Any String In a List
But I have failed to make it read correctly.
I have also tried to adapt the following: https://www.geeksforgeeks.org/python-finding-strings-with-given-substring-in-list/
But I was equally unsuccessful.
Here is some of my code:
import re
from itertools import islice
import os
# list of all countries
oneCountries = "Afghanistan, Albania, Algeria, Andorra, Angola, Antigua & Deps, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina, Burma, Burundi, Cambodia, Cameroon, Canada, Cape Verde, Central African Rep, Chad, Chile, China, Republic of China, Colombia, Comoros, Democratic Republic of the Congo, Republic of the Congo, Costa Rica,, Croatia, Cuba, Cyprus, Czech Republic, Danzig, Denmark, Djibouti, Dominica, Dominican Republic, East Timor, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Fiji, Finland, France, Gabon, Gaza Strip, The Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Holy Roman Empire, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Republic of Ireland, Israel, Italy, Ivory Coast, Jamaica, Japan, Jonathanland, Jordan, Kazakhstan, Kenya, Kiribati, North Korea, South Korea, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mount Athos, Mozambique, Namibia, Nauru, Nepal, Newfoundland, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Ottoman Empire, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Prussia, Qatar, Romania, Rome, Russian Federation, Rwanda, St Kitts & Nevis, St Lucia, Saint Vincent & the Grenadines, Samoa, San Marino, Sao Tome & Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Spain, Sri Lanka, Sudan, Suriname, Swaziland, Sweden, Switzerland, Syria, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad & Tobago, Tunisia, Turkey, Turkmenistan, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Uzbekistan, Vanuatu, Vatican City, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe"
countries = oneCountries.split(",")
path = "C:/Users/me/Desktop/read.txt"
thefile = open(path, errors='ignore')
countryParsing = False
for line in thefile:
line = line.strip()
# if line.startswith("Submitting Author:"):
# if re.match(r"Submitting Author:", line):
# print("blahblah1")
# countryParsing = True
# if countryParsing == True:
# print("blahblah2")
#
# res = [x for x in line if re.search(countries, x)]
# print("blah blah3: " + str(res))
# elif re.match(r"Running Head:", line):
# countryParsing = False
# if countryParsing == True:
# res = [x for x in line if re.search(countries, x)]
# print("blah blah4: " + str(res))
# for x in countries:
# if x in thefile:
# print("a country is: " + x)
# if any(s in line for s in countries):
# listOfAuthorCountries = listOfAuthorCountries + s + ", "
# if re.match(f"Submitting Author:, line"):
The #commented out lines are versions of the code that I've tried and failed to make work properly.
As requested, this is an example of the text file that I'm trying to grab the data from. I've modified it to remove sensitive information, but in this particular case, the "new list" should be appended with a certain number of "France" entries:
txt above....
Submitting Author:
asdf, asdf (proxy)
France
asdfasdf
blah blah
asdfasdf
asdf, Provence-Alpes-Côte d'Azu 13354
France
blah blah
France
asdf
Running Head:
...more text below
Based on the three points you stated on what you want to accomplish and what I understand from your code (which may not be what you intended), I propose:
# list of all countries
countries = "Afghanistan, Albania, Algeria, Andorra, Angola, Antigua & Deps, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina, Burma, Burundi, Cambodia, Cameroon, Canada, Cape Verde, Central African Rep, Chad, Chile, China, Republic of China, Colombia, Comoros, Democratic Republic of the Congo, Republic of the Congo, Costa Rica, Croatia, Cuba, Cyprus, Czech Republic, Danzig, Denmark, Djibouti, Dominica, Dominican Republic, East Timor, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Fiji, Finland, France, Gabon, Gaza Strip, The Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Holy Roman Empire, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Republic of Ireland, Israel, Italy, Ivory Coast, Jamaica, Japan, Jonathanland, Jordan, Kazakhstan, Kenya, Kiribati, North Korea, South Korea, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mount Athos, Mozambique, Namibia, Nauru, Nepal, Newfoundland, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Ottoman Empire, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Prussia, Qatar, Romania, Rome, Russian Federation, Rwanda, St Kitts & Nevis, St Lucia, Saint Vincent & the Grenadines, Samoa, San Marino, Sao Tome & Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Spain, Sri Lanka, Sudan, Suriname, Swaziland, Sweden, Switzerland, Syria, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad & Tobago, Tunisia, Turkey, Turkmenistan, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Uzbekistan, Vanuatu, Vatican City, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe"
countries = countries.split(",")
countries = [c.strip() for c in countries]
filename = "read.txt"
filehandle = open(filename, errors='ignore')
my_other_list = []
toParse = False
for line in filehandle:
line = line.strip()
if line.startswith("Submitting Author:"):
toParse = True
continue
elif line.startswith("Running Head:"):
toParse = False
continue
elif toParse:
for c in countries:
if c in line:
my_other_list.append(c)
EDIT SUMMARY
Adapted code to work on the text sample provided.
Fixed the list of countries (originally there were two commas after Costa Rica).
I think your main problem is that, in oneCountries, the country-names are separated by comma+space, but you're only splitting on comma, so for instance the second entry in countries is " Albania", with a space in front. You need to change:
oneCountries.split(",")
to:
oneCountries.split(", ")
After that, it looks like there's enough useful stuff in your commented-out code to achieve what you want.

Country name from ISO short code in dictionary, how to deal with non-ascii chars

I'm making a webapp that takes country short code (google app engine get from request header) and I want to get the country name (full name) not just the 2 letter initials.
I tried making a python dictionary but it breaks bkz the names have non-ascii chars (accent marks, etc).. I used the python library "pycountry" but I'm not sure how to include that in my google app engine project. unfortunately pycountries output also has accent marks, so i can't just copy their txt values and make a dictionary...
Besides, I just want the country code to name lookup table, no other details...
Here's a copy of the dictionarys i've been trying to make but they have these annoying accent marks...
Thanks for the help in advance
short2long = {"AF":"Afghanistan",
"AX":"Aland Islands",
"AL":"Albania",
"DZ":"Algeria",
"AS":"American Samoa",
"AD":"Andorra",
"AO":"Angola",
"AI":"Anguilla",
"AQ":"Antarctica",
"AG":"Antigua and Barbuda",
"AR":"Argentina",
"AM":"Armenia",
"AW":"Aruba",
"AU":"Australia",
"AT":"Austria",
"AZ":"Azerbaijan",
"BS":"Bahamas",
"BH":"Bahrain",
"BD":"Bangladesh",
"BB":"Barbados",
"BY":"Belarus",
"BE":"Belgium",
"BZ":"Belize",
"BJ":"Benin",
"BM":"Bermuda",
"BT":"Bhutan",
"BO":"Bolivia, Plurinational State of",
"BQ":"Bonaire, Sint Eustatius and Saba",
"BA":"Bosnia and Herzegovina",
"BW":"Botswana",
"BV":"Bouvet Island",
"BR":"Brazil",
"IO":"British Indian Ocean Territory",
"BN":"Brunei Darussalam",
"BG":"Bulgaria",
"BF":"Burkina Faso",
"BI":"Burundi",
"KH":"Cambodia",
"CM":"Cameroon",
"CA":"Canada",
"CV":"Cape Verde",
"KY":"Cayman Islands",
"CF":"Central African Republic",
"TD":"Chad",
"CL":"Chile",
"CN":"China",
"CX":"Christmas Island",
"CC":"Cocos (Keeling) Islands",
"CO":"Colombia",
"KM":"Comoros",
"CG":"Congo",
"CD":"Congo, The Democratic Republic of the",
"CK":"Cook Islands",
"CR":"Costa Rica",
"CI":"Côte d'Ivoire",
"HR":"Croatia",
"CU":"Cuba",
"CW":"Curaçao",
"CY":"Cyprus",
"CZ":"Czech Republic",
"DK":"Denmark",
"DJ":"Djibouti",
"DM":"Dominica",
"DO":"Dominican Republic",
"EC":"Ecuador",
"EG":"Egypt",
"SV":"El Salvador",
"GQ":"Equatorial Guinea",
"ER":"Eritrea",
"EE":"Estonia",
"ET":"Ethiopia",
"FK":"Falkland Islands (Malvinas)",
"FO":"Faroe Islands",
"FJ":"Fiji",
"FI":"Finland",
"FR":"France",
"GF":"French Guiana",
"PF":"French Polynesia",
"TF":"French Southern Territories",
"GA":"Gabon",
"GM":"Gambia",
"GE":"Georgia",
"DE":"Germany",
"GH":"Ghana",
"GI":"Gibraltar",
"GR":"Greece",
"GL":"Greenland",
"GD":"Grenada",
"GP":"Guadeloupe",
"GU":"Guam",
"GT":"Guatemala",
"GG":"Guernsey",
"GN":"Guinea",
"GW":"Guinea-Bissau",
"GY":"Guyana",
"HT":"Haiti",
"HM":"Heard Island and McDonald Islands",
"VA":"Holy See (Vatican City State)",
"HN":"Honduras",
"HK":"Hong Kong",
"HU":"Hungary",
"IS":"Iceland",
"IN":"India",
"ID":"Indonesia",
"IR":"Iran, Islamic Republic of",
"IQ":"Iraq",
"IE":"Ireland",
"IM":"Isle of Man",
"IL":"Israel",
"IT":"Italy",
"JM":"Jamaica",
"JP":"Japan",
"JE":"Jersey",
"JO":"Jordan",
"KZ":"Kazakhstan",
"KE":"Kenya",
"KI":"Kiribati",
"KP":"Korea, Democratic People's Republic of",
"KR":"Korea, Republic of",
"KW":"Kuwait",
"KG":"Kyrgyzstan",
"LA":"Lao People's Democratic Republic",
"LV":"Latvia",
"LB":"Lebanon",
"LS":"Lesotho",
"LR":"Liberia",
"LY":"Libya",
"LI":"Liechtenstein",
"LT":"Lithuania",
"LU":"Luxembourg",
"MO":"Macao",
"MK":"Macedonia, Republic of",
"MG":"Madagascar",
"MW":"Malawi",
"MY":"Malaysia",
"MV":"Maldives",
"ML":"Mali",
"MT":"Malta",
"MH":"Marshall Islands",
"MQ":"Martinique",
"MR":"Mauritania",
"MU":"Mauritius",
"YT":"Mayotte",
"MX":"Mexico",
"FM":"Micronesia, Federated States of",
"MD":"Moldova, Republic of",
"MC":"Monaco",
"MN":"Mongolia",
"ME":"Montenegro",
"MS":"Montserrat",
"MA":"Morocco",
"MZ":"Mozambique",
"MM":"Myanmar",
"NA":"Namibia",
"NR":"Nauru",
"NP":"Nepal",
"NL":"Netherlands",
"NC":"New Caledonia",
"NZ":"New Zealand",
"NI":"Nicaragua",
"NE":"Niger",
"NG":"Nigeria",
"NU":"Niue",
"NF":"Norfolk Island",
"MP":"Northern Mariana Islands",
"NO":"Norway",
"OM":"Oman",
"PK":"Pakistan",
"PW":"Palau",
"PS":"Palestinian Territory, Occupied",
"PA":"Panama",
"PG":"Papua New Guinea",
"PY":"Paraguay",
"PE":"Peru",
"PH":"Philippines",
"PN":"Pitcairn",
"PL":"Poland",
"PT":"Portugal",
"PR":"Puerto Rico",
"QA":"Qatar",
"RE":"Réunion",
"RO":"Romania",
"RU":"Russian Federation",
"RW":"Rwanda",
"BL":"Saint Barthélemy",
"SH":"Saint Helena, Ascension and Tristan da Cunha",
"KN":"Saint Kitts and Nevis",
"LC":"Saint Lucia",
"MF":"Saint Martin (French part)",
"PM":"Saint Pierre and Miquelon",
"VC":"Saint Vincent and the Grenadines",
"WS":"Samoa",
"SM":"San Marino",
"ST":"Sao Tome and Principe",
"SA":"Saudi Arabia",
"SN":"Senegal",
"RS":"Serbia",
"SC":"Seychelles",
"SL":"Sierra Leone",
"SG":"Singapore",
"SX":"Sint Maarten (Dutch part)",
"SK":"Slovakia",
"SI":"Slovenia",
"SB":"Solomon Islands",
"SO":"Somalia",
"ZA":"South Africa",
"GS":"South Georgia and the South Sandwich Islands",
"ES":"Spain",
"LK":"Sri Lanka",
"SD":"Sudan",
"SR":"Suriname",
"SS":"South Sudan",
"SJ":"Svalbard and Jan Mayen",
"SZ":"Swaziland",
"SE":"Sweden",
"CH":"Switzerland",
"SY":"Syrian Arab Republic",
"TW":"Taiwan, Province of China",
"TJ":"Tajikistan",
"TZ":"Tanzania, United Republic of",
"TH":"Thailand",
"TL":"Timor-Leste",
"TG":"Togo",
"TK":"Tokelau",
"TO":"Tonga",
"TT":"Trinidad and Tobago",
"TN":"Tunisia",
"TR":"Turkey",
"TM":"Turkmenistan",
"TC":"Turks and Caicos Islands",
"TV":"Tuvalu",
"UG":"Uganda",
"UA":"Ukraine",
"AE":"United Arab Emirates",
"GB":"United Kingdom",
"US":"United States",
"UM":"United States Minor Outlying Islands",
"UY":"Uruguay",
"UZ":"Uzbekistan",
"VU":"Vanuatu",
"VE":"Venezuela, Bolivarian Republic of",
"VN":"Viet Nam",
"VG":"Virgin Islands, British",
"VI":"Virgin Islands, U.S.",
"WF":"Wallis and Futuna",
"EH":"Western Sahara",
"YE":"Yemen",
"ZM":"Zambia",
"ZW":"Zimbabwe"}
I tried to use this code to build the dictionary
import pycountry
t = list(pycountry.countries)
for country in t:
print '"' + country.alpha2 + '":"' + country.name + '",'
# -*- coding: utf-8 -*-
import pycountry
cc={}
t = list(pycountry.countries)
for country in t:
cc[country.alpha2]=country.name
print cc
cc will be the dictionary you're looking for.
Python source files default to the ASCII character encoding. If you want to include characters outside of this range in your source code, then you will need to declare the file's character encoding as described in PEP 0263. For example, adding the following line to the top of the file might do what you want (assuming the file is encoded in UTF-8):
# -*- coding: utf-8 -*-
This should cause the string objects to contain UTF-8 encoded versions of the country names. If you were using unicode string literals instead, then the non-ASCII characters would be decoded correctly too.
If your input charset is limited to ASCII, you can still get the accents you need by using escape sequences. Try this:
import pycountry
import pprint
pprint.pprint ({country.alpha2 : country.name for country in pycountry.countries})
This produces lines like this one:
u'CI': u"C\xf4te d'Ivoire",

Categories

Resources