Scraping countries name from wikipedia page - python

from bs4 import BeautifulSoup
from urllib.request import urlopen
webpage = urlopen('https://en.wikipedia.org/wiki/List_of_largest_banks')
bs = BeautifulSoup(webpage,'html.parser')
print(bs)
spanList= bs.find_all('span',{'class':'flagicon'})
for span in spanList:
print(span.a['title'])
Though its printing the list of countries in the first table but after printing its giving error:
Traceback (most recent call last):
File "C:/Users/Jegathesan/Desktop/python programmes/scrape5.py", line 10, in <module>
print(span.a['title'])
TypeError: 'NoneType' object is not subscriptable

import pandas as pd
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_largest_banks")
for item in range(4):
goal = df[item].iloc[:, 1].values.tolist()
print(goal)
print("*" * 100)
Output:
['Industrial and Commercial Bank of China', 'China Construction Bank', 'Agricultural Bank of China', 'Bank of China', 'Mitsubishi UFJ Financial Group', 'JPMorgan Chase', 'HSBC Holdings PLC', 'Bank of America', 'BNP Paribas', 'Crédit Agricole', 'Citigroup Inc.', 'Japan Post Bank', 'Wells Fargo', 'Sumitomo Mitsui Financial Group', 'Mizuho Financial Group', 'Banco Santander', 'Deutsche Bank', 'Société Générale', 'Groupe BPCE', 'Barclays', 'Bank of Communications', 'Postal Savings Bank of China', 'Royal Bank of Canada', 'Lloyds Banking Group', 'ING Group', 'Toronto-Dominion Bank', 'China Merchants Bank', 'Crédit Mutuel', 'Norinchukin Bank', 'UBS', 'Industrial Bank (China)', 'UniCredit', 'Goldman Sachs', 'Shanghai Pudong Development Bank', 'Intesa Sanpaolo', 'Royal Bank of Scotland Group', 'China CITIC Bank', 'China Minsheng Bank', 'Morgan Stanley', 'Scotiabank', 'Credit Suisse', 'Banco Bilbao Vizcaya Argentaria', 'Commonwealth Bank', 'Standard Chartered', 'Australia and New Zealand Banking Group', 'Rabobank', 'Nordea', 'Westpac', 'China Everbright Bank', 'Bank of Montreal', 'DZ Bank', 'National Australia Bank', 'Danske Bank', 'State Bank of India', 'Resona Holdings', 'Commerzbank', 'Sumitomo Mitsui Trust Holdings', 'Ping An Bank', 'Canadian Imperial Bank of Commerce', 'U.S. Bancorp', 'CaixaBank', 'Truist Financial', 'ABN AMRO Group', 'KB Financial Group Inc', 'Shinhan Bank', 'Sberbank of Russia', 'Nomura Holdings', 'DBS
Bank', 'Itaú Unibanco', 'PNC Financial Services', 'Huaxia Bank', 'Nonghyup Bank', 'Capital One', 'Bank of Beijing', 'The Bank of New York Mellon', 'Banco do Brasil', 'Hana Financial Group', 'OCBC Bank', 'Banco Bradesco', 'Handelsbanken', 'Caixa Econômica Federal', 'KBC Bank', 'China Guangfa Bank', 'Nationwide Building
Society', 'Woori Bank', 'DNB ASA', 'SEB Group', 'Bank of Shanghai', 'United Overseas Bank', 'Bank of Jiangsu', 'La Banque postale', 'Landesbank Baden-Württemberg', 'Erste Group', 'Industrial Bank of Korea', 'Qatar National Bank', 'Banco Sabadell', 'Swedbank', 'BayernLB', 'State Street Corporation', 'China Zheshang Bank', 'Bankia']
****************************************************************************************************
['China', 'United States', 'Japan', 'France', 'South Korea', 'United Kingdom', 'Canada', 'Germany', 'Spain', 'Australia', 'Brazil', 'Netherlands', 'Singapore',
'Sweden', 'Italy', 'Switzerland', 'Austria', 'Belgium', 'Denmark', 'Finland', 'India', 'Luxembourg', 'Norway', 'Russia']
****************************************************************************************************
['JPMorgan Chase', 'Industrial and Commercial Bank of China', 'Bank of America', 'Wells Fargo', 'China Construction Bank', 'HSBC Holdings PLC', 'Agricultural Bank of China', 'Citigroup Inc.', 'Bank of China', 'China Merchants Bank', 'Royal
Bank of Canada', 'Banco Santander', 'Commonwealth Bank', 'Mitsubishi UFJ Financial Group', 'Toronto-Dominion Bank', 'BNP Paribas', 'Goldman Sachs', 'Sberbank of Russia', 'Morgan Stanley', 'U.S. Bancorp', 'HDFC Bank', 'Itaú Unibanco', 'Westpac', 'Scotiabank', 'ING Group', 'UBS', 'Charles Schwab', 'PNC Financial Services', 'Lloyds Banking Group', 'Sumitomo Mitsui Financial Group', 'Bank of Communications', 'Australia and New Zealand Banking Group', 'Banco Bradesco', 'National Australia Bank', 'Intesa Sanpaolo', 'Banco Bilbao Vizcaya Argentaria', 'Japan Post Bank', 'The Bank of New York Mellon', 'Shanghai Pudong Development Bank', 'Industrial Bank (China)', 'Bank of China (Hong Kong)', 'Bank of Montreal', 'Crédit
Agricole', 'DBS Bank', 'Nordea', 'Capital One', 'Royal Bank of Scotland Group',
'Mizuho Financial Group', 'Credit Suisse', 'Postal Savings Bank of China', 'China Minsheng Bank', 'UniCredit', 'China CITIC Bank', 'Hang Seng Bank', 'Société Générale', 'Barclays', 'Canadian Imperial Bank of Commerce', 'Bank Central Asia',
'Truist Financial', 'Oversea-Chinese Banking Corp', 'State Bank of India', 'State Street', 'Deutsche Bank', 'KBC Bank', 'Danske Bank', 'Ping An Bank', 'Standard Chartered', 'United Overseas Bank', 'QNB Group', 'Bank Rakyat']
****************************************************************************************************
['United States', 'China', 'United Kingdom', 'Canada', 'Australia', 'Japan', 'France', 'Spain', 'Brazil', 'India', 'Singapore', 'Switzerland', 'Italy', 'Hong Kong', 'Indonesia', 'Russia', 'Netherlands']
****************************************************************************************************

your initial code was checking the span tag in all the html parsed.
the modified code will get all the table tag found in html and store in a list.
using your statement to get the span tag for a specific table (i.e the first table)
from bs4 import BeautifulSoup
from urllib.request import urlopen
webpage = urlopen('https://en.wikipedia.org/wiki/List_of_largest_banks')
bs = BeautifulSoup(webpage,'html.parser')
# tableList is extracting all "table" elements in a list
tableList = bs.table.findAll()
# spanList will access the table [index] in the tableList and find all span
# to access other table change the list index
spanList= tableList[0].findAll('span',{'class':'flagicon'})
for span in spanList:
try:
print(span.a['title'])
except:
print("title tag is not found.")

Related

How to check if multiple values contain a CSV dataset with Python?

I'm trying to check if the CSV file contains all the states from the U.S. As you can see in my code I've imported the CSV file as a list in Python. I'm trying to solve this problem, without using pandas or another module.
I've created a list of the states, but I'm wondering what is the most efficient solution to check what how many states the CSV dataset contains?
import csv
with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
reader = csv.reader(file)
data = list(reader)
print(data)
[['state', 'county', 'candidate', 'party', 'votes'], ['Delaware', 'Kent County', 'Joe Biden', 'DEM', '44518'], ['Delaware', 'Kent County', 'Donald Trump', 'REP', '40976'], ['Delaware', 'Kent County', 'Jo Jorgensen', 'LIB', '1044'], ['Delaware', 'Kent County', 'Howie Hawkins', 'GRN', '420'], ['Delaware', 'Kent County', ' Write-ins', 'WRI', '0'], ['Delaware', 'New Castle County', 'Joe Biden', 'DEM', '194245'], ['Delaware', 'New Castle County', 'Donald Trump', 'REP', '87687'], ['Delaware', 'New Castle County', 'Jo Jorgensen', 'LIB', '2932'], ['Delaware', 'New Castle County', 'Howie Hawkins', 'GRN', '1277'], ['Delaware', 'New Castle County', ' Write-ins', 'WRI', '0'], ['Delaware', 'Sussex County', 'Donald Trump', 'REP', '71196'], ['Delaware', 'Sussex County', 'Joe Biden', 'DEM', '56657'], ['Delaware', 'Sussex County', 'Jo Jorgensen', 'LIB', '1003'], ['Delaware', 'Sussex County', 'Howie Hawkins', 'GRN', '437'], ['District of Columbia', 'District of Columbia', 'Joe Biden', 'DEM', '31723'], ['District of Columbia', 'District of Columbia', 'Donald Trump', 'REP', '1239'], ['District of Columbia', 'District of Columbia', ' Write-ins', 'WRI', '206'], ['District of Columbia', 'District of Columbia', 'Howie Hawkins', 'GRN', '192'], ['District of Columbia', 'District of Columbia', 'Jo Jorgensen', 'LIB', '147'], ['District of Columbia', 'District of Columbia', 'Gloria La Riva', 'PSL', '77'], ['District of Columbia', 'District of Columbia', 'Brock Pierce', 'IND', '28'], ['District of Columbia', 'Ward 2', 'Joe Biden', 'DEM', '25228'], ['District of Columbia', 'Ward 2', 'Donald Trump', 'REP', '2466'], ['District of Columbia', 'Ward 2', ' Write-ins', 'WRI', '298'], ['District of Columbia', 'Ward 2', 'Jo Jorgensen', 'LIB', '229'], ['District of Columbia', 'Ward 2', 'Howie Hawkins', 'GRN', '96'], ['District of Columbia', 'Ward 2', 'Gloria La Riva', 'PSL', '37'], ['District of Columbia', 'Ward 2', 'Brock Pierce', 'IND', '32']]
states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii',
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']
If your objective is only to
check if the CSV file contains all the states from the U.S
then you can find the unique set of states in your file and make sure they count exactly as 50.
number = len(set(record[0].lower() for record in data[1:]))
# Expected: number should be 50
This example will count every state found in your data list:
counter = {}
for state, *_ in data:
if state in states:
counter.setdefault(state, 0)
counter[state] += 1
for state in states:
print("{:<20} {}".format(state, counter.get(state, 0)))
print()
print("Total states found:", len(counter))
Prints:
Alabama 0
Alaska 0
Arizona 0
Arkansas 0
California 0
Colorado 0
Connecticut 0
Delaware 14
Florida 0
Georgia 0
Idaho 0
Hawaii 0
Illinois 0
Indiana 0
Iowa 0
Kansas 0
Kentucky 0
Louisiana 0
Maine 0
Maryland 0
Massachusetts 0
Michigan 0
Minnesota 0
Mississippi 0
Missouri 0
Montana 0
Nebraska 0
Nevada 0
New Hampshire 0
New Jersey 0
New Mexico 0
New York 0
North Carolina 0
North Dakota 0
Ohio 0
Oklahoma 0
Oregon 0
Pennsylvania 0
Rhode Island 0
South Carolina 0
South Dakota 0
Tennessee 0
Texas 0
Utah 0
Vermont 0
Virginia 0
Washington 0
West Virginia 0
Wisconsin 0
Wyoming 0
Total states found: 1
P.S.: To speed up, you can convert states from list to set beforehand.
First, a tip: It is probably easier to use csv.DictReader in this case as it will give you labelled rows and automatically skip the first now. Not necessary, but makes the code easier to read
import csv
with open('test.csv') as f:
data = list(csv.DictReader(f))
print(data)
# prints: [
# {'state': 'Delaware', ' county': ' Kent County', ' candidate': ' Joe Biden', ' party': ' DEM', ' votes': ' 44518'},
# {'state': 'Delaware', ' county': ' Kent County', ' candidate': ' Donald Trump', ' party': ' REP', ' votes': ' 40976'}
# ...
# ]
Then, you can use this expression to get all the states that are mentioned in the csv file:
states_in_csv = set(line['state'] for line in data)
print(states_in_csv)
# {'Delaware', 'District of Columbia', ... }
line['state'] for line in data is a list comprehension that extracts just the 'state' field of each of those lines. set() makes the set of those states, i.e. removed all duplicates.
Then, you can easily test how many of your states are represented in your table. For example:
num_states = 0
for state in [""]:
if state in states_in_csv:
num_states += 1
print("number of states:", num_states)
This is very efficient because checking if a value is in a set is a constant time operation, so you don't have to search your whole table for each state.
It looks like your states list contains every state. If you just want to know how many states were in the table, you can simply use len(states_in_csv)
You can use a HashMap or in this case a Python Dictionary that is the most efficient data structure for this job. This snippet can help you:
dict={}
for i in data:
#verify if the state exist
if i[0] in states:
if i[0] in dict.keys():
dict[i[0]] +=1
else:
dict[i[0]]=1
for k in dict.keys():
if dict[k]>1:
print(f"There are {dict[k]} times the {k} state")
import csv
with open('president_county_candidate.csv', newline='', encoding='utf_8') as file:
reader = csv.reader(file)
data = list(reader)
states = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Idaho', 'Hawaii',
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
'Maine', 'Maryland', 'Massachusetts','Michigan','Minnesota','Mississippi',
'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico',
'New York', 'North Carolina','North Dakota','Ohio','Oklahoma','Oregon',
'Pennsylvania','Rhode Island','South Carolina','South Dakota','Tennessee','Texas',
'Utah','Vermont','Virginia','Washington','West Virginia', 'Wisconsin','Wyoming']
for state in data:
for i in states:
if state == i:
print(state)

json to dict, creating multiple rows based on a nested column?

I'm having a weird problem that I'm not sure how to approach. I basically am calling a json endpoint and I get this:
{'Address': 'One Apple Park Way, Cupertino, CA, United States, 95014',
'AddressData': {'City': 'Cupertino',
'Country': 'United States',
'State': 'CA',
'Street': 'One Apple Park Way',
'ZIP': '95014'},
'CIK': '0000320193',
'CUSIP': '037833100',
'Code': 'AAPL',
'CountryISO': 'US',
'CountryName': 'USA',
'CurrencyCode': 'USD',
'CurrencyName': 'US Dollar',
'CurrencySymbol': '$',
'Description': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, iPod touch, and other Apple-branded and third-party accessories. It also provides AppleCare support services; cloud services store services; and operates various platforms, including the App Store, that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. In addition, the company offers various services, such as Apple Arcade, a game subscription service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It sells and delivers third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers, wholesalers, retailers, and resellers. Apple Inc. was founded in 1977 and is headquartered in Cupertino, California.',
'EmployerIdNumber': '94-2404110',
'Exchange': 'NASDAQ',
'FiscalYearEnd': 'September',
'FullTimeEmployees': 147000,
'GicGroup': 'Technology Hardware & Equipment',
'GicIndustry': 'Technology Hardware, Storage & Peripherals',
'GicSector': 'Information Technology',
'GicSubIndustry': 'Technology Hardware, Storage & Peripherals',
'HomeCategory': 'Domestic',
'IPODate': '1980-12-12',
'ISIN': 'US0378331005',
'Industry': 'Consumer Electronics',
'InternationalDomestic': 'International/Domestic',
'IsDelisted': False,
'Listings': {'0': {'Code': '0R2V', 'Exchange': 'LSE', 'Name': '0R2V'},
'1': {'Code': 'AAPL', 'Exchange': 'BA', 'Name': 'Apple Inc. CEDEAR'},
'2': {'Code': 'AAPL34', 'Exchange': 'SA', 'Name': 'Apple Inc'}},
'LogoURL': '/img/logos/US/aapl.png',
'Name': 'Apple Inc',
'Officers': {'0': {'Name': 'Mr. Timothy D. Cook',
'Title': 'CEO & Director',
'YearBorn': '1961'},
'1': {'Name': 'Mr. Luca Maestri',
'Title': 'CFO & Sr. VP',
'YearBorn': '1964'},
'2': {'Name': 'Mr. Jeffrey E. Williams',
'Title': 'Chief Operating Officer',
'YearBorn': '1964'},
'3': {'Name': 'Ms. Katherine L. Adams',
'Title': 'Sr. VP, Gen. Counsel & Sec.',
'YearBorn': '1964'},
'4': {'Name': "Ms. Deirdre O'Brien",
'Title': 'Sr. VP of People & Retail',
'YearBorn': '1967'},
'5': {'Name': 'Mr. Chris Kondo',
'Title': 'Sr. Director of Corp. Accounting',
'YearBorn': 'NA'},
'6': {'Name': 'Mr. James Wilson',
'Title': 'Chief Technology Officer',
'YearBorn': 'NA'},
'7': {'Name': 'Ms. Mary Demby',
'Title': 'Chief Information Officer',
'YearBorn': 'NA'},
'8': {'Name': 'Ms. Nancy Paxton',
'Title': 'Sr. Director of Investor Relations & Treasury',
'YearBorn': 'NA'},
'9': {'Name': 'Mr. Greg Joswiak',
'Title': 'Sr. VP of Worldwide Marketing',
'YearBorn': 'NA'}},
'Phone': '408-996-1010',
'Sector': 'Technology',
'Type': 'Common Stock',
'UpdatedAt': '2021-02-25',
'WebURL': 'http://www.apple.com'}
you can see it mostly flat except two keys have nested values (address/officers). When I convert it to a dataframe I'm getting :
Code Type Name Exchange CurrencyCode CurrencyName CurrencySymbol CountryName CountryISO ISIN CUSIP CIK EmployerIdNumber FiscalYearEnd IPODate InternationalDomestic Sector Industry GicSector GicGroup GicIndustry GicSubIndustry HomeCategory IsDelisted Description Address AddressData Listings Officers Phone WebURL LogoURL FullTimeEmployees UpdatedAt
Street AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... One Apple Park Way NaN NaN 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
City AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... Cupertino NaN NaN 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
State AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... CA NaN NaN 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
Country AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... United States NaN NaN 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
ZIP AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... 95014 NaN NaN 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
0 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN {'Code': '0R2V', 'Exchange': 'LSE', 'Name': '0... {'Name': 'Mr. Timothy D. Cook', 'Title': 'CEO ... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
1 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN {'Code': 'AAPL', 'Exchange': 'BA', 'Name': 'Ap... {'Name': 'Mr. Luca Maestri', 'Title': 'CFO & ... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
2 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN {'Code': 'AAPL34', 'Exchange': 'SA', 'Name': '... {'Name': 'Mr. Jeffrey E. Williams', 'Title': '... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
3 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Ms. Katherine L. Adams', 'Title': 'S... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
4 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Ms. Deirdre O'Brien', 'Title': 'Sr.... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
5 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Mr. Chris Kondo', 'Title': 'Sr. Dir... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
6 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Mr. James Wilson', 'Title': 'Chief ... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
7 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Ms. Mary Demby', 'Title': 'Chief In... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
8 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Ms. Nancy Paxton', 'Title': 'Sr. Di... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
9 AAPL Common Stock Apple Inc NASDAQ USD US Dollar $ USA US US0378331005 037833100 0000320193 94-2404110 September 1980-12-12 International/Domestic Technology Consumer Electronics Information Technology Technology Hardware & Equipment Technology Hardware, Storage & Peripherals Technology Hardware, Storage & Peripherals Domestic False Apple Inc. designs, manufactures, and markets ... One Apple Park Way, Cupertino, CA, United Stat... NaN NaN {'Name': 'Mr. Greg Joswiak', 'Title': 'Sr. VP... 408-996-1010 http://www.apple.com /img/logos/US/aapl.png 147000 2021-02-25
Basically it looks like each key in the nested keys is creating a new row in the dataframe. Here's my code:
import json
import requests
import pandas as pd
companyData = requests.get(url="https://eodhistoricaldata.com/api/fundamentals/AAPL.US?api_token=OeAFFmMliFG5orCUuwAKQ8l4WWFQ67YX").json()
General = pd.DataFrame.from_dict(companyData['General'])
General
In this case, my goal is that everything is just one row and any nesting would just show up as a json in the relevant column. Not create new duplicate rows for every item in the nested json.
try pd.json_normalize() with the max_level argument.
df = pd.json_normalize(companyData['General'],max_level=0)
print(df)

Creating a dict of list from pandas row?

I have a weird problem. I have a index and bunch of columns in a dataframe. I want the index to be a key and all the other columns to be in a list. Here's a example
df:
0 1 2 3
Barker Minerals Ltd
Blackout Media Corp
Booking Holdings Inc Booking Holdings Inc Booking Holdings Inc 4.10 04/13/2025 BOOKING HOLDINGS INC
Baker Hughes Company Baker Hughes Company BAKER HUGHES A GE COMPANY LLC-3.34%-12-15-2027 BAKER HUGHES A GE COMPANY LLC-3.14%-11-7-2029
Bank of Queensland Limited Bank of Queensland Limited Bank of Queensland Limited FRN 10-MAY-2026 3.50% 05/10/26 Bank of Queensland Limited FRN 26-OCT-2020 1.27% 10/26/20 Bank of Queensland Limited FRN 16-NOV-2021 1.12% 11/16/21
If I do this command it turns everything into a list when I want it to be a dict of a list:
df.to_numpy().tolist()
I want a dict with each key a list of values in the other columns(kind of like this):
{
Barker Minerals Ltd:
Blackout Media Corp:
Booking Holdings Inc: [Booking Holdings Inc ,Booking Holdings Inc 4.10 04/13/2025,BOOKING HOLDINGS INC]
Baker Hughes Company: [Baker Hughes Company ,BAKER HUGHES A GE COMPANY LLC-3.34%-12-15-2027,BAKER HUGHES A GE COMPANY LLC-3.14%-11-7-2029]
Bank of Queensland Limited: [Bank of Queensland Limited ,Bank of Queensland Limited FRN 10-MAY-2026 3.50% 05/10/26,Bank of Queensland Limited FRN 26-OCT-2020 1.27% 10/26/20, Bank of Queensland Limited FRN 16-NOV-2021 1.12% 11/16/21]
}
Is this possible to do?
The easiest answer as pointed out in the comments by Michael Szczesny:
df.T.to_dict(orient="list")
The output:
{'Barker Minerals Ltd': [nan, nan, nan, nan],
'Blackout Media Corp': [nan, nan, nan, nan],
'Booking Holdings Inc': ['Booking Holdings Inc',
'Booking Holdings Inc 4.10 04/13/2025',
'BOOKING HOLDINGS INC',
nan],
'Baker Hughes Company': ['Baker Hughes Company',
'BAKER HUGHES A GE COMPANY LLC-3.34%-12-15-2027',
'BAKER HUGHES A GE COMPANY LLC-3.14%-11-7-2029',
nan],
'Bank of Queensland Limited': ['Bank of Queensland Limited',
'Bank of Queensland Limited FRN 10-MAY-2026 3.50% 05/10/26',
'Bank of Queensland Limited FRN 26-OCT-2020 1.27% 10/26/20',
' Bank of Queensland Limited FRN 16-NOV-2021 1.12% 11/16/21']}
Also, in case you would like to lose all the nans, then the code is as follows:
df = pd.read_csv("df_to_dict.csv", index_col=0)
val = df.T.to_dict(orient="list")
cleaned_val = {}
for i in val:
cleaned_val[i] = [j for j in val[i] if str(j)!="nan"]
cleaned_val
The output is as follows:
{'Barker Minerals Ltd': [],
'Blackout Media Corp': [],
'Booking Holdings Inc': ['Booking Holdings Inc',
'Booking Holdings Inc 4.10 04/13/2025',
'BOOKING HOLDINGS INC'],
'Baker Hughes Company': ['Baker Hughes Company',
'BAKER HUGHES A GE COMPANY LLC-3.34%-12-15-2027',
'BAKER HUGHES A GE COMPANY LLC-3.14%-11-7-2029'],
'Bank of Queensland Limited': ['Bank of Queensland Limited',
'Bank of Queensland Limited FRN 10-MAY-2026 3.50% 05/10/26',
'Bank of Queensland Limited FRN 26-OCT-2020 1.27% 10/26/20',
' Bank of Queensland Limited FRN 16-NOV-2021 1.12% 11/16/21']}
The documentation of to_dict() can be accessed here.

I am trying to get an output of a replaced string value in a Data frame, but its not changing

I am trying to replace a string value from a dataframe and its not changing. Below is my code:
import pandas as pd
import numpy as np
def answer_one():
energy = pd.read_excel('Energy Indicators.xls', skip_footer=38,skiprows=17).drop(['Unnamed: 0',
'Unnamed: 1'], axis='columns')
energy = energy.rename(columns={'Unnamed: 2':'Country',
'Petajoules':'Energy Supply',
'Gigajoules':'Energy Supply per Capita',
'%':'% Renewable'})
energy['Energy Supply'] *=1000000
energy['Country'] = energy['Country'].replace({'China, Hong Kong Special Administrative Region':
'Hong Kong', 'United Kingdom of Great Britain and
Northern Ireland': 'United Kingdom', 'Republic Of
Korea': 'South Korea', 'United States of America':
'United States', 'Iran (Islamic Republic of)':
'Iran'})
return energy
answer_one()
I am getting all the output except that last one, where I am trying to replace the country names in 'Country' column.
I don't see any issues with the .replace(). Maybe there is something else missing which we are not aware from the hidden dataset. I just created a sample df with the values you got there and replaced them as you have done.
The df now has the original values for the country column.
df = pd.DataFrame(['China, Hong Kong Special Administrative Region',
'United Kingdom of Great Britain and Northern Ireland',
'Republic Of Korea',
'United States of America',
'Iran (Islamic Republic of)'],
columns=['Country'])
print(df)
Before replace:
Country
0 China, Hong Kong Special Administrative Region
1 United Kingdom of Great Britain and Northern I...
2 Republic Of Korea
3 United States of America
4 Iran (Islamic Republic of)
Used the same .replace() and assigned the pairs (see proper indentation)
df['Country'] = df['Country'].replace({'China, Hong Kong Special Administrative Region': 'Hong Kong',
'United Kingdom of Great Britain and Northern Ireland': 'United Kingdom',
'Republic Of Korea': 'South Korea',
'United States of America': 'United States',
'Iran (Islamic Republic of)': 'Iran'})
print(df)
After replace:
Country
0 Hong Kong
1 United Kingdom
2 South Korea
3 United States
4 Iran

How to scrape a website with mulitple pages and the links inside into a pandas dataframe?

I need to get the data from every firm in the following link with all the contents inside the link. I need each companies data to be in a row. The problem that I am having is that I am not sure how to do that exactly. I don't know which approach to take and from where to begin.
Here is the website: https://www.adgm.com/public-registers/fsra
I have tried to get the information into my code at least and try printing it from the IDE but I have failed and I don't understand why.
import requests
import pandas as pd
from bs4 import BeautifulSoup
res = requests.get("https://www.adgm.com/public-registers/fsra")
soup = BeautifulSoup(res.content,'html.parser')
table = soup.find_all('.every-accord')
for element in table:
print(element.text)
Here's the code that I have been trying. Each table row is in a class "every-accord" which I am trying to get. It is not giving me any errors, but I am not getting any results either.
Thanks for any help in advance.
You can iterate over the container:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.adgm.com/public-registers/fsra').text, 'html.parser')
results = [[c.text for c in i.find_all('div', {'class':'col-sm-6'})]+[i.a['href'], i.find('div', {'class':'col-lg-5'}).text] for i in d.find_all('div', {'class':'every-accord'})]
no_headers = [[i for i in c[1:] if i not in {'Company Status', 'Address'}] for c in results]
Output:
[['160024', 'Active', 'Level 7, Al Sila Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/aarna-capital-limited', 'Aarna Capital Limited'], ['160007', 'Active', 'Unit 8, 6th floor Al Khatem Tower, Abu Dhabi Global Markets Square, Al Maryah Island Abu Dhabi, United Arab Emirates P.O. Box 764605', '/public-registers/fsra/fsf/aberdeen-asset-middle-east-limited', 'Aberdeen Asset Middle East Limited'], ['180041', 'Active', 'Floor 22, Al Sila Tower, Abu Dhabi Global Market Square, Al Maryah Island', '/public-registers/fsra/fsf/abu-dhabi-catalyst-partners-limited', 'Abu Dhabi Catalyst Partners Limited'], ['180021', 'Active', 'Unit 5, 6th Floor, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/ad-global-investors-limited', 'AD Global Investors Limited'], ['180039', 'Active', '3419, 34th Floor, Al Maqam Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/ad-investment-management-limited', 'AD Investment Management Limited'], ['170036', 'Active', '10th Floor, Al Sila Tower, ADGM Square, Al Maryah Island', '/public-registers/fsra/fsf/adcb-asset-management-ltd', 'ADCB Asset Management Ltd.'], ['160006', 'Active', 'Level 34, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/adcm-altus-investment-management-limited', 'ADCM Altus Investment Management Limited'], ['160005', 'Active', '33rd floor, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/adcorp-ltd', 'ADCORP Ltd'], ['180024', 'Active', 'Unit 10, Level 6, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/adnoc-reinsurance-limited', 'ADNOC Reinsurance Limited'], ['170025', 'Active', 'Office 712, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', '/public-registers/fsra/fsf/ads-investment-solutions-limited', 'ADS Investment Solutions Limited']]
Edit: formatting columns from results:
new_results = [{**{j[i]:j[i+1] for i in range(0, len(j), 2)}, **{'link':a, 'name':b}} for *j, a, b in results]
Output:
[{'Financial Services Permission Number': '160024', 'Company Status': 'Active', 'Address': 'Level 7, Al Sila Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/aarna-capital-limited', 'name': 'Aarna Capital Limited'}, {'Financial Services Permission Number': '160007', 'Company Status': 'Active', 'Address': 'Unit 8, 6th floor Al Khatem Tower, Abu Dhabi Global Markets Square, Al Maryah Island Abu Dhabi, United Arab Emirates P.O. Box 764605', 'link': '/public-registers/fsra/fsf/aberdeen-asset-middle-east-limited', 'name': 'Aberdeen Asset Middle East Limited'}, {'Financial Services Permission Number': '180041', 'Company Status': 'Active', 'Address': 'Floor 22, Al Sila Tower, Abu Dhabi Global Market Square, Al Maryah Island', 'link': '/public-registers/fsra/fsf/abu-dhabi-catalyst-partners-limited', 'name': 'Abu Dhabi Catalyst Partners Limited'}, {'Financial Services Permission Number': '180021', 'Company Status': 'Active', 'Address': 'Unit 5, 6th Floor, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/ad-global-investors-limited', 'name': 'AD Global Investors Limited'}, {'Financial Services Permission Number': '180039', 'Company Status': 'Active', 'Address': '3419, 34th Floor, Al Maqam Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/ad-investment-management-limited', 'name': 'AD Investment Management Limited'}, {'Financial Services Permission Number': '170036', 'Company Status': 'Active', 'Address': '10th Floor, Al Sila Tower, ADGM Square, Al Maryah Island', 'link': '/public-registers/fsra/fsf/adcb-asset-management-ltd', 'name': 'ADCB Asset Management Ltd.'}, {'Financial Services Permission Number': '160006', 'Company Status': 'Active', 'Address': 'Level 34, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/adcm-altus-investment-management-limited', 'name': 'ADCM Altus Investment Management Limited'}, {'Financial Services Permission Number': '160005', 'Company Status': 'Active', 'Address': '33rd floor, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/adcorp-ltd', 'name': 'ADCORP Ltd'}, {'Financial Services Permission Number': '180024', 'Company Status': 'Active', 'Address': 'Unit 10, Level 6, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/adnoc-reinsurance-limited', 'name': 'ADNOC Reinsurance Limited'}, {'Financial Services Permission Number': '170025', 'Company Status': 'Active', 'Address': 'Office 712, Al Khatem Tower, Abu Dhabi Global Market Square, Al Maryah Island, Abu Dhabi, United Arab Emirates', 'link': '/public-registers/fsra/fsf/ads-investment-solutions-limited', 'name': 'ADS Investment Solutions Limited'}]

Categories

Resources