BeautifulSoup webpage scraping

BeautifulSoup webpage scraping - python

I am trying to scrap a webpage.
from bs4 import BeautifulSoup
import requests
page = requests.get('https://www.mql5.com/en/economic-calendar/united-states')
soup = BeautifulSoup(page.content, 'html.parser')
calender = soup.find(id="economicCalendarTable")
items = calender.find_all(class_="ec-table__title")
print(items)
However, it prints an empty list, although in the webpage there are many entries with tag "class_="ec-table__title". What I found is that the tags inside "id="economicCalendarTable" tag are just in one line (very long). So, "calender.find_all" skips everything.
I am trying to get all tages inside 'id="economicCalendarTable"'.
Is there way to do this?

You can use selenium:
from selenium import webdriver
import re
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome()
d.get('https://www.mql5.com/en/economic-calendar/united-states')
s = soup(d.page_source, 'lxml')
time = s.find('span', {'id':'economicCalendarTableColumnTime'}).text
title = s.find('div', {'class':'ec-table__title'}).text
classes = ['ec-table__col_time', 'ec-table__curency-name', 'ec-table__col_event', 'ec-table__col_forecast', 'prevValue']
full_data = [[i.find('div', {'class':c if c != 'prevValue' else re.compile('prevValue\d+')}) for c in classes] for i in s.find_all('div', {'class':'ec-table__item'})]
new_results = [dict(zip(['time', 'name', 'event', 'forcast', 'prevous_value'], [getattr(i, 'text', '') for i in b])) for b in full_data]
Output:
[{'event': u'Chicago Fed National Activity Index', 'forcast': u'0.14', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Markit Manufacturing PMI', 'forcast': u'56.4', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'Markit Services PMI', 'forcast': u'55', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'Markit Composite PMI', 'forcast': u'55', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'New Home Sales m/m', 'forcast': u'-1.2%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'New Home Sales', 'forcast': u'0.639 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'EIA Crude Oil Stocks Change', 'forcast': u'-1.791 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Cushing Crude Oil Stocks Change', 'forcast': u'0.259 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Crude Oil Imports Change', 'forcast': u'-0.32 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Distillate Fuel Production Change', 'forcast': u'-0.011 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Distillates Stocks Change', 'forcast': u'-0.182 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Gasoline Production Change', 'forcast': u'0.289 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Heating Oil Stocks Change', 'forcast': u'-0.026 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Gasoline Stocks Change', 'forcast': u'-3.206 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'FOMC Minutes', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 14:00'}, {'event': u'Continuing Jobless Claims', 'forcast': u'1.769 M', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Initial Jobless Claims', 'forcast': u'216 K', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Initial Jobless Claims 4-Week Average', 'forcast': u'213.814 K', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'HPI m/m', 'forcast': u'0.5%', 'name': u'USD', 'prevous_value': '', 'time': u' 09:00'}, {'event': u'Existing Home Sales', 'forcast': u'5.45 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Existing Home Sales m/m', 'forcast': u'0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'EIA Natural Gas Storage Change', 'forcast': u'92 B', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'Durable Goods Orders m/m', 'forcast': u'-0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Core Durable Goods Orders m/m', 'forcast': u'0.0%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Durable Goods Orders excl. Defense m/m', 'forcast': u'-6.2%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Nondefense Capital Goods Orders excl. Aircraft m/m', 'forcast': u'0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Fed Chair Powell Speech', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 09:20'}, {'event': u'Michigan Consumer Sentiment', 'forcast': u'98.5', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Consumer Expectations', 'forcast': u'88.9', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Current Conditions', 'forcast': u'112.9', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Inflation Expectations', 'forcast': u'2.7%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan 5-Year Inflation Expectations', 'forcast': u'2.5%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Baker Hughes US Oil Rig Count', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 13:00'}, {'event': u'CFTC Copper Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Crude Oil Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC S&P 500 Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Gold Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Silver Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}]

Here's a simple example I've put together using Selenium and BeautifulSoup:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Selenium part
browser = webdriver.Chrome()
browser.get('https://www.mql5.com/en/economic-calendar/united-states')
# BeautifulSoup part
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
calender = soup.find(id="economicCalendarTable")
items = calender.find_all(class_="ec-table__title")
print(items)
This code will allow you to download the page entirely and then pass the complete html source to BS
Be sure to install Selenium and the ChromeDriver correctly before running this script.

There is no item with class ec-table__title in the base html of that page.
However, it does appear when using a dom inspector in the browser. I am afraid this is a sure sign that it has been inserted into the DOM by javascript ad indeed there is some javascript invoked by that webpage.
May I suggest that you investigate using the selenium module in conjunction with BeautifulSoup?

Related

Scraping Table Data from Multiple Pages

So I think this is going to be complex...hoping someone is up for a challenge.
Basically, I'm trying to visit all HREF tags on a specific URL and then print their "profile-box" class into a Google Sheet.
I have a working example with a different link below. This code goes to each of the URLs, visits the Player Link, and then returns their associated data:
import requests
from bs4 import BeautifulSoup
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1DpasSS8yC1UX6WqAbkQ515BwEEjdDL-x74T0eTW8hLM')
worksheet = sh.get_worksheet(3)
# AddValue = ["Test", 25, "Test2"]
# worksheet.insert_row(AddValue, 3)
def get_links(url):
data = []
req_url = requests.get(url)
soup = BeautifulSoup(req_url.content, "html.parser")
for td in soup.find_all('td', {'data-th': 'Player'}):
a_tag = td.a
name = a_tag.text
player_url = a_tag['href']
print(f"Getting {name}")
req_player_url = requests.get(
f"https://basketball.realgm.com{player_url}")
soup_player = BeautifulSoup(req_player_url.content, "html.parser")
div_profile_box = soup_player.find("div", class_="profile-box")
row = {"Name": name, "URL": player_url}
for p in div_profile_box.find_all("p"):
try:
key, value = p.get_text(strip=True).split(':', 1)
row[key.strip()] = value.strip()
except: # not all entries have values
pass
data.append(row)
return data
urls = [
'https://basketball.realgm.com/dleague/players/2022',
'https://basketball.realgm.com/dleague/players/2021',
'https://basketball.realgm.com/dleague/players/2020',
'https://basketball.realgm.com/dleague/players/2019',
'https://basketball.realgm.com/dleague/players/2018',
]
res = []
for url in urls:
print(f"Getting: {url}")
data = get_links(url)
res = [*res, *data]
if res != []:
header = list(res[0].keys())
values = [
header, *[[e[k] if e.get(k) else "" for k in header] for e in res]]
worksheet.append_rows(values, value_input_option="USER_ENTERED")
RESULTS OF THIS CODE (CORRECT):
Secondarily - I have a working code that takes a separate URL, loops through 66 pages, and returns the table data:
import requests
import pandas as pd
url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc'
res = []
for count in range(1, 66):
# pd.read_html accepts a URL too so no need to make a separate request
df_list = pd.read_html(f"{url}/{count}")
res.append(df_list[-1])
pd.concat(res).to_csv('my data.csv')
This returns the table data from the URL and works perfectly:
So... this brings me to my current issue:
I'm trying to take this same link (https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc)
and repeat the same action as the first code.
Meaning, I want to visit each profile (on all 66 or x number of pages), and print the profile data just like in the first code.
I thought/hoped, I'd be able to just replace the original D League URLS with this URL and it would work - it doesn't. I'm a little confused why, because the table data seems to be the same set up?
I started trying to re-work this, but struggling. I have very basic code, but think I'm taking steps backwards:
import requests
from bs4 import BeautifulSoup
url = "https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for link in soup.find_all("a"):
profile_url = link.get("href")
profile_response = requests.get(profile_url)
profile_soup = BeautifulSoup(profile_response.text, "html.parser")
profile_box = profile_soup.find("div", class_="profileBox")
if profile_box:
print(profile_box)
Any thoughts on this? Like I said, ultimately trying to recreate the same action as the first script, just for the 2nd URL.
Thanks in advance.

You can actually largely use the same code that you used in your first example, with a slight modification to the first find_all loop. Instead of using a findall you can use a css selector to select all of the table cells that have the nowrap class then test if that cell has a decendant link, and then from there the rest of your function should work the same as before.
Here is an example:
import requests
from bs4 import BeautifulSoup
def get_links2(url):
data = []
req_url = requests.get(url)
soup = BeautifulSoup(req_url.content, "html.parser")
for td in soup.select('td.nowrap'):
a_tag = td.a
if a_tag:
name = a_tag.text
player_url = a_tag['href']
print(f"Getting {name}")
req_player_url = requests.get(
f"https://basketball.realgm.com{player_url}")
soup_player = BeautifulSoup(req_player_url.content, "html.parser")
div_profile_box = soup_player.find("div", class_="profile-box")
row = {"Name": name, "URL": player_url}
for p in div_profile_box.find_all("p"):
try:
key, value = p.get_text(strip=True).split(':', 1)
row[key.strip()] = value.strip()
except: # not all entries have values
pass
data.append(row)
return data
urls2 = ["https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc"]
res2 = []
for url in urls2:
data=get_links2(url)
res2 = [*res2, *data]
print(res2)
OUTPUT:
[{'Name': 'Jaroslaw Zyskowski', 'URL': '/player/Jaroslaw-Zyskowski/Summary/32427', 'Current Team': 'Trefl Sopot', 'Born': 'Jul 16, 1992(30 years old)', 'Birthplace/Hometown': 'Wroclaw, Poland', 'Natio
nality': 'Poland', 'Height': '6-7 (201cm)Weight:220 (100kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Manuel Capicchioni', 'Draft Entry': '2014 NBA Draft', 'Drafted': 'Undrafted', '
Pre-Draft Team': 'Kotwica Kolobrzeg (Poland)'}, {'Name': 'Ferdinand Zylka', 'URL': '/player/Ferdinand-Zylka/Summary/76159', 'Full Name': 'Ferdinand Leontin Zylka', 'Current Team': 'Basic-Fit Brussels
Basketball', 'Born': 'Apr 11, 1998(24 years old)', 'Birthplace/Hometown': 'Berlin, Germany', 'Nationality': 'Germany', 'Height': '6-3 (191cm)Weight:170 (77kg)', 'Current NBA Status': 'Unrestricted Fre
e Agent', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Mitteldeutscher BC (Germany)'}, {'Name': 'Dainius Zvinklys', 'URL': '/player/Dainius-Zvinklys/Summary/151962', 'Cu
rrent Team': 'BBG Herford', 'Born': 'Nov 27, 1990(32 years old)', 'Birthplace/Hometown': 'Kretniga, Lithuania', 'Nationality': 'Lithuania', 'Height': '6-8 (203cm)Weight:187 (85kg)', 'Current NBA Statu
s': 'Unrestricted Free Agent', 'Draft Entry': '2012 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Markuss Zvinis', 'URL': '/player/Markuss-Zvinis/Summary/183480', 'Current Team': 'BK Valmiera', 'Born
': 'Apr 26, 2005(17 years old)', 'Nationality': 'Latvia', 'Height': '6-4 (193cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2027', 'Draft Entry': '2027 NBA Draft'}, {'Name': 'Ivars Zvigrus',
'URL': '/player/Ivars-Zvigrus/Summary/204634', 'Current Team': 'Flyyingen BBK', 'Born': 'Oct 17, 1995(27 years old)', 'Birthplace/Hometown': 'Riga, Latvia', 'Nationality': 'Latvia', 'Height': '6-7 (2
01cm)Weight:204 (93kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Nikita Zverev', 'URL': '/player/Nikita-Zverev/Summary/3279
1', 'Current Team': 'Samara', 'Born': 'Apr 6, 1994(28 years old)', 'Nationality': 'Russia', 'Height': '6-10 (208cm)Weight:225 (102kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry':
'2016 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Khimki BC U18 (Russia)'}, {'Name': 'Fernando Zurbriggen', 'URL': '/player/Fernando-Zurbriggen/Summary/76271', 'Full Name': 'Fernando Zurbri
ggen', 'Current Team': 'Monbus Obradoiro', 'Born': 'Oct 20, 1997(25 years old)', 'Birthplace/Hometown': 'Santa Fe, Argentina', 'Nationality': 'Argentina', 'Height': '6-1 (185cm)Weight:190 (86kg)', 'Cu
rrent NBA Status': 'Unrestricted Free Agent', 'Agent': 'Franisco Javier Martin', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Obras Sanitarias (Argentina)'}, {'Name': 'A
lejandro Zurbriggen', 'URL': '/player/Alejandro-Zurbriggen/Summary/42671', 'Current Team': 'Sant Antoni Ibiza Feeling', 'Born': 'Mar 18, 1995(27 years old)', 'Birthplace/Hometown': 'Santa Fe, Argentin
a', 'Nationality': 'Argentina', 'Height': '6-5 (196cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Franisco Javier Martin', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undr
afted', 'Pre-Draft Team': 'Regatas Corrientes (Argentina)'}, {'Name': 'Nejc Zupan', 'URL': '/player/Nejc-Zupan/Summary/41700', 'Current Team': 'KK Tajfun Sentjur', 'Born': 'Apr 12, 1996(26 years old)'
, 'Birthplace/Hometown': 'Koper, Slovenia', 'Nationality': 'Slovenia', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Sead Galijasevic', 'Draft Entry': '
2018 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Sixt Primorska (Slovenia)'}, {'Name': 'Zhennian Zuo', 'URL': '/player/Zhennian-Zuo/Summary/92765', 'Current Team': 'Sichuan Blue Whales', 'B
orn': 'Jan 26, 1996(27 years old)', 'Nationality': 'China', 'Height': '6-8 (203cm)Weight:215 (98kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft', 'Drafted': 'Undr
afted'}, {'Name': 'Matija Zunic', 'URL': '/player/Matija-Zunic/Summary/156440', 'Current Team': 'HKK Zrinjski', 'Born': 'Jun 7, 1996(26 years old)', 'Nationality': 'Serbia', 'Height': '6-4 (193cm)Weig
ht:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Kyle Zunic', 'URL': '/player/Kyle-Zunic/Summary/107186', 'Current Team': '
Perth', 'Born': 'Mar 4, 1999(23 years old)', 'Birthplace/Hometown': 'Wollongong, Australia', 'Nationality': 'Australia', 'Height': '6-2 (188cm)Weight:195 (88kg)', 'Current NBA Status': 'Unrestricted F
ree Agent', 'Draft Entry': '2022 NBA Draft', 'Drafted': 'Undrafted', 'High School': 'Lake Ginniderra High School[Burnie, Tasmania (Australia)]'}, {'Name': 'Karlis Zunda', 'URL': '/player/Karlis-Zunda/
Summary/123596', 'Current Team': 'Betsafe/Liepaja', 'Born': 'Aug 28, 1997(25 years old)', 'Nationality': 'Latvia', 'Height': '6-6 (198cm)Weight:187 (85kg)', 'Current NBA Status': 'Unrestricted Free Ag
ent', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Zhang Zuming', 'URL': '/player/Zhang-Zuming/Summary/83723', 'Current Team': 'Qingdao', 'Born': 'Jan 27, 1995(28 years old)', '
Nationality': 'China', 'Height': '6-9 (206cm)Weight:198 (90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Ningbo Roc
kets (China)'}, {'Name': 'Otoniel Zulueta', 'URL': '/player/Otoniel-Zulueta/Summary/184006', 'Current Team': 'N/A', 'Nationality': 'Mexico', 'Height': '6-6 (198cm)Weight:205 (93kg)', 'Current NBA Stat
us': 'Unrestricted Free Agent', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Nathan Zulemie', 'URL': '/player/Nathan-Zulemie/Summary/175816', 'Current Team': 'Espoirs Nanterre',
'Born': 'Sep 7, 2004(18 years old)', 'Nationality': 'France', 'Height': '5-8 (173cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Mantvydas
Zukauskas', 'URL': '/player/Mantvydas-Zukauskas/Summary/75749', 'Current Team': 'Vilkaviskio Perlas', 'Born': 'Oct 19, 1998(24 years old)', 'Birthplace/Hometown': 'Kaunas, Lithuania', 'Nationality': '
Lithuania', 'Height': '6-3 (191cm)Weight:185 (84kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Delikatesas Joniskis
(Lithuania)'}, {'Name': 'Eigirdas Zukauskas', 'URL': '/player/Eigirdas-Zukauskas/Summary/43242', 'Current Team': 'BC Wolves', 'Born': 'Jun 3, 1992(30 years old)', 'Birthplace/Hometown': 'Radviliskis,
Lithuania', 'Nationality': 'Lithuania', 'Height': '6-6 (198cm)Weight:190 (86kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2014 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft T
eam': 'Siauliai (Lithuania)'}, {'Name': 'Ivo Zukanovic', 'URL': '/player/Ivo-Zukanovic/Summary/171804', 'Current Team': 'KK Alkar', 'Born': 'Sep 1, 2002(20 years old)', 'Nationality': 'Croatia', 'Heig
ht': '6-3 (191cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2024', 'Draft Entry': '2024 NBA Draft'}, {'Name': 'Kjeld Zuidema', 'URL': '/player/Kjeld-Zuidema/Summary/168658', 'Current Team':
'Donar Groningen', 'Born': 'Jun 21, 2001(21 years old)', 'Birthplace/Hometown': 'Eexterzandvoort, Netherlands', 'Nationality': 'Netherlands', 'Height': '6-5 (196cm)Weight:198 (90kg)', 'Current NBA St
atus': 'Draft Eligible in 2023', 'Draft Entry': '2023 NBA Draft'}, {'Name': 'Ruben Zugno', 'URL': '/player/Ruben-Zugno/Summary/78457', 'Current Team': 'Zeus Energy Group Rieti', 'Born': 'Mar 20, 1996(
26 years old)', 'Birthplace/Hometown': 'Cantu, Italy', 'Nationality': 'Italy', 'Height': '6-1 (185cm)Weight:182 (83kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft
', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Acqua San Bernardo Cantu (Italy)'}, {'Name': 'Luka Zugic', 'URL': '/player/Luka-Zugic/Summary/172582', 'Current Team': 'KK Milenijum Podgorica', 'Born': '
Nov 22, 2000(22 years old)', 'Birthplace/Hometown': 'Podgorica, Montenegro', 'Nationality': 'Montenegro', 'Height': '6-5 (196cm)Weight:210 (95kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Dr
aft Entry': '2022 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Fedor Zugic', 'URL': '/player/Fedor-Zugic/Summary/128532', 'Current Team': 'Ratiopharm Ulm', 'Born': 'Sep 18, 2003(19 years old)', 'Bir
thplace/Hometown': 'Kotor, Montenegro', 'Nationality': 'Montenegro', 'Height': '6-6 (198cm)Weight:188 (85kg)', 'Current NBA Status': 'Draft Eligible in 2025', 'Agent': 'Rade Filipovich,David Mondress'
, 'Draft Entry': '2025 NBA Draft', 'Early Entry Info': '2022 Early Entrant(Withdrew)', 'Pre-Draft Team': 'Ratiopharm Ulm (Germany)'}, {'Name': 'Andrey Zubkov', 'URL': '/player/Andrey-Zubkov/Summary/25
944', 'Current Team': 'Zenit Saint Petersburg', 'Born': 'Jun 29, 1991(31 years old)', 'Birthplace/Hometown': 'Chelyabinsk, Russia', 'Nationality': 'Russia', 'Height': '6-9 (206cm)Weight:195 (88kg)', '
Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Obrad Fimic', 'Draft Entry': '2013 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Lokomotiv Kuban (Russia)'}, {'Name': 'Aleksandr Zubk
ov', 'URL': '/player/Aleksandr-Zubkov/Summary/183206', 'Current Team': 'Runa-2', 'Born': 'Apr 7, 2002(20 years old)', 'Nationality': 'Russia', 'Height': '5-11 (180cm)Weight:N/A', 'Current NBA Status':
'Draft Eligible in 2024', 'Draft Entry': '2024 NBA Draft'}, {'Name': 'Aitor Zubizarreta', 'URL': '/player/Aitor-Zubizarreta/Summary/39787', 'Current Team': 'Acunsa GBC', 'Born': 'Mar 6, 1995(27 years
old)', 'Birthplace/Hometown': 'Azpeitia, Spain', 'Nationality': 'Spain', 'Height': '6-4 (193cm)Weight:195 (88kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'D
rafted': 'Undrafted', 'Pre-Draft Team': 'College of Idaho (Sr)'}, {'Name': 'Tomislav Zubcic', 'URL': '/player/Tomislav-Zubcic/Summary/2427', 'Current Team': 'London Lions', 'Born': 'Jan 17, 1990(33 ye
ars old)', 'Birthplace/Hometown': 'Zadar, Croatia', 'Nationality': 'Croatia', 'Height': '6-10 (208cm)Weight:230 (104kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Bill Duffy', 'Draft
Entry': '2012 NBA Draft', 'Early Entry Info': '2011 Early Entrant(Withdrew)', 'Drafted': 'Round 2, Pick 26, Toronto Raptors', 'Draft Rights Trade': 'TOR to OKC, Jun 30, 2015', 'Pre-Draft Team': 'KK C
ibona (Croatia)'}, {'Name': 'Jure Zubac', 'URL': '/player/Jure-Zubac/Summary/38326', 'Current Team': 'Belfius Mons-Hainaut', 'Born': 'Mar 15, 1995(27 years old)', 'Birthplace/Hometown': 'Mostar, Bosni
a and Herzegovina', 'Nationality': 'Bosnia and Herzegovina', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'
, 'Pre-Draft Team': 'BC Siroki (Bosnia and Herzegovina)'}, {'Name': 'Peter Zsiros', 'URL': '/player/Peter-Zsiros/Summary/98310', 'Current Team': 'Zalakeramia-ZTE KK', 'Born': 'Jun 22, 1994(28 years ol
d)', 'Nationality': 'Hungary', 'Height': '6-7 (201cm)Weight:198 (90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2016 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Harun Zrno
', 'URL': '/player/Harun-Zrno/Summary/188930', 'Current Team': 'OKK Spars Sarajevo', 'Born': 'Mar 1, 2004(18 years old)', 'Nationality': 'Bosnia and Herzegovina', 'Height': '6-6 (198cm)Weight:N/A', 'C
urrent NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Evangelos Zougris', 'URL': '/player/Evangelos-Zougris/Summary/183106', 'Current Team': 'Peristeri BC', 'Born':
'Oct 14, 2004(18 years old)', 'Nationality': 'Greece', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Vitaliy Zotov', '
URL': '/player/Vitaliy-Zotov/Summary/54539', 'Current Team': 'BC Budivelnik', 'Born': 'Mar 3, 1997(25 years old)', 'Birthplace/Hometown': 'Lozovaya, Ukraine', 'Nationality': 'Ukraine', 'Height': '6-2
(188cm)Weight:185 (84kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Misko Raznatovic', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Jan Zorvan', 'URL': '/playe
r/Jan-Zorvan/Summary/108564', 'Current Team': 'MBK Lucenec', 'Born': 'Dec 22, 1995(27 years old)', 'Nationality': 'Slovakia', 'Height': '6-7 (201cm)Weight:208 (94kg)', 'Current NBA Status': 'Unrestric
ted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Kristers Zoriks', 'URL': '/player/Kristers-Zoriks/Summary/54343', 'Current Team': 'BC VEF Riga', 'Born': 'May 25, 1
998(24 years old)', 'Birthplace/Hometown': 'Dobele, Latvia', 'Nationality': 'Latvia', 'Height': '6-4 (193cm)Weight:190 (86kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2022 NB
A Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'BC VEF Riga (Latvia)', 'High School': 'New Hampton School[New Hampton, New Hampshire (United States)]'}, {'Name': 'Yovel Zoosman', 'URL': '/player/
Yovel-Zoosman/Summary/75937', 'Current Team': 'ALBA Berlin', 'Born': 'May 12, 1998(24 years old)', 'Birthplace/Hometown': 'Kfar Saba, Israel', 'Nationality': 'Israel', 'Height': '6-7 (201cm)Weight:198
(90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Andrew Vye,Guillermo Bermejo,Brian Jungreis,Nadav Mor', 'Draft Entry': '2019 NBA Draft', 'Early Entry Info': '2019 Early Entrant',
'Drafted': 'Undrafted', 'Pre-Draft Team': 'Maccabi FOX Tel Aviv (Israel)'}, {'Name': 'Marcell Zoltan Volgyi', 'URL': '/player/Marcell-Zoltan-Volgyi/Summary/93730', 'Current Team': 'Budapesti Honved Se
', 'Born': 'Apr 22, 1998(24 years old)', 'Birthplace/Hometown': 'Nagykanizsa, Hungary', 'Nationality': 'Hungary', 'Height': '6-6 (198cm)Weight:200 (91kg)', 'Current NBA Status': 'Unrestricted Free Age
nt', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Zalakeramia-ZTE KK (Hungary)'}, {'Name': 'Przemyslaw Zolnierewicz', 'URL': '/player/Przemyslaw-Zolnierewicz/Summary/531
22', 'Current Team': 'Enea Zastal BC Zielona', 'Born': 'Jul 3, 1995(27 years old)', 'Birthplace/Hometown': 'Paslek, Poland', 'Nationality': 'Poland', 'Height': '6-4 (193cm)Weight:200 (91kg)', 'Current
NBA Status': 'Unrestricted Free Agent', 'Agent': 'Rade Filipovich', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Asseco Arka Gdynia (Poland)'}, {'Name': 'Laurent Raphae
l Zoccoletti', 'URL': '/player/Laurent-Raphael-Zoccoletti/Summary/95274', 'Current Team': 'SAM Basket Massagno', 'Born': 'Nov 17, 1999(23 years old)', 'Birthplace/Hometown': 'Wettingen, Switzerland',
'Nationality': 'Switzerland', 'Height': '6-7 (201cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2021 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'BBC Nyon (
Switzerland)'}, ....

Replacing Empty fields on latest record with good data based on other columns in pandas

I have a dataframe, and I am trying to fill in the Null values with values from other rows.
Building the dataframe:
data = [{'PersonalID': 84062174,
'Community': None,
'Gender': 'male',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': None,
'Account': None,
'Last Updated Date': '2021-06-22 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-04-14 10:00',
'Notes': None,
'Home Phone': None,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': 'x9999',
'Email': 'ghawes#gmail.com'},
{'PersonalID': 83471000,
'Community': None,
'Gender': None,
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': 'title',
'Account': None,
'Last Updated Date': None,
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-04-16 10:00',
'Notes': 'Project: MY project',
'Home Phone': 7778881234.0,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': None,
'Email': 'ghawes#gmail.com'},
{'PersonalID': 83458399,
'Community': None,
'Gender': None,
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': None,
'Account': 'third record',
'Last Updated Date': None,
'Exclude From Traffic': 'No',
'Area Sales Manager': 'you',
'CreatedDate': '2021-03-20 17:05',
'Notes': 'Project: My Project2',
'Home Phone': None,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': None,
'Email': 'ghawes#gmail.com'},
{'PersonalID': 82290675,
'Community': None,
'Gender': 'male',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'trevor',
'Last Name': 'Hawes',
'Job Title': 'title',
'Account': None,
'Last Updated Date': '2021-06-22 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-02-10 21:47',
'Notes': None,
'Home Phone': None,
'Phone Type 1': 'Home',
'Mobile Phone': 7806665555,
'Extension': None,
'Email': 'thawes#hotmail.com'},
{'PersonalID': 82269976,
'Community': None,
'Gender': None,
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'trevor',
'Last Name': 'Hawes',
'Job Title': None,
'Account': 'my Account',
'Last Updated Date': None,
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-02-09 21:47',
'Notes': 'Project: More about projects',
'Home Phone': 8887774321.0,
'Phone Type 1': 'Home',
'Mobile Phone': 7806665555,
'Extension': 'X5555',
'Email': 'thawes#hotmail.com'},
{'PersonalID': 76166887,
'Community': None,
'Gender': 'female',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Cathryn',
'Last Name': 'Anderson',
'Job Title': None,
'Account': None,
'Last Updated Date': '2021-02-12 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': 'Beth',
'CreatedDate': '2020-06-09 10:59',
'Notes': None,
'Home Phone': 9997774445.0,
'Phone Type 1': 'Cell',
'Mobile Phone': 7807770000,
'Extension': None,
'Email': 'canderson#gmail.com'}]
df = pd.DataFrame.from_dict(data)
Rows where First Name, Last Name, Email are the same, (acting as a key) I want the record with the latest CreatedDate to fill in any Null values with other rows where there was data.
The results I am looking for would look like the following:
To build the DF:
data = [{'PersonalID': 84062174,
'Community': None,
'Gender': 'male',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': None,
'Account': None,
'Last Updated Date': '2021-06-22 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-04-14 10:00',
'Notes': None,
'Home Phone': None,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': 'x9999',
'Email': 'ghawes#gmail.com'},
{'PersonalID': 83471000,
'Community': None,
'Gender': 'mlae',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': 'title',
'Account': 'third record',
'Last Updated Date': '2021-06-22 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': 'you',
'CreatedDate': '2021-04-16 10:00',
'Notes': 'Project: MY project',
'Home Phone': 7778881234.0,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': 'x9999',
'Email': 'ghawes#gmail.com'},
{'PersonalID': 83458399,
'Community': None,
'Gender': None,
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Geoff',
'Last Name': 'Hawes',
'Job Title': None,
'Account': 'third record',
'Last Updated Date': None,
'Exclude From Traffic': 'No',
'Area Sales Manager': 'you',
'CreatedDate': '2021-03-20 17:05',
'Notes': 'Project: My Project2',
'Home Phone': None,
'Phone Type 1': 'Home',
'Mobile Phone': 7805554444,
'Extension': None,
'Email': 'ghawes#gmail.com'},
{'PersonalID': 82290675,
'Community': None,
'Gender': 'male',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'trevor',
'Last Name': 'Hawes',
'Job Title': 'title',
'Account': 'my Account',
'Last Updated Date': '2021-06-22 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-02-10 21:47',
'Notes': 'Project: More about projects',
'Home Phone': 8887774321.0,
'Phone Type 1': 'Home',
'Mobile Phone': 7806665555,
'Extension': 'X5555',
'Email': 'thawes#hotmail.com'},
{'PersonalID': 82269976,
'Community': None,
'Gender': 'male',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'trevor',
'Last Name': 'Hawes',
'Job Title': None,
'Account': 'my Account',
'Last Updated Date': None,
'Exclude From Traffic': 'No',
'Area Sales Manager': None,
'CreatedDate': '2021-02-09 21:47',
'Notes': 'Project: More about projects',
'Home Phone': 8887774321.0,
'Phone Type 1': 'Home',
'Mobile Phone': 7806665555,
'Extension': 'X5555',
'Email': 'thawes#hotmail.com'},
{'PersonalID': 76166887,
'Community': None,
'Gender': 'female',
'Date of Birth': '0000-00-00',
'Title': None,
'First Name': 'Cathryn',
'Last Name': 'Anderson',
'Job Title': None,
'Account': None,
'Last Updated Date': '2021-02-12 0:00',
'Exclude From Traffic': 'No',
'Area Sales Manager': 'Beth',
'CreatedDate': '2020-06-09 10:59',
'Notes': None,
'Home Phone': 9997774445.0,
'Phone Type 1': 'Cell',
'Mobile Phone': 7807770000,
'Extension': None,
'Email': 'canderson#gmail.com'}]
df = pd.DataFrame.from_dict(data)
I have tried to do group_bys on the First Name, Last_Name, and Email and getting those results and then updating back to the original data frame based on the PersonalID. That was not giving expected results.
I then tried to bfill and ffill, and got closer. But this is updating every row, and not just the latest CreatedDate row.
df = df.groupby(['First Name','Last Name','Email','Mobile Phone']).bfill().ffill()
A bit stumped on where to try next, or if I need to revisit one of the above two ideas. Any recommendations?

This is how I would do it, first create your composite key for grouping.
#ensure createDate is a datetime object
df['CreatedDate'] = pd.to_datetime(df['CreatedDate'])
df['key'] = df.groupby(['First Name',
'Last Name', 'Email'])['CreatedDate'].transform('idxmax')
sort your dataframe by the created date in descending order.
df = df.sort_values('CreatedDate',ascending=False)
and finally apply the bfill, and ffill operation.
df1 = df.groupby('key').ffill().bfill()
print(df1) # try an example with fewer values, hard to vlaidate with lots of columns.
PersonalID Community Gender Date of Birth Title ... Home Phone Phone Type 1 Mobile Phone Extension Email
1 83471000 NaN male 0000-00-00 NaN ... 7.778881e+09 Home 7805554444 x9999 ghawes#gmail.com
0 84062174 NaN male 0000-00-00 NaN ... 7.778881e+09 Home 7805554444 x9999 ghawes#gmail.com
2 83458399 NaN male 0000-00-00 NaN ... 7.778881e+09 Home 7805554444 x9999 ghawes#gmail.com
3 82290675 NaN male 0000-00-00 NaN ... 8.887774e+09 Home 7806665555 X5555 thawes#hotmail.com
4 82269976 NaN male 0000-00-00 NaN ... 8.887774e+09 Home 7806665555 X5555 thawes#hotmail.com
5 76166887 NaN female 0000-00-00 NaN ... 9.997774e+09 Cell 7807770000 NaN canderson#gmail.com

Need help translating a nested dictionary into a pandas dataframe

Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful

response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])

How to Arrange a List of Dictionaries in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
data=[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}]
I have a dictionary.
Need to receive a list of dictionaries comma separated: [{},{},..]
How do I add them in a loop?
I tried to use append:
data_list.append(data.copy())
But it returns smth different: [[{...}]]
How do I get a list of such format:
[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}, {'address': '41st, 1155 Rene-Leve...W Flr 4000', 'beta': 2.219123, 'ceo': 'Mr. Klaus Paulini', 'changes': -0.00999999, 'cik': '0001113423', 'city': 'MONTREAL', 'companyName': 'Aeterna Zentaris Inc.', 'country': 'CA', 'currency': 'USD', ...}, {'address': '125 Summer Street', 'beta': 0.0, 'ceo': 'Dr. Jean-Pierre Som...ossi Ph.D.', 'changes': 2.5800018, 'cik': '0001593899', 'city': 'Boston', 'companyName': 'Atea Pharmaceuticals, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '401 Charmany Dr', 'beta': 1.073689, 'ceo': 'Mr. Corey Chambas', 'changes': 0.0, 'cik': '0001521951', 'city': 'Madison', 'companyName': 'First Business Finan...ices, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Arsenal Way', 'beta': 0.0, 'ceo': 'Mr. Marc A. Cohen', 'changes': -0.9699974, 'cik': '0001662579', 'city': 'Watertown', 'companyName': 'C4 Therapeutics, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': 'General-Guisan-Strasse 6', 'beta': 1.629418, 'ceo': 'Mr. Carlos Creus Moreira', 'changes': -0.09000015, 'cik': '0001738699', 'city': 'Zug', 'companyName': 'WISeKey Internationa...Holding AG', 'country': 'CH', 'currency': 'USD', ...}, {'address': '508 W Wall St Ste 800', 'beta': 1.7762, 'ceo': 'Mr. Stephen Jumper', 'changes': -0.04999995, 'cik': '0000799165', 'city': 'Midland', 'companyName': 'Dawson Geophysical Company', 'country': 'US', 'currency': 'USD', ...}, {'address': '955 Perimeter Road', 'beta': 0.0, 'ceo': 'Mr. Ravi Vig', 'changes': -1.2900009, 'cik': '0000866291', 'city': 'Manchester', 'companyName': 'Allegro MicroSystems, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Lapp Rd', 'beta': 1.138646, 'ceo': 'Ms. Geraldine Henwood', 'changes': -0.04999995, 'cik': '0001588972', 'city': 'Malvern', 'companyName': 'Recro Pharma, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '5 Haplada Street, PO Box 5011', 'beta': 1.396288, 'ceo': 'Mr. Guy Bernstein', 'changes': -0.9300003, 'cik': '0000876779', 'city': 'OR YEHUDA', 'companyName': 'Magic Software Enter...rises Ltd.', 'country': 'IL', 'currency': 'USD', ...}, {'address': '111 West 33rd Street', 'beta': 0.0, 'ceo': 'Mr. Richard Gumer', 'changes': -0.20249999, 'cik': '0001823323', 'city': 'New York', 'companyName': 'KL Acquisition Corp', 'country': 'US', 'currency': 'USD', ...}, {'address': '2 Canal Park Ste 4', 'beta': 1.907176, 'ceo': 'Mr. Langley Steinert', 'changes': -1.4399986, 'cik': '0001494259', 'city': 'Cambridge', 'companyName': 'CarGurus, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '119 Standard St', 'beta': 1.592636, 'ceo': 'Mr. Ethan Brown', 'changes': -3.859993, 'cik': '0001655210', 'city': 'El Segundo', 'companyName': 'Beyond Meat, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '3854 American Way Ste A', 'beta': 0.502729, 'ceo': 'Mr. Paul Kusserow', 'changes': -1.5899963, 'cik': '0000896262', 'city': 'Baton Rouge', 'companyName': 'Amedisys, Inc.', 'country': 'US', 'currency': 'USD', ...}, ...]
Ok, it looks like initially I have not a dictionary but a list of dictionaries from one element. So how do I add another dictionary to the list after comma?
Upd: I managed to receive a list of dictionaries. It appeared it's not fully correct as some rows include additional fields. The list looks like this:'currency': 'USD', ...}, 'code', 'status', {'address': '5 ...
How can I validate a list of dictionaries and make sure every dictionary matches predefined list of columns.
enter code here

data_list.append(data[0].copy())
You could also do
data_list = data_list + data

how to find the no. of person from a particular country from the below code?

[
{'Year': 1901,
'Category': 'Chemistry',
'Prize': 'The Nobel Prize in Chemistry 1901',
'Motivation': '"in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"',
'Prize Share': '1/1',
'Laureate ID': 160,
'Laureate Type': 'Individual',
'Full Name': "Jacobus Henricus van 't Hoff",
'Birth Date': '1852-08-30',
'Birth City': 'Rotterdam',
'Birth Country': 'Netherlands',
'Sex': 'Male',
'Organization Name': 'Berlin University',
'Organization City': 'Berlin',
'Organization Country': 'Germany',
'Death Date': '1911-03-01',
'Death City': 'Berlin',
'Death Country': 'Germany'},
{'Year': 1901,
'Category': 'Literature',
'Prize': 'The Nobel Prize in Literature 1901',
'Motivation': '"in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect"',
'Prize Share': '1/1',
'Laureate ID': 569,
'Laureate Type': 'Individual',
'Full Name': 'Sully Prudhomme',
'Birth Date': '1839-03-16',
'Birth City': 'Paris',
'Birth Country': 'France',
'Sex': 'Male',
'Organization Name': '',
'Organization City': '',
'Organization Country': '',
'Death Date': '1907-09-07',
'Death City': 'ChÃ¢tenay',
'Death Country': 'France'}
]

If you want to find, how many person belong to same birth country only from given list of dict, you can use the following code :
from collections import Counter
li = [each['Birth City'] for each in val if each['Birth City']]
print(dict(Counter(li)))
OUTPUT
{'Rotterdam': 1, 'Paris': 1}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

BeautifulSoup webpage scraping - python

Related

Scraping Table Data from Multiple Pages

Replacing Empty fields on latest record with good data based on other columns in pandas

Need help translating a nested dictionary into a pandas dataframe

How to Arrange a List of Dictionaries in Python [closed]

how to find the no. of person from a particular country from the below code?

Categories

Resources