Scraping Table Data from Multiple Pages - python

So I think this is going to be complex...hoping someone is up for a challenge.
Basically, I'm trying to visit all HREF tags on a specific URL and then print their "profile-box" class into a Google Sheet.
I have a working example with a different link below. This code goes to each of the URLs, visits the Player Link, and then returns their associated data:
import requests
from bs4 import BeautifulSoup
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1DpasSS8yC1UX6WqAbkQ515BwEEjdDL-x74T0eTW8hLM')
worksheet = sh.get_worksheet(3)
# AddValue = ["Test", 25, "Test2"]
# worksheet.insert_row(AddValue, 3)
def get_links(url):
data = []
req_url = requests.get(url)
soup = BeautifulSoup(req_url.content, "html.parser")
for td in soup.find_all('td', {'data-th': 'Player'}):
a_tag = td.a
name = a_tag.text
player_url = a_tag['href']
print(f"Getting {name}")
req_player_url = requests.get(
f"https://basketball.realgm.com{player_url}")
soup_player = BeautifulSoup(req_player_url.content, "html.parser")
div_profile_box = soup_player.find("div", class_="profile-box")
row = {"Name": name, "URL": player_url}
for p in div_profile_box.find_all("p"):
try:
key, value = p.get_text(strip=True).split(':', 1)
row[key.strip()] = value.strip()
except: # not all entries have values
pass
data.append(row)
return data
urls = [
'https://basketball.realgm.com/dleague/players/2022',
'https://basketball.realgm.com/dleague/players/2021',
'https://basketball.realgm.com/dleague/players/2020',
'https://basketball.realgm.com/dleague/players/2019',
'https://basketball.realgm.com/dleague/players/2018',
]
res = []
for url in urls:
print(f"Getting: {url}")
data = get_links(url)
res = [*res, *data]
if res != []:
header = list(res[0].keys())
values = [
header, *[[e[k] if e.get(k) else "" for k in header] for e in res]]
worksheet.append_rows(values, value_input_option="USER_ENTERED")
RESULTS OF THIS CODE (CORRECT):
Secondarily - I have a working code that takes a separate URL, loops through 66 pages, and returns the table data:
import requests
import pandas as pd
url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc'
res = []
for count in range(1, 66):
# pd.read_html accepts a URL too so no need to make a separate request
df_list = pd.read_html(f"{url}/{count}")
res.append(df_list[-1])
pd.concat(res).to_csv('my data.csv')
This returns the table data from the URL and works perfectly:
So... this brings me to my current issue:
I'm trying to take this same link (https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc)
and repeat the same action as the first code.
Meaning, I want to visit each profile (on all 66 or x number of pages), and print the profile data just like in the first code.
I thought/hoped, I'd be able to just replace the original D League URLS with this URL and it would work - it doesn't. I'm a little confused why, because the table data seems to be the same set up?
I started trying to re-work this, but struggling. I have very basic code, but think I'm taking steps backwards:
import requests
from bs4 import BeautifulSoup
url = "https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for link in soup.find_all("a"):
profile_url = link.get("href")
profile_response = requests.get(profile_url)
profile_soup = BeautifulSoup(profile_response.text, "html.parser")
profile_box = profile_soup.find("div", class_="profileBox")
if profile_box:
print(profile_box)
Any thoughts on this? Like I said, ultimately trying to recreate the same action as the first script, just for the 2nd URL.
Thanks in advance.

You can actually largely use the same code that you used in your first example, with a slight modification to the first find_all loop. Instead of using a findall you can use a css selector to select all of the table cells that have the nowrap class then test if that cell has a decendant link, and then from there the rest of your function should work the same as before.
Here is an example:
import requests
from bs4 import BeautifulSoup
def get_links2(url):
data = []
req_url = requests.get(url)
soup = BeautifulSoup(req_url.content, "html.parser")
for td in soup.select('td.nowrap'):
a_tag = td.a
if a_tag:
name = a_tag.text
player_url = a_tag['href']
print(f"Getting {name}")
req_player_url = requests.get(
f"https://basketball.realgm.com{player_url}")
soup_player = BeautifulSoup(req_player_url.content, "html.parser")
div_profile_box = soup_player.find("div", class_="profile-box")
row = {"Name": name, "URL": player_url}
for p in div_profile_box.find_all("p"):
try:
key, value = p.get_text(strip=True).split(':', 1)
row[key.strip()] = value.strip()
except: # not all entries have values
pass
data.append(row)
return data
urls2 = ["https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc"]
res2 = []
for url in urls2:
data=get_links2(url)
res2 = [*res2, *data]
print(res2)
OUTPUT:
[{'Name': 'Jaroslaw Zyskowski', 'URL': '/player/Jaroslaw-Zyskowski/Summary/32427', 'Current Team': 'Trefl Sopot', 'Born': 'Jul 16, 1992(30 years old)', 'Birthplace/Hometown': 'Wroclaw, Poland', 'Natio
nality': 'Poland', 'Height': '6-7 (201cm)Weight:220 (100kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Manuel Capicchioni', 'Draft Entry': '2014 NBA Draft', 'Drafted': 'Undrafted', '
Pre-Draft Team': 'Kotwica Kolobrzeg (Poland)'}, {'Name': 'Ferdinand Zylka', 'URL': '/player/Ferdinand-Zylka/Summary/76159', 'Full Name': 'Ferdinand Leontin Zylka', 'Current Team': 'Basic-Fit Brussels
Basketball', 'Born': 'Apr 11, 1998(24 years old)', 'Birthplace/Hometown': 'Berlin, Germany', 'Nationality': 'Germany', 'Height': '6-3 (191cm)Weight:170 (77kg)', 'Current NBA Status': 'Unrestricted Fre
e Agent', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Mitteldeutscher BC (Germany)'}, {'Name': 'Dainius Zvinklys', 'URL': '/player/Dainius-Zvinklys/Summary/151962', 'Cu
rrent Team': 'BBG Herford', 'Born': 'Nov 27, 1990(32 years old)', 'Birthplace/Hometown': 'Kretniga, Lithuania', 'Nationality': 'Lithuania', 'Height': '6-8 (203cm)Weight:187 (85kg)', 'Current NBA Statu
s': 'Unrestricted Free Agent', 'Draft Entry': '2012 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Markuss Zvinis', 'URL': '/player/Markuss-Zvinis/Summary/183480', 'Current Team': 'BK Valmiera', 'Born
': 'Apr 26, 2005(17 years old)', 'Nationality': 'Latvia', 'Height': '6-4 (193cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2027', 'Draft Entry': '2027 NBA Draft'}, {'Name': 'Ivars Zvigrus',
'URL': '/player/Ivars-Zvigrus/Summary/204634', 'Current Team': 'Flyyingen BBK', 'Born': 'Oct 17, 1995(27 years old)', 'Birthplace/Hometown': 'Riga, Latvia', 'Nationality': 'Latvia', 'Height': '6-7 (2
01cm)Weight:204 (93kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Nikita Zverev', 'URL': '/player/Nikita-Zverev/Summary/3279
1', 'Current Team': 'Samara', 'Born': 'Apr 6, 1994(28 years old)', 'Nationality': 'Russia', 'Height': '6-10 (208cm)Weight:225 (102kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry':
'2016 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Khimki BC U18 (Russia)'}, {'Name': 'Fernando Zurbriggen', 'URL': '/player/Fernando-Zurbriggen/Summary/76271', 'Full Name': 'Fernando Zurbri
ggen', 'Current Team': 'Monbus Obradoiro', 'Born': 'Oct 20, 1997(25 years old)', 'Birthplace/Hometown': 'Santa Fe, Argentina', 'Nationality': 'Argentina', 'Height': '6-1 (185cm)Weight:190 (86kg)', 'Cu
rrent NBA Status': 'Unrestricted Free Agent', 'Agent': 'Franisco Javier Martin', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Obras Sanitarias (Argentina)'}, {'Name': 'A
lejandro Zurbriggen', 'URL': '/player/Alejandro-Zurbriggen/Summary/42671', 'Current Team': 'Sant Antoni Ibiza Feeling', 'Born': 'Mar 18, 1995(27 years old)', 'Birthplace/Hometown': 'Santa Fe, Argentin
a', 'Nationality': 'Argentina', 'Height': '6-5 (196cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Franisco Javier Martin', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undr
afted', 'Pre-Draft Team': 'Regatas Corrientes (Argentina)'}, {'Name': 'Nejc Zupan', 'URL': '/player/Nejc-Zupan/Summary/41700', 'Current Team': 'KK Tajfun Sentjur', 'Born': 'Apr 12, 1996(26 years old)'
, 'Birthplace/Hometown': 'Koper, Slovenia', 'Nationality': 'Slovenia', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Sead Galijasevic', 'Draft Entry': '
2018 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Sixt Primorska (Slovenia)'}, {'Name': 'Zhennian Zuo', 'URL': '/player/Zhennian-Zuo/Summary/92765', 'Current Team': 'Sichuan Blue Whales', 'B
orn': 'Jan 26, 1996(27 years old)', 'Nationality': 'China', 'Height': '6-8 (203cm)Weight:215 (98kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft', 'Drafted': 'Undr
afted'}, {'Name': 'Matija Zunic', 'URL': '/player/Matija-Zunic/Summary/156440', 'Current Team': 'HKK Zrinjski', 'Born': 'Jun 7, 1996(26 years old)', 'Nationality': 'Serbia', 'Height': '6-4 (193cm)Weig
ht:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Kyle Zunic', 'URL': '/player/Kyle-Zunic/Summary/107186', 'Current Team': '
Perth', 'Born': 'Mar 4, 1999(23 years old)', 'Birthplace/Hometown': 'Wollongong, Australia', 'Nationality': 'Australia', 'Height': '6-2 (188cm)Weight:195 (88kg)', 'Current NBA Status': 'Unrestricted F
ree Agent', 'Draft Entry': '2022 NBA Draft', 'Drafted': 'Undrafted', 'High School': 'Lake Ginniderra High School[Burnie, Tasmania (Australia)]'}, {'Name': 'Karlis Zunda', 'URL': '/player/Karlis-Zunda/
Summary/123596', 'Current Team': 'Betsafe/Liepaja', 'Born': 'Aug 28, 1997(25 years old)', 'Nationality': 'Latvia', 'Height': '6-6 (198cm)Weight:187 (85kg)', 'Current NBA Status': 'Unrestricted Free Ag
ent', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Zhang Zuming', 'URL': '/player/Zhang-Zuming/Summary/83723', 'Current Team': 'Qingdao', 'Born': 'Jan 27, 1995(28 years old)', '
Nationality': 'China', 'Height': '6-9 (206cm)Weight:198 (90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Ningbo Roc
kets (China)'}, {'Name': 'Otoniel Zulueta', 'URL': '/player/Otoniel-Zulueta/Summary/184006', 'Current Team': 'N/A', 'Nationality': 'Mexico', 'Height': '6-6 (198cm)Weight:205 (93kg)', 'Current NBA Stat
us': 'Unrestricted Free Agent', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Nathan Zulemie', 'URL': '/player/Nathan-Zulemie/Summary/175816', 'Current Team': 'Espoirs Nanterre',
'Born': 'Sep 7, 2004(18 years old)', 'Nationality': 'France', 'Height': '5-8 (173cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Mantvydas
Zukauskas', 'URL': '/player/Mantvydas-Zukauskas/Summary/75749', 'Current Team': 'Vilkaviskio Perlas', 'Born': 'Oct 19, 1998(24 years old)', 'Birthplace/Hometown': 'Kaunas, Lithuania', 'Nationality': '
Lithuania', 'Height': '6-3 (191cm)Weight:185 (84kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Delikatesas Joniskis
(Lithuania)'}, {'Name': 'Eigirdas Zukauskas', 'URL': '/player/Eigirdas-Zukauskas/Summary/43242', 'Current Team': 'BC Wolves', 'Born': 'Jun 3, 1992(30 years old)', 'Birthplace/Hometown': 'Radviliskis,
Lithuania', 'Nationality': 'Lithuania', 'Height': '6-6 (198cm)Weight:190 (86kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2014 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft T
eam': 'Siauliai (Lithuania)'}, {'Name': 'Ivo Zukanovic', 'URL': '/player/Ivo-Zukanovic/Summary/171804', 'Current Team': 'KK Alkar', 'Born': 'Sep 1, 2002(20 years old)', 'Nationality': 'Croatia', 'Heig
ht': '6-3 (191cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2024', 'Draft Entry': '2024 NBA Draft'}, {'Name': 'Kjeld Zuidema', 'URL': '/player/Kjeld-Zuidema/Summary/168658', 'Current Team':
'Donar Groningen', 'Born': 'Jun 21, 2001(21 years old)', 'Birthplace/Hometown': 'Eexterzandvoort, Netherlands', 'Nationality': 'Netherlands', 'Height': '6-5 (196cm)Weight:198 (90kg)', 'Current NBA St
atus': 'Draft Eligible in 2023', 'Draft Entry': '2023 NBA Draft'}, {'Name': 'Ruben Zugno', 'URL': '/player/Ruben-Zugno/Summary/78457', 'Current Team': 'Zeus Energy Group Rieti', 'Born': 'Mar 20, 1996(
26 years old)', 'Birthplace/Hometown': 'Cantu, Italy', 'Nationality': 'Italy', 'Height': '6-1 (185cm)Weight:182 (83kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2018 NBA Draft
', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Acqua San Bernardo Cantu (Italy)'}, {'Name': 'Luka Zugic', 'URL': '/player/Luka-Zugic/Summary/172582', 'Current Team': 'KK Milenijum Podgorica', 'Born': '
Nov 22, 2000(22 years old)', 'Birthplace/Hometown': 'Podgorica, Montenegro', 'Nationality': 'Montenegro', 'Height': '6-5 (196cm)Weight:210 (95kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Dr
aft Entry': '2022 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Fedor Zugic', 'URL': '/player/Fedor-Zugic/Summary/128532', 'Current Team': 'Ratiopharm Ulm', 'Born': 'Sep 18, 2003(19 years old)', 'Bir
thplace/Hometown': 'Kotor, Montenegro', 'Nationality': 'Montenegro', 'Height': '6-6 (198cm)Weight:188 (85kg)', 'Current NBA Status': 'Draft Eligible in 2025', 'Agent': 'Rade Filipovich,David Mondress'
, 'Draft Entry': '2025 NBA Draft', 'Early Entry Info': '2022 Early Entrant(Withdrew)', 'Pre-Draft Team': 'Ratiopharm Ulm (Germany)'}, {'Name': 'Andrey Zubkov', 'URL': '/player/Andrey-Zubkov/Summary/25
944', 'Current Team': 'Zenit Saint Petersburg', 'Born': 'Jun 29, 1991(31 years old)', 'Birthplace/Hometown': 'Chelyabinsk, Russia', 'Nationality': 'Russia', 'Height': '6-9 (206cm)Weight:195 (88kg)', '
Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Obrad Fimic', 'Draft Entry': '2013 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Lokomotiv Kuban (Russia)'}, {'Name': 'Aleksandr Zubk
ov', 'URL': '/player/Aleksandr-Zubkov/Summary/183206', 'Current Team': 'Runa-2', 'Born': 'Apr 7, 2002(20 years old)', 'Nationality': 'Russia', 'Height': '5-11 (180cm)Weight:N/A', 'Current NBA Status':
'Draft Eligible in 2024', 'Draft Entry': '2024 NBA Draft'}, {'Name': 'Aitor Zubizarreta', 'URL': '/player/Aitor-Zubizarreta/Summary/39787', 'Current Team': 'Acunsa GBC', 'Born': 'Mar 6, 1995(27 years
old)', 'Birthplace/Hometown': 'Azpeitia, Spain', 'Nationality': 'Spain', 'Height': '6-4 (193cm)Weight:195 (88kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'D
rafted': 'Undrafted', 'Pre-Draft Team': 'College of Idaho (Sr)'}, {'Name': 'Tomislav Zubcic', 'URL': '/player/Tomislav-Zubcic/Summary/2427', 'Current Team': 'London Lions', 'Born': 'Jan 17, 1990(33 ye
ars old)', 'Birthplace/Hometown': 'Zadar, Croatia', 'Nationality': 'Croatia', 'Height': '6-10 (208cm)Weight:230 (104kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Bill Duffy', 'Draft
Entry': '2012 NBA Draft', 'Early Entry Info': '2011 Early Entrant(Withdrew)', 'Drafted': 'Round 2, Pick 26, Toronto Raptors', 'Draft Rights Trade': 'TOR to OKC, Jun 30, 2015', 'Pre-Draft Team': 'KK C
ibona (Croatia)'}, {'Name': 'Jure Zubac', 'URL': '/player/Jure-Zubac/Summary/38326', 'Current Team': 'Belfius Mons-Hainaut', 'Born': 'Mar 15, 1995(27 years old)', 'Birthplace/Hometown': 'Mostar, Bosni
a and Herzegovina', 'Nationality': 'Bosnia and Herzegovina', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'
, 'Pre-Draft Team': 'BC Siroki (Bosnia and Herzegovina)'}, {'Name': 'Peter Zsiros', 'URL': '/player/Peter-Zsiros/Summary/98310', 'Current Team': 'Zalakeramia-ZTE KK', 'Born': 'Jun 22, 1994(28 years ol
d)', 'Nationality': 'Hungary', 'Height': '6-7 (201cm)Weight:198 (90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2016 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Harun Zrno
', 'URL': '/player/Harun-Zrno/Summary/188930', 'Current Team': 'OKK Spars Sarajevo', 'Born': 'Mar 1, 2004(18 years old)', 'Nationality': 'Bosnia and Herzegovina', 'Height': '6-6 (198cm)Weight:N/A', 'C
urrent NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Evangelos Zougris', 'URL': '/player/Evangelos-Zougris/Summary/183106', 'Current Team': 'Peristeri BC', 'Born':
'Oct 14, 2004(18 years old)', 'Nationality': 'Greece', 'Height': '6-8 (203cm)Weight:N/A', 'Current NBA Status': 'Draft Eligible in 2026', 'Draft Entry': '2026 NBA Draft'}, {'Name': 'Vitaliy Zotov', '
URL': '/player/Vitaliy-Zotov/Summary/54539', 'Current Team': 'BC Budivelnik', 'Born': 'Mar 3, 1997(25 years old)', 'Birthplace/Hometown': 'Lozovaya, Ukraine', 'Nationality': 'Ukraine', 'Height': '6-2
(188cm)Weight:185 (84kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Misko Raznatovic', 'Draft Entry': '2019 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Jan Zorvan', 'URL': '/playe
r/Jan-Zorvan/Summary/108564', 'Current Team': 'MBK Lucenec', 'Born': 'Dec 22, 1995(27 years old)', 'Nationality': 'Slovakia', 'Height': '6-7 (201cm)Weight:208 (94kg)', 'Current NBA Status': 'Unrestric
ted Free Agent', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted'}, {'Name': 'Kristers Zoriks', 'URL': '/player/Kristers-Zoriks/Summary/54343', 'Current Team': 'BC VEF Riga', 'Born': 'May 25, 1
998(24 years old)', 'Birthplace/Hometown': 'Dobele, Latvia', 'Nationality': 'Latvia', 'Height': '6-4 (193cm)Weight:190 (86kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2022 NB
A Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'BC VEF Riga (Latvia)', 'High School': 'New Hampton School[New Hampton, New Hampshire (United States)]'}, {'Name': 'Yovel Zoosman', 'URL': '/player/
Yovel-Zoosman/Summary/75937', 'Current Team': 'ALBA Berlin', 'Born': 'May 12, 1998(24 years old)', 'Birthplace/Hometown': 'Kfar Saba, Israel', 'Nationality': 'Israel', 'Height': '6-7 (201cm)Weight:198
(90kg)', 'Current NBA Status': 'Unrestricted Free Agent', 'Agent': 'Andrew Vye,Guillermo Bermejo,Brian Jungreis,Nadav Mor', 'Draft Entry': '2019 NBA Draft', 'Early Entry Info': '2019 Early Entrant',
'Drafted': 'Undrafted', 'Pre-Draft Team': 'Maccabi FOX Tel Aviv (Israel)'}, {'Name': 'Marcell Zoltan Volgyi', 'URL': '/player/Marcell-Zoltan-Volgyi/Summary/93730', 'Current Team': 'Budapesti Honved Se
', 'Born': 'Apr 22, 1998(24 years old)', 'Birthplace/Hometown': 'Nagykanizsa, Hungary', 'Nationality': 'Hungary', 'Height': '6-6 (198cm)Weight:200 (91kg)', 'Current NBA Status': 'Unrestricted Free Age
nt', 'Draft Entry': '2020 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Zalakeramia-ZTE KK (Hungary)'}, {'Name': 'Przemyslaw Zolnierewicz', 'URL': '/player/Przemyslaw-Zolnierewicz/Summary/531
22', 'Current Team': 'Enea Zastal BC Zielona', 'Born': 'Jul 3, 1995(27 years old)', 'Birthplace/Hometown': 'Paslek, Poland', 'Nationality': 'Poland', 'Height': '6-4 (193cm)Weight:200 (91kg)', 'Current
NBA Status': 'Unrestricted Free Agent', 'Agent': 'Rade Filipovich', 'Draft Entry': '2017 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'Asseco Arka Gdynia (Poland)'}, {'Name': 'Laurent Raphae
l Zoccoletti', 'URL': '/player/Laurent-Raphael-Zoccoletti/Summary/95274', 'Current Team': 'SAM Basket Massagno', 'Born': 'Nov 17, 1999(23 years old)', 'Birthplace/Hometown': 'Wettingen, Switzerland',
'Nationality': 'Switzerland', 'Height': '6-7 (201cm)Weight:N/A', 'Current NBA Status': 'Unrestricted Free Agent', 'Draft Entry': '2021 NBA Draft', 'Drafted': 'Undrafted', 'Pre-Draft Team': 'BBC Nyon (
Switzerland)'}, ....

Related

How to sort lists of dictionaries according to values of ONE key type? [duplicate]

This question already has answers here:
How do I sort a list of dictionaries by a value of the dictionary?
(20 answers)
Closed 11 months ago.
I have the following list of dictionaries:
[{'title': 'Shrek the Musical', 'year': 2013, 'genres': ['Comedy', 'Family', 'Fantasy'], 'duration': 130, 'directors': ['Michael John Warren'], 'actors': ["Brian d'Arcy James", 'Sutton Foster', 'Christopher Sieber'], 'rating': 7.0},
{'title': 'Shrek Retold', 'year': 2018, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 90, 'directors': ['Grant Duffrin'], 'actors': ['Harry Antonucci', 'Russell Bailey'], 'rating': 7.5},
{'title': 'Shrek 2', 'year': 2004, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Andrew Adamson', 'Kelly Asbury'], 'actors': ['Mike Myers', 'Eddie Murphy'], 'rating': 7.2},
{'title': 'Shrek the Third', 'year': 2007, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Chris Miller', 'Raman Hui'], 'actors': ['Mike Myers', 'Eddie Murphy', 'Cameron Diaz', 'Antonio Banderas'], 'rating': 6.1},
{'title': 'Shrek Forever After', 'year': 2010, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Mike Mitchell'], 'actors': ['Mike Myers', 'Eddie Murphy', 'Cameron Diaz', 'Antonio Banderas'], 'rating': 6.3},
{'title': 'Shrek', 'year': 2001, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 90, 'directors': ['Andrew Adamson', 'Vicky Jenson'], 'actors': ['Mike Myers', 'Eddie Murphy'], 'rating': 7.8}]
I want to produce a list of movies sorted in increasing order of their year of release. That is, the sorting should just depend on one value, ignoring all other keys.
I tried using the sorted function, but I don't know how to specify that the dictionaries should be sorted based on a single value.
You can pass a key parameter to sorted() to tell it which key to sort on:
result = sorted(data, key=lambda x: x['year'])
print(result)
This outputs:
[{'title': 'Shrek', 'year': 2001, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 90, 'directors': ['Andrew Adamson', 'Vicky Jenson'], 'actors': ['Mike Myers', 'Eddie Murphy'], 'rating': 7.8},
{'title': 'Shrek 2', 'year': 2004, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Andrew Adamson', 'Kelly Asbury'], 'actors': ['Mike Myers', 'Eddie Murphy'], 'rating': 7.2},
{'title': 'Shrek the Third', 'year': 2007, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Chris Miller', 'Raman Hui'], 'actors': ['Mike Myers', 'Eddie Murphy', 'Cameron Diaz', 'Antonio Banderas'], 'rating': 6.1},
{'title': 'Shrek Forever After', 'year': 2010, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 93, 'directors': ['Mike Mitchell'], 'actors': ['Mike Myers', 'Eddie Murphy', 'Cameron Diaz', 'Antonio Banderas'], 'rating': 6.3},
{'title': 'Shrek the Musical', 'year': 2013, 'genres': ['Comedy', 'Family', 'Fantasy'], 'duration': 130, 'directors': ['Michael John Warren'], 'actors': ["Brian d'Arcy James", 'Sutton Foster', 'Christopher Sieber'], 'rating': 7.0},
{'title': 'Shrek Retold', 'year': 2018, 'genres': ['Animation', 'Adventure', 'Comedy'], 'duration': 90, 'directors': ['Grant Duffrin'], 'actors': ['Harry Antonucci', 'Russell Bailey'], 'rating': 7.5}]

Convert JSON data to pandas df - python

I know there is a few questions on SO regarding the conversion of JSON file to a pandas df but nothing is working. Specifically, the JSON requests the current days information. I'm trying to return the tabular structure that corresponds with Data but I'm only getting the first dict object.
I'll list the current attempts and the resulting outputs below.
import requests
import pandas as pd
import json
get_session_url = "https://qships.tmr.qld.gov.au/webx/"
get_data_url = "https://qships.tmr.qld.gov.au/webx/services/wxdata.svc/GetDataX"
get_data_query = {
"token": None,
"reportCode": "MSQ-WEB-0001",
"dataSource": None,
"filterName": "Today",
"parameters": [{
"__type": "ParameterValueDTO:#WebX.Core.DTO",
"sName": "DOMAIN_ID",
"iValueType": 0,
"aoValues": [{"Value": -1}],
}],
"metaVersion": 0,
}
sess = requests.session()
sess.get(get_session_url).raise_for_status()
my_dict = sess.post(get_data_url, json = get_data_query).json()
print(my_dict)
Output:
{'d': {'__type': 'DataSetDTO:#WebX.Core.DTO', 'BuildVersion': '7.0.0.12590', 'ReportCode': 'MSQ-WEB-0001', 'Tables': [{'__type': 'DataTableDTO:#WebX.Core.DTO', 'BuildVersion': '7.0.0.12590', 'AsOfDate': '14:36 on Jan 19', 'Data': [[132378, 334489, 'EXT', 'NANA Z', 'BULK CARRIER', 229.2, 'LBH Australia Pty Ltd (Mackay)', '/Date(1642600800000+1000)/', '/Date(1642600800000+1000)/', 'SEA for HPS', 'Anch for HPS & DBCT', 'PLAN', 'Keelung (Chilung)', 'Kwangyang', None, 633086, 705], [132112, 333984, 'DEP', 'KRITI WARRIOR', 'BULK CARRIER', 234.98, 'Wilhelmsen Ships Service (Gladstone)', '/Date(1642600800000+1000)/', '/Date(1642608900000+1000)/', 'Fishermans Landing 1', 'SEA', 'CONF', 'Amrun', 'Amrun', '2201', 632395, 725], [132232, 334208, 'EXT', 'BLUE GRASS MARINER', 'TANKER', 183.06, 'Gulf Agency Company (Mackay)', '/Date(1642600860000+1000)/', '/Date(1642600860000+1000)/', 'SEA M', 'Anch for MKY', 'PLAN', 'Gladstone', 'Singapore', None, 633566, 705], [132654, 335076, 'EXT', 'SERIFOS WARRIOR', 'BULK CARRIER', 234.98, 'Wilhelmsen Ships Service (Gladstone)', '/Date(1642606200000+1000)/', '/Date(1642609800000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Amrun', 'Amrun', '2201', 632055, 705], [132030, 333847, 'ARR', 'MH GREEN', 'CONTAINER SHIP', 199.98, 'Inchcape Shipping Services (Queensland)', '/Date(1642610700000+1000)/', '/Date(1642623300000+1000)/', 'SEA', 'Fisherman Island No 8', 'SCHD', 'Yantian', 'Botany Bay', '11S/11N', 633005, 710], [131681, 333193, 'ARR', 'KM NAGOYA', 'BULK CARRIER', 234.98, 'Gulf Agency Company (Gladstone)', '/Date(1642611600000+1000)/', '/Date(1642618800000+1000)/', 'Fairway Buoy Anchorage', 'Clinton Coal 2', 'CONF', 'Fangcheng', 'Singapore', None, 633504, 725], [132781, 335341, 'ARR', 'MORNING CLARA', 'VEHICLES CARRIER', 199.9, 'Wilhelmsen Ships Service (Brisbane)', '/Date(1642611600000+1000)/', '/Date(1642626000000+1000)/', 'Drift Point Cartwright', 'Fisherman Island No 1', 'SCHD', 'Tianjin', 'Port Kembla', '2251', 633093, 710], [131971, 333736, 'DEP', 'MAPLE FORTITUDE', 'BULK CARRIER', 179.9, 'Inchcape Shipping Services (Queensland)', '/Date(1642615200000+1000)/', '/Date(1642621500000+1000)/', 'Townsville 09', 'SEA', 'SCHD', 'Lanshan', 'Auckland', '2101', 633738, 710], [131629, 333076, 'DEP', 'JP CORAL', 'BULK CARRIER', 228.0, 'Sturrock Grindrod Maritime (Gladstone)', '/Date(1642617000000+1000)/', '/Date(1642625100000+1000)/', 'Clinton Coal 2', 'SEA', 'CONF', 'Matsushima - Nagasaki', 'Matsuura - Nagasaki', '146', 631305, 725], [130504, 331071, 'ARR', 'KENNADI', 'BULK CARRIER', 199.9, 'LBH Australia Pty Ltd (Gladstone)', '/Date(1642617000000+1000)/', '/Date(1642626000000+1000)/', 'East Anchorage 9', 'Clinton Coal 4', 'CONF', 'Kwangyang', 'Kendari - Sulawesi', '37', 633759, 725], [131497, 332926, 'ARR', 'STAR VIRGINIA', 'BULK CARRIER', 229.0, 'Inchcape Shipping Services (Queensland)', '/Date(1642617900000+1000)/', '/Date(1642633200000+1000)/', 'Point Cartwright Anchorage', 'Fisherman Island Coal Berth', 'SCHD', 'Kitakyushu', 'Fukuyama - Hiroshima', '2', 632115, 710], [132459, 334657, 'ARR', 'NORD ANNAPOLIS', 'BULK CARRIER', 179.9, 'Monson Agencies Australia (Gladstone)', '/Date(1642617900000+1000)/', '/Date(1642625100000+1000)/', 'East Anchorage 11', 'Auckland Point 2', 'CONF', 'Portland', 'Chittagong', '26', 633752, 725], [132563, 334863, 'DEP', 'POSITIVE LEADER', 'VEHICLES CARRIER', 180.0, 'Monson Agencies Australia (Brisbane)', '/Date(1642622400000+1000)/', '/Date(1642635000000+1000)/', 'Fisherman Island No 1', 'SEA', 'SCHD', 'Townsville', 'Port Kembla', '090', 632525, 710], [132221, 334613, 'ARR', 'DANCEWOOD SW', 'BULK CARRIER', 170.7, 'Inchcape Shipping Services (Queensland)', '/Date(1642622400000+1000)/', '/Date(1642640400000+1000)/', 'Point Cartwright Anchorage', 'Pinkenba No 1', 'SCHD', 'Guam', 'Shibushi', '202201', 632332, 710], [132357, 334450, 'EXT', 'DOUBLE FANTASY', 'BULK CARRIER', 234.98, 'Monson Agencies Australia (Townsville & Abbot Point)', '/Date(1642622400000+1000)/', '/Date(1642622400000+1000)/', 'SEA', 'Abbot Point Anchorage', 'SCHD', 'Chiba', None, None, 631611, 710], [132431, 334598, 'DEP', 'INDUS PROSPERITY', 'BULK CARRIER', 229.2, 'Monson Agencies Australia (Townsville & Abbot Point)', '/Date(1642624200000+1000)/', '/Date(1642624200000+1000)/', 'Abott Point 2', 'SEA', 'SCHD', 'Chiba', 'Dung Quat', None, 627891, 710], [132465, 334672, 'DEP', 'KOTA LUMAYAN', 'CONTAINER SHIP', 260.502, 'Gulf Agency Company (Brisbane)', '/Date(1642626000000+1000)/', '/Date(1642639500000+1000)/', 'Fisherman Island No. 9', 'SEA', 'PLAN', 'Singapore', 'Sydney', '0147', 632026, 705], [132356, 334446, 'ARR', 'TRITON', 'BULK CARRIER', 225.0, 'Sturrock Grindrod Maritime (Mackay)', '/Date(1642626000000+1000)/', '/Date(1642632000000+1000)/', 'North Anchorage 22', 'HPS Berth 2', 'SCHD', 'Gunsan (ex Kunsan)', 'Singapore', '012022', 633638, 710], [132430, 334595, 'ARR', 'GOLDEN YOSA', 'TANKER', 144.03, 'Sturrock Grindrod Maritime (Brisbane)', '/Date(1642626000000+1000)/', '/Date(1642644000000+1000)/', 'SEA', 'Viva Energy', 'SCHD', 'Geelong', 'Townsville', '74(C1)', 628015, 710], [132631, 335048, 'DEP', 'MONDIAL SUN', 'BULK CARRIER', 229.0, 'Ben Line Agencies', '/Date(1642626000000+1000)/', '/Date(1642629600000+1000)/', 'Abbot Point 1', 'SEA', 'SCHD', 'Bahudopi', 'India', '018', 633700, 710], [132451, 334640, 'EXT', 'GOLDEN HACHI', 'TANKER', 126.8, 'Sturrock Grindrod Maritime (Brisbane)', '/Date(1642626000000+1000)/', '/Date(1642626000000+1000)/', 'SEA', 'Point Cartwright Anchorage', 'PLAN', 'Singapore', 'Botany Bay', '10', 632483, 705], [132442, 334622, 'DEP', 'FOREVER SW', 'BULK CARRIER', 189.99, 'Gulf Agency Company (Brisbane)', '/Date(1642626000000+1000)/', '/Date(1642643100000+1000)/', 'Fisherman Island Coal Berth', 'SEA', 'SCHD', 'Toledo/Cebu', 'Kushiro', '2A', 569051, 710], [132572, 334905, 'ARR', 'GREEK FRIENDSHIP', 'BULK CARRIER', 228.9, 'LBH Australia Pty Ltd (Mackay)', '/Date(1642627800000+1000)/', '/Date(1642627800000+1000)/', 'Abbot Point Anchorage 11', 'Abott Point 2', 'SCHD', 'Tianjin', 'Singapore', None, 633660, 710], [132262, 334259, 'DEP', 'ASTREA', 'BULK CARRIER', 228.99, 'Wave Shipping Pty Ltd', '/Date(1642627800000+1000)/', '/Date(1642627860000+1000)/', 'HPS Berth 1', 'SEA Paddock Departure', 'PLAN', 'Lianyungang', 'Singapore', '1', 633595, 705], [132510, 334762, 'DEP', 'BRILLIANT ADVANCE', 'BULK CARRIER', 228.99, 'Wilhelmsen Ships Service (Weipa)', '/Date(1642629600000+1000)/', '/Date(1642633200000+1000)/', 'Chith Export Facility', 'SEA', 'CONF', 'Laizhou', 'Gladstone', None, 631808, 725], [132170, 334112, 'ARR', 'LOWLANDS CRIMSON', 'BULK CARRIER', 234.96, 'Wilhelmsen Ships Service (Weipa)', '/Date(1642629600000+1000)/', '/Date(1642636800000+1000)/', 'Anchorage ^D', 'Chith Export Facility', 'CONF', 'Gladstone', 'China', None, 630787, 725], [132433, 334601, 'DEP', 'PT NORFOLK', 'GENERAL CARGO BARGE', 70.15, 'Pacific Tug (Aust) PTY LTD', '/Date(1642631400000+1000)/', '/Date(1642635000000+1000)/', 'Marina', 'Bundaberg Anchorage', 'CONF', None, None, None, 624749, 725], [132428, 334591, 'REM', 'PT KYTHIRA', 'TUG', 26.0, 'Pacific Tug (Aust) PTY LTD', '/Date(1642631400000+1000)/', '/Date(1642635000000+1000)/', 'Marina', 'Bundaberg Anchorage', 'CONF', None, 'Brisbane', None, 570086, 725], [131637, 333097, 'ARR', 'BALZANI', 'TANKER', 228.418, 'Monson Agencies Australia (Gladstone)', '/Date(1642632300000+1000)/', '/Date(1642642200000+1000)/', 'North Anchorage 7', 'Fishermans Landing 2', 'CONF', 'Yeosu (ex Yosu)', 'Port Kembla', '32106', 632359, 725], [132699, 335167, 'EXT', 'FEDERAL IMABARI', 'BULK CARRIER', 199.98, 'Monson Agencies Australia (Brisbane)', '/Date(1642633200000+1000)/', '/Date(1642633200000+1000)/', 'Skardon River Anchorage', 'SEA', 'CONF', None, None, None, 624678, 725], [132451, 335328, 'ARR', 'GOLDEN HACHI', 'TANKER', 126.8, 'Sturrock Grindrod Maritime (Brisbane)', '/Date(1642635000000+1000)/', '/Date(1642651200000+1000)/', 'Point Cartwright Anchorage', 'Ampol Lytton Products', 'PLAN', 'Singapore', 'Botany Bay', '10', 632483, 705], [131897, 333604, 'DEP', 'PROTEUS', 'TANKER', 183.06, 'Gulf Agency Company (Mackay)', '/Date(1642635000000+1000)/', '/Date(1642635060000+1000)/', 'Mackay Berth 1', 'SEA MKY', 'SCHD', 'Gladstone', 'Townsville', None, 633592, 710], [132059, 333886, 'ARR', 'RTM WAKMATHA', 'BULK CARRIER', 236.0, 'Wilhelmsen Ships Service (Gladstone)', '/Date(1642635000000+1000)/', '/Date(1642644900000+1000)/', 'Fairway Buoy Anchorage', 'Fishermans Landing 1', 'CONF', 'Gove', 'Amrun', None, 633057, 725], [132024, 333833, 'ARR', 'MARIA PRINCESS', 'TANKER', 228.59, 'Gulf Agency Company (Brisbane)', '/Date(1642635000000+1000)/', '/Date(1642654800000+1000)/', 'Point Cartwright Anchorage', 'Fishermans Island Tanker Terminal', 'SCHD', 'Seria Brunei', None, None, 633606, 710], [132504, 334740, 'EXT', 'MAIRAKI', 'BULK CARRIER', 291.9, 'LBH Australia Pty Ltd (Gladstone)', '/Date(1642636800000+1000)/', '/Date(1642636800000+1000)/', 'SEA', 'Drift Gladstone', 'PLAN', 'Tianjin', None, '43', 633705, 705], [132029, 333846, 'DEP', 'MANTA NILGUN', 'GENERAL CARGO', 179.99, 'Monson Agencies Australia (Gladstone)', '/Date(1642637700000+1000)/', '/Date(1642644000000+1000)/', 'South Trees East', 'SEA', 'CONF', 'Port Moresby', 'Nakhodka', '202201', 632946, 725], [132001, 333781, 'ARR', 'NSU KEYSTONE', 'BULK CARRIER', 299.94, 'Inchcape Shipping Services (Queensland)', '/Date(1642638600000+1000)/', '/Date(1642644000000+1000)/', 'North Anchorage 19', 'DBCT Berth 1', 'SCHD', 'Yeosu (ex Yosu)', 'Kimitsu', '57', 633532, 710], [131382, 332650, 'EXT', 'AQUADIVA', 'BULK CARRIER', 292.0, 'Gulf Agency Company (Gladstone)', '/Date(1642639500000+1000)/', '/Date(1642643100000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Bayuquan', 'Abbot Point', None, 633453, 705], [132417, 334562, 'DEP', 'KMARIN KENAI', 'BULK CARRIER', 229.0, 'Monson Agencies Australia (Mackay)', '/Date(1642640400000+1000)/', '/Date(1642644000000+1000)/', 'DBCT Berth 1', 'SEA Paddock Departure', 'SCHD', 'Yeosu (ex Yosu)', 'Sepetiba', None, 633645, 710], [132708, 335184, 'ARR', 'MSC ELA', 'CONTAINER SHIP', 294.06, 'Mediterranean Shipping Company', '/Date(1642641300000+1000)/', '/Date(1642654800000+1000)/', 'SEA', 'Fisherman Island No. 9', 'SCHD', 'Sydney', 'Shanghai', 'SE151R', 633718, 710], [132611, 335017, 'DEP', 'SSB 1803', 'BARGE CARRIER', 52.7, 'Sea Swift Pty Ltd', '/Date(1642644000000+1000)/', '/Date(1642647600000+1000)/', 'Hammond Island', 'SEA', 'CONF', None, None, None, 586569, 725], [132429, 334592, 'EXT', 'LEONORA VICTORY', 'TANKER', 183.2, 'Monson Agencies Australia (Gladstone)', '/Date(1642644000000+1000)/', '/Date(1642644000000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Balboa', 'Unknown Port', '32', 633737, 705], [132601, 335000, 'DEP', 'NORMAN RIVER', 'TUG', 24.45, 'Sea Swift Pty Ltd', '/Date(1642644000000+1000)/', '/Date(1642647600000+1000)/', 'Hammond Island', 'SEA', 'CONF', 'Cape Flattery', 'Cairns', None, 633691, 725], [132079, 335477, 'EXT', 'DEE4 LARCH', 'TANKER', 183.06, 'Inchcape Shipping Services (Queensland)', '/Date(1642646040000+1000)/', '/Date(1642646040000+1000)/', 'East Anchorage 6', 'SEA', 'PLAN', 'Etajima', 'Unknown Port', '1', 632184, 705], [132470, 334682, 'ARR', 'CASTILLO DE SANTISTEBAN', 'LIQUEFIED GAS TANKER', 299.9, 'Gulf Agency Company (Gladstone)', '/Date(1642646700000+1000)/', '/Date(1642658400000+1000)/', 'LNG Anchorage 2', 'Queensland Curtis LNG', 'CONF', 'Taiwan', 'Ningbo', None, 632133, 725], [132434, 334603, 'REM', 'PT NORFOLK', 'GENERAL CARGO BARGE', 70.15, 'Pacific Tug (Aust) PTY LTD', '/Date(1642647600000+1000)/', '/Date(1642662000000+1000)/', 'Shark Spit Anchorage', 'Queensport', 'SCHD', None, None, None, 624749, 710], [132538, 334816, 'EXT', 'WINCANTON', 'LIQUEFIED GAS TANKER', 119.95, 'Inchcape Shipping Services (Queensland)', '/Date(1642647600000+1000)/', '/Date(1642647600000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Newcastle', 'Newcastle', '264', 632386, 705], [132432, 334600, 'REM', 'PT KYTHIRA', 'TUG', 26.0, 'Pacific Tug (Aust) PTY LTD', '/Date(1642647600000+1000)/', '/Date(1642662000000+1000)/', 'Shark Spit Anchorage', 'Queensport', 'SCHD', 'Bundaberg', None, None, 570086, 710], [131727, 333300, 'ARR', 'SEMIRAMIS', 'BULK CARRIER', 228.9, 'Sturrock Grindrod Maritime (Mackay)', '/Date(1642647660000+1000)/', '/Date(1642653060000+1000)/', 'South Anchorage 09', 'HPS Berth 1', 'PLAN', 'Jingtang (Tangshan)', 'Singapore', 'TP0264', 633516, 705], [132130, 335179, 'ARR', 'CHORUS', 'BULK CARRIER', 228.99, 'Monson Agencies Australia (Mackay)', '/Date(1642649400000+1000)/', None, 'North Anchorage 06', 'DBCT Berth 3', 'SCHD', 'Busan', 'Kakogawa', '80', 633558, 710], [132439, 334614, 'EXT', 'SM TIGER', 'BULK CARRIER', 292.0, 'LBH Australia Pty Ltd (Mackay)', '/Date(1642649400000+1000)/', '/Date(1642649400000+1000)/', 'SEA for HPS', 'Anch for HPS & DBCT', 'PLAN', 'Kwangyang', 'Pohang', '50', 633640, 705], [132795, 335381, 'ARR', 'ALBATROSS BAY', 'LANDING CRAFT', 64.0, 'Sea Swift Pty Ltd', '/Date(1642651200000+1000)/', '/Date(1642654800000+1000)/', 'SEA', 'Horn Island', 'CONF', 'Cairns', 'Seisia', 'AB 2203', 633274, 725], [132433, 335356, 'ARR', 'PT NORFOLK', 'GENERAL CARGO BARGE', 70.15, 'Pacific Tug (Aust) PTY LTD', '/Date(1642651200000+1000)/', '/Date(1642654800000+1000)/', 'Bundaberg Anchorage', 'Marina', 'CONF', None, None, None, 624749, 725], [132428, 335355, 'REM', 'PT KYTHIRA', 'TUG', 26.0, 'Pacific Tug (Aust) PTY LTD', '/Date(1642651200000+1000)/', '/Date(1642654800000+1000)/', 'Bundaberg Anchorage', 'Marina', 'CONF', None, 'Brisbane', None, 570086, 725], [132295, 334319, 'DEP', 'HOEGH KOBE', 'VEHICLES CARRIER', 199.1, 'Seaway Agencies Pty Ltd', '/Date(1642654800000+1000)/', '/Date(1642669200000+1000)/', 'Wagners', 'SEA', 'SCHD', 'Auckland', 'Port Kembla', '68', 631289, 710], [132291, 334306, 'ARR', 'LOCH MAREE', 'BULK CARRIER', 176.83, 'Wave Shipping Pty Ltd', '/Date(1642655700000+1000)/', '/Date(1642672800000+1000)/', 'Point Cartwright Anchorage', 'Fisherman Island General Purpose Berth', 'SCHD', 'Fujairah', 'Lae', '9', 633744, 710], [132232, 334209, 'ARR', 'BLUE GRASS MARINER', 'TANKER', 183.06, 'Gulf Agency Company (Mackay)', '/Date(1642657200000+1000)/', '/Date(1642657260000+1000)/', 'Anch for MKY', 'Mackay Berth 1', 'SCHD', 'Gladstone', 'Singapore', None, 633566, 710], [132538, 334817, 'ARR', 'WINCANTON', 'LIQUEFIED GAS TANKER', 119.95, 'Inchcape Shipping Services (Queensland)', '/Date(1642657500000+1000)/', '/Date(1642668300000+1000)/', 'Fairway Buoy Anchorage', 'Fishermans Landing 5', 'CONF', 'Newcastle', 'Newcastle', '264', 632386, 725], [132473, 334686, 'EXT', 'GREAT CHEER', 'BULK CARRIER', 229.2, 'LBH Australia Pty Ltd (Mackay)', '/Date(1642658400000+1000)/', '/Date(1642658400000+1000)/', 'SEA for HPS', 'Anch for HPS & DBCT', 'PLAN', 'Kakogawa', 'Indonesia', '2201VC', 633677, 705], [132513, 334770, 'DEP', 'KAI YANG STAR', 'BULK CARRIER', 234.98, 'Wilhelmsen Ships Service (Weipa)', '/Date(1642659300000+1000)/', '/Date(1642666500000+1000)/', 'Lorim West', 'SEA', 'CONF', 'Dongjiakou', 'Qingdao', None, 633694, 725], [132575, 335351, 'EXT', 'IPSEA COLOSSUS', 'BULK CARRIER', 197.0, 'Monson Agencies Australia (Townsville & Abbot Point)', '/Date(1642662000000+1000)/', '/Date(1642662000000+1000)/', 'SEA', 'Abbot Point Anchorage', 'SCHD', 'Chittagong', None, None, 625240, 710], [132285, 334295, 'DEP', 'FW EXCURSIONIST', 'BULK CARRIER', 179.9, 'Wave Shipping Pty Ltd', '/Date(1642662000000+1000)/', '/Date(1642679100000+1000)/', 'Fisherman Island General Purpose Berth', 'SEA', 'SCHD', 'Busan', 'New Plymouth', '24', 633667, 710], [132364, 334463, 'EXT', 'JUPITER', 'BULK CARRIER', 225.0, 'LBH Australia Pty Ltd (Mackay)', '/Date(1642663800000+1000)/', '/Date(1642663800000+1000)/', 'SEA for DBCT', 'Anch for HPS & DBCT', 'PLAN', 'Rizhao', 'Singapore', '17', 633643, 705], [132781, 335342, 'DEP', 'MORNING CLARA', 'VEHICLES CARRIER', 199.9, 'Wilhelmsen Ships Service (Brisbane)', '/Date(1642665600000+1000)/', '/Date(1642680000000+1000)/', 'Fisherman Island No 1', 'SEA', 'SCHD', 'Tianjin', 'Port Kembla', '2251', 633093, 710], [131704, 333251, 'DEP', 'TANGGUH JAYA', 'LIQUEFIED GAS TANKER', 285.1, 'Wilhelmsen Ships Service (Gladstone)', '/Date(1642666500000+1000)/', '/Date(1642676400000+1000)/', 'Santos GLNG', 'SEA', 'CONF', 'Mexico', 'Incheon', None, 633458, 725], [130826, 331647, 'ARR', 'DL DAHLIA', 'BULK CARRIER', 229.0, 'Monson Agencies Australia (Gladstone)', '/Date(1642668300000+1000)/', '/Date(1642677300000+1000)/', 'Fairway Buoy Anchorage', 'Clinton Coal 1', 'CONF', 'Yeongheung', 'Tanjung Bin', '2712', 633296, 725], [132582, 334934, 'DEP', 'CORAL GEOGRAPHER', 'PASSENGER', 94.5, 'Coral Expeditions', '/Date(1642669200000+1000)/', '/Date(1642672800000+1000)/', 'C123', 'SEA', 'CONF', 'Cairns', 'Cairns', None, 633369, 725], [130422, 330911, 'DEP', 'NSU QUEST', 'BULK CARRIER', 299.94, 'Inchcape Shipping Services (Queensland)', '/Date(1642673700000+1000)/', '/Date(1642682700000+1000)/', 'Clinton Coal 3', 'SEA', 'CONF', 'Hay Point', 'Japan', '45', 632982, 725], [132795, 335383, 'REM', 'ALBATROSS BAY', 'LANDING CRAFT', 64.0, 'Sea Swift Pty Ltd', '/Date(1642674600000+1000)/', '/Date(1642676400000+1000)/', 'Horn Island', 'Main Jetty', 'CONF', 'Cairns', 'Seisia', 'AB 2203', 633274, 725], [132759, 335288, 'EXT', 'CMB PAUILLAC', 'BULK CARRIER', 235.0, 'Wilhelmsen Ships Service (Gladstone)', '/Date(1642675500000+1000)/', '/Date(1642679100000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Gove', 'Weipa', None, 633160, 705], [132430, 334596, 'DEP', 'GOLDEN YOSA', 'TANKER', 144.03, 'Sturrock Grindrod Maritime (Brisbane)', '/Date(1642676400000+1000)/', '/Date(1642692600000+1000)/', 'Viva Energy', 'SEA', 'SCHD', 'Geelong', 'Townsville', '74(C1)', 628015, 710], [132456, 334647, 'EXT', 'MISSY ENTERPRISE', 'GENERAL CARGO', 181.16, 'Wave Shipping Pty Ltd', '/Date(1642676400000+1000)/', '/Date(1642676460000+1000)/', 'SEA', 'Bundaberg Anchorage', 'PLAN', 'Singapore', 'Japan', '2', 631532, 705], [132389, 335619, 'EXT', 'GLOVIS CHORUS', 'VEHICLES CARRIER', 199.99, 'Gulf Agency Company (Brisbane)', '/Date(1642680000000+1000)/', '/Date(1642680000000+1000)/', 'SEA', 'Point Cartwright Anchorage', 'PLAN', 'Port Kembla', 'Pyeongtaek ', '77A', 630944, 705], [132505, 334744, 'EXT', 'NSU CHALLENGER', 'BULK CARRIER', 299.95, 'Gulf Agency Company (Gladstone)', '/Date(1642680000000+1000)/', '/Date(1642683600000+1000)/', 'SEA', 'Fairway Buoy Anchorage', 'PLAN', 'Nagoya', 'Oita', None, 633706, 705], [132727, 335219, 'EXT', 'RTM DIAS', 'BULK CARRIER', 234.87, 'Wilhelmsen Ships Service (Weipa)', '/Date(1642680900000+1000)/', '/Date(1642680900000+1000)/', 'SEA', 'Weipa Anchorage', 'PLAN', 'Gladstone', 'China', None, 633623, 705], [132859, 335500, 'ARR', 'FOURCROY', 'LANDING CRAFT', 49.8, 'Sea Swift Pty Ltd', '/Date(1642685400000+1000)/', '/Date(1642686900000+1000)/', 'SEA', 'Horn Island Barge Ramp', 'CONF', 'Saibai Island', 'Weipa', None, 633180, 725]], 'IsCustomMetaData': False, 'MetaData': {'__type': 'DataTableMetaDTO:#WebX.Core.DTO', 'Columns': [{'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'HAlignment': 'haright', 'Name': 'VOYAGE_ID', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Voyage Id', 'Visible': False, 'Width': '50px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'HAlignment': 'haright', 'Name': 'ID', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Id', 'Visible': False, 'Width': '20px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'JOB_TYPE_CODE', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Job Type', 'Visible': True, 'Width': '71px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '"link": {"title":"Ship Info", "type":"dashboard", "target":"_popup", "code":"standard.vesselinfo", "params":[{"name":"VID","value":"[%VESSEL_ID%]"}]}', 'Name': 'VESSEL_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Ship', 'Visible': True, 'Width': '94px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'MSQ_SHIP_TYPE', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Ship Type', 'Visible': True, 'Width': '115px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'HAlignment': 'haright', 'Name': 'LOA', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'LOA', 'Visible': True, 'Width': '95px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'AGENCY_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Agency', 'Visible': True, 'Width': '287px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'START_TIME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Start Time', 'Visible': True, 'Width': '91px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'END_TIME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'End Time', 'Visible': True, 'Width': '91px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'FROM_LOCATION_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'From Location', 'Visible': True, 'Width': '139px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'TO_LOCATION_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'To Location', 'Visible': True, 'Width': '139px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'STATUS_TYPE_CODE', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Status', 'Visible': True, 'Width': '83px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'LASTPORT_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Last Port', 'Visible': True, 'Width': '114px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'NEXTPORT_NAME', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Next Port', 'Visible': True, 'Width': '114px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'Name': 'VOYAGE_NUMBER', 'SortIndex': -1, 'SortOrder': '', 'Sortable': True, 'Template': '', 'Title': 'Voyage #', 'Visible': True, 'Width': '45px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'HAlignment': 'haright', 'Name': 'VESSEL_ID', 'SortIndex': -1, 'SortOrder': '', 'Template': '', 'Title': 'Vessel Id', 'Visible': False, 'Width': '64px'}, {'__type': 'ColumnMetaDTO:#WebX.Core.DTO', 'Format': '', 'HAlignment': 'haright', 'Name': 'STATUS_TYPE', 'SortIndex': -1, 'SortOrder': '', 'Template': '', 'Title': 'Status Type', 'Visible': False, 'Width': '64px'}], 'Script': 'var data = this.getData();\nvar $row = this.get$Row();\nvar $jobtype = this.get$Cell(\'JOB_TYPE\');\n\nvar $startTime = this.get$Cell(\'START_TIME\');\nvar $endTime = this.get$Cell(\'END_TIME\');\n\nif (data.JOB_TYPE == "Arrival")\n{\n $jobtype.css(\'color\', \'green\');\n}\nif (data.JOB_TYPE == "Departure")\n{\n $jobtype.css(\'color\', \'blue\');\n}\nif (data.JOB_TYPE == "Shift")\n{\n $jobtype.css(\'color\', \'#8B7500\');\n}\nif (data.JOB_TYPE == "External")\n{\n $jobtype.css(\'color\', \'grey\');\n}\n\nif (data.STATUS_TYPE >= 735 &&data.STATUS_TYPE < 750 )\n{\n $startTime.css(\'font-weight\', \'bold\');\n $endTime.css(\'font-weight\', \'bold\');\n $startTime.css(\'font-style\', \'italic\');\n $endTime.css(\'font-style\', \'italic\');\n}\n\n', 'TemplateRow': '', 'TemplateTable': '', 'Version': 0}, 'Name': 'DATA'}]}}
I've tried using pd.json_normalize with and without record_path. Specifying record_path draws an error where column name can't be found.
print(pd.json_normalize(my_dict))
Output:
d.__type d.BuildVersion d.ReportCode d.Tables
0 DataSetDTO:#WebX.Core.DTO 7.0.0.12590 MSQ-WEB-0001 [{'__type': 'DataTableDTO:#WebX.Core.DTO', 'Bu...
print(pd.json_normalize(my_dict, record_path=['Data']))
Error:
File "/Users/kevin_o'connell/opt/anaconda3/lib/python3.8/site-packages/pandas/io/json/_normalize.py", line 243, in _pull_field
result = result[spec]
KeyError: 'Data'
I've also tried the following but as the print out shows, I'm not returning the tabular information associated with Data.
print(pd.concat({k: pd.DataFrame(v).T for k, v in my_dict.items()}, axis=0))
0
d __type DataSetDTO:#WebX.Core.DTO
BuildVersion 7.0.0.12590
ReportCode MSQ-WEB-0001
Tables {'__type': 'DataTableDTO:#WebX.Core.DTO', 'Bui...
Returning the desired info as an object, not a pandas df:
df = pd.json_normalize(my_dict['d'], 'Tables')
df = pd.DataFrame(df['Data'].T)
Out:
Data
0 [[132393, 334520, EXT, CESI BEIHAI, LIQUEFIED ...
list meta as a parameter:
df = pd.json_normalize(my_dict['d'], record_path = 'Tables', meta = ['Data'], errors = 'ignore')
raise ValueError(
ValueError: Conflicting metadata name Data, need distinguishing prefix
record_path is the path to the record, so you should specify the full path
df = pd.json_normalize(data, record_path=['d', 'Tables', 'Data'])
If you want to do without record_path, the value type of Data is list of list. You can use pd.DataFrame directly
df = pd.DataFrame(data['d']['Tables'][0]['Data'])
print(df)
0 1 2 3 4 5 ... 11 12 13 14 15 16
0 132378 334489 EXT NANA Z BULK CARRIER 229.20 ... PLAN Keelung (Chilung) Kwangyang None 633086 705
1 132112 333984 DEP KRITI WARRIOR BULK CARRIER 234.98 ... CONF Amrun Amrun 2201 632395 725
2 132232 334208 EXT BLUE GRASS MARINER TANKER 183.06 ... PLAN Gladstone Singapore None 633566 705
3 132654 335076 EXT SERIFOS WARRIOR BULK CARRIER 234.98 ... PLAN Amrun Amrun 2201 632055 705
4 132030 333847 ARR MH GREEN CONTAINER SHIP 199.98 ... SCHD Yantian Botany Bay 11S/11N 633005 710
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
71 132456 334647 EXT MISSY ENTERPRISE GENERAL CARGO 181.16 ... PLAN Singapore Japan 2 631532 705
72 132389 335619 EXT GLOVIS CHORUS VEHICLES CARRIER 199.99 ... PLAN Port Kembla Pyeongtaek 77A 630944 705
73 132505 334744 EXT NSU CHALLENGER BULK CARRIER 299.95 ... PLAN Nagoya Oita None 633706 705
74 132727 335219 EXT RTM DIAS BULK CARRIER 234.87 ... PLAN Gladstone China None 633623 705
75 132859 335500 ARR FOURCROY LANDING CRAFT 49.80 ... CONF Saibai Island Weipa None 633180 725
[76 rows x 17 columns]

How to Arrange a List of Dictionaries in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
data=[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}]
I have a dictionary.
Need to receive a list of dictionaries comma separated: [{},{},..]
How do I add them in a loop?
I tried to use append:
data_list.append(data.copy())
But it returns smth different: [[{...}]]
How do I get a list of such format:
[{'address': 'High Tech Campus 60', 'beta': 1.406659, 'ceo': 'Mr. Richard Clemmer', 'changes': -3.9400024, 'cik': '0001413447', 'city': 'Eindhoven', 'companyName': 'NXP Semiconductors N.V.', 'country': 'NL', 'currency': 'USD', ...}, {'address': '41st, 1155 Rene-Leve...W Flr 4000', 'beta': 2.219123, 'ceo': 'Mr. Klaus Paulini', 'changes': -0.00999999, 'cik': '0001113423', 'city': 'MONTREAL', 'companyName': 'Aeterna Zentaris Inc.', 'country': 'CA', 'currency': 'USD', ...}, {'address': '125 Summer Street', 'beta': 0.0, 'ceo': 'Dr. Jean-Pierre Som...ossi Ph.D.', 'changes': 2.5800018, 'cik': '0001593899', 'city': 'Boston', 'companyName': 'Atea Pharmaceuticals, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '401 Charmany Dr', 'beta': 1.073689, 'ceo': 'Mr. Corey Chambas', 'changes': 0.0, 'cik': '0001521951', 'city': 'Madison', 'companyName': 'First Business Finan...ices, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Arsenal Way', 'beta': 0.0, 'ceo': 'Mr. Marc A. Cohen', 'changes': -0.9699974, 'cik': '0001662579', 'city': 'Watertown', 'companyName': 'C4 Therapeutics, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': 'General-Guisan-Strasse 6', 'beta': 1.629418, 'ceo': 'Mr. Carlos Creus Moreira', 'changes': -0.09000015, 'cik': '0001738699', 'city': 'Zug', 'companyName': 'WISeKey Internationa...Holding AG', 'country': 'CH', 'currency': 'USD', ...}, {'address': '508 W Wall St Ste 800', 'beta': 1.7762, 'ceo': 'Mr. Stephen Jumper', 'changes': -0.04999995, 'cik': '0000799165', 'city': 'Midland', 'companyName': 'Dawson Geophysical Company', 'country': 'US', 'currency': 'USD', ...}, {'address': '955 Perimeter Road', 'beta': 0.0, 'ceo': 'Mr. Ravi Vig', 'changes': -1.2900009, 'cik': '0000866291', 'city': 'Manchester', 'companyName': 'Allegro MicroSystems, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '490 Lapp Rd', 'beta': 1.138646, 'ceo': 'Ms. Geraldine Henwood', 'changes': -0.04999995, 'cik': '0001588972', 'city': 'Malvern', 'companyName': 'Recro Pharma, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '5 Haplada Street, PO Box 5011', 'beta': 1.396288, 'ceo': 'Mr. Guy Bernstein', 'changes': -0.9300003, 'cik': '0000876779', 'city': 'OR YEHUDA', 'companyName': 'Magic Software Enter...rises Ltd.', 'country': 'IL', 'currency': 'USD', ...}, {'address': '111 West 33rd Street', 'beta': 0.0, 'ceo': 'Mr. Richard Gumer', 'changes': -0.20249999, 'cik': '0001823323', 'city': 'New York', 'companyName': 'KL Acquisition Corp', 'country': 'US', 'currency': 'USD', ...}, {'address': '2 Canal Park Ste 4', 'beta': 1.907176, 'ceo': 'Mr. Langley Steinert', 'changes': -1.4399986, 'cik': '0001494259', 'city': 'Cambridge', 'companyName': 'CarGurus, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '119 Standard St', 'beta': 1.592636, 'ceo': 'Mr. Ethan Brown', 'changes': -3.859993, 'cik': '0001655210', 'city': 'El Segundo', 'companyName': 'Beyond Meat, Inc.', 'country': 'US', 'currency': 'USD', ...}, {'address': '3854 American Way Ste A', 'beta': 0.502729, 'ceo': 'Mr. Paul Kusserow', 'changes': -1.5899963, 'cik': '0000896262', 'city': 'Baton Rouge', 'companyName': 'Amedisys, Inc.', 'country': 'US', 'currency': 'USD', ...}, ...]
Ok, it looks like initially I have not a dictionary but a list of dictionaries from one element. So how do I add another dictionary to the list after comma?
Upd: I managed to receive a list of dictionaries. It appeared it's not fully correct as some rows include additional fields. The list looks like this:'currency': 'USD', ...}, 'code', 'status', {'address': '5 ...
How can I validate a list of dictionaries and make sure every dictionary matches predefined list of columns.
enter code here
data_list.append(data[0].copy())
You could also do
data_list = data_list + data

how to find the no. of person from a particular country from the below code?

[
{'Year': 1901,
'Category': 'Chemistry',
'Prize': 'The Nobel Prize in Chemistry 1901',
'Motivation': '"in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"',
'Prize Share': '1/1',
'Laureate ID': 160,
'Laureate Type': 'Individual',
'Full Name': "Jacobus Henricus van 't Hoff",
'Birth Date': '1852-08-30',
'Birth City': 'Rotterdam',
'Birth Country': 'Netherlands',
'Sex': 'Male',
'Organization Name': 'Berlin University',
'Organization City': 'Berlin',
'Organization Country': 'Germany',
'Death Date': '1911-03-01',
'Death City': 'Berlin',
'Death Country': 'Germany'},
{'Year': 1901,
'Category': 'Literature',
'Prize': 'The Nobel Prize in Literature 1901',
'Motivation': '"in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect"',
'Prize Share': '1/1',
'Laureate ID': 569,
'Laureate Type': 'Individual',
'Full Name': 'Sully Prudhomme',
'Birth Date': '1839-03-16',
'Birth City': 'Paris',
'Birth Country': 'France',
'Sex': 'Male',
'Organization Name': '',
'Organization City': '',
'Organization Country': '',
'Death Date': '1907-09-07',
'Death City': 'Châtenay',
'Death Country': 'France'}
]
If you want to find, how many person belong to same birth country only from given list of dict, you can use the following code :
from collections import Counter
li = [each['Birth City'] for each in val if each['Birth City']]
print(dict(Counter(li)))
OUTPUT
{'Rotterdam': 1, 'Paris': 1}

BeautifulSoup webpage scraping

I am trying to scrap a webpage.
from bs4 import BeautifulSoup
import requests
page = requests.get('https://www.mql5.com/en/economic-calendar/united-states')
soup = BeautifulSoup(page.content, 'html.parser')
calender = soup.find(id="economicCalendarTable")
items = calender.find_all(class_="ec-table__title")
print(items)
However, it prints an empty list, although in the webpage there are many entries with tag "class_="ec-table__title". What I found is that the tags inside "id="economicCalendarTable" tag are just in one line (very long). So, "calender.find_all" skips everything.
I am trying to get all tages inside 'id="economicCalendarTable"'.
Is there way to do this?
You can use selenium:
from selenium import webdriver
import re
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome()
d.get('https://www.mql5.com/en/economic-calendar/united-states')
s = soup(d.page_source, 'lxml')
time = s.find('span', {'id':'economicCalendarTableColumnTime'}).text
title = s.find('div', {'class':'ec-table__title'}).text
classes = ['ec-table__col_time', 'ec-table__curency-name', 'ec-table__col_event', 'ec-table__col_forecast', 'prevValue']
full_data = [[i.find('div', {'class':c if c != 'prevValue' else re.compile('prevValue\d+')}) for c in classes] for i in s.find_all('div', {'class':'ec-table__item'})]
new_results = [dict(zip(['time', 'name', 'event', 'forcast', 'prevous_value'], [getattr(i, 'text', '') for i in b])) for b in full_data]
Output:
[{'event': u'Chicago Fed National Activity Index', 'forcast': u'0.14', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Markit Manufacturing PMI', 'forcast': u'56.4', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'Markit Services PMI', 'forcast': u'55', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'Markit Composite PMI', 'forcast': u'55', 'name': u'USD', 'prevous_value': '', 'time': u' 09:45'}, {'event': u'New Home Sales m/m', 'forcast': u'-1.2%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'New Home Sales', 'forcast': u'0.639 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'EIA Crude Oil Stocks Change', 'forcast': u'-1.791 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Cushing Crude Oil Stocks Change', 'forcast': u'0.259 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Crude Oil Imports Change', 'forcast': u'-0.32 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Distillate Fuel Production Change', 'forcast': u'-0.011 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Distillates Stocks Change', 'forcast': u'-0.182 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Gasoline Production Change', 'forcast': u'0.289 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Heating Oil Stocks Change', 'forcast': u'-0.026 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'EIA Gasoline Stocks Change', 'forcast': u'-3.206 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'FOMC Minutes', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 14:00'}, {'event': u'Continuing Jobless Claims', 'forcast': u'1.769 M', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Initial Jobless Claims', 'forcast': u'216 K', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Initial Jobless Claims 4-Week Average', 'forcast': u'213.814 K', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'HPI m/m', 'forcast': u'0.5%', 'name': u'USD', 'prevous_value': '', 'time': u' 09:00'}, {'event': u'Existing Home Sales', 'forcast': u'5.45 M', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Existing Home Sales m/m', 'forcast': u'0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'EIA Natural Gas Storage Change', 'forcast': u'92 B', 'name': u'USD', 'prevous_value': '', 'time': u' 10:30'}, {'event': u'Durable Goods Orders m/m', 'forcast': u'-0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Core Durable Goods Orders m/m', 'forcast': u'0.0%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Durable Goods Orders excl. Defense m/m', 'forcast': u'-6.2%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Nondefense Capital Goods Orders excl. Aircraft m/m', 'forcast': u'0.3%', 'name': u'USD', 'prevous_value': '', 'time': u' 08:30'}, {'event': u'Fed Chair Powell Speech', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 09:20'}, {'event': u'Michigan Consumer Sentiment', 'forcast': u'98.5', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Consumer Expectations', 'forcast': u'88.9', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Current Conditions', 'forcast': u'112.9', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan Inflation Expectations', 'forcast': u'2.7%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Michigan 5-Year Inflation Expectations', 'forcast': u'2.5%', 'name': u'USD', 'prevous_value': '', 'time': u' 10:00'}, {'event': u'Baker Hughes US Oil Rig Count', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 13:00'}, {'event': u'CFTC Copper Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Crude Oil Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC S&P 500 Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Gold Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}, {'event': u'CFTC Silver Non-Commercial Net Positions', 'forcast': u'', 'name': u'USD', 'prevous_value': '', 'time': u' 15:30'}]
Here's a simple example I've put together using Selenium and BeautifulSoup:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Selenium part
browser = webdriver.Chrome()
browser.get('https://www.mql5.com/en/economic-calendar/united-states')
# BeautifulSoup part
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
calender = soup.find(id="economicCalendarTable")
items = calender.find_all(class_="ec-table__title")
print(items)
This code will allow you to download the page entirely and then pass the complete html source to BS
Be sure to install Selenium and the ChromeDriver correctly before running this script.
There is no item with class ec-table__title in the base html of that page.
However, it does appear when using a dom inspector in the browser. I am afraid this is a sure sign that it has been inserted into the DOM by javascript ad indeed there is some javascript invoked by that webpage.
May I suggest that you investigate using the selenium module in conjunction with BeautifulSoup?

Categories

Resources