getting class_ on wb scraping for two elements

getting class_ on wb scraping for two elements - python

I am doing a web scraping for top 10 teams icc, i got same class for both points and matches
"td",class_='table-body__cell u-center-text'
how do i split this
page=requests.get(url1)
page
soup1 = BeautifulSoup(page.content,"html.parser") print(soup1.prettify())
matches = []
for i in soup1.find_all("td",class_='rankings-block__banner-matches'):
matches.append(i.text)
matches

Simple way use pandas
You can use pandas to read the table into a dataframe and pick the values you want:
import pandas as pd
pd.read_html('https://www.icc-cricket.com/rankings/mens/team-rankings/odi/')[0]
Alternative with bs4
matches = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(3)')]
points = [x.get_text() for x in soup.select('table.table tr td:nth-of-type(4)')]
print(matches, points)
or
matches=[]
points=[]
for x in soup.select('table.table tr')[1:]:
matches.append(x.select_one('td:nth-of-type(3)').get_text())
points.append(x.select_one('td:nth-of-type(4)').get_text())
print(matches, points)

A complete solution, just run this code and you will get a dictionary with all the data from the table organized nicely:
# get the entire table
table = soup1.find('table', {'class': 'table'})
# create dictionary to hold results
rankings = {}
# separate first row since it uses different markup than the rest
position = table.find('td', {'class': 'rankings-block__banner--pos'}).text.strip()
country_name = table.find('span', {'class': 'u-hide-phablet'}).text.strip()
matches = table.find('td', {'class': 'rankings-block__banner--matches'}).text.strip()
points = table.find('td', {'class': 'rankings-block__banner--points'}).text.strip()
rating = table.find('td', {'class': 'rankings-block__banner--rating u-text-right'}).text.strip()
rankings[country_name] = {'position': position,
'matches': matches,
'points': points,
'rating': rating}
# for the next rows, use a loop
for row in table.find_all('tr', {'class': 'table-body'}):
position = row.find('td', {'class': 'table-body__cell table-body__cell--position u-text-right'}).text.strip()
country_name = row.find('span', {'class': 'u-hide-phablet'}).text.strip()
matches = row.find_all('td', {'class': 'table-body__cell u-center-text'})[0].text.strip()
points = row.find_all('td', {'class': 'table-body__cell u-center-text'})[1].text.strip()
rating = row.find('td', {'class': 'table-body__cell u-text-right rating'}).text.strip()
rankings[country_name] = {'position': position,
'matches': matches,
'points': points,
'rating': rating}
rankings
Which outputs:
{'New Zealand': {'position': '1',
'matches': '17',
'points': '2,054',
'rating': '121'},
'England': {'position': '2',
'matches': '32',
'points': '3,793',
'rating': '119'},
'Australia': {'position': '3',
'matches': '28',
'points': '3,244',
'rating': '116'},
'India': {'position': '4',
'matches': '32',
'points': '3,624',
'rating': '113'},
'South Africa': {'position': '5',
'matches': '25',
'points': '2,459',
'rating': '98'},
'Pakistan': {'position': '6',
'matches': '27',
'points': '2,524',
'rating': '93'},
'Bangladesh': {'position': '7',
'matches': '30',
'points': '2,740',
'rating': '91'},
'West Indies': {'position': '8',
'matches': '30',
'points': '2,523',
'rating': '84'},
'Sri Lanka': {'position': '9',
'matches': '32',
'points': '2,657',
'rating': '83'},
'Afghanistan': {'position': '10',
'matches': '17',
'points': '1,054',
'rating': '62'},
'Netherlands': {'position': '11',
'matches': '7',
'points': '336',
'rating': '48'},
'Ireland': {'position': '12',
'matches': '25',
'points': '1,145',
'rating': '46'},
'Oman': {'position': '13', 'matches': '11', 'points': '435', 'rating': '40'},
'Scotland': {'position': '14',
'matches': '8',
'points': '308',
'rating': '39'},
'Zimbabwe': {'position': '15',
'matches': '20',
'points': '764',
'rating': '38'},
'Nepal': {'position': '16', 'matches': '11', 'points': '330', 'rating': '30'},
'UAE': {'position': '17', 'matches': '9', 'points': '190', 'rating': '21'},
'United States': {'position': '18',
'matches': '14',
'points': '232',
'rating': '17'},
'Namibia': {'position': '19', 'matches': '6', 'points': '97', 'rating': '16'},
'Papua New Guinea': {'position': '20',
'matches': '10',
'points': '0',
'rating': '0'}}
In addition, you can also add it to a pandas dataframe for better analysis:
pd.DataFrame(rankings)
Which outputs:
New Zealand England Australia India South Africa Pakistan Bangladesh West Indies Sri Lanka Afghanistan Netherlands Ireland Oman Scotland Zimbabwe Nepal UAE United States Namibia Papua New Guinea
position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
matches 17 32 28 32 25 27 30 30 32 17 7 25 11 8 20 11 9 14 6 10
points 2,054 3,793 3,244 3,624 2,459 2,524 2,740 2,523 2,657 1,054 336 1,145 435 308 764 330 190 232 97 0
rating 121 119 116 113 98 93 91 84 83 62 48 46 40 39 38 30 21 17 16 0

Related

How can I pull one item, from a list of dictionary items, using a single key:value pair?

Let's say I have a list of dictionary items:
main_dict = [{'Player': '1', 'position': 'main', 'points': 50},
{'Player': '2', 'position': 'main', 'points': 60},
{'Player': '3', 'position': 'main', 'points': 70},
{'Player': '4', 'position': 'main', 'points': 80},
{'Player': '5', 'position': 'main', 'points': 90}]
I ran some code and got this result:
90
I now want to pull the full dictionary item, from index in the list, using only the value of the points key.
if points == 90:
new_item = (#find item in main_dict[4])
output: {'Player': '5', 'position': 'main', 'points': 90}
How can I pull the full item out of list, using only the unique value of 90?

filter built-in should do the trick. If you want to match all items:
new_item = list(filter(lambda x: x['points'] == 90, main_dict))
if you want only the first item which matches:
new_item = next(filter(lambda x: x['points'] == 90, main_dict))

Try this:
main_dict = [{'Player': '1', 'position': 'main', 'points': 50},
{'Player': '2', 'position': 'main', 'points': 60},
{'Player': '3', 'position': 'main', 'points': 70},
{'Player': '4', 'position': 'main', 'points': 80},
{'Player': '5', 'position': 'main', 'points': 90}]
def getDict(i):
for retDict in main_dict:
if i == retDict.get('points'):
return(retDict)
print(getDict(90))

You can use filter the list to dicts with 'points': 90 using list comprehension:
[inner_dict for inner_dict in main_dict if inner_dict['points'] == 90]

Yeah I'm gonna assume this is an XY problem and you should just get the item directly, without first finding the points value. For example if you want the player with the most points:
>>> max(main_dict, key=lambda d: d['points'])
{'Player': '5', 'position': 'main', 'points': 90}

Extract value in Python

My Code:
import requests
import json
web_page = requests.get("http://api.bart.gov/api/etd.aspx?cmd=etd&orig=mont&key=MW9S-E7SL-26DU-VV8V&json=y")
response = web_page.text
parsed_json = json.loads(response)
#print(parsed_json)
print(parsed_json['root']['date'])
print(parsed_json['root']['time'])
print(parsed_json['root']['station']['name'])
How to extract value of destination and minutes from below in Python.
[{'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'WHITE', 'hexcolor': '#ffffff', 'bikeflag': '1', 'delay': '220'}]}, {'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '16', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '132'}, {'minutes': '26', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '69'}]}]}]

Try this:
json_obj = {'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '1', 'platform': '2', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '254'}]},
{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]},
{'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '38', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0'}]}]}
for item in json_obj['etd']:
dest = item['destination']
minute = item['estimate'][0]['minutes']
print(dest, minute)
Output:
Antioch 1
Daly City 39
SF Airport 38

The problem is in parsed_json['root']['station']['name']. parsed_json['root']['station'] is a list, not a dict, so it doesn't have name key. You need to use index 0 or iterate over it
for station in parsed_json['root']['station']:
for etd in station['etd']:
for estimate in etd['estimate']:
print(etd['destination'], estimate['minutes'])
Output
Daly City 35
SF Airport 16
SF Airport 26

Try this to get json data:
import json
# some JSON:
json_data= {'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]}
# parse json_data:
data = json.dumps(json_data)
extract_json = json.loads(data)
print("Destination: "+extract_json["destination"])
print("Minutes: "+extract_json["estimate"][0]["minutes"])
Output:
Destination: Daly City
Minutes: 39

Assuming the data is in d_MONT:
d_MONT = {'name': 'Montgomery St.', 'abbr': 'MONT', 'etd': [{'destination': 'Antioch', 'abbreviation': 'ANTC', 'limited': '0', 'estimate': [{'minutes': '1', 'platform': '2', 'direction': 'North', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '254'}]},
{'destination': 'Daly City', 'abbreviation': 'DALY', 'limited': '0', 'estimate': [{'minutes': '39', 'platform': '1', 'direction': 'South', 'length': '0', 'color': 'BLUE', 'hexcolor': '#0099cc', 'bikeflag': '1', 'delay': '0'}]},
{'destination': 'SF Airport', 'abbreviation': 'SFIA', 'limited': '0', 'estimate': [{'minutes': '38', 'platform': '1', 'direction': 'South', 'length': '10', 'color': 'YELLOW', 'hexcolor': '#ffff33', 'bikeflag': '1', 'delay': '0'}]}]}
This will find the next train to destinationRequired:
destinationList = d_MONT['etd']
destinationRequired = 'Daly City'
for destinationDict in destinationList:
if destinationDict['destination'] == destinationRequired:
earliest = None
for estimate in destinationDict['estimate']:
if earliest is None or estimate['minutes'] < eariest:
earliest = estimate['minutes']
print("Next train to {0}: {1} minutes".format(destinationRequired, earliest))
break
else:
print("No trains to {0}".format(destinationRequired))
Note there are more Pythonic ways to do this, and the code example above does not follow PEP8, but I think it is important you understand the basic logic of how to do what you want rather than a complex Python one-liner.
You do not document the JSON object format, so I don't think it is safe to assume the list of trains to destination will be in order, therefore the safest is to step through each one and find the earliest. It isn't even clear if more than one train will ever be returned in the list, in which case a simple [0] would be sufficient rather than stepping through each one.

Creating a dataframe from a dictionary within tuple

I have a dictionary within a tuple and I want to know how to access it and create a dataframe merging the dictionary value into single row
Example:
({'Id': '4', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv1331', 'DT': '08/1/19', 'AMT': '1500'}, {'Id': '9', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv4321', 'DT': '02/6/19', 'AMT': '1000'})
Expected Result:
Id_1 BU_1 V_ID_1 INV_1 DT_1 AMT_1 Id_2 BU_2 V_ID_2 INV_2 DT_2 AMT_2
---------------------------------------------------------------------------------------------
4 usa 44 inv1331 08/1/19 1500 9 usa 44 inv4321 02/6/19 1000

x = ({'Id': '4', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv1331', 'DT': '08/1/19', 'AMT': '1500'}, {'Id': '9', 'BU': 'usa', 'V_ID': '44', 'INV': 'inv4321', 'DT': '02/6/19', 'AMT': '1000'})
data = {f"{k}_{i+1}": v for i, d in enumerate(x) for k, v in d.items()}
df = pd.DataFrame(data, index = [0])
Output:
>>> df
Id_1 BU_1 V_ID_1 INV_1 DT_1 ... BU_2 V_ID_2 INV_2 DT_2 AMT_2
0 4 usa 44 inv1331 08/1/19 ... usa 44 inv4321 02/6/19 1000
[1 rows x 12 columns]

Python with Json, If Statement

I have the json code below and I have a list
i want to do a for loop or if statement which
if label in selected_size:
fsize = id
selected_size[]
in selected size:
[7, 7.5, 4, 4.5]
in json:
removed
print(json_data)
for size in json_data:
if ['label'] in select_size:
fsize = ['id']
print(fsize)
i have no idea on how to do it.

You need to access to list and later to dict, for example:
json_data = [{'id': '91', 'label': '10.5', 'price': '0', 'oldPrice': '0', 'products': ['81278']}, {'id': '150', 'label': '9.5', 'price': '0', 'oldPrice': '0', 'products': ['81276']}, {'id': '28', 'label': '4', 'price': '0', 'oldPrice': '0', 'products': ['81270']}, {'id': '29', 'label': '5', 'price': '0', 'oldPrice': '0', 'products': ['81271']}, {'id': '22', 'label': '8', 'price': '0', 'oldPrice': '0', 'products': ['81274']}, {'id': '23', 'label': '9', 'price': '0', 'oldPrice': '0', 'products': ['81275']}, {'id': '24', 'label': '10', 'price': '0', 'oldPrice': '0', 'products': ['81277']}, {'id': '25', 'label': '11', 'price': '0', 'oldPrice': '0', 'products': ['81279']}, {'id': '26', 'label': '12', 'price': '0', 'oldPrice': '0', 'products': ['81280']}]
fsize = []
select_size = [7, 7.5, 4, 4.5]
[float(i) for i in select_size] #All select_size values to float value
for size in json_data:
if float(size['label']) in select_size: #For compare it i need float(size['label']) for convert to float.
fsize.append(size['id']) #Add to list
print(fsize) #Print all list, i get only 28

Convert Pandas Dataframe to_dict() with unique column values as keys

How can I convert a pandas dataframe to a dict using unique column values as the keys for the dictionary? In this case I want to use unique username's as the key.
Here is my progress so far based on information found on here and online.
My test dataframe:
import pandas
import pprint
df = pandas.DataFrame({
'username': ['Kevin', 'John', 'Kevin', 'John', 'Leslie', 'John'],
'sport': ['Soccer', 'Football', 'Racing', 'Tennis', 'Baseball', 'Bowling'],
'age': ['51','32','20','19','34','27'],
'team': ['Cowboyws', 'Packers', 'Sonics', 'Raiders', 'Wolves', 'Lakers']
})
I can create a dictionary by doing this:
dct = df.to_dict(orient='records')
pprint.pprint(dct, indent=4)
>>>>[{'age': '51', 'sport': 'Soccer', 'team': 'Cowboyws', 'username': 'Kevin'},
{'age': '32', 'sport': 'Football', 'team': 'Packers', 'username': 'John'},
{'age': '20', 'sport': 'Racing', 'team': 'Sonics', 'username': 'Kevin'},
{'age': '19', 'sport': 'Tennis', 'team': 'Raiders', 'username': 'John'},
{'age': '34', 'sport': 'Baseball', 'team': 'Wolves', 'username': 'Leslie'},
{'age': '27', 'sport': 'Bowling', 'team': 'Lakers', 'username': 'John'}]
I tried using the groupby and apply method which got me closer but it converts all the values to lists. I want them to remain as dictionaries so i can retain the each value's key:
result = df.groupby('username').apply(lambda x: x.values.tolist()).to_dict()
pprint.pprint(result, indent=4)
{ 'John': [ ['32', 'Football', 'Packers', 'John'],
['19', 'Tennis', 'Raiders', 'John'],
['27', 'Bowling', 'Lakers', 'John']],
'Kevin': [ ['51', 'Soccer', 'Cowboyws', 'Kevin'],
['20', 'Racing', 'Sonics', 'Kevin']],
'Leslie': [['34', 'Baseball', 'Wolves', 'Leslie']]}
This is the desired result I want:
{
'John': [{'age': '32', 'sport': 'Football', 'team': 'Packers', 'username': 'John'},
{'age': '19', 'sport': 'Tennis', 'team': 'Raiders', 'username': 'John'},
{'age': '27', 'sport': 'Bowling', 'team': 'Lakers', 'username': 'John'}],
'Kevin': [{'age': '51', 'sport': 'Soccer', 'team': 'Cowboyws', 'username': 'Kevin'},
{'age': '20', 'sport': 'Racing', 'team': 'Sonics', 'username': 'Kevin'}],
'Leslie': [{'age': '34', 'sport': 'Baseball', 'team': 'Wolves', 'username': 'Leslie'}]
}

Use groupby and apply. Inside the apply, call to_dict with the "records" orient (similar to what you've figured out already).
df.groupby('username').apply(lambda x: x.to_dict(orient='r')).to_dict()

I prefer using for loop here , also you may want to drop the username columns , since it is redundant
d = {x: y.drop('username',1).to_dict('r') for x , y in df.groupby('username')}
d
Out[212]:
{'John': [{'age': '32', 'sport': 'Football', 'team': 'Packers'},
{'age': '19', 'sport': 'Tennis', 'team': 'Raiders'},
{'age': '27', 'sport': 'Bowling', 'team': 'Lakers'}],
'Kevin': [{'age': '51', 'sport': 'Soccer', 'team': 'Cowboyws'},
{'age': '20', 'sport': 'Racing', 'team': 'Sonics'}],
'Leslie': [{'age': '34', 'sport': 'Baseball', 'team': 'Wolves'}]}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

getting class_ on wb scraping for two elements - python

Related

How can I pull one item, from a list of dictionary items, using a single key:value pair?

Extract value in Python

Creating a dataframe from a dictionary within tuple

Python with Json, If Statement

Convert Pandas Dataframe to_dict() with unique column values as keys

Categories

Resources