I am learning Python (teaching myself through articles and tutorials) and have come across something that I need assistance with. I have JSON data that represents TV listings, and I want to read it, edit it (remove outdated listings), and rewrite it back out. The sticking points for me are the nested data and how to iterate through/reference it to skip over the objects I don't want when writing out. Thanks. Here is some sample data:
{
"1": {
"channel_id": "1",
"img": "https://guide.tv/assets/images/channels/1.png",
"items": [
{
"category": "Comedy",
"channel": "1",
"description": "Latest episode of show A",
"end_time": "2017-09-11 20:30:00",
"language": "us",
"name": "Show A",
"quality": "720p",
"runtime": "30",
"time": "2017-09-11 20:00:00",
"version": "None"
},
{
"category": "Comedy",
"channel": "1",
"description": "Latest episode of show B",
"end_time": "2017-09-12 21:00:00",
"language": "us",
"name": "Show B",
"quality": "720p",
"runtime": "30",
"time": "2017-09-12 20:30:00",
"version": "None"
},
],
"name": "01 - NBC"
},
"2": {
"channel_id": "2",
"img": "https://guide.tv/assets/images/channels/2.png",
"items": [
{
"category": "Drama",
"channel": "2",
"description": "Latest episode of show C",
"end_time": "2017-09-10 23:00:00",
"language": "us",
"name": "Show C",
"quality": "720p",
"runtime": "180",
"time": "2017-09-10 20:00:00",
"version": "None"
},
{
"category": "Drama",
"channel": "2",
"description": "Latest episode of show D",
"end_time": "2017-09-11 23:00:00",
"language": "us",
"name": "Show D",
"quality": "720p",
"runtime": "60",
"time": "2017-09-11 22:00:00",
"version": "None"
},
{
"category": "Action",
"channel": "2",
"description": "Latest episode of Show E",
"end_time": "2017-09-11 22:00:00",
"language": "us",
"name": "Show E",
"quality": "720p",
"runtime": "180",
"time": "2017-09-11 19:00:00",
"version": "None"
},
{
"category": "Fiction",
"channel": "2",
"description": "Latest episode of show F",
"end_time": "2017-09-10 19:00:00",
"language": "us",
"name": "Show F",
"quality": "720p",
"runtime": "180",
"time": "2017-09-10 16:00:00",
"version": "None"
},
],
"name": "02 - CBS"
},
"3": {
"channel_id": "3",
"img": "https://guide.tv/assets/images/channels/3.png",
"items": [
{
"category": "Comedy",
"channel": "3",
"description": "Latest episode of show G",
"end_time": "2017-09-18 12:00:00",
"language": "us",
"name": "Show G",
"quality": "hqlq",
"runtime": "120",
"time": "2017-09-18 10:00:00",
"version": "None"
},
{
"category": "Action",
"channel": "3",
"description": "Latest episode of show H",
"end_time": "2017-09-19 12:00:00",
"language": "us",
"name": "Show H",
"quality": "hqlq",
"runtime": "120",
"time": "2017-09-19 10:00:00",
"version": "None"
},
],
"name": "03 - ABC"
}
}
This is the code I have tried:
import json
with open('file.json') as data_file:
data = json.load(data_file)
for element in data.values():
if 'items' in element:
for e2 in element['items']:
if '2017-09-10' in e2['time']:
del e2
print json.dumps(data, indent=4, sort_keys=True)
Given how del works, that's not too surprising. You want to remove an item from the list of items, not "delete a declared variable from the current execution scope". So instead of del e2, remove e2 from element['items'], using remove and friends, e.g.:
elements = data.values()
for element in elements:
if 'items' in element:
listing = element['items']
for entry in listing:
if 'time' in entry and '2017-09-10' in entry['time']:
listing.remove(entry)
Related
{
"generated_at": "2022-09-19T15:30:42+00:00",
"sport_schedule_sport_event_markets": [{
"sport_event": {
"id": "sr:sport_event:33623209",
"start_time": "2022-09-18T17:00:00+00:00",
"start_time_confirmed": true,
"competitors": [{
"id": "sr:competitor:4413",
"name": "Baltimore Ravens",
"country": "USA",
"country_code": "USA",
"abbreviation": "BAL",
"qualifier": "home",
"rotation_number": 264
}, {
"id": "sr:competitor:4287",
"name": "Miami Dolphins",
"country": "USA",
"country_code": "USA",
"abbreviation": "MIA",
"qualifier": "away",
"rotation_number": 263
}]
},
"markets": [{
"id": "sr:market:1",
"name": "1x2",
"books": [{
"id": "sr:book:18149",
"name": "DraftKings",
"removed": true,
"external_sport_event_id": "180327504",
"external_market_id": "120498143",
"outcomes": [{
"id": "sr:outcome:1",
"type": "home",
"odds_decimal": "1.13",
"odds_american": "-800",
"odds_fraction": "1\/8",
"open_odds_decimal": "1.37",
"open_odds_american": "-270",
"open_odds_fraction": "10\/27",
"external_outcome_id": "0QA120498143#1341135376_13L88808Q1468Q20",
"removed": true
}, {
"id": "sr:outcome:2",
"type": "draw",
"odds_decimal": "31.00",
"odds_american": "+3000",
"odds_fraction": "30\/1",
"open_odds_decimal": "36.00",
"open_odds_american": "+3500",
"open_odds_fraction": "35\/1",
"external_outcome_id": "0QA120498143#1341135377_13L88808Q10Q22",
"removed": true
}, {
"id": "sr:outcome:3",
"type": "away",
"odds_decimal": "6.00",
"odds_american": "+500",
"odds_fraction": "5\/1",
"open_odds_decimal": "2.95",
"open_odds_american": "+195",
"open_odds_fraction": "39\/20",
"external_outcome_id": "0QA120498143#1341135378_13L88808Q1329Q21",
"removed": true
}]
I'm trying to get outcomes as the main table and sport event ID at the metaID. below is not working
#df =pd.json_normalize(data,record_path=['sport_schedule_sport_event_markets','markets','books','outcomes'],meta=[['sport_schedule_sport_event_markets','sport_event']],meta_prefix='game_',errors='ignore')
Edit: bug report here
I am trying to display images on maps. For the example, let's say I want tomake a map with regions and their coats of arms. There is a simple example below.
The text appears (Vega-Lite editor) but the images only appear with the x and y channels, not with latitude and longitude.
Is this about me, about a known limitation, or about a bug?
Linked question: is it possible to use SVG files rather than PNG ?
{
"config": {"view": {"continuousWidth": 300, "continuousHeight": 400}},
"layer": [
{
"mark": {"type": "text", "font": "Ubuntu", "fontSize": 18},
"encoding": {
"latitude": {"field": "latitude", "type": "quantitative"},
"longitude": {"field": "longitude", "type": "quantitative"},
"text": {"type": "nominal", "field": "name"}
}
},
{
"mark": {"type": "image", "height": 50, "width": 50},
"encoding": {
"latitude": {"field": "latitude", "type": "quantitative"},
"longitude": {"field": "longitude", "type": "quantitative"},
"url": {"type": "nominal", "field": "url"}
}
}
],
"data": {"name": "data"},
"$schema": "https://vega.github.io/schema/vega-lite/v5.0.0.json",
"datasets": {
"data": [
{
"name": "Unterfranken",
"wikidata": "Q10547",
"latitude": 50.011166583816824,
"longitude": 9.94760069351192,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Unterfranken_Wappen.svg/512px-Unterfranken_Wappen.svg.png"
},
{
"name": "Oberfranken",
"wikidata": "Q10554",
"latitude": 50.05097220003501,
"longitude": 11.376017810598547,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Wappen_Bezirk_Oberfranken2.svg/500px-Wappen_Bezirk_Oberfranken2.svg.png"
},
{
"name": "Niederbayern",
"wikidata": "Q10559",
"latitude": 48.70663162458262,
"longitude": 12.78636846158091,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Wappen_Bezirk_Niederbayern.svg/352px-Wappen_Bezirk_Niederbayern.svg.png"
},
{
"name": "Oberpfalz",
"wikidata": "Q10555",
"latitude": 49.399491050485366,
"longitude": 12.117024154659765,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Wappen_Oberpfalz.svg/177px-Wappen_Oberpfalz.svg.png"
},
{
"name": "Mittelfranken",
"wikidata": "Q10551",
"latitude": 49.348122683559716,
"longitude": 10.786147287006969,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Mittelfranken_Wappen.svg/393px-Mittelfranken_Wappen.svg.png"
},
{
"name": "Schwaben",
"wikidata": "Q10557",
"latitude": 48.1625551901062,
"longitude": 10.519367235613796,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/55/Wappen_Schwaben_Bayern.svg/450px-Wappen_Schwaben_Bayern.svg.png"
},
{
"name": "Oberbayern",
"wikidata": "Q10562",
"latitude": 48.10116954364846,
"longitude": 11.753750582859597,
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Wappen_Oberbayern.svg/200px-Wappen_Oberbayern.svg.png"
}
]
}
}
This feature does not yet exist but will be added in future versions of Vega-Lite/Altair as per this PR https://github.com/vega/vega-lite/pull/7296.
I have a Python script where I would like to filter Python with a http get and I would like to filter the response data for only specific values. The json response example is below:
{
"id": "38",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "105600",
"type": "csv",
"status": "Completed",
"creator": {
"id": "1",
"username": "btest",
"firstname": "bob",
"lastname": "test"
},
{
"id": "39",
"name": "Report2",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113218",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
I would like to filter the above json to only show a report by name. For example if there is a Python filter that would only allow me to filter for a report by the name of "Report1". If I filtered on name of "Report1". I would expect to following to be to be returned below:
{
"id": "38",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "105600",
"type": "csv",
"status": "Completed",
"creator": {
"id": "1",
"username": "btest",
"firstname": "bob",
"lastname": "test"
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
For the final part of the script I would like to compare the 'id' field to show the largest value for example id 38 vs id 49 and then output the json for the largest in this case id 49. I would like it output
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
For the last part i would just like to save the id value '49' to a variable in Python.
So far what I have below is:
response_data = response.json()
input_dict = json.dumps(response_data)
input_transform = json.loads(input_dict)
# Filter python objects with list comprehensions
sort1 = sorted([r.get("id") for r in input_transform if r.get("name") == "Report1"], reverse=True)[0]
# Print sorted JSON
print(sort1)
I updated my code and now I'm getting the error below:
'str' object has no attribute 'get'
I researched it and can not figure out what I'm doing now and how to get past it.
You need to get the ID in the listcomp as bellow:
sorted([r.get("id") for r in sample if r.get("name") == "Report1"], reverse=True)[0]
I'm trying to web scrape this website: "https://no.unibet.com/betting/sports/filter/chess".
When i check the page_soup variable (see code under), I see that the element i'm looking for isn't there.
Why?
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://no.unibet.com/betting/sports/filter/chess/'
chrome_path = r"C:\Users\lakha\OneDrive\Skrivebord\chromedriver_win32 (1)\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(url)
html = driver.page_source
page_soup = BeautifulSoup(html, features="lxml")
containers = page_soup.findAll("ul", {"class" : "KambiBC-list-view__column KambiBC-list-view__event-list"})
print(len(containers))#returns 0
I'm not sure I understand the point of using Selenium webdriver as a giant request library only to dump the static HTML into BeautifulSoup. That pretty much defeats the purpose of webdriver, which is to navigate the page dynamically and wait for JS to work.
Here's an example of using a CSS selector in webdriver to extract the elements you want:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://no.unibet.com/betting/sports/filter/chess/")
selector = "ul[class='KambiBC-list-view__column KambiBC-list-view__event-list']"
for elem in driver.find_elements_by_css_selector(selector):
print(elem.text)
Output:
Tue
04:00 AM
Giri, A
Nepomniachtchi, Ian
+3
Giri, A
4.50
Uavgjort
1.40
Nepomniachtchi, Ian
7.50
Tue
04:00 AM
Grischuk, Alexander
Alekseenko, Kirill
+3
Grischuk, Alexander
2.75
Uavgjort
1.55
Alekseenko, Kirill
16.00
Tue
04:00 AM
Liren Ding
Hao Wang
+3
Liren Ding
4.10
Uavgjort
1.33
Hao Wang
13.00
Tue
04:00 AM
Vachier-Lagrave, M
Caruana, Fabiano
+3
Vachier-Lagrave, M
8.00
Uavgjort
1.25
Caruana, Fabiano
7.00
You can also consider hitting their JSON endpoint directly to request the data instead of going through the DOM as this post nicely shows.
Website is loaded via JavaScript dynamically once the page loads, therefore i noticed an XHR request made to JSON end-point to fetch the required information which you looking for. that's can be found under Developer-Tools within your browser and then navigate to Network section.
import requests
import json
r = requests.get("https://eu-offering.kambicdn.org/offering/v2018/ub/listView/chess.json?lang=no_NO&market=NO&client_id=2&channel_id=1&ncid=1584287770903&useCombined=true").json()
print(r.keys())
print(json.dumps(r, indent=4))
Output:
dict_keys(['events', 'terms', 'activeTermIds', 'soonMode', 'categoryGroups', 'activeCategories', 'activeEventTypes', 'eventTypes', 'defaultEventType'])
{
"events": [
{
"event": {
"id": 1006198062,
"name": "Vachier-Lagrave, M - Caruana, Fabiano",
"nameDelimiter": "-",
"englishName": "Vachier-Lagrave, Maxime - Caruana, Fabiano",
"homeName": "Vachier-Lagrave, M",
"awayName": "Caruana, Fabiano",
"start": "2020-03-17T11:00:00Z",
"group": "Kandidater",
"groupId": 2000055248,
"path": [
{
"id": 1000190837,
"name": "Sjakk",
"englishName": "Chess",
"termKey": "chess"
},
{
"id": 1000190838,
"name": "VM",
"englishName": "World Championship",
"termKey": "world_championship"
},
{
"id": 2000055248,
"name": "Kandidater",
"englishName": "Candidates",
"termKey": "candidates"
}
],
"nonLiveBoCount": 3,
"sport": "CHESS",
"tags": [
"MATCH"
],
"state": "NOT_STARTED"
},
"betOffers": [
{
"id": 2208576284,
"closed": "2020-03-17T11:00:00Z",
"criterion": {
"id": 1001836486,
"label": "Kampodds",
"englishLabel": "Match Odds",
"order": [
0
]
},
"betOfferType": {
"id": 2,
"name": "Kamp",
"englishName": "Match"
},
"eventId": 1006198062,
"outcomes": [
{
"id": 2733454562,
"label": "1",
"englishLabel": "1",
"odds": 8000,
"type": "OT_ONE",
"betOfferId": 2208576284,
"changedDate": "2020-03-07T12:54:29Z",
"oddsFractional": "7/1",
"oddsAmerican": "700",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2733454563,
"label": "X",
"englishLabel": "X",
"odds": 1250,
"type": "OT_CROSS",
"betOfferId": 2208576284,
"changedDate": "2020-03-07T12:54:29Z",
"oddsFractional": "1/4",
"oddsAmerican": "-400",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2733454564,
"label": "2",
"englishLabel": "2",
"odds": 7000,
"type": "OT_TWO",
"betOfferId": 2208576284,
"changedDate": "2020-03-07T12:54:29Z",
"oddsFractional": "6/1",
"oddsAmerican": "600",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"sortOrder": 1,
"cashOutStatus": "DISABLED"
}
]
},
{
"event": {
"id": 1006147747,
"name": "Liren Ding - Hao Wang",
"nameDelimiter": "-",
"englishName": "Liren Ding - Hao Wang",
"homeName": "Liren Ding",
"awayName": "Hao Wang",
"start": "2020-03-17T11:00:00Z",
"group": "Kandidater",
"groupId": 2000055248,
"path": [
{
"id": 1000190837,
"name": "Sjakk",
"englishName": "Chess",
"termKey": "chess"
},
{
"id": 1000190838,
"name": "VM",
"englishName": "World Championship",
"termKey": "world_championship"
},
{
"id": 2000055248,
"name": "Kandidater",
"englishName": "Candidates",
"termKey": "candidates"
}
],
"nonLiveBoCount": 3,
"sport": "CHESS",
"tags": [
"MATCH"
],
"state": "NOT_STARTED"
},
"betOffers": [
{
"id": 2205691273,
"closed": "2020-03-17T11:00:00Z",
"criterion": {
"id": 1001836486,
"label": "Kampodds",
"englishLabel": "Match Odds",
"order": [
0
]
},
"betOfferType": {
"id": 2,
"name": "Kamp",
"englishName": "Match"
},
"eventId": 1006147747,
"outcomes": [
{
"id": 2723380316,
"label": "1",
"englishLabel": "1",
"odds": 4100,
"type": "OT_ONE",
"betOfferId": 2205691273,
"changedDate": "2020-03-12T15:54:12Z",
"oddsFractional": "3/1",
"oddsAmerican": "310",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380317,
"label": "X",
"englishLabel": "X",
"odds": 1330,
"type": "OT_CROSS",
"betOfferId": 2205691273,
"changedDate": "2020-03-12T15:54:12Z",
"oddsFractional": "33/100",
"oddsAmerican": "-305",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380318,
"label": "2",
"englishLabel": "2",
"odds": 13000,
"type": "OT_TWO",
"betOfferId": 2205691273,
"changedDate": "2020-03-12T15:54:12Z",
"oddsFractional": "12/1",
"oddsAmerican": "1200",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"sortOrder": 1,
"cashOutStatus": "DISABLED"
}
]
},
{
"event": {
"id": 1006147748,
"name": "Giri, A - Nepomniachtchi, Ian",
"nameDelimiter": "-",
"englishName": "Giri, Anish - Nepomniachtchi, Ian",
"homeName": "Giri, A",
"awayName": "Nepomniachtchi, Ian",
"start": "2020-03-17T11:00:00Z",
"group": "Kandidater",
"groupId": 2000055248,
"path": [
{
"id": 1000190837,
"name": "Sjakk",
"englishName": "Chess",
"termKey": "chess"
},
{
"id": 1000190838,
"name": "VM",
"englishName": "World Championship",
"termKey": "world_championship"
},
{
"id": 2000055248,
"name": "Kandidater",
"englishName": "Candidates",
"termKey": "candidates"
}
],
"nonLiveBoCount": 3,
"sport": "CHESS",
"tags": [
"MATCH"
],
"state": "NOT_STARTED"
},
"betOffers": [
{
"id": 2205691270,
"closed": "2020-03-17T11:00:00Z",
"criterion": {
"id": 1001836486,
"label": "Kampodds",
"englishLabel": "Match Odds",
"order": [
0
]
},
"betOfferType": {
"id": 2,
"name": "Kamp",
"englishName": "Match"
},
"eventId": 1006147748,
"outcomes": [
{
"id": 2723380307,
"label": "1",
"englishLabel": "1",
"odds": 4500,
"type": "OT_ONE",
"betOfferId": 2205691270,
"changedDate": "2020-03-01T14:00:24Z",
"oddsFractional": "7/2",
"oddsAmerican": "350",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380308,
"label": "X",
"englishLabel": "X",
"odds": 1400,
"type": "OT_CROSS",
"betOfferId": 2205691270,
"changedDate": "2020-03-01T14:00:24Z",
"oddsFractional": "2/5",
"oddsAmerican": "-250",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380309,
"label": "2",
"englishLabel": "2",
"odds": 7500,
"type": "OT_TWO",
"betOfferId": 2205691270,
"changedDate": "2020-03-01T14:00:24Z",
"oddsFractional": "13/2",
"oddsAmerican": "650",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"sortOrder": 1,
"cashOutStatus": "DISABLED"
}
]
},
{
"event": {
"id": 1006147749,
"name": "Grischuk, Alexander - Alekseenko, Kirill",
"nameDelimiter": "-",
"englishName": "Grischuk, Alexander - Alekseenko, Kirill",
"homeName": "Grischuk, Alexander",
"awayName": "Alekseenko, Kirill",
"start": "2020-03-17T11:00:00Z",
"group": "Kandidater",
"groupId": 2000055248,
"path": [
{
"id": 1000190837,
"name": "Sjakk",
"englishName": "Chess",
"termKey": "chess"
},
{
"id": 1000190838,
"name": "VM",
"englishName": "World Championship",
"termKey": "world_championship"
},
{
"id": 2000055248,
"name": "Kandidater",
"englishName": "Candidates",
"termKey": "candidates"
}
],
"nonLiveBoCount": 3,
"sport": "CHESS",
"tags": [
"MATCH"
],
"state": "NOT_STARTED"
},
"betOffers": [
{
"id": 2205691271,
"closed": "2020-03-17T11:00:00Z",
"criterion": {
"id": 1001836486,
"label": "Kampodds",
"englishLabel": "Match Odds",
"order": [
0
]
},
"betOfferType": {
"id": 2,
"name": "Kamp",
"englishName": "Match"
},
"eventId": 1006147749,
"outcomes": [
{
"id": 2723380310,
"label": "1",
"englishLabel": "1",
"odds": 2750,
"type": "OT_ONE",
"betOfferId": 2205691271,
"changedDate": "2020-03-07T13:14:51Z",
"oddsFractional": "7/4",
"oddsAmerican": "175",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380311,
"label": "X",
"englishLabel": "X",
"odds": 1550,
"type": "OT_CROSS",
"betOfferId": 2205691271,
"changedDate": "2020-03-07T13:14:51Z",
"oddsFractional": "11/20",
"oddsAmerican": "-182",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2723380312,
"label": "2",
"englishLabel": "2",
"odds": 16000,
"type": "OT_TWO",
"betOfferId": 2205691271,
"changedDate": "2020-03-07T13:14:51Z",
"oddsFractional": "15/1",
"oddsAmerican": "1500",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"sortOrder": 1,
"cashOutStatus": "DISABLED"
}
]
}
],
"terms": [
{
"type": "SPORT",
"id": "/chess",
"termKey": "chess",
"localizedName": "Sjakk",
"parentId": "/",
"englishName": "Chess"
}
],
"activeTermIds": [
"/chess"
],
"soonMode": "DAILY",
"categoryGroups": [
{
"categoryGroupName": "list_view",
"categories": [
{
"id": 16299,
"englishName": "Most Popular",
"localizedName": "Mest popul\u00e6re"
}
]
}
],
"activeCategories": [
"16299"
],
"activeEventTypes": [
"matches"
],
"eventTypes": [
"competitions",
"matches"
],
"defaultEventType": "matches"
}
If you wanna keep the browser non-headless and parse the required items, you can try like below. What the following script does is wait for the spinner to go away so that it can interact with the desired items.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://no.unibet.com/betting/sports/filter/chess"
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 30)
driver.get(url)
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR,".KambiBC-spinner-inner")))
for elem in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".KambiBC-event-participants"))):
print(elem.text)
I am trying to tag content using OpenCalais. The following is my code that I'm using the communicate with the API:
import httplib2
import json
import ast
# Some local values needed for the call
LOCAL_API_KEY = '***********************' # Aquire this by registering at the Calais site
CALAIS_TAG_API = 'https://api.thomsonreuters.com/permid/calais'
# Some sample text from a news story to pass to Calais for analysis
test_body = 'Samsung is closing its Milk Music streaming service'
# header information need by Calais.
# For more info see http://www.opencalais.com/documentation/calais-web-service-api/api-invocation/rest
headers = {
'X-AG-Access-Token': LOCAL_API_KEY,
'Content-Type': 'text/raw',
'outputFormat': 'application/json',
}
# Create your http object
http = httplib2.Http()
# Make the http post request, passing the body and headers as needed.
response, content = http.request(CALAIS_TAG_API, 'POST', headers=headers, body=test_body)
jcontent = json.loads(content) # Parse the json return into a python dict
output = json.dumps(jcontent, indent=4) # Pretty print the resulting dictionary returned.
print output
Anyway, this works nicely, as I am able to get the following output (print output).
{
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/cat/1": {
"score": 1,
"forenduserdisplay": "false",
"name": "Business_Finance",
"_typeGroup": "topics"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/cat/2": {
"score": 1,
"forenduserdisplay": "false",
"name": "Entertainment_Culture",
"_typeGroup": "topics"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/lid/DefaultLangId": {
"forenduserdisplay": "false",
"language": "http://d.opencalais.com/lid/DefaultLangId/InputTextTooShort",
"_typeGroup": "language"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/Industry/1": {
"name": "Phones & Handheld Devices - NEC",
"permid": "4294951233",
"forenduserdisplay": "false",
"_typeGroup": "industry",
"relevance": 0.8,
"rcscode": "B:1768"
},
"http://d.opencalais.com/comphash-1/c021d644-16e9-3060-96fe-b3be0cd4ae1e": {
"_typeReference": "http://s.opencalais.com/1/type/em/e/Company",
"_type": "Company",
"name": "Samsung",
"confidence": {
"aggregate": "0.905",
"resolution": "1.0",
"statisticalfeature": "0.876",
"dblookup": "0.0"
},
"_typeGroup": "entities",
"instances": [
{
"detection": "[]Samsung[ is closing its Milk Music streaming]",
"length": 7,
"exact": "Samsung",
"suffix": " is closing its Milk Music streaming",
"offset": 0
}
],
"confidencelevel": "0.905",
"relevance": 0.8,
"nationality": "N/A",
"resolutions": [
{
"name": "SAMSUNG ELECTRONICS CO,.LTD",
"permid": "4295882451",
"commonname": "Samsung Elec",
"primaryric": "005930.KS",
"score": 1,
"ticker": "005930",
"id": "https://permid.org/1-4295882451"
}
],
"forenduserdisplay": "false"
},
"doc": {
"info": {
"document": "Samsung is closing its Milk Music streaming service",
"docId": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1",
"docDate": "2016-08-22 13:02:01.814",
"docTitle": "",
"ontology": "http://mdaas-virtual-onecalais.int.thomsonreuters.com/owlschema/9.8/onecalais.owl.allmetadata.xml",
"calaisRequestID": "eef490a6-2e3e-7cac-156b-257ebcf3beba",
"id": "http://id.opencalais.com/b*RzenPxfvWZmjCvQqpzNA"
},
"meta": {
"stagsVer": "OneCalais_9.8-RELEASE-b6-2016-07-18_14:00:15",
"contentType": "text/raw",
"language": "InputTextTooShort",
"serverVersion": "OneCalais_9.8-RELEASE:109",
"submissionDate": "2016-08-22 13:02:01.679",
"processingVer": "AllMetadata",
"submitterCode": "0ca6a864-5659-789d-5f32-f365f695e757",
"signature": "digestalg-1|BovyytInhxJhSerNjEFvOZNAHJQ=|Q5g9GCOSi7+FnERjgY9y4B9oJukYPjYeTl6v+Zu81BJLwOBcIZZ/eA=="
}
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/ComponentVersions": {
"version": [
"Deals Index:201608221149:201608221149",
"index-refineries:201608202306:201608202306",
"config-physicalAssets-powerStations:480:480",
"OA Index:201608212349:201608212349",
"NextTags:OneCalais_9.8-RELEASE:109",
"config-sca-DataPackage:38:38",
"com.clearforest.infoext.dial4j.plugins-basistechconfig:OneCalais_9.8-RELEASE:109",
"People Index:201608221124:201608221124",
"config-negativeSignature:480:480",
"Dial4J:OneCalais_8.6-RELEASE:209",
"OA Override:507:507",
"People Override:480:480",
"index-vessels:201608201644:201608201644",
"config-refineries:480:480",
"config-cse:507:507",
"config-vessels:480:480",
"OneCalais:OneCalais_9.8-RELEASE:109",
"config-physicalAssets-mines:480:480",
"SocialTags Index:201608212334:201608212334",
"BlackList:504:504",
"index-ports:201608202256:201608202256",
"config-physicalAssets-ports:480:480",
"config-drugs:480:480"
],
"_typeGroup": "versions"
},
"http://d.opencalais.com/comphash-1/e89d0187-8b46-3f8d-9f6b-4995a709c85e": {
"_typeReference": "http://s.opencalais.com/1/type/em/e/Company",
"_type": "Company",
"name": "Milk Music",
"confidence": {
"aggregate": "0.499",
"resolution": "0.0",
"statisticalfeature": "0.775",
"dblookup": "0.0"
},
"_typeGroup": "entities",
"instances": [
{
"suffix": " streaming service",
"prefix": "Samsung is closing its ",
"detection": "[Samsung is closing its ]Milk Music[ streaming service]",
"length": 10,
"offset": 23,
"exact": "Milk Music"
}
],
"confidencelevel": "0.499",
"relevance": 0.8,
"nationality": "N/A",
"forenduserdisplay": "false"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/cat/3": {
"score": 1,
"forenduserdisplay": "false",
"name": "Technology_Internet",
"_typeGroup": "topics"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/1": {
"name": "Streaming music services",
"importance": "1",
"_typeGroup": "socialTag",
"originalValue": "Streaming music services",
"socialTag": "http://d.opencalais.com/genericHasher-1/d1447a37-4c52-3b2f-a9d2-40984014685b",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/1"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/3": {
"name": "Milk Music",
"importance": "1",
"_typeGroup": "socialTag",
"originalValue": "Milk Music (streaming service)",
"socialTag": "http://d.opencalais.com/genericHasher-1/471ee9b8-9f72-3a81-aa13-2a7d44658521",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/3"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/2": {
"name": "Digital audio",
"importance": "1",
"_typeGroup": "socialTag",
"originalValue": "Digital audio",
"socialTag": "http://d.opencalais.com/genericHasher-1/64447afb-045b-34db-9a52-dae5bed0254e",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/2"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/5": {
"name": "Smartphones",
"importance": "2",
"_typeGroup": "socialTag",
"originalValue": "Smartphones",
"socialTag": "http://d.opencalais.com/genericHasher-1/e42d9d7b-150b-3c30-974f-87a1fba000ef",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/5"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/4": {
"name": "Samsung",
"importance": "2",
"_typeGroup": "socialTag",
"originalValue": "Samsung",
"socialTag": "http://d.opencalais.com/genericHasher-1/97370f53-c2f8-31b8-bbcf-aa685e504714",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/4"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/7": {
"name": "Samsung Galaxy",
"importance": "2",
"_typeGroup": "socialTag",
"originalValue": "Samsung Galaxy",
"socialTag": "http://d.opencalais.com/genericHasher-1/64b8e664-bbdc-3731-b712-eb30990eab6f",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/7"
},
"http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/6": {
"name": "Samsung Music Hub",
"importance": "2",
"_typeGroup": "socialTag",
"originalValue": "Samsung Music Hub",
"socialTag": "http://d.opencalais.com/genericHasher-1/a3310a01-ef6f-314e-90b2-a303822b965c",
"forenduserdisplay": "true",
"id": "http://d.opencalais.com/dochash-1/8a7adab6-d07e-38e6-b0c9-5db5220336c1/SocialTag/6"
}
}
Something I noticed is that the keys are all hyperlinks. Anyway, I want to print all the socialTags from the output. To do that, I wrote this following code:
# Print all the social tags
for key, value in ast.literal_eval(output).items():
if value["_typeGroup"] == 'socialTag':
print value["name"]
However, I get this error:
Traceback (most recent call last):
File "opencal.py", line 30, in <module>
if value["_typeGroup"] == 'socialTag':
KeyError: '_typeGroup'
What is this error? Or to be more precise, what is the correct way to get the socialTags? Thanks.