I was successful in extracting the data about matches I required in the first half of my code, but I can't seem the do the other part. I am reading JSON data and doing it in the same way really but I'm getting strings, not dictionaries with data. I'm sure it's a logic problem or something, please help me. I have the working part on my github : https://github.com/LEvinson2504/Football-Prediction-and-analysis
import urllib.request
import json
#Match odds
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib.request
def SportDemo():
# Set url parameter
url = "http://api.isportsapi.com/sport/free/football/odds/main?api_key=" + api_key
# Call iSport Api to get data in json format
f = urllib.request.urlopen(url)
content = f.read()
#data = json.loads((content.decode('utf-8')))
data = content.decode('utf-8')
'''store match ids
matches = []
#English teams match id
for team in data['data']:
if (team == 'English Premier League'):
#store match ids
matches.append(team['matchId'])
'''
#here is the problem, tried several ways to access data
for i in data[data]:
print(i['asia'])
'''
for match in data[data]['asia']:
for coun in match:
print(coun)
'''
'''
if(match == 'asian'):
print(type(match))
'''
#if (match['leagueName'] == 'ENG U23 D1'):
#for odds in data['data']:
#for i in matches:
#print()
SportDemo()
Expected Output, and I want to read inside the dictionaries to get the data iside keys "europe", "asia"
Json data : https://www.isportsapi.com/docs?isportsDocIndex=1-4-24 like her, I'm sorry I couldn't format.
But i get nothing
Firstly, when asking a question please take the time to tidy it up so that it represents what you actually ran and remove any commented-out code.
In your case, the problem can be reduced to:
f = urllib.request.urlopen(url)
content = f.read()
data = json.loads((content.decode('utf-8')))
#here is the problem, tried several ways to access data
for i in data[data]:
print(i['asia'])
and we can actually see what the issue is. data is a dict; within that dict is a key 'data', which is itself a dict. Iterating through a dict gives you the keys. If you just want to access the 'asia' data, then do so, no need to loop at all:
print(data['data']['asia'])
If you did want to iterate through every item, then use items():
for region, matches in data['data'].items():
print(region)
print(matches)
The download data is too big. 6.2M
Change the jupyter notebook configuration file. (jupyter_notebook_config.py)
Edit ~/.jupyter/jupyter_notebook_config.py
If you cannot find the file,
$ jupyter notebook --generate-config
Open the file and edit.
c.NotebookApp.iopub_data_rate_limit = 10000000
and restart $ jupyter notebook.
url = "http://api.isportsapi.com/sport/free/football/odds/main?api_key=" + api_key
# Call iSport Api to get data in json format
f = urllib.request.urlopen(url)
content = f.read()
#print(content.decode('utf-8'))
data = json.loads((content.decode('utf-8')))
print( data['data']['asian'])
# there is no 'asia' field in that content.
Output is
[{'matchId': '4196461', 'companyId': '1', 'initialHandicap': '-0.25', 'initialHome': '0.78', 'initialAway': '1.02', 'instantHandicap': '-0.25', 'instantHome': '0.78', 'instantAway': '1.02', 'modifyTime': 1567434821, 'close': False, 'inPlay': False}, {'matchId': '4196461', 'companyId': '3', 'initialHandicap': '-0.25', 'initialHome': '0.91', 'initialAway': '0.91', 'instantHandicap': '-0.25', 'instantHome': '0.81', 'instantAway': '1.09', 'modifyTime': 1567709243, 'close': False, 'inPlay': True}, {'matchId': '4196461', 'companyId': '8', 'initialHandicap': '-0.25', 'initialHome': '0.85', 'initialAway': '1.00', 'instantHandicap': '-0.25', 'instantHome': '0.80',
...
Related
i tried web scraping like this.
I want to get price and name from product in the website.
and I dont know how to extract specific script include ""product details jason inline script.""<script type="application/ld+json>"
so extract all jason inline script data using beautfulsoup
and I Assign it to script.
and i tried to many ways to extract specific one script but it dooesn't work.
so i tried to slice like list.
i use indexing to extract specific script that i want.
and I choose index[6] to isolate the specific script.
and i assign variable to name "product script."
after I use some techniques to split and extract the price and product name.
But
I want to another way to extract data from json inline script.
This my code:
def function_glomark_name(url_glomark):
global product_name_glomark
req2 = requests.get(url_glomark)
product_request(req2)
head_part = soup.find('head')
scripts = head_part.find_all('script')
product_script = scripts[6]
#Remove tags
pd_list = product_script.contents
for item in pd_list:
product_des = item
# make Dictionary
product_glomark= json.loads(product_des)
#Assign product_name_glomark
product_name_glomark = (product_glomark['name'])
print(product_name_glomark)
return product_name_glomark
glomark_coconut = 'https://glomark.lk/coconut/p/11624'
#after calling function
function_glomark_name(glomark_coconut)
function_laughs_name(laughs_coconut)
output:Coconut
To parse contents of the specific <script> you can use this example:
import json
import requests
from bs4 import BeautifulSoup
url = "https://glomark.lk/coconut/p/11624"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
s = soup.select_one('script[type="application/ld+json"]')
data = json.loads(s.text)
for key, value in data.items():
print(f"{key=} {value=}")
print("-" * 80)
print(f'Name is {data["name"]}')
Prints:
key='#context' value='https://schema.org'
key='#type' value='Product'
key='productID' value='11624'
key='name' value='Coconut'
key='description' value='Coconut'
key='url' value='/coconut/p/11624'
key='image' value='https://objectstorage.ap-mumbai-1.oraclecloud.com/n/softlogicbicloud/b/cdn/o/products/310310--01--1555692325.jpeg'
key='brand' value='GLOMARK'
key='offers' value=[{'#type': 'Offer', 'price': '92', 'priceCurrency': 'LKR', 'itemCondition': 'https://schema.org/NewCondition', 'availability': 'https://schema.org/InStock'}]
--------------------------------------------------------------------------------
Name is Coconut
I am currently trying to read out the locations of a company. The information about the locations is inside a script tag (json). So I read out the contet inside the corresponding script tag.
This is my code:
sauce = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'})
soup1 = BeautifulSoup(sauce.text, features="html.parser")
all_scripts = soup1.find_all('script')[6]
all_scripts.contents
The output is:
['\n\t\twindow.storeFinderComponent = {"center":{"lat":51.165691,"long":10.451526},"bounds":[[55.655085,5.160441],[46.439648,15.666775]],"stores":[{"code":"1238240","lat":51.411572,"long":10.425264,"name":"EP:Schulze","url":"/schulze-breitenworbis","showAsClosed":false,"isBusinessCard":false,"logoUrl":"https://cdn.prod.team-ec.com/logo/retailer/retailerlogo_epde_1238240.png","address":{"street":"Weststraße 6","zip":"37339","town":"Breitenworbis","phone":"+49 (36074) 31193"},"email":"info#ep-schulze-breitenworbis.de","openingHours":[{"day":"Mo.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Di.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Mi.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},...]
I have problems converting the content to a dictionary and reading all lat and long data.
When I try:
data = json.loads(all_scripts.get_text())
all_scripts.get_text() returns an empty list
So i tryed:
data = json.loads(all_scripts.contents)
But then i get an TypeError: the JSON object must be str, bytes or bytearray, not list
I dont know ho to convert the .content method to json:
data = json.loads(str(all_scripts.contents))
JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Can anyone help me?
You could use regex to pull out the json and read that in.
import requests
import re
import json
html = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'}).text
pattern = re.compile('window\.storeFinderComponent = ({.*})')
result = pattern.search(html).groups(1)[0]
jsonData = json.loads(result)
You can removed first part of data and then last character of data and then load data to json
import json
data=all_scripts.contents[0]
removed_data=data.replace("\n\t\twindow.storeFinderComponent = ","")
clean_data=data[:-3]
json_data=json.loads(clean_data)
Output:
{'center': {'lat': 51.165691, 'long': 10.451526},
'bounds': [[55.655085, 5.160441], [46.439648, 15.666775]],
'stores': [{'code': '1238240',
'lat': 51.411572,
....
I was using the requests module to get some data in JSON form and I want to assign some of the output results into variables in the app; for example the results were like:
{'text': 'example',
'type': 'text'}
I wanted to create variables that automatically store text as example and type as text.
I tried to create a function and put the first code in it but it didn't work.
The code for it was:
import requests
import json
import pprint
def new_func():
url = '***'
r = requests.get(url)
data = r.json()
pprint.pprint(data)
print(data)
text = new_func.text()
print(text)
However, it gives me an error as text is not a member of new_func.
text was part of the output as I mentioned before.
You basically have what is called a dictionary in python.
A dictionary looks like this: dictionary = {key: value}
You can get the value of a key using dictionary.get(key)
For example, consider the code below:
def getValue(key):
data = {'text': 'some text here',
'type': 'some text here 2'}
return data.get(key)
your_value = getValue('type')
This function will return some text here 2 when we get the type from data
You don't even necessarily need a function for this. You can just have this:
data = {'text': 'some text here',
'type': 'some text here 2'}
your_value = data.get('type')
You should be able to apply this to your case.
Hope that helps.
You should take a look at the JSON module in Python. Below are some links that should help:
https://docs.python.org/3/library/json.html
https://www.w3schools.com/python/python_json.asp
You can try something like this, you can wrap any part into a function.
import requests
# get response
response = requests.get('https://api.github.com')
# parse response:
response_code = response.status_code
response_json = response.json()
# pack response:
packed_response = {
'text' : response_json,
'type' : 'text',
'code' : response_code,
}
More on the requests library here: https://realpython.com/python-requests/
I've scraped the timetable from this website.
The output I get is:
"ROUTE": "NAPOLI PORTA DI MASSA \u00bb ISCHIA"
but I would like:
"DEPARTURE PORT": "NAPOLI PORTA DI MASSA"
"ARRIVAL PORT": "ISCHIA"
How do I divide the string?
Here is the code:
medmar_live_departures_table = list(soup.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
medmar_live_departures_data.append({
'ROUTE' : li.text
})
Two things,
1.Since "»" is a non-ascii character python is returning the non-ascii character like so "\u00bb", hence parsing the string by splitting the text with the non-ascii code like so will work:
parse=li.get_text().split('\u00bb')
Also, you can use the re library to parse non-ascii characters like so (you will need to add the re library if you choose this path):
import re
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
#[^\x00-\x7f] will select non-ascii characters as pointed out by Moinuddin Quadri in https://stackoverflow.com/questions/40872126/python-replace-non-ascii-character-in-string
However by doing so python will create a list of parts from the the parse but not all texts in the "li" html tag carry the "»" character (ie.the text "POZZUOLI-PROCIDA" at the end of the table on the website) so we must account for that or we'll run into some issues.
2.A dictionary may be a poor choice of data structure since the data you are parsing will have the same keys.
For example, POUZZOULI » CASAMICCIOLA, and POUZOULI » PROCIDA. COSMICCIOLA and PROCIDA will have the same key. Python will will simply overwrite/update the value of the POUZZOULI key. So POUZZOULI: CASAMICCIOLA will become POUZZOULI: PROCIDA instead of adding POUZZOULI: CASAMICCIOLA as a dictionary entry and POUZZOULI: PROCIDA as another dictionary entry.
I suggest adding each part of the parse into lists as tuples like so:
single_port= []
ports=[]
medmar_live_departures_table = list(bs.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
# The if statement takes care of table data strings that don't have the non-ascii character "»"
if len(parse) > 1:
ports.append((parse[0], parse[1]))
else:
single_port.append(parse[0])
# This will print out your data in your desired manner
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I also used the split method in a test code that I ran:
import requests
from bs4 import BeautifulSoup
import re
url="https://www.medmargroup.it/"
response=requests.get(url)
bs=BeautifulSoup(response.text, 'html.parser')
timeTable=bs.find('section', class_="primarystyle-timetable")
medmar_live_departures_table=timeTable.find('ul')
single_port= []
ports=[]
for li in medmar_live_departures_table.find_all('li', class_="tratta"):
parse=li.get_text().split('\u00bb')
if len(parse)>1:
ports.append((parse[0],parse[1]))
else:
single_port.append(parse[0])
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I hope this helps!
try this:
medmar_live_departures_table = list(soup.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
medmar_live_departures_data.append({
'DEPARTURE PORT' : li.text.split("\ u00bb")[0],
'ARRIVAL PORT' : li.text.split("\ u00bb")[1]
})
I'm pretty lost. Not going to lie. I'm trying to figure out how to parse JSON data from the college scorecard API into an HTML file. I used Python to store the JSON data in a dictionary, but other than that, I'm pretty dang lost. How would you write an example sending this data to an HTML file?
def main():
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json'
payload = {
'api_key': "api_key_string",
'_fields': ','.join([
'school.name',
'school.school_url',
'school.city',
'school.state',
'school.zip',
'2015.student.size',
]),
'school.operating': '1',
'2015.academics.program_available.assoc_or_bachelors': 'true',
'2015.student.size__range': '1..',
'school.degrees_awarded.predominant__range': '1..3',
'school.degrees_awarded.highest__range': '2..4',
'id': '240444',
}
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
main()
Output:
{u'school.city': u'Madison', u'school.school_url': u'www.wisc.edu', u
'school.zip': u'53706-1380', u'2015.student.size': 29579, u'school.st
ate': u'WI', u'school.name': u'University of Wisconsin-Madison'}
Edit: For clarification, I need to insert the return data to an HTML file that formats and removes data styling and places it onto a table.
Edit II: Json2html edit
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
index= open("index.html","w")
index.write(formatted_table)
index.close()
Edit: Json2html output:
Output image here
Try using the json2html module! This will convert the JSON that was returned into a 'human readable HTML Table representation'.
This code will take your JSON output and create the HTML:
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
Then to save it as HTML you can do this:
your_file= open("filename","w")
your_file.write(formatted_table)
your_file.close()