Cant properly read api json data

Cant properly read api json data - python

I was successful in extracting the data about matches I required in the first half of my code, but I can't seem the do the other part. I am reading JSON data and doing it in the same way really but I'm getting strings, not dictionaries with data. I'm sure it's a logic problem or something, please help me. I have the working part on my github : https://github.com/LEvinson2504/Football-Prediction-and-analysis
import urllib.request
import json
#Match odds
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib.request
def SportDemo():
# Set url parameter
url = "http://api.isportsapi.com/sport/free/football/odds/main?api_key=" + api_key
# Call iSport Api to get data in json format
f = urllib.request.urlopen(url)
content = f.read()
#data = json.loads((content.decode('utf-8')))
data = content.decode('utf-8')
'''store match ids
matches = []
#English teams match id
for team in data['data']:
if (team == 'English Premier League'):
#store match ids
matches.append(team['matchId'])
'''
#here is the problem, tried several ways to access data
for i in data[data]:
print(i['asia'])
'''
for match in data[data]['asia']:
for coun in match:
print(coun)
'''
'''
if(match == 'asian'):
print(type(match))
'''
#if (match['leagueName'] == 'ENG U23 D1'):
#for odds in data['data']:
#for i in matches:
#print()
SportDemo()
Expected Output, and I want to read inside the dictionaries to get the data iside keys "europe", "asia"
Json data : https://www.isportsapi.com/docs?isportsDocIndex=1-4-24 like her, I'm sorry I couldn't format.
But i get nothing

Firstly, when asking a question please take the time to tidy it up so that it represents what you actually ran and remove any commented-out code.
In your case, the problem can be reduced to:
f = urllib.request.urlopen(url)
content = f.read()
data = json.loads((content.decode('utf-8')))
#here is the problem, tried several ways to access data
for i in data[data]:
print(i['asia'])
and we can actually see what the issue is. data is a dict; within that dict is a key 'data', which is itself a dict. Iterating through a dict gives you the keys. If you just want to access the 'asia' data, then do so, no need to loop at all:
print(data['data']['asia'])
If you did want to iterate through every item, then use items():
for region, matches in data['data'].items():
print(region)
print(matches)

The download data is too big. 6.2M
Change the jupyter notebook configuration file. (jupyter_notebook_config.py)
Edit ~/.jupyter/jupyter_notebook_config.py
If you cannot find the file,
$ jupyter notebook --generate-config
Open the file and edit.
c.NotebookApp.iopub_data_rate_limit = 10000000
and restart $ jupyter notebook.
url = "http://api.isportsapi.com/sport/free/football/odds/main?api_key=" + api_key
# Call iSport Api to get data in json format
f = urllib.request.urlopen(url)
content = f.read()
#print(content.decode('utf-8'))
data = json.loads((content.decode('utf-8')))
print( data['data']['asian'])
# there is no 'asia' field in that content.
Output is
[{'matchId': '4196461', 'companyId': '1', 'initialHandicap': '-0.25', 'initialHome': '0.78', 'initialAway': '1.02', 'instantHandicap': '-0.25', 'instantHome': '0.78', 'instantAway': '1.02', 'modifyTime': 1567434821, 'close': False, 'inPlay': False}, {'matchId': '4196461', 'companyId': '3', 'initialHandicap': '-0.25', 'initialHome': '0.91', 'initialAway': '0.91', 'instantHandicap': '-0.25', 'instantHome': '0.81', 'instantAway': '1.09', 'modifyTime': 1567709243, 'close': False, 'inPlay': True}, {'matchId': '4196461', 'companyId': '8', 'initialHandicap': '-0.25', 'initialHome': '0.85', 'initialAway': '1.00', 'instantHandicap': '-0.25', 'instantHome': '0.80',
...

Related

I want to web scraping from website their product price and name using python and only using beautifulsoup,requests and json modules

i tried web scraping like this.
I want to get price and name from product in the website.
and I dont know how to extract specific script include ""product details jason inline script.""<script type="application/ld+json>"
so extract all jason inline script data using beautfulsoup
and I Assign it to script.
and i tried to many ways to extract specific one script but it dooesn't work.
so i tried to slice like list.
i use indexing to extract specific script that i want.
and I choose index[6] to isolate the specific script.
and i assign variable to name "product script."
after I use some techniques to split and extract the price and product name.
But
I want to another way to extract data from json inline script.
This my code:
def function_glomark_name(url_glomark):
global product_name_glomark
req2 = requests.get(url_glomark)
product_request(req2)
head_part = soup.find('head')
scripts = head_part.find_all('script')
product_script = scripts[6]
#Remove tags
pd_list = product_script.contents
for item in pd_list:
product_des = item
# make Dictionary
product_glomark= json.loads(product_des)
#Assign product_name_glomark
product_name_glomark = (product_glomark['name'])
print(product_name_glomark)
return product_name_glomark
glomark_coconut = 'https://glomark.lk/coconut/p/11624'
#after calling function
function_glomark_name(glomark_coconut)
function_laughs_name(laughs_coconut)
output:Coconut

To parse contents of the specific <script> you can use this example:
import json
import requests
from bs4 import BeautifulSoup
url = "https://glomark.lk/coconut/p/11624"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
s = soup.select_one('script[type="application/ld+json"]')
data = json.loads(s.text)
for key, value in data.items():
print(f"{key=} {value=}")
print("-" * 80)
print(f'Name is {data["name"]}')
Prints:
key='#context' value='https://schema.org'
key='#type' value='Product'
key='productID' value='11624'
key='name' value='Coconut'
key='description' value='Coconut'
key='url' value='/coconut/p/11624'
key='image' value='https://objectstorage.ap-mumbai-1.oraclecloud.com/n/softlogicbicloud/b/cdn/o/products/310310--01--1555692325.jpeg'
key='brand' value='GLOMARK'
key='offers' value=[{'#type': 'Offer', 'price': '92', 'priceCurrency': 'LKR', 'itemCondition': 'https://schema.org/NewCondition', 'availability': 'https://schema.org/InStock'}]
--------------------------------------------------------------------------------
Name is Coconut

Handle content from a <script> tag in python

I am currently trying to read out the locations of a company. The information about the locations is inside a script tag (json). So I read out the contet inside the corresponding script tag.
This is my code:
sauce = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'})
soup1 = BeautifulSoup(sauce.text, features="html.parser")
all_scripts = soup1.find_all('script')[6]
all_scripts.contents
The output is:
['\n\t\twindow.storeFinderComponent = {"center":{"lat":51.165691,"long":10.451526},"bounds":[[55.655085,5.160441],[46.439648,15.666775]],"stores":[{"code":"1238240","lat":51.411572,"long":10.425264,"name":"EP:Schulze","url":"/schulze-breitenworbis","showAsClosed":false,"isBusinessCard":false,"logoUrl":"https://cdn.prod.team-ec.com/logo/retailer/retailerlogo_epde_1238240.png","address":{"street":"Weststraße 6","zip":"37339","town":"Breitenworbis","phone":"+49 (36074) 31193"},"email":"info#ep-schulze-breitenworbis.de","openingHours":[{"day":"Mo.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Di.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Mi.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},...]
I have problems converting the content to a dictionary and reading all lat and long data.
When I try:
data = json.loads(all_scripts.get_text())
all_scripts.get_text() returns an empty list
So i tryed:
data = json.loads(all_scripts.contents)
But then i get an TypeError: the JSON object must be str, bytes or bytearray, not list
I dont know ho to convert the .content method to json:
data = json.loads(str(all_scripts.contents))
JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Can anyone help me?

You could use regex to pull out the json and read that in.
import requests
import re
import json
html = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'}).text
pattern = re.compile('window\.storeFinderComponent = ({.*})')
result = pattern.search(html).groups(1)[0]
jsonData = json.loads(result)

You can removed first part of data and then last character of data and then load data to json
import json
data=all_scripts.contents[0]
removed_data=data.replace("\n\t\twindow.storeFinderComponent = ","")
clean_data=data[:-3]
json_data=json.loads(clean_data)
Output:
{'center': {'lat': 51.165691, 'long': 10.451526},
'bounds': [[55.655085, 5.160441], [46.439648, 15.666775]],
'stores': [{'code': '1238240',
'lat': 51.411572,
....

Assign output of 'requests' in python

I was using the requests module to get some data in JSON form and I want to assign some of the output results into variables in the app; for example the results were like:
{'text': 'example',
'type': 'text'}
I wanted to create variables that automatically store text as example and type as text.
I tried to create a function and put the first code in it but it didn't work.
The code for it was:
import requests
import json
import pprint
def new_func():
url = '***'
r = requests.get(url)
data = r.json()
pprint.pprint(data)
print(data)
text = new_func.text()
print(text)
However, it gives me an error as text is not a member of new_func.
text was part of the output as I mentioned before.

You basically have what is called a dictionary in python.
A dictionary looks like this: dictionary = {key: value}
You can get the value of a key using dictionary.get(key)
For example, consider the code below:
def getValue(key):
data = {'text': 'some text here',
'type': 'some text here 2'}
return data.get(key)
your_value = getValue('type')
This function will return some text here 2 when we get the type from data
You don't even necessarily need a function for this. You can just have this:
data = {'text': 'some text here',
'type': 'some text here 2'}
your_value = data.get('type')
You should be able to apply this to your case.
Hope that helps.

You should take a look at the JSON module in Python. Below are some links that should help:
https://docs.python.org/3/library/json.html
https://www.w3schools.com/python/python_json.asp

You can try something like this, you can wrap any part into a function.
import requests
# get response
response = requests.get('https://api.github.com')
# parse response:
response_code = response.status_code
response_json = response.json()
# pack response:
packed_response = {
'text' : response_json,
'type' : 'text',
'code' : response_code,
}
More on the requests library here: https://realpython.com/python-requests/

Dividing scraped text with Python and Beautiful Soup

I've scraped the timetable from this website.
The output I get is:
"ROUTE": "NAPOLI PORTA DI MASSA \u00bb ISCHIA"
but I would like:
"DEPARTURE PORT": "NAPOLI PORTA DI MASSA"
"ARRIVAL PORT": "ISCHIA"
How do I divide the string?
Here is the code:
medmar_live_departures_table = list(soup.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
medmar_live_departures_data.append({
'ROUTE' : li.text
})

Two things,
1.Since "»" is a non-ascii character python is returning the non-ascii character like so "\u00bb", hence parsing the string by splitting the text with the non-ascii code like so will work:
parse=li.get_text().split('\u00bb')
Also, you can use the re library to parse non-ascii characters like so (you will need to add the re library if you choose this path):
import re
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
#[^\x00-\x7f] will select non-ascii characters as pointed out by Moinuddin Quadri in https://stackoverflow.com/questions/40872126/python-replace-non-ascii-character-in-string
However by doing so python will create a list of parts from the the parse but not all texts in the "li" html tag carry the "»" character (ie.the text "POZZUOLI-PROCIDA" at the end of the table on the website) so we must account for that or we'll run into some issues.
2.A dictionary may be a poor choice of data structure since the data you are parsing will have the same keys.
For example, POUZZOULI » CASAMICCIOLA, and POUZOULI » PROCIDA. COSMICCIOLA and PROCIDA will have the same key. Python will will simply overwrite/update the value of the POUZZOULI key. So POUZZOULI: CASAMICCIOLA will become POUZZOULI: PROCIDA instead of adding POUZZOULI: CASAMICCIOLA as a dictionary entry and POUZZOULI: PROCIDA as another dictionary entry.
I suggest adding each part of the parse into lists as tuples like so:
single_port= []
ports=[]
medmar_live_departures_table = list(bs.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
# The if statement takes care of table data strings that don't have the non-ascii character "»"
if len(parse) > 1:
ports.append((parse[0], parse[1]))
else:
single_port.append(parse[0])
# This will print out your data in your desired manner
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I also used the split method in a test code that I ran:
import requests
from bs4 import BeautifulSoup
import re
url="https://www.medmargroup.it/"
response=requests.get(url)
bs=BeautifulSoup(response.text, 'html.parser')
timeTable=bs.find('section', class_="primarystyle-timetable")
medmar_live_departures_table=timeTable.find('ul')
single_port= []
ports=[]
for li in medmar_live_departures_table.find_all('li', class_="tratta"):
parse=li.get_text().split('\u00bb')
if len(parse)>1:
ports.append((parse[0],parse[1]))
else:
single_port.append(parse[0])
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I hope this helps!

try this:
medmar_live_departures_table = list(soup.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
medmar_live_departures_data.append({
'DEPARTURE PORT' : li.text.split("\ u00bb")[0],
'ARRIVAL PORT' : li.text.split("\ u00bb")[1]
})

Python JSON data into HTML table

I'm pretty lost. Not going to lie. I'm trying to figure out how to parse JSON data from the college scorecard API into an HTML file. I used Python to store the JSON data in a dictionary, but other than that, I'm pretty dang lost. How would you write an example sending this data to an HTML file?
def main():
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json'
payload = {
'api_key': "api_key_string",
'_fields': ','.join([
'school.name',
'school.school_url',
'school.city',
'school.state',
'school.zip',
'2015.student.size',
]),
'school.operating': '1',
'2015.academics.program_available.assoc_or_bachelors': 'true',
'2015.student.size__range': '1..',
'school.degrees_awarded.predominant__range': '1..3',
'school.degrees_awarded.highest__range': '2..4',
'id': '240444',
}
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
main()
Output:
{u'school.city': u'Madison', u'school.school_url': u'www.wisc.edu', u
'school.zip': u'53706-1380', u'2015.student.size': 29579, u'school.st
ate': u'WI', u'school.name': u'University of Wisconsin-Madison'}
Edit: For clarification, I need to insert the return data to an HTML file that formats and removes data styling and places it onto a table.
Edit II: Json2html edit
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
index= open("index.html","w")
index.write(formatted_table)
index.close()
Edit: Json2html output:
Output image here

Try using the json2html module! This will convert the JSON that was returned into a 'human readable HTML Table representation'.
This code will take your JSON output and create the HTML:
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
Then to save it as HTML you can do this:
your_file= open("filename","w")
your_file.write(formatted_table)
your_file.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cant properly read api json data - python

Related

I want to web scraping from website their product price and name using python and only using beautifulsoup,requests and json modules

Handle content from a <script> tag in python

Assign output of 'requests' in python

Dividing scraped text with Python and Beautiful Soup

Python JSON data into HTML table

Categories

Resources