Python JSON data into HTML table - python

I'm pretty lost. Not going to lie. I'm trying to figure out how to parse JSON data from the college scorecard API into an HTML file. I used Python to store the JSON data in a dictionary, but other than that, I'm pretty dang lost. How would you write an example sending this data to an HTML file?
def main():
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json'
payload = {
'api_key': "api_key_string",
'_fields': ','.join([
'school.name',
'school.school_url',
'school.city',
'school.state',
'school.zip',
'2015.student.size',
]),
'school.operating': '1',
'2015.academics.program_available.assoc_or_bachelors': 'true',
'2015.student.size__range': '1..',
'school.degrees_awarded.predominant__range': '1..3',
'school.degrees_awarded.highest__range': '2..4',
'id': '240444',
}
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
main()
Output:
{u'school.city': u'Madison', u'school.school_url': u'www.wisc.edu', u
'school.zip': u'53706-1380', u'2015.student.size': 29579, u'school.st
ate': u'WI', u'school.name': u'University of Wisconsin-Madison'}
Edit: For clarification, I need to insert the return data to an HTML file that formats and removes data styling and places it onto a table.
Edit II: Json2html edit
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
index= open("index.html","w")
index.write(formatted_table)
index.close()
Edit: Json2html output:
Output image here

Try using the json2html module! This will convert the JSON that was returned into a 'human readable HTML Table representation'.
This code will take your JSON output and create the HTML:
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
Then to save it as HTML you can do this:
your_file= open("filename","w")
your_file.write(formatted_table)
your_file.close()

Related

Python loop through table of URL's and get data from those websites

i'm trying to loop through a table that has all the websites that i want to get the JSON data from.
def getResponse(url):
operUrl = urllib.request.urlopen(url)
if(operUrl.getcode()==200):
data = operUrl.read()
jsonData = json.loads(data)
else:
print("Error receiving data", operUrl.getcode())
return jsonData
def main():
urlData = ("site1.com")
#Needs to loop all the URL's inside
#urlData = ["site1.com", "site2.com"] and so on
jsonData = getResponse(urlData)
for i in jsonData["descriptions"]:
description = f'{i["groups"][0]["variables"][0]["content"]}'
data = data = {'mushrooms':[{'description': description,}]}
with open('data.json', 'w') as f:
json.dump(data, f, ensure_ascii=False)
print(json.dumps(data, indent=4, ensure_ascii=False), )
After running it saves it into a data.json file and here is what it looks like
{
"mushrooms": [
{
"description": "example how it looks",
}
]
}
It does get the data from the one site but i want it to loop through multiple URL's that are in a table like
EDIT:
i got it working by looping like this
for url in urlData:
and i have all my website links in a table urlData and after that appending the data found from those sites into a another table.
I got it working by looping like this
for url in urlData:
and i have all my website links in a table urlData and after that appending the data found from those sites into a another table. and after it's done it dumps the data into a json.

Handle content from a <script> tag in python

I am currently trying to read out the locations of a company. The information about the locations is inside a script tag (json). So I read out the contet inside the corresponding script tag.
This is my code:
sauce = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'})
soup1 = BeautifulSoup(sauce.text, features="html.parser")
all_scripts = soup1.find_all('script')[6]
all_scripts.contents
The output is:
['\n\t\twindow.storeFinderComponent = {"center":{"lat":51.165691,"long":10.451526},"bounds":[[55.655085,5.160441],[46.439648,15.666775]],"stores":[{"code":"1238240","lat":51.411572,"long":10.425264,"name":"EP:Schulze","url":"/schulze-breitenworbis","showAsClosed":false,"isBusinessCard":false,"logoUrl":"https://cdn.prod.team-ec.com/logo/retailer/retailerlogo_epde_1238240.png","address":{"street":"Weststraße 6","zip":"37339","town":"Breitenworbis","phone":"+49 (36074) 31193"},"email":"info#ep-schulze-breitenworbis.de","openingHours":[{"day":"Mo.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Di.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Mi.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},...]
I have problems converting the content to a dictionary and reading all lat and long data.
When I try:
data = json.loads(all_scripts.get_text())
all_scripts.get_text() returns an empty list
So i tryed:
data = json.loads(all_scripts.contents)
But then i get an TypeError: the JSON object must be str, bytes or bytearray, not list
I dont know ho to convert the .content method to json:
data = json.loads(str(all_scripts.contents))
JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Can anyone help me?
You could use regex to pull out the json and read that in.
import requests
import re
import json
html = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'}).text
pattern = re.compile('window\.storeFinderComponent = ({.*})')
result = pattern.search(html).groups(1)[0]
jsonData = json.loads(result)
You can removed first part of data and then last character of data and then load data to json
import json
data=all_scripts.contents[0]
removed_data=data.replace("\n\t\twindow.storeFinderComponent = ","")
clean_data=data[:-3]
json_data=json.loads(clean_data)
Output:
{'center': {'lat': 51.165691, 'long': 10.451526},
'bounds': [[55.655085, 5.160441], [46.439648, 15.666775]],
'stores': [{'code': '1238240',
'lat': 51.411572,
....

Assign output of 'requests' in python

I was using the requests module to get some data in JSON form and I want to assign some of the output results into variables in the app; for example the results were like:
{'text': 'example',
'type': 'text'}
I wanted to create variables that automatically store text as example and type as text.
I tried to create a function and put the first code in it but it didn't work.
The code for it was:
import requests
import json
import pprint
def new_func():
url = '***'
r = requests.get(url)
data = r.json()
pprint.pprint(data)
print(data)
text = new_func.text()
print(text)
However, it gives me an error as text is not a member of new_func.
text was part of the output as I mentioned before.
You basically have what is called a dictionary in python.
A dictionary looks like this: dictionary = {key: value}
You can get the value of a key using dictionary.get(key)
For example, consider the code below:
def getValue(key):
data = {'text': 'some text here',
'type': 'some text here 2'}
return data.get(key)
your_value = getValue('type')
This function will return some text here 2 when we get the type from data
You don't even necessarily need a function for this. You can just have this:
data = {'text': 'some text here',
'type': 'some text here 2'}
your_value = data.get('type')
You should be able to apply this to your case.
Hope that helps.
You should take a look at the JSON module in Python. Below are some links that should help:
https://docs.python.org/3/library/json.html
https://www.w3schools.com/python/python_json.asp
You can try something like this, you can wrap any part into a function.
import requests
# get response
response = requests.get('https://api.github.com')
# parse response:
response_code = response.status_code
response_json = response.json()
# pack response:
packed_response = {
'text' : response_json,
'type' : 'text',
'code' : response_code,
}
More on the requests library here: https://realpython.com/python-requests/

How can I access these json object in python

I'm making some data visualization from movies database api and I already access the data in the normal way but when i load the json data and for loop to print it, the data that out is just the column but I need to access the object inside.
url = "https://api.themoviedb.org/3/discover/movie?api_key="+ api_key
+"&language=en- US&sort_by=popularity.desc&include_adult=
false&include_video=false&page=1" # api url
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
data = json.loads(raw_json)
for j in data:
print(j)
i expect the output would be
[{'popularity': 15,
'id': 611,
'video': False,
'vote_count': 1403,
'vote_average': 8.9,
'title': 'lalalalo'},{....}]
but the actual output is
page
total_results
total_pages
results
The results are one level down. You are looping through the metadata.
Try changing your code to
import json
import urllib.request
api_key = "your api code"
url = "https://api.themoviedb.org/3/discover/movie?api_key=" + api_key +"&language=en- US&sort_by=popularity.desc&include_adult=false&include_video=false&page=1" # api url
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
data = json.loads(raw_json)
for j in data['results']:
print(j)
You need to change
data
to
data['results']
you can simply use requests module...
import requests
import json
your_link = " "
r = requests.get(your_link)
data = json.loads(r.content)
You shall have the json loaded up, then use your key "results" ["results"] and loop through the data you got.

How to save scraped json data with key value pair into a json file format using python

I scraped a site for data and I was able to print the desired output with json format containing only value but what i actually needed is to get the data with both key and value pair and save it into output.json format so I can insert into my django database. Here is what I have done so far
import requests
import json
URL ='http://tfda.go.tz/portal/en/trader_module/trader_module/getRegisteredDrugs_products'payload = "draw=1&columns%5B0%5D%5Bdata%5D=no&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=True&columns%5B0%5D%5Borderable%5D=True&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B1%5D%5Bdata%5D=certificate_no&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=True&columns%5B1%5D%5Borderable%5D=True&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B2%5D%5Bdata%5D=brand_name&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=True&columns%5B2%5D%5Borderable%5D=True&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B3%5D%5Bdata%5D=classification_name&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=True&columns%5B3%5D%5Borderable%5D=True&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B4%5D%5Bdata%5D=common_name&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=True&columns%5B4%5D%5Borderable%5D=True&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B5%5D%5Bdata%5D=dosage_form&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=True&columns%5B5%5D%5Borderable%5D=True&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B6%5D%5Bdata%5D=product_strength&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=True&columns%5B6%5D%5Borderable%5D=True&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B7%5D%5Bdata%5D=registrant&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=True&columns%5B7%5D%5Borderable%5D=True&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B8%5D%5Bdata%5D=registrant_country&columns%5B8%5D%5Bname%5D=&columns%5B8%5D%5Bsearchable%5D=True&columns%5B8%5D%5Borderable%5D=True&columns%5B8%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B8%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B9%5D%5Bdata%5D=manufacturer&columns%5B9%5D%5Bname%5D=&columns%5B9%5D%5Bsearchable%5D=True&columns%5B9%5D%5Borderable%5D=True&columns%5B9%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B9%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B10%5D%5Bdata%5D=manufacturer_country&columns%5B10%5D%5Bname%5D=&columns%5B10%5D%5Bsearchable%5D=True&columns%5B10%5D%5Borderable%5D=True&columns%5B10%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B10%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B11%5D%5Bdata%5D=expiry_date&columns%5B11%5D%5Bname%5D=&columns%5B11%5D%5Bsearchable%5D=True&columns%5B11%5D%5Borderable%5D=True&columns%5B11%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B11%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B12%5D%5Bdata%5D=id&columns%5B12%5D%5Bname%5D=&columns%5B12%5D%5Bsearchable%5D=True&columns%5B12%5D%5Borderable%5D=True&columns%5B12%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B12%5D%5Bsearch%5D%5Bregex%5D=False&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=3911&search%5Bvalue%5D=&search%5Bregex%5D=False"
with requests.Session() as s:
s.headers={"User-Agent":"Mozilla/5.0"}
s.headers.update({'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
res = s.post(URL, data = payload)
for data in res.json()['data']:
serial = data['no']
certno = data['certificate_no']
brndname = data['brand_name']
clssification = data['classification_name']
common_name = data['common_name']
dosage_form = data['dosage_form']
expiry_date = data['expiry_date']
manufacturer = data['manufacturer']
manufacturer_country = data['manufacturer_country']
product_strength = data['product_strength']
registrant = data['registrant']
registrant_country = data['registrant_country']
output = (serial,certno,brndname,clssification,common_name,dosage_form,expiry_date,manufacturer, manufacturer_country,product_strength,registrant, registrant_country )
my_list = output
json_str = json.dumps(my_list)
print (json_str)
And here is my attached output screenshot
So how do I approach this?
Use json.dump
with open(path, 'w') as file:
[...]
json.dump(myPythonList, file)
file.write('\n')

Categories

Resources