How do you pull text from a website into a dict?

How do you pull text from a website into a dict? - python

I'm attempting to get the information from http://xkcd.com/info.0.json. Basically it looks like a simple python dictionary and that's what I'd like to convert it to. My current code is:
import urllib.request
with urllib.request.urlopen('http://xkcd.com/info.0.json') as response:
html = [response.read()]
print(html)
and that outputs
[b'{"month": "2", "num": 1647, "link": "", "year": "2016", "news": "", "safe_title": "Diacritics", "transcript": "", "alt": "Using diacritics correctly is not my fort\\u00c3\\u00a9.", "img": "http:\\/\\/imgs.xkcd.com\\/comics\\/diacritics.png", "title": "Diacritics", "day": "24"}']

You are receiving a JSON encoded response. You can parse that with the json.loads() function:
import json
import urllib.request
with urllib.request.urlopen('http://xkcd.com/info.0.json') as response:
data = json.loads(response.read().decode('utf8'))
>>> data
{'link': '', 'transcript': '', 'month': '2', 'year': '2016', 'alt': 'Using diacritics correctly is not my fortÃ©.', 'num': 1647, 'img': 'http://imgs.xkcd.com/comics/diacritics.png', 'day': '24', 'safe_title': 'Diacritics', 'news': '', 'title': 'Diacritics'}
This is easier with the requests module:
import requests
response = requests.get('http://xkcd.com/info.0.json')
data = response.json()
>>> data
{'link': '', 'transcript': '', 'month': '2', 'year': '2016', 'alt': 'Using diacritics correctly is not my fortÃ©.', 'num': 1647, 'img': 'http://imgs.xkcd.com/comics/diacritics.png', 'day': '24', 'safe_title': 'Diacritics', 'news': '', 'title': 'Diacritics'}
requests saves you the hassle of decoding the incoming data and decoding the JSON.

In Python 2.7 you need to import urllib2 and then import json to load the data into a variable as a Python dictionary. Resource here
import urllib2
import json
response = urllib2.urlopen('http://xkcd.com/info.0.json')
html = response.read().decode('utf8')
data = json.loads(html)
type(data) is dict # True

Related

Unable to load dicts strings into JSON

import requests
import re
import json
def parser(code):
params = {
'template': 'professional',
'level': 'search',
'search': code
}
r = requests.get("https://maps.locations.husqvarna.com/api/getAsyncLocations",
params=params).json()
goal = re.search(r'({.+})', r['maplist'], re.M | re.DOTALL).group(1)
print(goal)
parser("35801")
The code will return a string of dicts which is not wrapped. i tried to dump/loads and wrapped it within [ ] but for weired reason it's still a string

You need to convert goal into a list manually, to receive Python objects:
import requests
import re
import json
def parser(code):
params = {
'template': 'professional',
'level': 'search',
'search': code
}
r = requests.get("https://maps.locations.husqvarna.com/api/getAsyncLocations",
params=params).json()
goal = re.search(r'({.+})', r['maplist'], re.M | re.DOTALL).group(1)
jsonList = '[%s]' % goal # Make proper json list!
items = json.loads(jsonList)
for item in items:
print(item)
parser("35801")
Out:
{'fid': 'USF221344-2115METROCIRCLE', 'lid': '56063', 'lat': '34.7004049', 'lng': '-86.5924508', 'url': 'https://locations.husqvarna.com/al/huntsville/product-manufacturer-usf221344-2115metrocircle.html', 'country': 'US', 'url_slug': 'product-manufacturer-usf221344-2115metrocircle.html', 'location_name': 'HEDDEN LAWN & GARDEN', 'address_1': '2115 METRO CIRCLE', 'address_2': '', 'city': 'HUNTSVILLE', 'city_clean': 'huntsville', 'region': 'AL', 'region_lc': 'al', 'post_code': '35801', 'local_phone': '(256) 885-1750', 'local_phone_pn_dashes': '256-885-1750', 'local_fax': '', 'local_fax_pn_dashes': '', 'from_email': '', 'hours_timezone': '', 'hours_dst': '', 'distance': '2.2', 'hours_sets:primary': '{"label":"Primary Hours","name":"primary","type":"0","timezone":"-6","dst":"1"}', 'Store Type_CS': 'Buy,Service', 'Location Type_CS': 'Authorized Dealers,Servicing Locations'}
...

How to access list of json-like objects from web api?

Just a side project I'm doing right now playing around with covid19 api. And I was hoping for something that would let me access the data with something like data2.countries.
import requests as r
import urllib
import json
url = 'https://api.covid19api.com/total/dayone/country/south-africa'
foo = urllib.request.urlopen(url)
data = json.loads(foo.read().decode())
data2 = json.parse(data)
print(data2)
The data looks like this - it's all in one list:
[{'Country': 'South Africa', 'CountryCode': '', 'Province': '', 'City': '', 'CityCode': '', 'Lat': '0', 'Lon': '0', 'Confirmed': 607045, 'Deaths': 12987, 'Recovered': 504127, 'Active': 89931, 'Date': '2020-08-22T00:00:00Z'},
{'Country': 'South Africa', 'CountryCode': '', 'Province': '', 'City': '', 'CityCode': '', 'Lat': '0', 'Lon': '0', 'Confirmed': 609773, 'Deaths': 13059, 'Recovered': 506470, 'Active': 90244, 'Date': '2020-08-23T00:00:00Z'}]
So far I'm getting:
File "~/20200813file/main.py", line 19, in <module>
data2 = json.parse(data)
AttributeError: module 'json' has no attribute 'parse'

Why not try converting it into a pandas Dataframe.
import urllib
import json
import pandas as pd
url = 'https://api.covid19api.com/total/dayone/country/south-africa'
foo = urllib.request.urlopen(url)
data = json.loads(foo.read().decode())
df = pd.DataFrame(data)
print(df.Country)

You should use json.dumps:
import requests as r
import urllib
import json
url = 'https://api.covid19api.com/total/dayone/country/south-africa'
foo = urllib.request.urlopen(url)
data = json.loads(foo.read().decode())
data2 = json.dumps(data)
print(data2)
json.dumps() function converts a Python object into a json string.

How to compare json file with expected result in Python 3?

I need to prepare test which will be comparing content of .json file with expected result (we want to check if values in .json are correctly generated by our dev tool).
For test I will use robot framework or unittests but I don't know yet how to parse correctly json file.
Json example:
{
"Customer": [{
"Information": [{
"Country": "",
"Form": ""
}
],
"Id": "110",
"Res": "",
"Role": "Test",
"Limit": ["100"]
}]
}
So after I execute this:
with open('test_json.json') as f:
hd = json.load(f)
I get dict 'hd' where key is:
dict_keys(['Customer'])
and values:
dict_values([[{'Information': [{'Form': '', 'Country': ''}], 'Role': 'Test', 'Id': '110', 'Res': '', 'Limit': ['100']}]])
My problem is that I don't know how to get to only one value from Dict(e.g: Role: Test), because I can get only extract whole value. I can prepare a long string to compare with but it is not best solution for tests.
Any ideas how I can get to only one row from .json file?

Your JSON has single key 'Customer' and it has a value of list type. So when you ppass dict_keys(['Customer']) you are getting list value.
>>> hd['Customer']
[{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}]
First element in list:
>>> hd['Customer'][0]
{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}
Now access inside dict structure using:
>>> hd['Customer'][0]['Role']
'Test'

You can compare the dict that you loaded (say hd) to the expected results dict (say expected_dict) by running
hd.items() == expected_dict.items()

How to get json elements from http response string in python

I am new in json parsing from http api in python.Currently i have parsed http content as string in python which have json object array my code is given bellow
import json
from urllib.request import urlopen
apilink=urlopen("api link")
data=json.loads(apilink.read().decode())
print(data)
and my current output is
{'Message': 'Success', 'Data': '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]', 'Status': 'Success'}
but i want to extract only attribute 'data' that is json array
'Data': '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]'
desirble part is
[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]
kindly help me to solve this.
thank you

data is a dictionary. Use dict indexing. You need the value associated with Data:
In [876]: data['Data']
Out[876]: '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]'
This is a string. You can use json.loads one more time.
In [877]: json.loads(data['Data'])
Out[877]:
[{'ApplicationName': 'Asset Inventory',
'CreatedBy': '1',
'Did': 'c055c3d2f3314725b69965e6c55adb5b',
'InsertedDate': '2017-08-02 7:27:11 AM',
'ModuleName': 'Rpt_Hello_World',
'Name': 'Hello World',
'PostProcessor': '',
'Published': 'true',
'UpdatedBy': '1',
'UpdatedDate': '2017-08-02 9:33:16 AM',
'UserId': '1'}]

thats simple
format: jsonName[attributename]
data['Data']

Converting to list of dictionary

I have a text file filled with place data provided by twitter api. Here is the sample data of 2 lines
{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
{'country': 'India', 'full_name': 'New Delhi, India', 'id': '317fcc4b21a604d5', 'country_code': 'IN', 'name': 'New Delhi', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[76.84252, 28.397657], [77.347652, 28.397657], [77.347652, 28.879322], [76.84252, 28.879322]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/317fcc4b21a604d5.json'}
I want 'country', 'name' and 'cordinates' filed of each line.In order to do this we need to iterate line by line the entire file.so i append each line to a list
data = []
with open('place.txt','r') as f:
for line in f:
data.append(line)
when i checked the data type it shows as 'str' instead of 'dict'.
type(data[0])
str
data[0].keys()
AttributeError: 'str' object has no attribute 'keys'
how to fix this so that it can be saved as list of dictionaries.
Originally tweets were encoded and decoded by following code:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n') #encoded and saved to a .txt file
tweets.append(jsonpickle.decode(line)) # decoding
And place data file is saved by following code:
fName = "place.txt"
newLine = "\n"
with open(fName, 'a', encoding='utf-8') as f:
for i in range(len(tweets)):
f.write('{}'.format(tweets[i]['place']) +'\n')

In your case you should use json to do the data parsing. But if you have a problem with json (which is almost impossible since we are talking about an API ), then in general to convert from string to dictionary you can do:
>>> import ast
>>> x = "{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
"
>>> d = ast.literal_eval(x)
>>> d
d now is a dictionary instead of a string.
But again if your data are in json format python has a built-in lib to handle json format, and is better and safer to use json than ast.
For example if you get a response let's say resp you could simply do:
response = json.loads(resp)
and now you could parse response as a dictionary.

Note: Single quotes are not valid JSON.
I have never tried Twitter API. Looks like your data are not valid JSON. Here is a simple preprocess method to replace '(single quote) into "(double quote)
data = "{'country': 'United Kingdom', ... }"
json_data = data.replace('\'', '\"')
dict_data = json.loads(json_data)
dict_data.keys()
# [u'full_name', u'url', u'country', ... ]

You should use python json library for parsing and getting the value.
In python it's quite easy.
import json
x = '{"country": "United Kingdom", "full_name": "Dorridge, England", "id": "31fe56e2e7d5792a", "country_code": "GB", "name": "Dorridg", "attributes": {}, "contained_within": [], "place_type": "city", "bounding_box": {"coordinates": [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], "type": "Polygon"}, "url": "https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json"}'
y = json.loads(x)
print(y["country"],y["name"],y["bounding_box"]["coordinates"])

You can use list like this
mlist= list()
for i in ndata.keys():
mlist.append(i)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do you pull text from a website into a dict? - python

Related

Unable to load dicts strings into JSON

How to access list of json-like objects from web api?

How to compare json file with expected result in Python 3?

How to get json elements from http response string in python

Converting to list of dictionary

Categories

Resources