How to extract numbers from JSON API - python

I want to extract numbers and calculate the sum of these numbers from JSON API. The format is
{
comments: [
{
name: "Matthias"
count: 97
},
{
name: "Geomer"
count: 97
}
...
]
}
And my code is
import json
import urllib
url = 'http://python-data.dr-chuck.net/comments_204529.json'
print 'Retrieving', url
uh = urllib.urlopen(url)
data = uh.read()
print 'Retrieved',len(data),'characters'
result = json.loads(url)
print result
I can get the result of how many characters in this data but cannot continue with the code because it's said JSON object cannot be decoded.
Does anyone know how to finish this code? Much appreciated!

First of all, I suggest you study the built-in Python Data Structures to get a better understanding about what you are dealing with.
result is a dictionary, result["comments"] is a list of dictionaries - you can make a list comprehension to get all the comments counts:
>>> import json
>>> import urllib
>>>
>>> url = 'http://python-data.dr-chuck.net/comments_204529.json'
>>> uh = urllib.urlopen(url)
>>> result = json.load(uh)
>>>
>>> [comment["count"] for comment in result["comments"]]
[100, 96, 95, 93, 85, 85, 77, 73, 73, 70, 65, 65, 65, 62, 62, 62, 61, 57, 50, 49, 46, 46, 43, 42, 39, 38, 37, 36, 34, 33, 31, 28, 28, 26, 26, 25, 22, 20, 20, 18, 17, 15, 14, 12, 10, 9, 8, 6, 5, 3]

Related

How do I convert these multiple lists into a big dictionary using python

subjects = ['Chem', 'Phy', 'Math']
students = ['Joy', 'Agatha', 'Mary', 'Frank', 'Godwin', 'Chizulum', 'Enoc', 'Chinedu', 'Kenneth', 'Lukas']
math = [76,56,78,98,88,75,59,80,45,30]
phy = [72,86,70,98,89,79,69,50,85,80]
chem = [75,66,77,45,83,75,59,40,65,90]
How do I transform the lists above to the nested dictionary below using pyhon
{
'math':{'joy':76, 'Agatha':56, 'Mary':78.....},
'phy':{'joy':72, 'Agatha':86, 'Mary':70....},
'chem':{'joy':75, 'Agatha':66, 'Mary':77....}
}
This is certainly not the most elegant way to do this, but it works:
dictionary = {}
dict_math = {}
dict_phy = {}
dict_chem = {}
for i in range(len(students)):
dict_math[students[i]] = math[i]
dict_phy[students[i]] = phy[i]
dict_chem[students[i]] = chem[i]
dictionary['math'] = dict_math
dictionary['phy'] = dict_phy
dictionary['chem'] = dict_chem
print(dictionary)
With the given lists, you could build the result dictionary this way :
result_dict = {
subject: {
name: grade for name in students for grade in globals()[subject.lower()]
}
for subject in subjects
}
This solution uses a nested dictionary comprehension and isn't meant for beginners. On top of that the use of built-in globals() is not recommanded and only suits in this particular case.
You can do something like that:
math_grades = list(zip(students, math))
phy_grades = list(zip(students, phy))
chem_grades = list(zip(students, chem))
your_dict = {
"math": {c: d for c, d in math_grades},
"phy": {c: d for c, d in phy_grades},
"chem": {c: d for c, d in chem_grades},
}
You can do it like this:
subjects = ['Chem', 'Phy', 'Math']
students = ['Joy', 'Agatha', 'Mary', 'Frank', 'Godwin', 'Chizulum', 'Enoc', 'Chinedu', 'Kenneth', 'Lukas']
math = [76, 56, 78, 98, 88, 75, 59, 80, 45, 30]
phy = [72, 86, 70, 98, 89, 79, 69, 50, 85, 80]
chem = [75, 66, 77, 45, 83, 75, 59, 40, 65, 90]
grades = {
"math": dict(zip(students, math)),
"phy": dict(zip(students, phy)),
"chem": dict(zip(students, chem)),
}

Split word in list and iterate dictionary

I've got some code where I receive a string of languages in a text.
My goal is to turn this input into a list and iterate through this list in a dictionary to use as a key for value outputs. I send this output to a list to use later.
The output I am expecting is [57, 20, 22, 52, 60... etc] but currently, I am receiving
[57, None, None, None, None, None, None....etc]
My first output is correct but after that, It doesn't seem to find the correct value in the dict.
Code below.
l_languages = []
language_dict = { 'Afrikaans' : 57, 'Arabic' : 20, 'Assistive communication' : 21, 'AUSLAN' : 22, 'Bosnian' : 52,'Burmese' : 60, 'Cantonese' : 23, 'Croation' : 54, 'Dutch' : 50,'French' : 24, 'German' : 25, 'Greek' : 26,'Hindi' : 27, 'Indigenous Australian' : 310, 'Indonesian' : 56, 'Italian' : 28, 'Japanese' : 62, 'Korean' : 48, 'Mandarin' : 29, 'Nepali' : 55, 'Polish' : 30}
data = "Afrikaans, Arabic, Assistive communication, AUSLAN, Bosnian, Burmese, Cantonese, Croation, Dutch"
language_list = data.split(',')
for language in language_list:
id = language_dict.get(language)
l_languages.append(id)
print(l_languages)
current output = [57, None, None, None, None, None, None....etc]
you are neglecting the white space in your language list. You should remove the leading and trailing white space and the access your dict.
if you just split the list at ',' then there is a leading white space in front of every following language. Just not on the first one, which explains your current output
Look at your language_list. It has leading whitespace. You need to call strip() on each element and you get your expected result
l_languages = []
language_dict = { 'Afrikaans' : 57, 'Arabic' : 20, 'Assistive communication' : 21, 'AUSLAN' : 22, 'Bosnian' : 52,'Burmese' : 60, 'Cantonese' : 23, 'Croation' : 54, 'Dutch' : 50,'French' : 24, 'German' : 25, 'Greek' : 26,'Hindi' : 27, 'Indigenous Australian' : 310, 'Indonesian' : 56, 'Italian' : 28, 'Japanese' : 62, 'Korean' : 48, 'Mandarin' : 29, 'Nepali' : 55, 'Polish' : 30}
data = "Afrikaans, Arabic, Assistive communication, AUSLAN, Bosnian, Burmese, Cantonese, Croation, Dutch"
language_list = data.split(',')
print(language_list)
for language in language_list:
val = language_dict.get(language.strip())
l_languages.append(val)
print(l_languages)
['Afrikaans', ' Arabic', ' Assistive communication', ' AUSLAN', ' Bosnian', ' Burmese', ' Cantonese', ' Croation', ' Dutch'] # list with leading spaces
[57, 20, 21, 22, 52, 60, 23, 54, 50] # right result
l_languages = []
language_dict = { 'Afrikaans' : 57, 'Arabic' : 20, 'Assistive communication' : 21, 'AUSLAN' : 22, 'Bosnian' : 52,'Burmese' : 60, 'Cantonese' : 23, 'Croation' : 54, 'Dutch' : 50,'French' : 24, 'German' : 25, 'Greek' : 26,'Hindi' : 27, 'Indigenous Australian' : 310, 'Indonesian' : 56, 'Italian' : 28, 'Japanese' : 62, 'Korean' : 48, 'Mandarin' : 29, 'Nepali' : 55, 'Polish' : 30}
data = "Afrikaans, Arabic, Assistive communication, AUSLAN, Bosnian, Burmese, Cantonese, Croation, Dutch"
language_list=[x.strip() for x in data.split(',')]
for language in language_list:
id = language_dict.get(language)
l_languages.append(id)
#output
[57, 20, 21, 22, 52, 60, 23, 54, 50]
Simplest way you can do
#Devil
language_dict = { 'Afrikaans' : 57, 'Arabic' : 20, 'Assistive communication' : 21,
'AUSLAN' : 22, 'Bosnian' : 52,'Burmese' : 60, 'Cantonese' : 23,
'Croation' : 54, 'Dutch' : 50,'French' : 24, 'German' : 25, 'Greek' : 26,
'Hindi' : 27, 'Indigenous Australian' : 310, 'Indonesian' : 56, 'Italian' : 28,
'Japanese' : 62, 'Korean' : 48, 'Mandarin' : 29, 'Nepali' : 55, 'Polish' : 30}
data = "Afrikaans, Arabic, Assistive communication, AUSLAN, Bosnian, Burmese, Cantonese, Croation, Dutch"
data_list = data.split(",") #split the data
data_list = [d.strip() for d in data_list] #remove white space
l_languages = [language_dict[z] for z in data_list] #find the value using key
print(data_list)
print(l_languages)

I keep getting the error: TypeError: tuple indices must be integers or slices, not str

So I've looked all over the place and cant seem to get an answer that I understand. I am trying to implement a piece of code where Python looks at a text file, gets a line, and looks for a dictionary with a corresponding name. Here is my code so far:
f = open("data.txt", "r")
content = f.readlines()
icecream = {
"fat": 80,
"carbohydrates": 50,
"protein": 650,
"calories": 45,
"cholesterol": 50,
"sodium": 50,
"name": "Icecream"
}
bigmac = {
"fat": 29,
"carbohydrates": 45,
"protein": 25,
"sodium": 1040,
"cholesterol": 75,
"calories": 540,
"name": "Big Mac"
}
whopper = {
"fat": 47,
"carbohydrates": 53,
"protein": 33,
"sodium": 1410,
"cholesterol": 100,
"calories": 760,
"name": "Whopper"
}
menu = [
bigmac,
whopper,
icecream
]
sea = content[0]
for line in enumerate(menu):
if sea.lower() in line['name'].lower():
print (line['name'])
I keep getting the error TypeError: tuple indices must be integers or slices, not str and I don't understand why. Could someone help me fix my code and possibly get my 2 brain-cells to understand why this error comes up?
enumerate() returns a tuple of index and element. E.g.:
>>> for item in enumerate(["a", "b", "c"]):
>>> print(item)
(0, "a")
(0, "b")
(0, "c")
So when you enumerate over your menu list, your item is not this dict, but tuple of index and dict. If you don't need index of element, use:
for line in menu:
if sea.lower() in line['name'].lower():
print (line['name'])
If you need index, use:
for i, line in enumerate(menu):
if sea.lower() in line['name'].lower():
print (i, line['name'])
Update your code to :
for line in menu:
if sea.lower() in line['name'].lower():
print (line['name'])
"enumerate" is useless with menu that is already an array
Your error arises when calling line['name'], as line is a tuple produced by the enumerate call:
(0, {'fat': 29, 'carbohydrates': 45, 'protein': 25, 'sodium': 1040, 'cholesterol': 75, 'calories': 540, 'name': 'Big Mac'})
(1, {'fat': 47, 'carbohydrates': 53, 'protein': 33, 'sodium': 1410, 'cholesterol': 100, 'calories': 760, 'name': 'Whopper'})
(2, {'fat': 80, 'carbohydrates': 50, 'protein': 650, 'calories': 45, 'cholesterol': 50, 'sodium': 50, 'name': 'Icecream'})
As such, it will need a integer in order to know which of menu's items to call.
enumerate(menu) returns a "tuple" and the way you were accessing it as a dictionary has caused this error. Also, use splitlines to handle if there is any new-line characters in the read string.
So, change the code as below without enumerate.
sea = content.splitlines()[0]
for line in menu:
if sea.lower() in line['name'].lower():
print (line['name'])
This depends on how the input file data is. Share us how the input file looks like, if this is not working.

How to change the format of json to spacy/custom json format in python?

I do have a json format which is generated from docanno annotation tool. I want to convert the json into another format. Please check below for the format
Docanno json format :
{"id": 2, "data": "My name is Nithin Reddy and i'm working as a Data Scientist.", "label": [[3, 8, "Misc"], [11, 23, "Person"], [32, 39, "Activity"], [45, 59, "Designation"]]}
{"id": 3, "data": "I live in Hyderabad.", "label": [[2, 6, "Misc"], [10, 19, "Location"]]}
{"id": 4, "data": "I'm pusring my master's from Bits Pilani.", "label": [[15, 24, "Education"], [29, 40, "Organization"]]}
Required json format :
("My name is Nithin Reddy and i'm working as a Data Scientist.", {"entities": [(3, 8, "Misc"), (11, 23, "Person"), (32, 39, "Activity"), (45, 59, "Designation")]}),
("I live in Hyderabad.", {"entities": [(2, 6, "Misc"), (10, 19, "Location")]}),
("I'm pusring my master's from Bits Pilani.", {"entities": [(15, 24, "Education"), (29, 40, "Organization")]})
I tried the below code, but it's not working
import json
with open('data.json') as f:
data = json.load(f)
new_data = []
for i in data:
new_data.append((i['data'], {"entities": i['label']}))
with open('data_new.json', 'w') as f:
json.dump(new_data, f)
Can anyone help me with the python code which will change the json to required format?

webscraping and extracting dates

Using python BeautifulSoup, I'm trying to extract the date of each newspaper article from a google search page:
https://www.google.com/search?q=citi+group&tbm=nws&ei=u9_1WsetC67l5gKRt7qYBA&start=0&sa=N&biw=1600&bih=794&dpr=1
Here is the my code:
from bs4 import BeautifulSoup
import requests
article_link = "https://www.google.com/search?q=citi+group&tbm=nws&ei=u9_1WsetC67l5gKRt7qYBA&start=0&sa=N&biw=1600&bih=794&dpr=1"
page = requests.get(article_link)
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find_all('div', {'class':'slp'}):
date = links.get_text()
print(date)
The source code is something like:
The output is "PE Hub (blog) - 1 day ago"
Can I just extract the date part (2018. 5. 11)?
Not sure exactly why BeautifulSoup is pulling it that way but you can use regex and datetime to clean what you're pulling then you can clean it and use timedelta otherwise use strptime to convert it to the correct format.
from bs4 import BeautifulSoup
import requests
hold = []
article_link = "https://www.google.com/search?q=citi+group&tbm=nws&ei=u
9_1WsetC67l5gKRt7qYBA&start=0&sa=N&biw=1600&bih=794&dpr=1"
page = requests.get(article_link)
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find_all('div', {'class':'slp'}):
date = links.get_text()
hold.append(date) #added list append
---------
#converting to datetime values
import re
from datetime import datetime as dt
hold2 = []
for item in hold:
item = re.sub('^.+ - ','', item)
if 'ago' in item:
item = re.sub(' days? ago$','',item)
hold2.append(dt.today() - timedelta(int(item)))
else:
item = dt.strptime(item, '%b %d, %Y')
hold2.append(item)
hold2
[datetime.datetime(2018, 5, 12, 14, 37, 39, 653618),
datetime.datetime(2018, 5, 8, 14, 37, 39, 653636),
datetime.datetime(2018, 5, 11, 14, 37, 39, 653643),
datetime.datetime(2018, 5, 12, 14, 37, 39, 653649),
datetime.datetime(2018, 5, 8, 14, 37, 39, 653655),
datetime.datetime(2018, 5, 12, 14, 37, 39, 653661),
datetime.datetime(2018, 5, 12, 14, 37, 39, 653667),
datetime.datetime(2018, 4, 24, 0, 0),
datetime.datetime(2018, 5, 8, 14, 37, 39, 653716),
datetime.datetime(2018, 4, 25, 0, 0)]

Categories

Resources