Python & Pandas: Parsing JSONs in a loop - python

With Python I'm pulling a nested json, and I'm seeking to parse it via a loop and write the data to a csv. The structure of the json is below. The values I'm after are in the "view" list, labeled "user_id" and `"message"'
{
"view": [
{
"id": 109205,
"user_id": 6354,
"parent_id": null,
"created_at": "2020-11-03T23:32:49Z",
"updated_at": "2020-11-03T23:32:49Z",
"rating_count": null,
"rating_sum": null,
**"message": "message text",**
"replies": [
# json continues
],
}
After some study and assistance from this helpful tutorial I was able to structure requests like this:
import requests
import json
import pandas as pd
url = "URL"
headers = {'Authorization' : 'Bearer KEY'}
r = requests.get(url, headers=headers)
data = r.json()
print(data['view'][0]['user_id'])
print(data['view'][0]['message'])
Which successfully prints the outputs 6354 and "message test".
Now....how would I approach capturing all the user id's and messages from the json to a csv with Pandas?

Related

How to convert python-request JSON results to csv?

I am trying to get my list of contacts from my WIX website using their API endpoint url and the requests module in python. I am totally stuck.
Here's my code so far:
import requests
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
for i in contacts_list:
for key in i:
print(key, ':', i[key])
Here is what I get:
200
id : long id string 1
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail1#yahoo.com'}]
phones : []
addresses : [{'tag': 'UNTAGGED', 'countryCode': 'US'}]
metadata : {'createdAt': '2020-07-08T22:41:07.135Z', 'updatedAt': '2020-07-08T22:42:19.327Z'}
source : {'sourceType': 'SITE_MEMBERS'}
id : long id string 2
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail2#yahoo.com'}]
phones : []
addresses : []
metadata : {'createdAt': '2020-07-03T00:51:21.127Z', 'updatedAt': '2020-07-04T03:26:16.370Z'}
source : {'sourceType': 'SITE_MEMBERS'}
Process finished with exit code 0
Each line is a string. I need each row of the csv to be a new contact (There are two sample contacts). The columns should be the keys. I plan to use the csv module to writerow(Fields), where fields is a list of string (keys) such as Fields = [id, emails, phones, addresses, metadata, source]
All I really need is the emails in a single column of a csv though. Is there a way to maybe just get the email for each contact?
A CSV file with one column is basically just a text file with one item per line, but you can use the csv module to do it if you really want, as shown below.
I commented-out the 'python-requests' stuff and used some sample input for testing.
test_data = {
"contacts": [
{
"id": "long id string 1",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail1#yahoo.com"
}
],
"phones": [],
"addresses": [
{
"tag": "UNTAGGED",
"countryCode": "US"
}
],
"metadata": {
"createdAt": "2020-07-08T22:41:07.135Z",
"updatedAt": "2020-07-08T22:42:19.327Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
},
{
"id": "long id string 2",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail2#yahoo.com"
}
],
"phones": [],
"addresses": [],
"metadata": {
"createdAt": "2020-07-03T00:51:21.127Z",
"updatedAt": "2020-07-04T03:26:16.370Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
}
]
}
import csv
import json
import requests
auth_key = "my auth key"
output_filename = 'whatever.csv'
#r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
#print(r.status_code)
#json_obj = r.json()
json_obj = test_data # FOR TESTING PURPOSES
contacts_list = json_obj["contacts"]
with open(output_filename, 'w', newline='') as outp:
writer = csv.writer(outp)
writer.writerow(['email']) # Write csv header.
for contact in contacts_list:
email = contact['emails'][0]['email'] # Get the first one.
writer.writerow([email])
print('email csv file written')
Contents of whatever.csv file afterwards:
email
sampleemail1#yahoo.com
sampleemail2#yahoo.com
Update:
As pointed by #martineau, I just saw you can array in few values, you need to cater it. You may make them string with [].join() in the for loop
you can write it to csv like this using csv package.
import csv, json, sys
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
output = csv.writer(sys.stdout)
#insert header(keys)
output.writerow(data[0].keys())
for i in contacts_list:
output.writerow(i.values())
At the end you can print and verify output

Print Specific Value from an API Request in Python

I am trying to print the values from an API Request. The JSON file returned is large(4,000 lines) so I am just trying to get specific values from the key value pair and automate a message.
Here is what I have so far:
import requests
import json
import urllib
url = "https://api.github.com/repos/<companyName>/<repoName>/issues" #url
payload = {}
headers = {
'Authorization': 'Bearer <masterToken>' #authorization works fine
}
name = (user.login) #pretty sure nothing is being looked out
url = (url)
print(hello %name, you have a pull request to view. See here %url for more information) # i want to print those keys here
The JSON file (exported from the API get request is as followed:
[
{
**"url": "https://github.com/<ompanyName>/<repo>/issues/1000",**
"repository_url": "https://github.com/<ompanyName>/<repo>",
"labels_url": "https://github.com/<ompanyName>/<repo>/issues/1000labels{/name}",
"comments_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"events_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"html_url": "https://github.com/<ompanyName>/<repo>/issues/1000",
"id": <id>,
"node_id": "<nodeID>",
"number": 702,
"title": "<titleName>",
"user": {
**"login": "<userName>",**
"id": <idNumber>,
"node_id": "nodeID",
"avatar_url": "https://avatars3.githubusercontent.com/u/urlName?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/<userName>",
"html_url": "https://github.com/<userName>",
"followers_url": "https://api.github.com/users/<userName>/followers",
"following_url": "https://api.github.com/users/<userName>/following{/other_user}",
"gists_url": "https://api.github.com/users/<userName>/gists{/gist_id}",
"starred_url": "https://api.github.com/users/<userName>/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/<userName>/subscriptions",
"organizations_url": "https://api.github.com/users/<userName>/orgs",
"repos_url": "https://api.github.com/users/<userName>/repos",
"events_url": "https://api.github.com/users/<userName>/events{/privacy}",
"received_events_url": "https://api.github.com/users/<userName>/received_events",
"type": "User",
"site_admin": false
},
]
(note this JSON file repeats a few hundred times)
From the API request, I am trying to get the nested "login" and the url.
What am I missing?
Thanks
Edit:
Solved:
import requests
import json
import urllib
url = "https://api.github.com/repos/<companyName>/<repoName>/issues"
payload = {}
headers = {
'Authorization': 'Bearer <masterToken>'
}
response = requests.get(url).json()
for obj in response:
name = obj['user']['login']
url = obj['url']
print('Hello {0}, you have an outstanding ticket to review. For more information see here:{1}.'.format(name,url))
Since it's a JSON array you have to loop over it. And JSON objects are converted to dictionaries, so you use ['key'] to access the elements.
for obj in response:
name = obj['user']['login']
url = obj['url']
print(f'hello {name}, you have a pull request to view. See here {url} for more information')
you can parse it into a python lists/dictionaries and then access it like any other python object.
response = requests.get(...).json()
login = response[0]['user']
You can convert JSON formatted data to a Python dictionary like this:
https://www.w3schools.com/python/python_json.asp
json_data = ... # response from API
dict_data = json.loads(json_data)
login = response[0]['user']['login']
url = response[0]['url']

Python: bitly request

I'm trying to do a basic Bitly shortening URL call. However, I cannot seem to either push the json correctly, or deal with the json response correctly... I omitted some obvious variables for brevity and obfuscated some real values for security purposes.
import requests
import json
bitly_header = {'Authorization':'Bearer
some_long_secret_character_string_here', 'Content-Type':'application/json'}
bitly_data = {
"long_url": ""+long_url+"",
"group_guid": ""+bitly_guid+""
}
short_link_resp =requests.post(bitly_endpoint,data=bitly_data,headers=bitly_header)
short_link_json = short_link_resp.json()
short_link = short_link_json["link"]
Errors is "Key error: 'link'
The json I get from Postman is:
{
"created_at": "1970-01-01T00:00:00+0000",
"id": "bit.ly/2MjdrrG",
"link": "bit.ly/2MjdrrG",
"custom_bitlinks": [],
"long_url": "google.com/",
"archived": false,
"tags": [],
"deeplinks": [],
"references": {
"group": "https://api-ssl.bitly.com/v4/groups/Bi7i8IbM1x9"
}
}
try replace data with json:
short_link_resp =requests.post(bitly_endpoint, json=bitly_data, headers=bitly_header)
see the doc ref.

Difficulties using Python request (POST) + API

Im trying to use a simple API with python. I get the data with the code below but, but I can't seem to parse it. When i print the type of variable "c" it says "unicode". I want a Json object or dictionary so I can use the information.
I have I tried various ways to solve this but I'm not sure if the output from the API (below) is actually Json or why it doesn't work properly.
import requests
import json
import urllib
test1 ={
"query": [
{
"code": "SNI2007",
"selection": {
"filter": "item",
"values": [
"47.4+47.54"
]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": [
"HA0101A9",
"HA0101B4"
]
}
},
{
"code": "Tid",
"selection": {
"filter": "item",
"values": [
"2010M01",
"2010M02",
"2010M03",
"2010M04",
"2010M05",
"2010M06",
"2010M07",
"2010M08",
"2010M09",
"2010M10",
"2010M11",
"2010M12",
"2011M01",
"2011M02",
"2011M03",
"2011M04",
"2011M05",
"2011M06",
"2011M07",
"2011M08",
"2011M09",
"2011M10",
"2011M11",
"2011M12",
"2012M01",
"2012M02",
"2012M03",
"2012M04",
"2012M05",
"2012M06",
"2012M07",
"2012M08",
"2012M09",
"2012M10",
"2012M11",
"2012M12",
"2013M01",
"2013M02",
"2013M03",
"2013M04",
"2013M05",
"2013M06",
"2013M07",
"2013M08",
"2013M09",
"2013M10",
"2013M11",
"2013M12",
"2014M01",
"2014M02",
"2014M03",
"2014M04",
"2014M05",
"2014M06",
"2014M07",
"2014M08",
"2014M09",
"2014M10",
"2014M11",
"2014M12",
"2015M01",
"2015M02",
"2015M03",
"2015M04",
"2015M05",
"2015M06",
"2015M07",
"2015M08",
"2015M09",
"2015M10",
"2015M11",
"2015M12",
"2016M01",
"2016M02",
"2016M03",
"2016M04",
"2016M05",
"2016M06",
"2016M07",
"2016M08",
"2016M09",
"2016M10",
"2016M11",
"2016M12",
"2017M01",
"2017M02",
"2017M03",
"2017M04",
"2017M05",
"2017M06",
"2017M07",
"2017M08",
"2017M09",
"2017M10",
"2017M11",
"2017M12",
"2018M01",
"2018M02",
"2018M03",
"2018M04"
]
}
}
],
"response": {
"format": "json"
}
}
response = requests.post("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/HA/HA0101/HA0101B/Detoms07", json = test1)
dat = response.content
b = json.dumps(dat)
c = json.loads(b)
print type(b)
This is what I get if I print the "response.content" variable.
{"columns":[{"code":"SNI2007","text":"näringsgren SNI 2007","type":"d"},{"code":"Tid","text":"månad","type":"t"},{"code":"HA0101A9","text":"Löpande priser","type":"c"},{"code":"HA0101B4","text":"Fasta priser","type":"c"}],"comments":[],"data":[{"key":["47.4+47.54","2010M01"],"values":["90.3","45.0"]},{"key":["47.4+47.54","2010M02"],"values":["80.9","40.3"]},{"key":["47.4+47.54","2010M03"],"values":["91.3","45.7"]},{"key":["47.4+47.54","2010M04"],"values":["83.9","43.5"]},{"key":["47.4+47.54","2010M05"],"values":["87.4","45.7"]},{"key":["47.4+47.54","2010M06"],"values":["97.6","52.6"]},{"key":["47.4+47.54","2010M07"],"values":["99.5","54.2"]},{"key":["47.4+47.54","2010M08"],"values":["105.2","57.3"]},{"key":["47.4+47.54","2010M09"],"values":["108.9","60.4"]},{"key":["47.4+47.54","2010M10"],"values":["107.9","60.7"]},{"key":["47.4+47.54","2010M11"],"values":["107.9","61.3"]},{"key":["47.4+47.54","2010M12"],"values":["181.9","106.1"]},{"key":["47.4+47.54","2011M01"],"values":["95.3","55.9"]},{"key":["47.4+47.54","2011M02"],"values":["80.1","47.3"]},{"key":["47.4+47.54","2011M03"],"values":["88.8","53.5"]},{"key":["47.4+47.54","2011M04"],"values":["79.4","48.5"]},{"key":["47.4+47.54","2011M05"],"values":["85.9","53.0"]},{"key":["47.4+47.54","2011M06"],"values":["90.2","57.3"]},{"key":["47.4+47.54","2011M07"],"values":["95.5","61.1"]},{"key":["47.4+47.54","2011M08"],"values":["97.1","62.3"]},{"key":["47.4+47.54","2011M09"],"values":["96.3","62.4"]},{"key":["47.4+47.54","2011M10"],"values":["97.0","63.6"]},{"key":["47.4+47.54","2011M11"],"values":["104.5","69.2"]},{"key":["47.4+47.54","2011M12"],"values":["171.4","113.9"]},{"key":["47.4+47.54","2012M01"],"values":["93.7","62.8"]},{"key":["47.4+47.54","2012M02"],"values":["78.3","53.1"]},{"key":["47.4+47.54","2012M03"],"values":["87.2","60.1"]},{"key":["47.4+47.54","2012M04"],"values":["82.7","57.4"]},{"key":["47.4+47.54","2012M05"],"values":["81.1","56.8"]},{"key":["47.4+47.54","2012M06"],"values":["92.8","66.3"]},{"key":["47.4+47.54","2012M07"],"values":["88.4","64.0"]},{"key":["47.4+47.54","2012M08"],"values":["92.7","68.0"]},{"key":["47.4+47.54","2012M09"],"values":["96.1","71.5"]},{"key":["47.4+47.54","2012M10"],"values":["92.4","69.7"]},{"key":["47.4+47.54","2012M11"],"values":["99.2","75.9"]},{"key":["47.4+47.54","2012M12"],"values":["147.5","115.5"]},{"key":["47.4+47.54","2013M01"],"values":["89.6","70.6"]},{"key":["47.4+47.54","2013M02"],"values":["75.5","59.9"]},{"key":["47.4+47.54","2013M03"],"values":["79.5","63.7"]},{"key":["47.4+47.54","2013M04"],"values":["76.2","62.0"]},{"key":["47.4+47.54","2013M05"],"values":["79.0","65.0"]},{"key":["47.4+47.54","2013M06"],"values":["84.6","70.5"]},{"key":["47.4+47.54","2013M07"],"values":["85.7","73.0"]},{"key":["47.4+47.54","2013M08"],"values":["91.6","77.8"]},{"key":["47.4+47.54","2013M09"],"values":["90.6","77.4"]},{"key":["47.4+47.54","2013M10"],"values":["93.0","79.8"]},{"key":["47.4+47.54","2013M11"],"values":["97.4","84.3"]},{"key":["47.4+47.54","2013M12"],"values":["151.0","133.0"]},{"key":["47.4+47.54","2014M01"],"values":["92.3","81.6"]},{"key":["47.4+47.54","2014M02"],"values":["75.7","67.6"]},{"key":["47.4+47.54","2014M03"],"values":["82.3","74.5"]},{"key":["47.4+47.54","2014M04"],"values":["79.6","72.7"]},{"key":["47.4+47.54","2014M05"],"values":["80.3","73.9"]},{"key":["47.4+47.54","2014M06"],"values":["92.7","85.9"]},{"key":["47.4+47.54","2014M07"],"values":["88.0","82.7"]},{"key":["47.4+47.54","2014M08"],"values":["94.4","88.6"]},{"key":["47.4+47.54","2014M09"],"values":["100.2","95.3"]},{"key":["47.4+47.54","2014M10"],"values":["103.0","98.9"]},{"key":["47.4+47.54","2014M11"],"values":["104.4","100.0"]},{"key":["47.4+47.54","2014M12"],"values":["159.9","154.1"]},{"key":["47.4+47.54","2015M01"],"values":["95.9","93.3"]},{"key":["47.4+47.54","2015M02"],"values":["80.5","78.3"]},{"key":["47.4+47.54","2015M03"],"values":["90.4","88.5"]},{"key":["47.4+47.54","2015M04"],"values":["82.6","81.2"]},{"key":["47.4+47.54","2015M05"],"values":["85.9","84.4"]},{"key":["47.4+47.54","2015M06"],"values":["97.5","96.8"]},{"key":["47.4+47.54","2015M07"],"values":["95.1","95.0"]},{"key":["47.4+47.54","2015M08"],"values":["93.7","93.8"]},{"key":["47.4+47.54","2015M09"],"values":["98.4","99.4"]},{"key":["47.4+47.54","2015M10"],"values":["105.5","107.5"]},{"key":["47.4+47.54","2015M11"],"values":["114.6","116.9"]},{"key":["47.4+47.54","2015M12"],"values":["159.9","164.9"]},{"key":["47.4+47.54","2016M01"],"values":["91.4","95.8"]},{"key":["47.4+47.54","2016M02"],"values":["84.7","90.1"]},{"key":["47.4+47.54","2016M03"],"values":["89.6","96.2"]},{"key":["47.4+47.54","2016M04"],"values":["87.9","94.8"]},{"key":["47.4+47.54","2016M05"],"values":["84.6","92.1"]},{"key":["47.4+47.54","2016M06"],"values":["95.0","105.6"]},{"key":["47.4+47.54","2016M07"],"values":["93.0","104.3"]},{"key":["47.4+47.54","2016M08"],"values":["96.1","106.9"]},{"key":["47.4+47.54","2016M09"],"values":["98.2","110.5"]},{"key":["47.4+47.54","2016M10"],"values":["103.2","116.4"]},{"key":["47.4+47.54","2016M11"],"values":["116.6","132.3"]},{"key":["47.4+47.54","2016M12"],"values":["155.6","177.2"]},{"key":["47.4+47.54","2017M01"],"values":["94.7","108.3"]},{"key":["47.4+47.54","2017M02"],"values":["79.2","91.4"]},{"key":["47.4+47.54","2017M03"],"values":["88.8","102.8"]},{"key":["47.4+47.54","2017M04"],"values":["80.3","93.9"]},{"key":["47.4+47.54","2017M05"],"values":["82.9","97.4"]},{"key":["47.4+47.54","2017M06"],"values":["94.2","111.0"]},{"key":["47.4+47.54","2017M07"],"values":["88.3","103.4"]},{"key":["47.4+47.54","2017M08"],"values":["91.0","105.8"]},{"key":["47.4+47.54","2017M09"],"values":["92.6","107.9"]},{"key":["47.4+47.54","2017M10"],"values":["97.9","115.2"]},{"key":["47.4+47.54","2017M11"],"values":["121.2","142.7"]},{"key":["47.4+47.54","2017M12"],"values":["149.7","177.7"]},{"key":["47.4+47.54","2018M01"],"values":["98.1","116.3"]},{"key":["47.4+47.54","2018M02"],"values":["79.0","94.6"]},{"key":["47.4+47.54","2018M03"],"values":["93.0","112.9"]},{"key":["47.4+47.54","2018M04"],"values":["85.6","104.3"]}]}
There's two strategies you can use, response.json() will give you back a dict with all the JSON in key-value format using requests internal JSON parser, the other if you want to use the actual json library is to do json_data = json.loads(response.text) and allow the json library to parse it instead.
In general the requests JSON parser is probably enough for what you need.

Filter data to facebook graph with json-python

I 'm getting Facebook graph and data already shows what I need , but I could not filter the 'message' and 'id' or JSON , appreciate them , I leave my code:
import facebook
import json
import urllib.request
from urllib.request import urlopen
page_id = "MYPAGE"
access_token = 'MY-ACCESS-TOKEN'
api_endpoint = "https://graph.facebook.com/v2.5/"
fb_graph_url = page_id+"?fields=id,name,feed.since(2015-12-22).until(2015-12-25){comments.filter(stream)}&access_token="+access_token
html = api_endpoint + fb_graph_url
print(html,"\n")
data = urllib.request.urlopen(html)
read_page= data.read()
print(read_page)
print(data.read(),"\n")
data2=json.loads(read_page.decode())
#message=data2["feed"]["data"]
message=data2
for item in message['feed']['data'][1]['comments']['data']:
print(item['message'])
print(item['from']['name'])
print(message,"\n")
He shows me something like:
{
"id": "2825921296",
"name": "MY-PAGE",
"feed": {
"data": [
{
"id": "2825921296_5155340"
},
{
"id": "2825921296_5155340",
"comments": {
"data": [
{
"from": {
"name": "Carl Jhon",
"id": "282564921296"
},
"message": "Comment one",
"created_time": "2015-12-10T03:42:05+0000",
"id": "5153352885_5153353484206"
},
{
And my question is , How to display only the 'message' and 'name' of all it shows.
Thankl and I appreciate your response.
It looks like the variable "message" in your code (not in the JSON data) is a dictionary. Subsequently, you can access the name and the first message by adding:
print(message['feed']['data'][1]['comments']['data'][0]['message'])
print(message['name'])
You can access the nth message with:
print(message['feed']['data'][1]['comments']['data'][n]['message'])
To print all of the messages, including the name of the author, you could use the for loop like this:
for item in message['feed']['data'][1]['comments']['data']:
print(item['message'])
print(item['from']['name'])
Or you can output a specific number of messages and names (100 in this case):
if len(message['feed']['data'][1]['comments']['data'])>=100:
for i in range(100):
print(message['feed']['data'][1]['comments']['data'][i]['message'])
print(message['feed']['data'][1]['comments']['data'][i]['from']['name'])
In case the message contains emojis, you can either add # -*- coding: utf-8 -*- to the top of your script or take a look at this post

Categories

Resources