Python & Pandas: Parsing JSONs in a loop - python
With Python I'm pulling a nested json, and I'm seeking to parse it via a loop and write the data to a csv. The structure of the json is below. The values I'm after are in the "view" list, labeled "user_id" and `"message"'
{
"view": [
{
"id": 109205,
"user_id": 6354,
"parent_id": null,
"created_at": "2020-11-03T23:32:49Z",
"updated_at": "2020-11-03T23:32:49Z",
"rating_count": null,
"rating_sum": null,
**"message": "message text",**
"replies": [
# json continues
],
}
After some study and assistance from this helpful tutorial I was able to structure requests like this:
import requests
import json
import pandas as pd
url = "URL"
headers = {'Authorization' : 'Bearer KEY'}
r = requests.get(url, headers=headers)
data = r.json()
print(data['view'][0]['user_id'])
print(data['view'][0]['message'])
Which successfully prints the outputs 6354 and "message test".
Now....how would I approach capturing all the user id's and messages from the json to a csv with Pandas?
Related
How to convert python-request JSON results to csv?
I am trying to get my list of contacts from my WIX website using their API endpoint url and the requests module in python. I am totally stuck. Here's my code so far: import requests auth_key = "my auth key" r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key}) print(r.status_code) dict = r.json() contacts_list = dict["contacts"] for i in contacts_list: for key in i: print(key, ':', i[key]) Here is what I get: 200 id : long id string 1 emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail1#yahoo.com'}] phones : [] addresses : [{'tag': 'UNTAGGED', 'countryCode': 'US'}] metadata : {'createdAt': '2020-07-08T22:41:07.135Z', 'updatedAt': '2020-07-08T22:42:19.327Z'} source : {'sourceType': 'SITE_MEMBERS'} id : long id string 2 emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail2#yahoo.com'}] phones : [] addresses : [] metadata : {'createdAt': '2020-07-03T00:51:21.127Z', 'updatedAt': '2020-07-04T03:26:16.370Z'} source : {'sourceType': 'SITE_MEMBERS'} Process finished with exit code 0 Each line is a string. I need each row of the csv to be a new contact (There are two sample contacts). The columns should be the keys. I plan to use the csv module to writerow(Fields), where fields is a list of string (keys) such as Fields = [id, emails, phones, addresses, metadata, source] All I really need is the emails in a single column of a csv though. Is there a way to maybe just get the email for each contact?
A CSV file with one column is basically just a text file with one item per line, but you can use the csv module to do it if you really want, as shown below. I commented-out the 'python-requests' stuff and used some sample input for testing. test_data = { "contacts": [ { "id": "long id string 1", "emails": [ { "tag": "UNTAGGED", "email": "sampleemail1#yahoo.com" } ], "phones": [], "addresses": [ { "tag": "UNTAGGED", "countryCode": "US" } ], "metadata": { "createdAt": "2020-07-08T22:41:07.135Z", "updatedAt": "2020-07-08T22:42:19.327Z" }, "source": { "sourceType": "SITE_MEMBERS" } }, { "id": "long id string 2", "emails": [ { "tag": "UNTAGGED", "email": "sampleemail2#yahoo.com" } ], "phones": [], "addresses": [], "metadata": { "createdAt": "2020-07-03T00:51:21.127Z", "updatedAt": "2020-07-04T03:26:16.370Z" }, "source": { "sourceType": "SITE_MEMBERS" } } ] } import csv import json import requests auth_key = "my auth key" output_filename = 'whatever.csv' #r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key}) #print(r.status_code) #json_obj = r.json() json_obj = test_data # FOR TESTING PURPOSES contacts_list = json_obj["contacts"] with open(output_filename, 'w', newline='') as outp: writer = csv.writer(outp) writer.writerow(['email']) # Write csv header. for contact in contacts_list: email = contact['emails'][0]['email'] # Get the first one. writer.writerow([email]) print('email csv file written') Contents of whatever.csv file afterwards: email sampleemail1#yahoo.com sampleemail2#yahoo.com
Update: As pointed by #martineau, I just saw you can array in few values, you need to cater it. You may make them string with [].join() in the for loop you can write it to csv like this using csv package. import csv, json, sys auth_key = "my auth key" r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key}) print(r.status_code) dict = r.json() contacts_list = dict["contacts"] output = csv.writer(sys.stdout) #insert header(keys) output.writerow(data[0].keys()) for i in contacts_list: output.writerow(i.values()) At the end you can print and verify output
Print Specific Value from an API Request in Python
I am trying to print the values from an API Request. The JSON file returned is large(4,000 lines) so I am just trying to get specific values from the key value pair and automate a message. Here is what I have so far: import requests import json import urllib url = "https://api.github.com/repos/<companyName>/<repoName>/issues" #url payload = {} headers = { 'Authorization': 'Bearer <masterToken>' #authorization works fine } name = (user.login) #pretty sure nothing is being looked out url = (url) print(hello %name, you have a pull request to view. See here %url for more information) # i want to print those keys here The JSON file (exported from the API get request is as followed: [ { **"url": "https://github.com/<ompanyName>/<repo>/issues/1000",** "repository_url": "https://github.com/<ompanyName>/<repo>", "labels_url": "https://github.com/<ompanyName>/<repo>/issues/1000labels{/name}", "comments_url": "https://github.com/<ompanyName>/<repo>/issues/1000", "events_url": "https://github.com/<ompanyName>/<repo>/issues/1000", "html_url": "https://github.com/<ompanyName>/<repo>/issues/1000", "id": <id>, "node_id": "<nodeID>", "number": 702, "title": "<titleName>", "user": { **"login": "<userName>",** "id": <idNumber>, "node_id": "nodeID", "avatar_url": "https://avatars3.githubusercontent.com/u/urlName?v=4", "gravatar_id": "", "url": "https://api.github.com/users/<userName>", "html_url": "https://github.com/<userName>", "followers_url": "https://api.github.com/users/<userName>/followers", "following_url": "https://api.github.com/users/<userName>/following{/other_user}", "gists_url": "https://api.github.com/users/<userName>/gists{/gist_id}", "starred_url": "https://api.github.com/users/<userName>/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/<userName>/subscriptions", "organizations_url": "https://api.github.com/users/<userName>/orgs", "repos_url": "https://api.github.com/users/<userName>/repos", "events_url": "https://api.github.com/users/<userName>/events{/privacy}", "received_events_url": "https://api.github.com/users/<userName>/received_events", "type": "User", "site_admin": false }, ] (note this JSON file repeats a few hundred times) From the API request, I am trying to get the nested "login" and the url. What am I missing? Thanks Edit: Solved: import requests import json import urllib url = "https://api.github.com/repos/<companyName>/<repoName>/issues" payload = {} headers = { 'Authorization': 'Bearer <masterToken>' } response = requests.get(url).json() for obj in response: name = obj['user']['login'] url = obj['url'] print('Hello {0}, you have an outstanding ticket to review. For more information see here:{1}.'.format(name,url))
Since it's a JSON array you have to loop over it. And JSON objects are converted to dictionaries, so you use ['key'] to access the elements. for obj in response: name = obj['user']['login'] url = obj['url'] print(f'hello {name}, you have a pull request to view. See here {url} for more information')
you can parse it into a python lists/dictionaries and then access it like any other python object. response = requests.get(...).json() login = response[0]['user']
You can convert JSON formatted data to a Python dictionary like this: https://www.w3schools.com/python/python_json.asp json_data = ... # response from API dict_data = json.loads(json_data) login = response[0]['user']['login'] url = response[0]['url']
Python: bitly request
I'm trying to do a basic Bitly shortening URL call. However, I cannot seem to either push the json correctly, or deal with the json response correctly... I omitted some obvious variables for brevity and obfuscated some real values for security purposes. import requests import json bitly_header = {'Authorization':'Bearer some_long_secret_character_string_here', 'Content-Type':'application/json'} bitly_data = { "long_url": ""+long_url+"", "group_guid": ""+bitly_guid+"" } short_link_resp =requests.post(bitly_endpoint,data=bitly_data,headers=bitly_header) short_link_json = short_link_resp.json() short_link = short_link_json["link"] Errors is "Key error: 'link' The json I get from Postman is: { "created_at": "1970-01-01T00:00:00+0000", "id": "bit.ly/2MjdrrG", "link": "bit.ly/2MjdrrG", "custom_bitlinks": [], "long_url": "google.com/", "archived": false, "tags": [], "deeplinks": [], "references": { "group": "https://api-ssl.bitly.com/v4/groups/Bi7i8IbM1x9" } }
try replace data with json: short_link_resp =requests.post(bitly_endpoint, json=bitly_data, headers=bitly_header) see the doc ref.
Difficulties using Python request (POST) + API
Im trying to use a simple API with python. I get the data with the code below but, but I can't seem to parse it. When i print the type of variable "c" it says "unicode". I want a Json object or dictionary so I can use the information. I have I tried various ways to solve this but I'm not sure if the output from the API (below) is actually Json or why it doesn't work properly. import requests import json import urllib test1 ={ "query": [ { "code": "SNI2007", "selection": { "filter": "item", "values": [ "47.4+47.54" ] } }, { "code": "ContentsCode", "selection": { "filter": "item", "values": [ "HA0101A9", "HA0101B4" ] } }, { "code": "Tid", "selection": { "filter": "item", "values": [ "2010M01", "2010M02", "2010M03", "2010M04", "2010M05", "2010M06", "2010M07", "2010M08", "2010M09", "2010M10", "2010M11", "2010M12", "2011M01", "2011M02", "2011M03", "2011M04", "2011M05", "2011M06", "2011M07", "2011M08", "2011M09", "2011M10", "2011M11", "2011M12", "2012M01", "2012M02", "2012M03", "2012M04", "2012M05", "2012M06", "2012M07", "2012M08", "2012M09", "2012M10", "2012M11", "2012M12", "2013M01", "2013M02", "2013M03", "2013M04", "2013M05", "2013M06", "2013M07", "2013M08", "2013M09", "2013M10", "2013M11", "2013M12", "2014M01", "2014M02", "2014M03", "2014M04", "2014M05", "2014M06", "2014M07", "2014M08", "2014M09", "2014M10", "2014M11", "2014M12", "2015M01", "2015M02", "2015M03", "2015M04", "2015M05", "2015M06", "2015M07", "2015M08", "2015M09", "2015M10", "2015M11", "2015M12", "2016M01", "2016M02", "2016M03", "2016M04", "2016M05", "2016M06", "2016M07", "2016M08", "2016M09", "2016M10", "2016M11", "2016M12", "2017M01", "2017M02", "2017M03", "2017M04", "2017M05", "2017M06", "2017M07", "2017M08", "2017M09", "2017M10", "2017M11", "2017M12", "2018M01", "2018M02", "2018M03", "2018M04" ] } } ], "response": { "format": "json" } } response = requests.post("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/HA/HA0101/HA0101B/Detoms07", json = test1) dat = response.content b = json.dumps(dat) c = json.loads(b) print type(b) This is what I get if I print the "response.content" variable. {"columns":[{"code":"SNI2007","text":"näringsgren SNI 2007","type":"d"},{"code":"Tid","text":"månad","type":"t"},{"code":"HA0101A9","text":"Löpande priser","type":"c"},{"code":"HA0101B4","text":"Fasta priser","type":"c"}],"comments":[],"data":[{"key":["47.4+47.54","2010M01"],"values":["90.3","45.0"]},{"key":["47.4+47.54","2010M02"],"values":["80.9","40.3"]},{"key":["47.4+47.54","2010M03"],"values":["91.3","45.7"]},{"key":["47.4+47.54","2010M04"],"values":["83.9","43.5"]},{"key":["47.4+47.54","2010M05"],"values":["87.4","45.7"]},{"key":["47.4+47.54","2010M06"],"values":["97.6","52.6"]},{"key":["47.4+47.54","2010M07"],"values":["99.5","54.2"]},{"key":["47.4+47.54","2010M08"],"values":["105.2","57.3"]},{"key":["47.4+47.54","2010M09"],"values":["108.9","60.4"]},{"key":["47.4+47.54","2010M10"],"values":["107.9","60.7"]},{"key":["47.4+47.54","2010M11"],"values":["107.9","61.3"]},{"key":["47.4+47.54","2010M12"],"values":["181.9","106.1"]},{"key":["47.4+47.54","2011M01"],"values":["95.3","55.9"]},{"key":["47.4+47.54","2011M02"],"values":["80.1","47.3"]},{"key":["47.4+47.54","2011M03"],"values":["88.8","53.5"]},{"key":["47.4+47.54","2011M04"],"values":["79.4","48.5"]},{"key":["47.4+47.54","2011M05"],"values":["85.9","53.0"]},{"key":["47.4+47.54","2011M06"],"values":["90.2","57.3"]},{"key":["47.4+47.54","2011M07"],"values":["95.5","61.1"]},{"key":["47.4+47.54","2011M08"],"values":["97.1","62.3"]},{"key":["47.4+47.54","2011M09"],"values":["96.3","62.4"]},{"key":["47.4+47.54","2011M10"],"values":["97.0","63.6"]},{"key":["47.4+47.54","2011M11"],"values":["104.5","69.2"]},{"key":["47.4+47.54","2011M12"],"values":["171.4","113.9"]},{"key":["47.4+47.54","2012M01"],"values":["93.7","62.8"]},{"key":["47.4+47.54","2012M02"],"values":["78.3","53.1"]},{"key":["47.4+47.54","2012M03"],"values":["87.2","60.1"]},{"key":["47.4+47.54","2012M04"],"values":["82.7","57.4"]},{"key":["47.4+47.54","2012M05"],"values":["81.1","56.8"]},{"key":["47.4+47.54","2012M06"],"values":["92.8","66.3"]},{"key":["47.4+47.54","2012M07"],"values":["88.4","64.0"]},{"key":["47.4+47.54","2012M08"],"values":["92.7","68.0"]},{"key":["47.4+47.54","2012M09"],"values":["96.1","71.5"]},{"key":["47.4+47.54","2012M10"],"values":["92.4","69.7"]},{"key":["47.4+47.54","2012M11"],"values":["99.2","75.9"]},{"key":["47.4+47.54","2012M12"],"values":["147.5","115.5"]},{"key":["47.4+47.54","2013M01"],"values":["89.6","70.6"]},{"key":["47.4+47.54","2013M02"],"values":["75.5","59.9"]},{"key":["47.4+47.54","2013M03"],"values":["79.5","63.7"]},{"key":["47.4+47.54","2013M04"],"values":["76.2","62.0"]},{"key":["47.4+47.54","2013M05"],"values":["79.0","65.0"]},{"key":["47.4+47.54","2013M06"],"values":["84.6","70.5"]},{"key":["47.4+47.54","2013M07"],"values":["85.7","73.0"]},{"key":["47.4+47.54","2013M08"],"values":["91.6","77.8"]},{"key":["47.4+47.54","2013M09"],"values":["90.6","77.4"]},{"key":["47.4+47.54","2013M10"],"values":["93.0","79.8"]},{"key":["47.4+47.54","2013M11"],"values":["97.4","84.3"]},{"key":["47.4+47.54","2013M12"],"values":["151.0","133.0"]},{"key":["47.4+47.54","2014M01"],"values":["92.3","81.6"]},{"key":["47.4+47.54","2014M02"],"values":["75.7","67.6"]},{"key":["47.4+47.54","2014M03"],"values":["82.3","74.5"]},{"key":["47.4+47.54","2014M04"],"values":["79.6","72.7"]},{"key":["47.4+47.54","2014M05"],"values":["80.3","73.9"]},{"key":["47.4+47.54","2014M06"],"values":["92.7","85.9"]},{"key":["47.4+47.54","2014M07"],"values":["88.0","82.7"]},{"key":["47.4+47.54","2014M08"],"values":["94.4","88.6"]},{"key":["47.4+47.54","2014M09"],"values":["100.2","95.3"]},{"key":["47.4+47.54","2014M10"],"values":["103.0","98.9"]},{"key":["47.4+47.54","2014M11"],"values":["104.4","100.0"]},{"key":["47.4+47.54","2014M12"],"values":["159.9","154.1"]},{"key":["47.4+47.54","2015M01"],"values":["95.9","93.3"]},{"key":["47.4+47.54","2015M02"],"values":["80.5","78.3"]},{"key":["47.4+47.54","2015M03"],"values":["90.4","88.5"]},{"key":["47.4+47.54","2015M04"],"values":["82.6","81.2"]},{"key":["47.4+47.54","2015M05"],"values":["85.9","84.4"]},{"key":["47.4+47.54","2015M06"],"values":["97.5","96.8"]},{"key":["47.4+47.54","2015M07"],"values":["95.1","95.0"]},{"key":["47.4+47.54","2015M08"],"values":["93.7","93.8"]},{"key":["47.4+47.54","2015M09"],"values":["98.4","99.4"]},{"key":["47.4+47.54","2015M10"],"values":["105.5","107.5"]},{"key":["47.4+47.54","2015M11"],"values":["114.6","116.9"]},{"key":["47.4+47.54","2015M12"],"values":["159.9","164.9"]},{"key":["47.4+47.54","2016M01"],"values":["91.4","95.8"]},{"key":["47.4+47.54","2016M02"],"values":["84.7","90.1"]},{"key":["47.4+47.54","2016M03"],"values":["89.6","96.2"]},{"key":["47.4+47.54","2016M04"],"values":["87.9","94.8"]},{"key":["47.4+47.54","2016M05"],"values":["84.6","92.1"]},{"key":["47.4+47.54","2016M06"],"values":["95.0","105.6"]},{"key":["47.4+47.54","2016M07"],"values":["93.0","104.3"]},{"key":["47.4+47.54","2016M08"],"values":["96.1","106.9"]},{"key":["47.4+47.54","2016M09"],"values":["98.2","110.5"]},{"key":["47.4+47.54","2016M10"],"values":["103.2","116.4"]},{"key":["47.4+47.54","2016M11"],"values":["116.6","132.3"]},{"key":["47.4+47.54","2016M12"],"values":["155.6","177.2"]},{"key":["47.4+47.54","2017M01"],"values":["94.7","108.3"]},{"key":["47.4+47.54","2017M02"],"values":["79.2","91.4"]},{"key":["47.4+47.54","2017M03"],"values":["88.8","102.8"]},{"key":["47.4+47.54","2017M04"],"values":["80.3","93.9"]},{"key":["47.4+47.54","2017M05"],"values":["82.9","97.4"]},{"key":["47.4+47.54","2017M06"],"values":["94.2","111.0"]},{"key":["47.4+47.54","2017M07"],"values":["88.3","103.4"]},{"key":["47.4+47.54","2017M08"],"values":["91.0","105.8"]},{"key":["47.4+47.54","2017M09"],"values":["92.6","107.9"]},{"key":["47.4+47.54","2017M10"],"values":["97.9","115.2"]},{"key":["47.4+47.54","2017M11"],"values":["121.2","142.7"]},{"key":["47.4+47.54","2017M12"],"values":["149.7","177.7"]},{"key":["47.4+47.54","2018M01"],"values":["98.1","116.3"]},{"key":["47.4+47.54","2018M02"],"values":["79.0","94.6"]},{"key":["47.4+47.54","2018M03"],"values":["93.0","112.9"]},{"key":["47.4+47.54","2018M04"],"values":["85.6","104.3"]}]}
There's two strategies you can use, response.json() will give you back a dict with all the JSON in key-value format using requests internal JSON parser, the other if you want to use the actual json library is to do json_data = json.loads(response.text) and allow the json library to parse it instead. In general the requests JSON parser is probably enough for what you need.
Filter data to facebook graph with json-python
I 'm getting Facebook graph and data already shows what I need , but I could not filter the 'message' and 'id' or JSON , appreciate them , I leave my code: import facebook import json import urllib.request from urllib.request import urlopen page_id = "MYPAGE" access_token = 'MY-ACCESS-TOKEN' api_endpoint = "https://graph.facebook.com/v2.5/" fb_graph_url = page_id+"?fields=id,name,feed.since(2015-12-22).until(2015-12-25){comments.filter(stream)}&access_token="+access_token html = api_endpoint + fb_graph_url print(html,"\n") data = urllib.request.urlopen(html) read_page= data.read() print(read_page) print(data.read(),"\n") data2=json.loads(read_page.decode()) #message=data2["feed"]["data"] message=data2 for item in message['feed']['data'][1]['comments']['data']: print(item['message']) print(item['from']['name']) print(message,"\n") He shows me something like: { "id": "2825921296", "name": "MY-PAGE", "feed": { "data": [ { "id": "2825921296_5155340" }, { "id": "2825921296_5155340", "comments": { "data": [ { "from": { "name": "Carl Jhon", "id": "282564921296" }, "message": "Comment one", "created_time": "2015-12-10T03:42:05+0000", "id": "5153352885_5153353484206" }, { And my question is , How to display only the 'message' and 'name' of all it shows. Thankl and I appreciate your response.
It looks like the variable "message" in your code (not in the JSON data) is a dictionary. Subsequently, you can access the name and the first message by adding: print(message['feed']['data'][1]['comments']['data'][0]['message']) print(message['name']) You can access the nth message with: print(message['feed']['data'][1]['comments']['data'][n]['message']) To print all of the messages, including the name of the author, you could use the for loop like this: for item in message['feed']['data'][1]['comments']['data']: print(item['message']) print(item['from']['name']) Or you can output a specific number of messages and names (100 in this case): if len(message['feed']['data'][1]['comments']['data'])>=100: for i in range(100): print(message['feed']['data'][1]['comments']['data'][i]['message']) print(message['feed']['data'][1]['comments']['data'][i]['from']['name']) In case the message contains emojis, you can either add # -*- coding: utf-8 -*- to the top of your script or take a look at this post