Parse JSON from URL and skip first line with Python

Parse JSON from URL and skip first line with Python - python

I have a URL which contains some JSON data. I would like to parse this data and convert to a dictionary using Python. The first line of the data on the webpage is not in JSON format, so I would like to skip the first line before parsing. The data on the webpage looks like the following:
expected 1 issue, got 1
{
"Issues": [
{
"issue": {
"assignedTo": {
"iD": "2",
"name": "industry"
},
"count": "1117",
"logger": "errors",
"metadata": {
"function": "_execute",
"type": "IntegrityError",
"value": "duplicate key value violates unique constraint \nDETAIL: Key (id, date, reference)=(17, 2020-08-03, ZER) already exists.\n"
},
"stats": {},
"status": "unresolved",
"type": "error"
},
"Events": [
{
"message": "Unable to record contract details",
"tags": {
"environment": "worker",
"handled": "yes",
"level": "error",
"logger": "errors",
"mechanism": "logging",
},
"Messages": null,
"Stacktraces": null,
"Exceptions": null,
"Requests": null,
"Templates": null,
"Users": null,
"Breadcrumbs": null,
"Context": null
},
],
"fetch_time": "2020-07-20"
}
]
}
And I have tried running this script:
with urllib.request.urlopen("[my_url_here]") as url:
if(url.getcode()==200):
for _ in range(1):
next(url)
data = url.read()
json=json.loads(data)
else:
print("Error receiving data", url.getcode())
But am met with the error:
Traceback (most recent call last):
File "<stdin>", line 6, in <module>
File
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I get the same error when I run it without using
for _ in range(2):
next(url)
... but with the last line as 'Expecting value: line 2 column 1 (char 1)'.
Any advice? Thanks

You can Remove the First line by the following code.
Code:
data = ''.join(data.split('\n')[1:])
print(data)
Output:
{ "Issues": [ { "issue": { "assignedTo": { "iD": "2", "name": "industry" }, "count": "1117", "logger": "errors", "metadata": { "function": "_execute", "type": "IntegrityError", "value": "duplicate key value violates unique constraint DETAIL: Key (id, date, reference)=(17, 2020-08-03, ZER) already exists." }, "stats": {}, "status": "unresolved", "type": "error" }, "Events": [ { "message": "Unable to record contract details", "tags": { "environment": "worker", "handled": "yes", "level": "error", "logger": "errors", "mechanism": "logging", }, "Messages": null, "Stacktraces": null, "Exceptions": null, "Requests": null, "Templates": null, "Users": null, "Breadcrumbs": null, "Context": null }, ], "fetch_time": "2020-07-20" } ]}
As you can see that the we achieved removing first line. But your Parsed Json response have issues. It is not properly formatted. Take a look on the below image.
On the crossed lines we got extra comma letting know the parser there are more instances left still but your response doesn't have any more instance on that scope. So please check your code which used to convert your data to json. If you have doubt please write here. For validating your json you can check on https://jsonlint.com/
I hope it would be helpful... :)

you can try to load the json like this:
json.loads(data.split("\n",1)[1])
this will split the string at the first endline and use the second part of it.
However I discourage this as you can't be sure your server will always reply like this - try to fix the endpoint or find one that returns a valid json reply if you can.
you will still get a json.decoder.JSONDecodeError: Invalid control character at: line 14 column 68 (char 336) because of that \n in the data.

Related

extract dict from string

'''[ERROR] 2020-10-01T04:46:37.sdfsdqs889dgsdg9dgdf {
"correlation_id": "asdfsdf-dsfasdfa-adfadsf-asdf",
"invocation_timestamp": null,
"invoked_component": "lambda",
"invoker_agent": null,
"message": {
"errorMessage": "Unauthorized",
"statusCode": 401
},
"message_type": "ERROR",
"original_source_app": "",
"response_timestamp": "2020-10-01 04:46:37.121436",
"status": 401,
"target_idp_application": "",
"timezone": "UTC"
}'''
How would I convert this string to only contain the dict object inside of it?
such as:
{
"correlation_id": "asdfsdf-dsfasdfa-adfadsf-asdf",
"invocation_timestamp": null,
"invoked_component": "lambda",
"invoker_agent": null,
"message": {
"errorMessage": "Unauthorized",
"statusCode": 401
},
"message_type": "ERROR",
"original_source_app": "",
"response_timestamp": "2020-10-01 04:46:37.121436",
"status": 401,
"target_idp_application": "",
"timezone": "UTC"
}

You could do something like this to get the string form
test = '''[ERROR] 2020-10-01T04:46:37.sdfsdqs889dgsdg9dgdf {
"correlation_id": "asdfsdf-dsfasdfa-adfadsf-asdf",
"invocation_timestamp": null,
"invoked_component": "lambda",
"invoker_agent": null,
"message": {
"errorMessage": "Unauthorized",
"statusCode": 401
},
"message_type": "ERROR",
"original_source_app": "",
"response_timestamp": "2020-10-01 04:46:37.121436",
"status": 401,
"target_idp_application": "",
"timezone": "UTC"
}'''
print(test[test.find('{'):]) # find the first '{' and discard all characters before that index in the string
and you could do this if you want it as a dict object
import json
dict_form = json.loads(test[test.find('{'):]) # same as before now sending it to json.loads which converts a string to a dict object (as most requests are sent as a string)
print(dict_form)

Try
res = json.loads(string_var)
print(res)
Now you can use res dict to access it.

Issue building a REST API query using $filter for Power BI Admin API

I am trying to access a Power BI Admin API by filtering the status of dataset refreshes that are failed, however the API filter query doesn't work.
Here is the documentation : https://learn.microsoft.com/en-us/rest/api/power-bi/admin/getrefreshables
Below is part my code in Python for calling Get method, which is failing -
refreshables_url = "https://api.powerbi.com/v1.0/myorg/admin/capacities/refreshables?$filter=lastRefresh/status eq 'Failed'"
header = {'Content-Type':'application/json','Authorization': f'Bearer {access_token}'}
r = requests.get(url=refreshables_url, headers=header)
Below error is thrown when I try to filter for status -
raise JSONDecodeError("Expecting value",s,err.value") from None json.decoder.JSONDecodeError: Expecting value line 1 column 1 (char 0)
However, when I tried below, it works fine for such simple queries without inner/nested elements.
refreshables_url = "https://api.powerbi.com/v1.0/myorg/admin/capacities/refreshables?$filter=averageDuration gt 1200"
refreshables_url = "https://api.powerbi.com/v1.0/myorg/admin/capacities/refreshables?$filter=refreshesPerDay eq 15"
However, when I try to filter for inner array like Status, it fails. I must be calling it incorrectly but not sure of it.
What am I missing here?
Here is how the response looks like -
{
"value": [
{
"id": "cfafbeb1-8037-4d0c-896e-a46fb27ff229",
"name": "SalesMarketing",
"kind": "Dataset",
"startTime": "2017-06-13T09:25:43.153Z",
"endTime": "2017-06-19T11:22:32.445Z",
"refreshCount": 22,
"refreshFailures": 0,
"averageDuration": 289.3814,
"medianDuration": 268.6245,
"refreshesPerDay": 11,
"lastRefresh": {
"refreshType": "ViaApi",
"startTime": "2017-06-13T09:25:43.153Z",
"endTime": "2017-06-13T09:31:43.153Z",
"status": "Completed",
"requestId": "9399bb89-25d1-44f8-8576-136d7e9014b1"
}
}
]
}
Here is what am expecting (it should just be filtering the entries for status as "Failed" instead of above "completed" entries -
{
"value": [
{
"id": "ewrffbeb1-6337-460c-326e-a46fb27hh234",
"name": "SalesMarketing",
"kind": "Dataset",
"startTime": "2017-06-13T09:25:43.153Z",
"endTime": "2017-06-19T11:22:32.445Z",
"refreshCount": 2,
"refreshFailures": 0,
"averageDuration": 189.3814,
"medianDuration": 168.6245,
"refreshesPerDay": 1,
"lastRefresh": {
"refreshType": "ViaApi",
"startTime": "2017-04-13T09:25:43.153Z",
"endTime": "2017-10-13T09:31:43.153Z",
"status": "Failed",
"requestId": "43643bb89-25d1-77f8-8543-dsgfewre9034r3223"
}
}
]
}

Parsing list of dictionaries in a dictionary to retrieve a specific key's value from each dictionary

I got a JSON response and converted it to a python dictionary using json.loads(). So the dictionary looks like this:
{u'body': u'[{"id":"1","entity":"zone","status":"PROCESSING","url":null,"createdOn":"2019-10-11T05:49:11Z"},{"id":"2","entity":"floor","status":"FAILED","url":null,"createdOn":"2019-10-11T05:49:15Z"},{"id":"3","entityType":"apartment","status":"SUCCESS","url":null,"createdOn":"2019-10-11T05:49:18Z"}]',u'isBase64Encoded': False, u'statusCode': 200}
I named this as testStatusList. I want to retrieve the value of "status" key of every dictionary inside "body". I was able to retrieve the "body" by giving body = testStatusList['body']. Now, the dictionary looks like:
[
{
"id": "1",
"entityType": "zone",
"status": "PROCESSING",
"url": null,
"createdOn": "2019-03-07T12:47:10Z"
},
{
"id": "2",
"entityType": "floor",
"status": "FAILED",
"url": null,
"createdOn": "2019-08-19T16:46:13Z"
},
{
"id": "3",
"entityType": "apartment",
"status": "SUCCESS",
"url": null,
"createdOn": "2019-08-19T16:46:13Z"
}
]
I tried out this solution [Parsing a dictionary to retrieve a key in Python 3.6
testStatusList= json.loads(status_response['Payload'].read())
body = testStatusList['body']
status =[]
for b in body:
for k,v in b.items():
if k == 'status':
status.append(v)
but I keep getting AttributeError: 'unicode' object has no attribute 'items'. Is there a different method to get items for unicode objects?
So I basically want to retrieve all the statuses i.e., PROCESSING, FAILED AND SUCCESS so that I can put an 'if' condition to display appropriate messages when something failed for that particular "id". I am very unsure about my approach as I am totally new to Python. Any help would be much appreciated thanks!

body is still a (unicode) string in your top blob. Use json.loads again on that string:
body = """[
{
"id": "1",
"entityType": "zone",
"status": "PROCESSING",
"url": null,
"createdOn": "2019-03-07T12:47:10Z"
},
{
"id": "2",
"entityType": "floor",
"status": "FAILED",
"url": null,
"createdOn": "2019-08-19T16:46:13Z"
},
{
"id": "3",
"entityType": "apartment",
"status": "SUCCESS",
"url": null,
"createdOn": "2019-08-19T16:46:13Z"
}
]"""
import json
body = json.loads(body)
status =[]
for b in body:
for k,v in b.items():
if k == 'status':
status.append(v)
print(status)
Result:
['PROCESSING', 'FAILED', 'SUCCESS']

JSON or Python dict / list decoding problem

I have been using the Python script below to try and retrieve and extract some data from Flightradar24, it would appear that it extracts the data in JSON format and will print the data out ok fully using json.dumps, but when I attempt to select the data I want (the status text in this case) using get it gives the following error:
'list' object has no attribute 'get'
Is the Data in JSON or a List ? I'm totally confused now.
I'm fairly new to working with data in JSON format, any help would be appreciated!
Script:
import flightradar24
import json
flight_id = 'BA458'
fr = flightradar24.Api()
flight = fr.get_flight(flight_id)
y = flight.get("data")
print (json.dumps(flight, indent=4))
X= (flight.get('result').get('response').get('data').get('status').get('text'))
print (X)
Sample of output data:
{
"result": {
"request": {
"callback": null,
"device": null,
"fetchBy": "flight",
"filterBy": null,
"format": "json",
"limit": 25,
"page": 1,
"pk": null,
"query": "BA458",
"timestamp": null,
"token": null
},
"response": {
"item": {
"current": 16,
"total": null,
"limit": 25
},
"page": {
"current": 1,
"total": null
},
"timestamp": 1546241512,
"data": [
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
"live": false,
"text": "Scheduled",
"icon": null,
"estimated": null,
"ambiguous": false,
"generic": {
"status": {
"text": "scheduled",
"type": "departure",
"color": "gray",
"diverted": null
},

You can use print(type(variable_name)) to see what type it is. The .get(key[,default]) is not supported on lists - it is supported for dict's.
X = (flight.get('result').get('response').get('data').get('status').get('text'))
# ^^^^^^^^ does not work, data is a list of dicts
as data is a list of dicts:
"data": [ # <<<<<< this is a list
{
"identification": {
"id": null,
"row": 4852575431,
"number": {
"default": "BA458",
"alternative": null
},
"callsign": null,
"codeshare": null
},
"status": {
This should work:
X = (flight.get('result').get('response').get('data')[0].get('status').get('text')

The issue, as pointed out by #PatrickArtner, is your data is actually a list rather than a dictionary. As an aside, you may find your code more readable if you were to use a helper function to apply dict.get repeatedly on a nested dictionary:
from functools import reduce
def ng(dataDict, mapList):
"""Nested Getter: Iterate nested dictionary"""
return reduce(dict.get, mapList, dataDict)
X = ng(ng(flight, ['result', 'response', 'data'])[0], ['status'[, 'text']])

Getting Deeper Level JSON Values in Python

I have a Python script that make an API call to retrieve data from Zendesk. (Using Python 3.x) The JSON object has a structure like this:
{
"id": 35436,
"url": "https://company.zendesk.com/api/v2/tickets/35436.json",
"external_id": "ahg35h3jh",
"created_at": "2009-07-20T22:55:29Z",
"updated_at": "2011-05-05T10:38:52Z",
"type": "incident",
"subject": "Help, my printer is on fire!",
"raw_subject": "{{dc.printer_on_fire}}",
"description": "The fire is very colorful.",
"priority": "high",
"status": "open",
"recipient": "support#company.com",
"requester_id": 20978392,
"submitter_id": 76872,
"assignee_id": 235323,
"organization_id": 509974,
"group_id": 98738,
"collaborator_ids": [35334, 234],
"forum_topic_id": 72648221,
"problem_id": 9873764,
"has_incidents": false,
"due_at": null,
"tags": ["enterprise", "other_tag"],
"via": {
"channel": "web"
},
"custom_fields": [
{
"id": 27642,
"value": "745"
},
{
"id": 27648,
"value": "yes"
}
],
"satisfaction_rating": {
"id": 1234,
"score": "good",
"comment": "Great support!"
},
"sharing_agreement_ids": [84432]
}
Where I am running into issues is in the "custom_fields" section specifically. I have a particular custom field inside of each ticket I need the value for, and I only want that particular value.
To spare you too many specifics of the Python code, I am reading through each value below for each ticket and adding it to an output variable before writing that output variable to a .csv. Here is the particular place the breakage is occuring:
output += str(ticket['custom_fields'][id:23825198]).replace(',', '')+','
All the replace nonsense is to make sure that since it is going into a comma delimited file, any commas inside of the values are removed. Anyway, here is the error I am getting:
output += str(ticket['custom_fields'][id:int(23825198)]).replace(',', '')+','
TypeError: slice indices must be integers or None or have an __index__ method
As you can see I have tried a couple different variations of this to try and resolve the issue, and have yet to find a fix. I could use some help!
Thanks...

Are you using json.loads()? If so you can then get the keys, and do an if statement against the keys. An example on how to get the keys and their respective values is shown below.
import json
some_json = """{
"id": 35436,
"url": "https://company.zendesk.com/api/v2/tickets/35436.json",
"external_id": "ahg35h3jh",
"created_at": "2009-07-20T22:55:29Z",
"updated_at": "2011-05-05T10:38:52Z",
"type": "incident",
"subject": "Help, my printer is on fire!",
"raw_subject": "{{dc.printer_on_fire}}",
"description": "The fire is very colorful.",
"priority": "high",
"status": "open",
"recipient": "support#company.com",
"requester_id": 20978392,
"submitter_id": 76872,
"assignee_id": 235323,
"organization_id": 509974,
"group_id": 98738,
"collaborator_ids": [35334, 234],
"forum_topic_id": 72648221,
"problem_id": 9873764,
"has_incidents": false,
"due_at": null,
"tags": ["enterprise", "other_tag"],
"via": {
"channel": "web"
},
"custom_fields": [
{
"sid": 27642,
"value": "745"
},
{
"id": 27648,
"value": "yes"
}
],
"satisfaction_rating": {
"id": 1234,
"score": "good",
"comment": "Great support!"
},
"sharing_agreement_ids": [84432]
}"""
# load the json object
zenJSONObj = json.loads(some_json)
# Shows a list of all custom fields
print("All the custom field data")
print(zenJSONObj['custom_fields'])
print("----")
# Tells you all the keys in the custom_fields
print("How keys and the values")
for custom_field in zenJSONObj['custom_fields']:
print("----")
for key in custom_field.keys():
print("key:",key," value: ",custom_field[key])
You can then modify the JSON object by doing something like
print(zenJSONObj['custom_fields'][0])
zenJSONObj['custom_fields'][0]['value'] = 'something new'
print(zenJSONObj['custom_fields'][0])
Then re-encode it using the following:
newJSONObject = json.dumps(zenJSONObj, sort_keys=True, indent=4)
I hope this is of some help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse JSON from URL and skip first line with Python - python

Related

extract dict from string

Issue building a REST API query using $filter for Power BI Admin API

Parsing list of dictionaries in a dictionary to retrieve a specific key's value from each dictionary

JSON or Python dict / list decoding problem

Getting Deeper Level JSON Values in Python

Categories

Resources