Want to remove square brackets from json file with script python - python

I have a JSON file, and I want to remove all fields after the square bracket with scripts in Python.
My JSON file is this:
{
"Employees": [
{
"userId": "krish",
"jobTitle": "Developer",
"firstName": "Krish",
"lastName": "Lee",
"employeeCode": "E1",
"region": "CA",
"phoneNumber": "123456",
"emailAddress": "krish.lee#learningcontainer.com"
},
{
"userId": "devid",
"jobTitle": "Developer",
"firstName": "Devid",
"lastName": "Rome",
"employeeCode": "E2",
"region": "CA",
"phoneNumber": "1111111",
"emailAddress": "devid.rome#learningcontainer.com"
},
{
"userId": "tin",
"jobTitle": "Program Directory",
"firstName": "tin",
"lastName": "jonson",
"employeeCode": "E3",
"region": "CA",
"phoneNumber": "2222222",
"emailAddress": "tin.jonson#learningcontainer.com"
}
]
}
My script is this:
import json
import re
with open('data.json')as f:
data = json.load(f)
for item in data:
re.sub(" *\[.*\] *"," ",item)
with open('new_data.json','w') as f:
json.dump(item, f)
I expect this:
{
"Employees":
}
but I receive this:
"Employees"
Tell me why it takes off the braces, and come and solve this problem.

JSON is a serialization of a data structure and has no canonical format. This means that doing any kind of text matching or regular expressions on it is a very bad idea and just asking for trouble.
The proper way to do is to use a JSON parser to convert it into objects, then synthesize whatever output you want based on object data.
In your case, the parsed object will be a dictionary with a single key which you can obtain for example like this:
print("{{\n \"{0}\":\n}}".format(list(json.load(f).keys())[0]))
Result:
{
"Employees":
}
For whatever is worth, this is not a valid JSON so I'm not sure why you need it.

Related

How do I output specific data from a json response?

I am fairly new to using APIs in python and I am trying to create a system that outputs data from previous motorsport races. I have sent requests to an API, but I am struggling to get it to just output one specific piece of data (eg. time, location). I get this when I just print the raw JSON data sent.
{
"MRData": {
"RaceTable": {
"Races": [
{
"Circuit": {
"Location": {
"country": "Spain",
"lat": "41.57",
"locality": "Montmeló",
"long": "2.26111"
},
"circuitId": "catalunya",
"circuitName": "Circuit de Barcelona-Catalunya",
"url": "http://en.wikipedia.org/wiki/Circuit_de_Barcelona-Catalunya"
},
"date": "2020-08-16",
"raceName": "Spanish Grand Prix",
"round": "6",
"season": "2020",
"time": "13:10:00Z",
"url": "https://en.wikipedia.org/wiki/2020_Spanish_Grand_Prix"
}
],
"round": "6",
"season": "2020"
},
"limit": "30",
"offset": "0",
"series": "f1",
"total": "1",
"url": "http://ergast.com/api/f1/2020/6.json",
"xmlns": "http://ergast.com/mrd/1.4"
}
}
Just to get to grips with APIs I am simply trying to output a simple piece of data of a specific race, and once I can do that, I'll be able to scale it up and output all sorts of data. I'd assumed it would just be as simple as typing print(data['time']) (as seen below) but I get an error message saying this:
KeyError: 'time'
My source code:
import requests
response = requests.get("http://ergast.com/api/f1/2020/6.json")
data = response.json()
print (data["time"])
Any help is appreciated!
Like this...
import json
data = """{
"MRData":{
"xmlns":"http://ergast.com/mrd/1.4",
"series":"f1",
"url":"http://ergast.com/api/f1/2020/6.json",
"limit":"30",
"offset":"0",
"total":"1",
"RaceTable":{
"season":"2020",
"round":"6",
"Races":[
{
"season":"2020",
"round":"6",
"url":"https://en.wikipedia.org/wiki/2020_Spanish_Grand_Prix",
"raceName":"Spanish Grand Prix",
"Circuit":{
"circuitId":"catalunya",
"url":"http://en.wikipedia.org/wiki/Circuit_de_Barcelona-Catalunya",
"circuitName":"Circuit de Barcelona-Catalunya",
"Location":{
"lat":"41.57",
"long":"2.26111",
"locality":"Montmeló",
"country":"Spain"
}
},
"date":"2020-08-16",
"time":"13:10:00Z"
}
]
}
}
}"""
jsonData = json.loads(data)
Races is an array, in this case there is only one race so you would desigate it as ["Races"][0]
print(jsonData["MRData"]["RaceTable"]["Races"][0]["time"])
data['time'] would work if you had a flat dictionary, but you have a nested dicts/list structure, so:
data["MRData"]["RaceTable"]["Races"][0]["time"]
data["MRData"] returns another dict, which has a key "RaceTable". The value of this key is again a dictionary which has a key "Races". The value of this is a list of races, of which you only have one. The races are again dicts which have the key time.

How to convert a complicated (nested) json to a pandas dataframe?

I have a very weird json file with a lot of nesting in it. I need to convert it into a Pandas dataframe.
The Json looks something like this:
{
"data": {
"page1": {
"last_name": "suraj",
"first_name": "singh",
"dob": "2020-06-02",
"gender": "Male",
"address1": "asdf",
"city": "asdf",
"state": "ID",
"Zip": "34324",
"phone": "2343243242",
"emailaddress": "suraj.singh#fugetroncorp.com",
"ethnicity": "adsf",
"url": " iVBORw0KGgoAAAANSUhEUgAAAVIAAABkCAYAAADUgbjrAAANS0lEQVR4Xu2dXeh1RRXGH++EICMwyIy3bt4LjSwkUAnMCwP7oA9SKqKSQkMxk64Ky4Lopu9AhQgqIqKEPsiMSLAgLEFCQcGoyEIJEso+yLoqfq8zMez3/I/7/Pee2fPxDBzOef9n75am1njXvc9aaNbP2GXIzAkbACBiBRQicsehu32wEjIARMAIykXoSGAEjYAQWImAiXQigbzcCRsAImEg9B4yAETACCxEwkS4E0LcbASNgBEykngNGwAgYgYUImEgXAujbjYARMAImUs+BkRB4jiReJ/Yo/TdJT4bvHx0JHOt6fARMpMfHznfWg8CrJJ0VSPJF4f1lQTzeIc+lDXJNX7G/+Lf4799J+qWkV0r6mCTk4QUpm5iXWqHS+02klRrGYp0iHzxHSDCSY/Qo4/tJSec0hNVjkn4bCPVrkn7akOwWdQ8CJlJPjy0RgCAvkITXGD1HvMt9LYbeeHd/l/TscPHUM0z7SD3SqXcaiRqPtmS7W9LlJQf0WPkQMJHmw9Y9n44AZHmpJMiSzxBp2iDJB4LHBjHymfcYEvPvEi31gON4UdboDce/T0kaj/PXks4Pof3Z4fN5kl4r6TXhxq9KurqEMh4jPwIm0vwYjzwCJIPH+cbwSonzDyG0hRzjKyZ5esEMj/MSSa8O7+j1K0k/kPQZSf/oRdHR9TCRjj4D1tcfTzMlzzhCJE7WBXn1mni5UNKng9JxmeIXku6VdKckPv9nfdjd45YImEi3RL+PsfEyY8iO50ko+6yg2s8CaRLG9kqcqHqlpOvCUsVfgv683yoJDEgyuXWMgIm0Y+NmUg3iZJ0zJojS5BBe55clPRQItLdQfQrpNZLekKx7/lDSLZJ+ExJhmUzgbmtDwERam0Xqkoc1zjQ5NN2TmYbrcZ2zLg3ySPNWSdeGpNm3JD3lxFEeoFvp1UTaiqXKyQlZvivJrKcjPxgSQ6xxjkScEQOWLm4M2IDBxyU9LOmJcubxSDUiYCKt0SrbyPTuQBLxRBBSsL4HYcYEUe+h+lHIs3zxubAGyhatz4fXNpbyqNUhYCKtziTFBYIkvpLs6YQ8SQ59LzlzXlyoSgbkR4U1TzxRljHABRId9QelErPUJ4aJtD6blJQILxQSpRG2f8DHFk9hQUINAgUfGiF87zsPSs677sYykXZn0tkK4Vmx3kfjhA1EMXojuQYm/KDwGe+cz6VOVI2Of7P6m0ibNd0iwalKhMfFeh9eF2H86I3wHe8cAiWMBxcXFRl9VszU30Q6E6iOLmNN9J6gz2Umi1NhPAQa98PeFLxzr4N2NOlzq2IizY1wXf1zfPH+INL1km6rS7zi0kTPnIG/H4qMOIwvbob2BzSRtm/DQzTgGCPHFglZ8UZHbelOBcJ4CNVrxKPOhhX0NpGuAGJDXRDSQyJkoSGP0Rrrn6wNk0CiUVyZzw7jR5sJK+trIl0Z0Mq7G5lISSbFTfVOJlU+UVsTz0TamsWWyRvXBEfySNM9oexSIIQHB3uhy+aS704QMJGONR3eE6ozkVBhjbR3MiFsJ5SPe0LZ0tRzOb+xZnNF2ppIKzJGAVEgFMJ7jj5SvehLBcbcYgjWgSFQ3r1XdgsLDDamiXQwg0v6YFLB/SpJd3QGAaE7NUKZ2w7jOzNureqYSGu1TF65YtKJyu03dHKyiRqhtydhPOugPpmUdx6594CAiXTcqUAVe550SWs5+ZTWCH1c0s3eEzrupN5KcxPpVshvP+4Vku5KxHhE0tsaKtDBOi8FRiBS1n7ZExqrNW2PriUYCgET6VDmPk1Zwl9OO/HAOhpZfP72hYphiZvqIU1XaKrYUCOJZiIdydq7dT0ZapGemXzN9qg3VbZVCNLkESgQPZ/JxrO9yUc7PYc3R8BEurkJqhCAMBlC4nn0aYOoavBOkYMwns31NGTypvoqpo6FAAETqedBikBaDSn+nQ3sFH7eIgOeHutEHhda9nytEgETaZVm2VQoNrFTPX/qnUKk1OosUWYu3VAPGDwGBZJ3AepNp4YHPwoBE6nnxi4EWIMknOZ11uQCCBWipX7n2o1xKbKMJ0rzOujaCLu/LAiYSLPA2k2nrEniCZLkmTZCftZV2Xa0xvl1SJvqTLF5HbSbadS/IibS/m28hob7CJX+CfchVbzUQ0mVRBdeKO80V6pfw2LuoygCJtKicDc/GITK/k1eJ47Q5hBShXyjt+saoc1Pj3EVMJGOa/ulmuNBQqisZ+4j1biempbs497vejvTUhP4/loQMJHWYom25cBThVDJtlN5aVeL66nxbDzX2Att2+6WPiBgIvVUyIEAhBqJdbqNivH+JelTgz43Kgfe7nNjBEykGxtggOFfGoosv3miK0kpwv4aTk4NYAarmBMBE2lOdN339GTShyU9FdZWWQ5gj+ofJX00nFo6NONvhI1AFQiYSKswQ3dCQJIkk+KWJtZCIdX0VBTfvU7Se0OyimQUm/2pjVri9FR3oFuh7RAwkW6HfY8jT58bj46cj4dE9z1oj8347ACI66kQKQcBuLf3B/T1OA+G08lEOpzJsykMEXIyCTKNjfVPSHJuw0uFQGPmn1A/eqkO++ei6OuKI2AiLQ55dwOSoYdAYxiPgkvPyO8665/zjH93RrFCZREwkZbFu6fRIDsIdPp4Dyo1Ecqv5UHSP15tDPtjtp8z/g77e5pRDetiIm3YeBuKviuMR5ychUbwfBk3HimFRCmrR3JqLdLeEFIP3TICJtKWrVde9mmBkVQCHk1Sol4oOwLwUNOjqYT98eRUeVQ84vAImEiHnwKzACCM51EfJIKmbe1QfpZA4aJdYT+EimfssP8QJH3tIgRMpIvgG+JmvFDWQgmtp62WRyAjI16qw/4hpmR9SppI67NJTRLh8VErdNrIyvNdiVD+EDximb+0sj8y4qFu8cypQ2T3tQ0jYCJt2HiZRYeAdlVyIpSHRGs/fTQN+719KvOEGbl7E+nI1t+tO+uh90z2hcYrc2blc1mCJQk81PijwA8Amf7avOlc+rvfAgiYSAuA3NAQ04LLUXRCeSo17Uo2taJezPaTNKPFSv6uPtWKBSuW00RasXEKi8Z2ItZD0yOeiEDBETy6Xjy4eGoKfdnkzx7UeK7f+1ELT7pehjOR9mLJZXpAlLfsINEttzYt0+iZ74ZQ3yLp5lB9Kp7rZyeCE1PPjJ+vSBAwkXo64I1BotM2p2pTL+jFB/pdGhSKp6bwwl2BqhcrZ9TDRJoR3Aa6Tp/imYp7aNWmBlSdJeJRT0kFJ0iVR0W7GYHTEDCRjjspjtredHU4bjkuMk9rTuKNddT00dOE/Dc1sPVrdNsV199EWhzyzQdkbZDq9btOKplEd5sHrFgCiaE/a8rO9m8+lesRwERajy1KSHLUHlG2N+F9Ocmy3woQKITKs6bYi9rydrAS822YMUykw5hazw9rfK+YqLzreUrjoHK4pninHFigmUwPx6/LO0ykXZp1p1K3Srpu8k3P25tyWjYl07dL+mbOwdx3/QiYSOu30RoSXiHprklHI21vWgPDaR9XSvqIpH9K4jHTXhbJgXIjfZpIGzHUAjFfLuk7ktjaE9s7JX19QZ++9WkESDi9X9J9ki4yKOMiYCLt3/Zstk+TIldJ+okLH69i+JOSfhx+pF4v6c5VenUnzSFgIm3OZAcJ/AJJjyV3ODlyEHyzLv6ipBvC3lu2j7kNiICJtG+jp8c/75Z0ed/qbqJduv7s/0+bmGD7QW347W2QS4LzJD2cdH6ZEyJZoE4z+MY4C8T1d2oird9Gx5WQo4yfDTffJun643bk+/YiQBLv9+GKF/vR0GPOFhNpn3Y/V9K3JV0c1LOnlNfO/zWR5gW49t5NpLVb6HjyfUjSJ8Otd0giU++WB4ELJd1vIs0Dbiu9mkhbsdR8OfFC700uP0fSn+bf7isPRCAuoVAYmtDebUAETKT9GZ2QnlM3tHdI+kZ/KlalEevQkOlfJT23KsksTDEETKTFoC420EOSzg8Z+5cUG3XcgShgEksS+v/ToPPAhu/T8OwfdYm3Mrb9s6SzJT0h6XllhvQotSFgIq3NIpanNQRINJFwYs+uI4DWrLeSvCbSlYB0N8Mi8LgkEnom0mGngGQiHdj4Vn0xAumpJsrosV/XbUAETKQDGt0qr4bA+yTdHnozka4Ga3sdmUjbs5kl3hYBjoTyEDye38STRmPjCC5Hcd0GRMBEOqDRrfJsBCBNXhcE0iSUTwtkx44oVfjC2b36wu4QMJF2Z9JmFeIJp0+uIH0kuhOS6JMXLf28a5h4H+9cm3qb+8T6eTiO+6MVZHcXjSJgIm3UcJ2JDXFxMmhfS0mWz5Eg4z3Tf+eEiCevPhDO2H8i50Duuw0ETKRt2Kl3KecQ6VwM/i3pzLkXz7yOBwVCnLxIKnGu3s0I/B8BE6knQy0IxNA+9SxjmL1PxkhqeKmp15qG9NPQnu+O6jv2Q78Q5xrLDbVgbDkyIWAizQSsuzUCRmAcBEyk49jamhoBI5AJARNpJmDdrREwAuMgYCIdx9bW1AgYgUwImEgzAetujYARGAeB/wEMT+10S9jf7wAAAABJRU5ErkJggg==",
"meds": [
[
"asdf"
]
],
"guardian": false,
"guardianName": "N/A",
"optout": false,
"currentDate": "06-30-2020",
"values": [
{
"value": "asdf"
}
]
}
How can I create a proper structured dataFrame using this so that I can export it into a CSV for a better understanding.

Retrieve data from json file using python

I'm new to python. I'm running python on Azure data bricks. I have a .json file. I'm putting the important fields of the json file here
{
"school": [
{
"schoolid": "mr1",
"board": "cbse",
"principal": "akseal",
"schoolName": "dps",
"schoolCategory": "UNKNOWN",
"schoolType": "UNKNOWN",
"city": "mumbai",
"sixhour": true,
"weighting": 3,
"paymentMethods": [
"cash",
"cheque"
],
"contactDetails": [
{
"name": "picsa",
"type": "studentactivities",
"information": [
{
"type": "PHONE",
"detail": "+917597980"
}
]
}
],
"addressLocations": [
{
"locationType": "School",
"address": {
"countryCode": "IN",
"city": "Mumbai",
"zipCode": "400061",
"street": "Madh",
"buildingNumber": "80"
},
"Location": {
"latitude": 49.313885,
"longitude": 72.877426
},
I need to create a data frame with schoolName as one column & latitude & longitude are others two columns. Can you please suggest me how to do that?
you can use the method json.load(), here's an example:
import json
with open('path_to_file/file.json') as f:
data = json.load(f)
print(data)
use this
import json # built-in
with open("filename.json", 'r') as jsonFile:
Data = jsonFile.load()
Data is now a dictionary of the contents exp.
for i in Data:
# loops through keys
print(Data[i]) # prints the value
For more on JSON:
https://docs.python.org/3/library/json.html
and python dictionaries:
https://www.programiz.com/python-programming/dictionary#:~:text=Python%20dictionary%20is%20an%20unordered,when%20the%20key%20is%20known.

Passing a String Variable into Pymongo Query

I'm trying to scrape a database of information, but was having trouble querying. Here's the basic database setup in MongoDB:
{
"ID": 346,
"data": [
{
"number": "23",
"name": "Winnie"
},
{
"number": "12",
"name": "Finn"
},
{
"number": "99",
"name": "Todd"
}
]
}
{
"ID": 346,
"data": [
{
"number": "12",
"name": "Ram"
},
{
"number": "34",
"name": "Greg"
},
{
"number": "155",
"name": "Arnie"
}
]
}
relevant Python code is below:
import pymongo
import json
import io
import sys
from bson.json_util import dumps
from pymongo import MongoClient
stringArr = ['"23"', '"12"', '"155"']
for x in range(0, len(stringArr))
print(collection.find({"data.number" : stringArr[x]}).count())
When I enter collection.find({"data.number" : "23"}).count() I return the correct number of entries that have "23" as the number in data, so I presume my syntax for find in Python to be messed up, likely having to do with the variable being a string, but I'm fairly inexperienced with MongoDB, let alone PyMongo. Any suggestion would be greatly appreciated!
$elemMatch operator is used to match values contained within an array field belonging to BSON document.
According to description as mentioned in above question please try executing following raw query in MongoDB shell.
db.collection.find({
data: {
$elemMatch: {
number: {
$in: ["23", "12", "155"]
}
}
}
})

Getting Deeper Level JSON Values in Python

I have a Python script that make an API call to retrieve data from Zendesk. (Using Python 3.x) The JSON object has a structure like this:
{
"id": 35436,
"url": "https://company.zendesk.com/api/v2/tickets/35436.json",
"external_id": "ahg35h3jh",
"created_at": "2009-07-20T22:55:29Z",
"updated_at": "2011-05-05T10:38:52Z",
"type": "incident",
"subject": "Help, my printer is on fire!",
"raw_subject": "{{dc.printer_on_fire}}",
"description": "The fire is very colorful.",
"priority": "high",
"status": "open",
"recipient": "support#company.com",
"requester_id": 20978392,
"submitter_id": 76872,
"assignee_id": 235323,
"organization_id": 509974,
"group_id": 98738,
"collaborator_ids": [35334, 234],
"forum_topic_id": 72648221,
"problem_id": 9873764,
"has_incidents": false,
"due_at": null,
"tags": ["enterprise", "other_tag"],
"via": {
"channel": "web"
},
"custom_fields": [
{
"id": 27642,
"value": "745"
},
{
"id": 27648,
"value": "yes"
}
],
"satisfaction_rating": {
"id": 1234,
"score": "good",
"comment": "Great support!"
},
"sharing_agreement_ids": [84432]
}
Where I am running into issues is in the "custom_fields" section specifically. I have a particular custom field inside of each ticket I need the value for, and I only want that particular value.
To spare you too many specifics of the Python code, I am reading through each value below for each ticket and adding it to an output variable before writing that output variable to a .csv. Here is the particular place the breakage is occuring:
output += str(ticket['custom_fields'][id:23825198]).replace(',', '')+','
All the replace nonsense is to make sure that since it is going into a comma delimited file, any commas inside of the values are removed. Anyway, here is the error I am getting:
output += str(ticket['custom_fields'][id:int(23825198)]).replace(',', '')+','
TypeError: slice indices must be integers or None or have an __index__ method
As you can see I have tried a couple different variations of this to try and resolve the issue, and have yet to find a fix. I could use some help!
Thanks...
Are you using json.loads()? If so you can then get the keys, and do an if statement against the keys. An example on how to get the keys and their respective values is shown below.
import json
some_json = """{
"id": 35436,
"url": "https://company.zendesk.com/api/v2/tickets/35436.json",
"external_id": "ahg35h3jh",
"created_at": "2009-07-20T22:55:29Z",
"updated_at": "2011-05-05T10:38:52Z",
"type": "incident",
"subject": "Help, my printer is on fire!",
"raw_subject": "{{dc.printer_on_fire}}",
"description": "The fire is very colorful.",
"priority": "high",
"status": "open",
"recipient": "support#company.com",
"requester_id": 20978392,
"submitter_id": 76872,
"assignee_id": 235323,
"organization_id": 509974,
"group_id": 98738,
"collaborator_ids": [35334, 234],
"forum_topic_id": 72648221,
"problem_id": 9873764,
"has_incidents": false,
"due_at": null,
"tags": ["enterprise", "other_tag"],
"via": {
"channel": "web"
},
"custom_fields": [
{
"sid": 27642,
"value": "745"
},
{
"id": 27648,
"value": "yes"
}
],
"satisfaction_rating": {
"id": 1234,
"score": "good",
"comment": "Great support!"
},
"sharing_agreement_ids": [84432]
}"""
# load the json object
zenJSONObj = json.loads(some_json)
# Shows a list of all custom fields
print("All the custom field data")
print(zenJSONObj['custom_fields'])
print("----")
# Tells you all the keys in the custom_fields
print("How keys and the values")
for custom_field in zenJSONObj['custom_fields']:
print("----")
for key in custom_field.keys():
print("key:",key," value: ",custom_field[key])
You can then modify the JSON object by doing something like
print(zenJSONObj['custom_fields'][0])
zenJSONObj['custom_fields'][0]['value'] = 'something new'
print(zenJSONObj['custom_fields'][0])
Then re-encode it using the following:
newJSONObject = json.dumps(zenJSONObj, sort_keys=True, indent=4)
I hope this is of some help.

Categories

Resources