I have a txt file which have the format shown below and the Key strings are not in quotes. How can I convert into a JSON using python?
name {
first_name: "random"
}
addresses {
location {
locality: "India"
street_address: "xyz"
postal_code: "300092"
full_address: "street 1 , abc,India"
}
}
projects {
url: "www.githib.com"
}
There's no simple way in the standard library to convert that data format to JSON, so we need to write a parser. However, since the data format is fairly simple that's not hard to do. We can use the standard csv module to read the data. The csv.reader will handle the details of parsing spaces and quoted strings correctly. A quoted string will be treated as a single token, tokens consisting of a single word may be quoted but they don't need to be.
The csv.reader normally gets its data from an open file, but it's quite versatile, and will also read its data from a list of strings. This is convenient while testing since we can embed our input data into the script.
We parse the data into a nested dictionary. A simple way to keep track of the nesting is to use a stack, and we can use a plain list as our stack.
The code below assumes that input lines can be one of three forms:
Plain data. The line consists of a key - value pair, separated by at least one space.
A new subobject. The line starts with a key and ends in an open brace {.
The end of the current subobject. The line contains a single close brace }
import csv
import json
raw = '''\
name {
first_name: "random"
}
addresses {
location {
locality: "India"
street_address: "xyz"
postal_code: "300092"
full_address: "street 1 , abc,India"
}
}
projects {
url: "www.githib.com"
}
'''.splitlines()
# A stack to hold the parsed objects
stack = [{}]
reader = csv.reader(raw, delimiter=' ', skipinitialspace=True)
for row in reader:
#print(row)
key = row[0]
if key == '}':
# The end of the current object
stack.pop()
continue
val = row[-1]
if val == '{':
# A new subobject
stack[-1][key] = d = {}
stack.append(d)
else:
# A line of plain data
stack[-1][key] = val
# Convert to JSON
out = json.dumps(stack[0], indent=4)
print(out)
output
{
"name": {
"first_name:": "random"
},
"addresses": {
"location": {
"locality:": "India",
"street_address:": "xyz",
"postal_code:": "300092",
"full_address:": "street 1 , abc,India"
}
},
"projects": {
"url:": "www.githib.com"
}
}
Assuming your data as,
{
'addresses': {
'location': {
'full_address': 'street 1 , abc,India',
'locality': 'India',
'postal_code': '300092',
'street_address': 'xyz'
}
},
'name': {
'first_name': 'random'
},
'projects': {
'url': 'www.githib.com'
}
}
Use json.dumps to convert dict to json
In [16]: import json
In [17]: data
Out[17]:
{'addresses': {'location': {'full_address': 'street 1 , abc,India',
'locality': 'India',
'postal_code': '300092',
'street_address': 'xyz'}},
'name': {'first_name': 'random'},
'projects': {'url': 'www.githib.com'}}
In [18]: json.dumps(data)
Out[18]: '{"name": {"first_name": "random"}, "projects": {"url": "www.githib.com"}, "addresses": {"location": {"postal_code": "300092", "full_address": "street 1 , abc,India", "street_address": "xyz", "locality": "India"}}}'
In [19]:
Related
I got this chunk from a response text after calling an API. How do I remove only the set with '"id": 23732 along with its other key:values ,' from the string?
{
"jobs": [
{
"id": 23732,
"status": "done",
"name": "TESTRBZ7664"
},
{
"id": 23730,
"status": "done",
"name": "RBY5434"
}
]
}
TQ
Convert the string to a json using json.loads() or response.json()
See the following code
In [4]: d
Out[4]:
{'jobs': [{'id': 23732, 'status': 'done', 'name': 'TESTRBZ7664'},
{'id': 23730, 'status': 'done', 'name': 'RBY5434'}]}
In [5]: [i for i in d["jobs"] if i["id"] != 23732]
Out[5]: [{'id': 23730, 'status': 'done', 'name': 'RBY5434'}]
Assuming the dict you posted is called original_dict, you could build a new dict using a list comprehension:
new_data = {
"jobs": [x for x in original_dict if x["id"] != 23732]
}
This doesn't strictly remove the entry from your original dict, it rather creates a new dict that doesn't contain the unwanted entry.
Read more about list comprehensions here: https://www.w3schools.com/python/python_lists_comprehension.asp
So, I need some help returning an ID having found a certain string. My JSON looks something like this:
{
"id": "id1"
"field1": {
"subfield1": {
"subrield2": {
"subfield3": {
"subfield4": [
"string1",
"string2",
"string3"
]
}
}
}
}
"id": "id2"
"field1": {
"subfield1": {
"subrield2": {
"subfield3": {
"subfield4": [
"string4",
"string5",
"string6"
]
}
}
}
}
}
Now, I need to get the ID from a certain string, for example:
For "string5" I need to return "id2"
For "string2" I need to return "id1"
In order to find these strings I have used objectpath python module like this: json_Tree.execute('$..subfield4'))
After doing an analysis on a huge amount of strings, I need to return the ones that are meeting my criterias. I have the strings that I need (for example "string3"), but now I have to return the IDs.
Thank you!!
Note: I don't have a lot of experience with coding, I just started a few months ago to work on a project in Python and I have been stuck on this for a while
Making some assumptions about the actual structure of the data as being:
[
{
"id": "id1",
"subfield1": {
"subfield2": {
"subfield3": {
"subfield4": [
"string1",
"string2",
"string3"
]
}
}
}
}
// And so on
]
And assuming that each string1, string2 etc. is in only one id, then you can construct this mapping like so:
data: List[dict] # The json parsed as a list of dicts
string_to_id_mapping = {}
for record in data:
for string in record["subfield1"]["subfield2"]["subfield3"]["subfield4"]:
string_to_id_mapping[string] = record["id"]
assert string_to_id_mapping["string3"] == "id1"
If each string can appear in multiple ids then the following will catch all of them:
from collections import defaultdict
data: List[dict] # The json parsed as a list of dicts
string_to_id_mapping = defaultdict(set)
for record in data:
for string in record["subfield1"]["subfield2"]["subfield3"]["subfield4"]:
string_to_id_mapping[string].add(record["id"])
assert string_to_id_mapping["string3"] == {"id1"}
I am trying to get my list of contacts from my WIX website using their API endpoint url and the requests module in python. I am totally stuck.
Here's my code so far:
import requests
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
for i in contacts_list:
for key in i:
print(key, ':', i[key])
Here is what I get:
200
id : long id string 1
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail1#yahoo.com'}]
phones : []
addresses : [{'tag': 'UNTAGGED', 'countryCode': 'US'}]
metadata : {'createdAt': '2020-07-08T22:41:07.135Z', 'updatedAt': '2020-07-08T22:42:19.327Z'}
source : {'sourceType': 'SITE_MEMBERS'}
id : long id string 2
emails : [{'tag': 'UNTAGGED', 'email': 'sampleemail2#yahoo.com'}]
phones : []
addresses : []
metadata : {'createdAt': '2020-07-03T00:51:21.127Z', 'updatedAt': '2020-07-04T03:26:16.370Z'}
source : {'sourceType': 'SITE_MEMBERS'}
Process finished with exit code 0
Each line is a string. I need each row of the csv to be a new contact (There are two sample contacts). The columns should be the keys. I plan to use the csv module to writerow(Fields), where fields is a list of string (keys) such as Fields = [id, emails, phones, addresses, metadata, source]
All I really need is the emails in a single column of a csv though. Is there a way to maybe just get the email for each contact?
A CSV file with one column is basically just a text file with one item per line, but you can use the csv module to do it if you really want, as shown below.
I commented-out the 'python-requests' stuff and used some sample input for testing.
test_data = {
"contacts": [
{
"id": "long id string 1",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail1#yahoo.com"
}
],
"phones": [],
"addresses": [
{
"tag": "UNTAGGED",
"countryCode": "US"
}
],
"metadata": {
"createdAt": "2020-07-08T22:41:07.135Z",
"updatedAt": "2020-07-08T22:42:19.327Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
},
{
"id": "long id string 2",
"emails": [
{
"tag": "UNTAGGED",
"email": "sampleemail2#yahoo.com"
}
],
"phones": [],
"addresses": [],
"metadata": {
"createdAt": "2020-07-03T00:51:21.127Z",
"updatedAt": "2020-07-04T03:26:16.370Z"
},
"source": {
"sourceType": "SITE_MEMBERS"
}
}
]
}
import csv
import json
import requests
auth_key = "my auth key"
output_filename = 'whatever.csv'
#r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
#print(r.status_code)
#json_obj = r.json()
json_obj = test_data # FOR TESTING PURPOSES
contacts_list = json_obj["contacts"]
with open(output_filename, 'w', newline='') as outp:
writer = csv.writer(outp)
writer.writerow(['email']) # Write csv header.
for contact in contacts_list:
email = contact['emails'][0]['email'] # Get the first one.
writer.writerow([email])
print('email csv file written')
Contents of whatever.csv file afterwards:
email
sampleemail1#yahoo.com
sampleemail2#yahoo.com
Update:
As pointed by #martineau, I just saw you can array in few values, you need to cater it. You may make them string with [].join() in the for loop
you can write it to csv like this using csv package.
import csv, json, sys
auth_key = "my auth key"
r = requests.get("https://www.wixapis.com/crm/v1/contacts", headers={"Authorization": auth_key})
print(r.status_code)
dict = r.json()
contacts_list = dict["contacts"]
output = csv.writer(sys.stdout)
#insert header(keys)
output.writerow(data[0].keys())
for i in contacts_list:
output.writerow(i.values())
At the end you can print and verify output
I am trying to read some json with the following format. A simple pd.read_json() returns ValueError: Trailing data. Adding lines=True returns ValueError: Expected object or value. I've tried various combinations of readlines() and load()/loads() so far without success.
Any ideas how I could get this into a dataframe?
{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
}
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}
The sample you have above isn't valid JSON. To be valid JSON these objects need to be within a JS array ([]) and be comma separated, as follows:
[{
"content": "kdjfsfkjlffsdkj",
"source": {
"name": "jfkldsjf"
},
"title": "dsldkjfslj",
"url": "vkljfklgjkdlgj"
},
{
"content": "djlskgfdklgjkfgj",
"source": {
"name": "ldfjkdfjs"
},
"title": "lfsjdfklfldsjf",
"url": "lkjlfggdflkjgdlf"
}]
I just tried on my machine. When formatted correctly, it works
>>> pd.read_json('data.json')
content source title url
0 kdjfsfkjlffsdkj {'name': 'jfkldsjf'} dsldkjfslj vkljfklgjkdlgj
1 djlskgfdklgjkfgj {'name': 'ldfjkdfjs'} lfsjdfklfldsjf lkjlfggdflkjgdlf
Another solution if you do not want to reformat your files.
Assuming your JSON is in a string called my_json you could do:
import json
import pandas as pd
splitted = my_json.split('\n\n')
my_list = [json.loads(e) for e in splitted]
df = pd.DataFrame(my_list)
Thanks for the ideas internet. None quite solved the problem in the way I needed (I had lots of newline characters in the strings themselves which meant I couldn't split on them) but they helped point the way. In case anyone has a similar problem, this is what worked for me:
with open('path/to/original.json', 'r') as f:
data = f.read()
data = data.split("}\n")
data = [d.strip() + "}" for d in data]
data = list(filter(("}").__ne__, data))
data = [json.loads(d) for d in data]
with open('path/to/reformatted.json', 'w') as f:
json.dump(data, f)
df = pd.read_json('path/to/reformatted.json')
If you can use jq then solution is simpler:
jq -s '.' path/to/original.json > path/to/reformatted.json
I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.