Parsing json into a list of values - python

I have a program that takes some file and transforms it into a json format.
Im trying to get all the values of certain keys into a list but, because the format of the json file has a bunch of keys that are present multiple times, I cant find a way to do it properly.
My json file looks like this
{
"data": {
"__schema": {
"queryType": {
"fields": [
{
"description": "",
"name": "project"
},
{
"description": "",
"name": "projectEventFeed"
},
{
"description": "",
"name": "projectEventFeedFetchMore"
},
{
"description": "",
"name": "projectRecentEventFeed"
},
{
"description": "",
"name": "unseenProjectActivityCount"
},
{
"description": "",
"name": "projectFiles"
},
{
"description": "",
"name": "projectFilesIdSet"
},
{
"description": "",
"name": "projectFileMessages"
},
{
"description": "",
"name": "projectUserStatus"
},
{
"description": "",
"name": "projectFileScribble"
},
{
"description": "",
"name": "user"
},
{
"description": "",
"name": "viewer"
},
{
"description": "",
"name": "profile"
},
{
"description": "",
"name": "site"
},
{
"description": "",
"name": "designers"
},
{
"description": "",
"name": "predictImageCategory"
},
{
"description": "",
"name": "getPortfolioDesign"
}
]
}
}
}
}
My goal is to get all the name values into a list.
Before turning the file into json, I tried getting that with regex but failed.
With json format I tried the following
map(lambda parsed_json: parsed_json['data']['__schema']['queryType']['fields']['name'], List)
Im getting List from typing
But when i want to turn the map into a list, I get
TypeError: Parameters to generic types must be types. Got 0.
From the conversion.

You could just use list comprehension on the nested 'fields' key in the dict you have converted from your json.
d = {"data": {"__schema": {"queryType": {"fields": [{"description": "", "name": "project"}, {"description": "", "name": "projectEventFeed"}, {"description": "", "name": "projectEventFeedFetchMore"}, {"description": "", "name": "projectRecentEventFeed"}, {"description": "", "name": "unseenProjectActivityCount"}, {"description": "", "name": "projectFiles"}, {"description": "", "name": "projectFilesIdSet"}, {"description": "", "name": "projectFileMessages"}, {"description": "", "name": "projectUserStatus"}, {"description": "", "name": "projectFileScribble"}, {"description": "", "name": "user"}, {"description": "", "name": "viewer"}, {"description": "", "name": "profile"}, {"description": "", "name": "site"}, {"description": "", "name": "designers"}, {"description": "", "name": "predictImageCategory"}, {"description": "", "name": "getPortfolioDesign"}]}}}}
fields = [f['name'] for f in d['data']['__schema']['queryType']['fields']]
print(fields)
# ['project', 'projectEventFeed', 'projectEventFeedFetchMore', 'projectRecentEventFeed', 'unseenProjectActivityCount', 'projectFiles', 'projectFilesIdSet', 'projectFileMessages', 'projectUserStatus', 'projectFileScribble', 'user', 'viewer', 'profile', 'site', 'designers', 'predictImageCategory', 'getPortfolioDesign']

Related

How to compare 2 JSON strings and store what changed in a variable Python

I have a JSON string that is a response to a GET request:
[
{
"symbol": "cosmic2",
"name": "Cosmic Condos",
"description": "Some cosmic condos",
"image": "https://bafybeif3tacmhinsivylzrrxskwshwufysst3s6np3y2ar3qagpmliw374.ipfs.dweb.link/72.png?ext=png",
"twitter": "",
"discord": "",
"website": "",
"categories": []
}
]
And after a while it becomes :
[
{
"symbol": "cosmic2",
"name": "Cosmic Condos",
"description": "Some cosmic condos",
"image": "",
"twitter": "",
"discord": "",
"website": "",
"categories": []
},
{
"symbol": "test_lp_1",
"name": "Test Launchpad 1",
"description": "3 Stages",
"image": "",
"categories": [
"launchpad"
]
}
]
How do I store the second part that was added in a variable and later work with it?

Unable to write avro data to kafka using python

I'm using kafka kafka_2.11-0.11.0.2 and confluent version 3.3.0 for schema registry.
I have defined an avro schema as follows:
{
"namespace": "com.myntra.search",
"type": "record",
"name": "SearchDataIngestionObject",
"fields": [
{"name": "timestamp","type":"long"},
{"name": "brandList", "type":{ "type" : "array", "items" : "string" }},
{"name": "articleTypeList", "type":{ "type" : "array", "items" : "string" }},
{"name": "gender", "type":{ "type" : "array", "items" : "string" }},
{"name": "masterCategoryList", "type":{ "type" : "array", "items" : "string" }},
{"name": "subCategoryList", "type":{ "type" : "array", "items" : "string" }},
{"name": "quAlgo","type":{ "type" : "array", "items" : "string" }},
{"name": "colours", "type":{ "type" : "array", "items" : "string" }},
{"name": "isLandingPage", "type": "boolean"},
{"name": "isUserQuery", "type": "boolean"},
{"name": "isAutoSuggest", "type": "boolean"},
{"name": "userQuery", "type": "string"},
{"name": "correctedQuery", "type": "string"},
{"name": "completeSolrQuery", "type": "string"},
{"name": "atsaList", "type":{"type": "map", "values":{ "type" : "array", "items" : "string" }}},
{"name": "quMeta", "type": {"type": "map", "values": "string"}},
{"name": "requestId", "type": "string"}
]
}
And I'm trying to write some data to kafka as follows:
value = {
"timestamp": 1597399323000,
"brandList": ["brand_value"],
"articleTypeList": ["articleType_value"],
"gender": ["gender_value"],
"masterCategoryList": ["masterCategory_value"],
"subCategoryList": ["subCategory_value"],
"quAlgo": ["quAlgo_value"],
"colours": ["colours_value"],
"isLandingPage": False,
"isUserQuery": False,
"isAutoSuggest": False,
"userQuery": "userQuery_value",
"correctedQuery": "correctedQuery_value",
"completeSolrQuery": "completeSolrQuery_value",
"atsaList": {
"atsa_key1": ["atsa_value1"],
"atsa_key2": ["atsa_value2"],
"atsa_key3": ["atsa_value3"]
},
"quMeta": {
"quMeta_key1": "quMeta_value1",
"quMeta_key2": "quMeta_value2",
"quMeta_key3": "quMeta_value3"
},
"requestId": "requestId_value"
}
topic = "search"
key = str(uuid.uuid4())
producer.produce(topic=topic, key=key, value=value)
producer.flush()
But I'm getting the following error:
Traceback (most recent call last):
File "producer.py", line 61, in <module>
producer.produce(topic=topic, key=key, value=value)
File "/Library/Python/2.7/site-packages/confluent_kafka/avro/__init__.py", line 99, in produce
value = self._serializer.encode_record_with_schema(topic, value_schema, value)
File "/Library/Python/2.7/site-packages/confluent_kafka/avro/serializer/message_serializer.py", line 118, in encode_record_with_schema
return self.encode_record_with_schema_id(schema_id, record, is_key=is_key)
File "/Library/Python/2.7/site-packages/confluent_kafka/avro/serializer/message_serializer.py", line 152, in encode_record_with_schema_id
writer(record, outf)
File "/Library/Python/2.7/site-packages/confluent_kafka/avro/serializer/message_serializer.py", line 86, in <lambda>
return lambda record, fp: writer.write(record, avro.io.BinaryEncoder(fp))
File "/Library/Python/2.7/site-packages/avro/io.py", line 979, in write
raise AvroTypeException(self.writers_schema, datum)
avro.io.AvroTypeException: The datum {'quAlgo': ['quAlgo_value'], 'userQuery': 'userQuery_value', 'isAutoSuggest': False, 'isLandingPage': False, 'timestamp': 1597399323000, 'articleTypeList': ['articleType_value'], 'colours': ['colours_value'], 'correctedQuery': 'correctedQuery_value', 'quMeta': {'quMeta_key1': 'quMeta_value1', 'quMeta_key2': 'quMeta_value2', 'quMeta_key3': 'quMeta_value3'}, 'requestId': 'requestId_value', 'gender': ['gender_value'], 'isUserQuery': False, 'brandList': ['brand_value'], 'masterCategoryList': ['masterCategory_value'], 'subCategoryList': ['subCategory_value'], 'completeSolrQuery': 'completeSolrQuery_value', 'atsaList': {'atsa_key1': ['atsa_value1'], 'atsa_key2': ['atsa_value2'], 'atsa_key3': ['atsa_value3']}} is not an example of the schema {
"namespace": "com.myntra.search",
"type": "record",
"name": "SearchDataIngestionObject",
"fields": [
{
"type": "long",
"name": "timestamp"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "brandList"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "articleTypeList"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "gender"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "masterCategoryList"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "subCategoryList"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "quAlgo"
},
{
"type": {
"items": "string",
"type": "array"
},
"name": "colours"
},
{
"type": "boolean",
"name": "isLandingPage"
},
{
"type": "boolean",
"name": "isUserQuery"
},
{
"type": "boolean",
"name": "isAutoSuggest"
},
{
"type": "string",
"name": "userQuery"
},
{
"type": "string",
"name": "correctedQuery"
},
{
"type": "string",
"name": "completeSolrQuery"
},
{
"type": {
"values": {
"items": "string",
"type": "array"
},
"type": "map"
},
"name": "atsaList"
},
{
"type": {
"values": "string",
"type": "map"
},
"name": "quMeta"
},
{
"type": "string",
"name": "requestId"
}
]
}
I even trying the same example as given here but it doesn't work and throws the same error.
In your exception, there error is saying that the data you are providing it is the following:
{'userQuery': 'userQuery_value',
'isAutoSuggest': False,
'isLandingPage': False,
'correctedQuery': 'correctedQuery_value',
'isUserQuery': False,
'timestamp': 1597399323000,
'completeSolrQuery': 'completeSolrQuery_value',
'requestId': 'requestId_value'}
This is much less than what you claim you are providing it in your example.
Can you go back to your original code and on line 60 before you do producer.produce(topic=topic, key=key, value=value) just do a simple print(value) to make sure you are sending it the right value and that the value hasn't gotten overwritten by some other line of code.

Filter json response from http request to only specific value then select the largest 'id' value from all results

I have a Python script where I would like to filter Python with a http get and I would like to filter the response data for only specific values. The json response example is below:
{
"id": "38",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "105600",
"type": "csv",
"status": "Completed",
"creator": {
"id": "1",
"username": "btest",
"firstname": "bob",
"lastname": "test"
},
{
"id": "39",
"name": "Report2",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113218",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
I would like to filter the above json to only show a report by name. For example if there is a Python filter that would only allow me to filter for a report by the name of "Report1". If I filtered on name of "Report1". I would expect to following to be to be returned below:
{
"id": "38",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "105600",
"type": "csv",
"status": "Completed",
"creator": {
"id": "1",
"username": "btest",
"firstname": "bob",
"lastname": "test"
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
For the final part of the script I would like to compare the 'id' field to show the largest value for example id 38 vs id 49 and then output the json for the largest in this case id 49. I would like it output
},
"id": "49",
"name": "Report1",
"description": "",
"reportDefinitionID": "-1",
"jobID": "113219",
"type": "csv",
"status": "Completed"
"creator": {
"id": "1",
"username": "btest1",
"firstname": "Bob",
"lastname": "test1"
}
For the last part i would just like to save the id value '49' to a variable in Python.
So far what I have below is:
response_data = response.json()
input_dict = json.dumps(response_data)
input_transform = json.loads(input_dict)
# Filter python objects with list comprehensions
sort1 = sorted([r.get("id") for r in input_transform if r.get("name") == "Report1"], reverse=True)[0]
# Print sorted JSON
print(sort1)
I updated my code and now I'm getting the error below:
'str' object has no attribute 'get'
I researched it and can not figure out what I'm doing now and how to get past it.
You need to get the ID in the listcomp as bellow:
sorted([r.get("id") for r in sample if r.get("name") == "Report1"], reverse=True)[0]

Fast way of adding fields to a nested dict

I need a help with improving my code.
I've got a nested dict with many levels:
{
"11": {
"FacLC": {
"immty": [
"in_mm",
"in_mm"
],
"moood": [
"in_oo",
"in_oo"
]
}
},
"22": {
"FacLC": {
"immty": [
"in_mm",
"in_mm",
"in_mm"
]
}
}
}
And I want to add additional fields on every level, so my output looks like this:
[
{
"id": "",
"name": "11",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
},
{
"id": "",
"name": "moood",
"characteristics": [
{
"id": "",
"name": "in_oo"
},
{
"id": "",
"name": "in_oo"
}
]
}
]
}
]
},
{
"id": "",
"name": "22",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
}
]
}
]
}
]
I managed to write a 4-times nested for loop, what I find inefficient and inelegant:
for main_name, general in my_dict.items():
generals = []
for general_name, specific in general.items():
specifics = []
for specific_name, characteristics in specific.items():
characteristics_dicts = []
for characteristic in characteristics:
characteristics_dicts.append({
"id": "",
"name": characteristic,
})
specifics.append({
"id": "",
"name": specific_name,
"characteristics": characteristics_dicts,
})
generals.append({
"id": "",
"name": general_name,
"specifics": specifics,
})
my_new_dict.append({
"id": "",
"name": main_name,
"general": generals,
})
I am wondering if there is more compact and efficient solution.
In the past I created a function to do it. Basically you call this function everytime that you need to add new fields to a nested dict, independently on how many levels this nested dict have. You only have to inform the 'full path' , that I called the 'key_map'.
Like ['node1','node1a','node1apart3']
def insert_value_using_map(_nodes_list_to_be_appended, _keys_map, _value_to_be_inserted):
for _key in _keys_map[:-1]:
_nodes_list_to_be_appended = _nodes_list_to_be_appended.setdefault(_key, {})
_nodes_list_to_be_appended[_keys_map[-1]] = _value_to_be_inserted

can't print the value of json object in django

I have this json object in ajax_data variable
{
"columns[0][data]": "0",
"columns[1][name]": "",
"columns[5][searchable]": "true",
"columns[5][name]": "",
"columns[4][search][regex]": "false",
"order[0][dir]": "asc",
"length": "10",
}
I have converted it using json.loads() function like.
ajax_data = json.loads(ajax_data)
I want to get the value if "order[0][dir]" and "columns[0][data]" but if i print it using
ajax_data['order'][0]['dir]
its giving error :
KeyError at /admin/help
'order'
But same code works if i access it for length key then it works.
The keys you have used are actually not a good way of implementation.
{
"columns[0][data]": "0",
"columns[1][name]": "",
"columns[5][searchable]": "true",
"columns[5][name]": "",
"columns[4][search][regex]": "false",
"order[0][dir]": "asc",
"length": "10",
}
Instead of this you should hav gone for
{
"columns": [
{"data": "0", "name": "", "searchable": "true", "name": "", "search": {
"regex": "false"}
},
{"data": "0", "name": "", "searchable": "true", "name": ""," search": {
"regex": "false"}},
{"data": "0", "name": "", "searchable": "true", "name": "", "search": {
"regex": "false"}},
{"data": "0", "name": "", "searchable": "true", "name": "", "search": {
"regex": "false"}},
{"data": "0", "name": "", "searchable": "true", "name": "", "search": {
"regex": "false"}},
{"data": "0", "name": "", "searchable": "true", "name": "", "search": {
"regex": "false"}},
],
"order": [
{"dir": "asc"}
],
"length": "10"
}
In this case ajax_data['order'][0]['dir] will result in value "asc"
For your current implementation the key is "order[0][dir]"
That is go for
ajax_data["order[0][dir]"]
Hope you understood the issue.
Structuring of json is very important when dealing with APIs. Try to restructure your json which will help for future too.
That's because length is a key in that json object, and order is not. The key names are the entire strings inside the quotes: columns[0][data], order[0][dir], etc.
Those are unusual key names, but perfectly valid.

Categories

Resources