Python Remove element from json if value exists - python

I have a rather large geojson file which is converted from some National Weather Service data. I've trimmed it down to this sample here:
{
"properties": {
"name": "day1otlk"
},
"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.71424459627099,
40.229695635383166
],
[
-122.62484780364827,
40.53410620541074
],
[
-122.71424459627099,
40.229695635383166
]
]
]
},
"properties": {
"Name": "General Thunder",
"stroke-opacity": 1,
"stroke-width": 4,
"name": "General Thunder",
"fill": "#c0e8c0",
"fill-opacity": 0.75,
"stroke": "#ffffff",
"timeSpan": {
"end": "2017-03-30T12:00:00Z",
"begin": "2017-03-29T20:00:00Z"
}
},
"type": "Feature"
},
{
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-108.65861565996833,
32.91391108773154
],
[
-108.63932601964923,
32.95521185698698
],
[
-108.65861565996833,
32.91391108773154
]
]
]
},
"properties": {
"Name": "General Thunder",
"stroke-opacity": 1,
"stroke-width": 4,
"name": "General Thunder",
"fill": "#c0e8c0",
"fill-opacity": 0.75,
"stroke": "#ffffff",
"timeSpan": {
"end": "2017-03-30T12:00:00Z",
"begin": "2017-03-29T20:00:00Z"
}
},
"type": "Feature"
},
{
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-92.67280213157608,
38.47870651780003
],
[
-92.62448390998837,
38.45534960370862
],
[
-92.59475154780039,
38.493327413824595
],
[
-92.64308574626148,
38.51669676139087
],
[
-92.67280213157608,
38.47870651780003
]
]
]
},
"properties": {
"Name": "10 %",
"stroke-opacity": 1,
"stroke-width": 4,
"name": "10 %",
"fill": "#8b4726",
"fill-opacity": 0.89,
"stroke": "#ffffff",
"timeSpan": {
"end": "2017-03-30T12:00:00Z",
"begin": "2017-03-29T20:00:00Z"
}
},
"type": "Feature"
},
{
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-92.67280213157608,
38.47870651780003
],
[
-92.62448390998837,
38.45534960370862
],
[
-92.59475154780039,
38.493327413824595
],
[
-92.64308574626148,
38.51669676139087
],
[
-92.67280213157608,
38.47870651780003
]
]
]
},
"properties": {
"Name": "10 %",
"stroke-opacity": 1,
"stroke-width": 4,
"name": "20 %",
"fill": "#8b4726",
"fill-opacity": 0.89,
"stroke": "#ffffff",
"timeSpan": {
"end": "2017-03-30T12:00:00Z",
"begin": "2017-03-29T20:00:00Z"
}
},
"type": "Feature"
},
{
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-97.09845994557838,
38.43843745045377
],
[
-97.07114801649661,
38.47751978088534
],
[
-97.09845994557838,
38.43843745045377
]
]
]
},
"properties": {
"Name": "5 %",
"stroke-opacity": 1,
"stroke-width": 4,
"name": "5 %",
"fill": "#b47f00",
"fill-opacity": 0.89,
"stroke": "#ffffff",
"timeSpan": {
"end": "2017-03-30T12:00:00Z",
"begin": "2017-03-29T20:00:00Z"
}
},
"type": "Feature"
}
]
}
I'm looking to remove the elements where name has a % in it. I don't want those coordinates or anything included.
Here's my code:
import json
with open('test.geojson') as data_file:
data = json.load(data_file)
for element in data["features"]:
if '%' in element["properties"]["name"]:
del element["type"]
del element["properties"] # Deletes the properties
del element["geometry"] # Deletes the coords
with open('test_output.geojson', 'w') as data_file:
data = json.dump(data, data_file)
This works well enough to remove the element's sub keys, but I'm left with output that looks like:
{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}
I've also tried to use
for element in data["features"]:
if '%' in element["properties"]["name"]:
data["features"].remove(element)
but that seems to delete only the last element in the sample group, which is the 5 % group. It should be removing the 10 %, 20 % and the 5 % groups.
Is there a way to remove the element from data["features"] if name has a % in it all together so I'm left with clean json output? In this sample data, the only data["features"] I should have left are the General Thunder and no empty brackets.

Use a simple filter:
no_percent = lambda feature: '%' not in feature['properties']['name']
data['features'] = filter(no_percent, data['features'])
Or as a list comprehension:
data['features'] = [feature for feature in data['features']
if '%' not in feature['properties']['name']]

The issue with using del element["type"], del element["properties"] and del element["geometry"] is that it only removes those items from properties dict of that element. Not the element itself.
For your 2nd item, when you're iterating over a list like in for element in data["features"]:, it's not good to modify a list or object while iterating over it (which is what's happening with data["features"].remove(element)). Also, list.remove() removes an item with that value. So element gets used in a value context, not as that element.
It's better to create a new list and then assign that. What you could do is:
new_features = []
for element in data["features"]:
if '%' not in element["properties"]["name"]: # note the 'not'
new_features.append(element) # new_features has the one's you want
# and then re-assign features to the list with the elements you want
data["features"] = new_features

Related

Copy some k,v from existing JSON to constitute another JSON in Python

I have this JSON :
{
"duration": 1942,
"frame_id": 0,
"detect1": [
{
"type": {
"s_type1": [
{
"confidence": 98.70016,
"klass": -1,
"name": "c*****"
},
{
"confidence": 1.042385,
"klass": -1,
"name": "c*****"
},
{
"confidence": 0.1587732,
"klass": -1,
"name": "s*****"
}
],
"s_type2": [
{
"confidence": 92.82484,
"klass": -1,
"name": "b*****"
},
{
"confidence": 7.098834,
"klass": -1,
"name": "b*****"
},
{
"confidence": 0.02423214,
"klass": -1,
"name": "p*****"
},
],
"Box": [
80.80994,
170.0965,
1091.778
]
},
"confidences": [
90.08681,
99.91595,
90.12489
]
}
]
}
And i would like to save some of key and values from this JSON to another JSON. The new JSON will keep :
duration (k,v),
frame_id (k,v),
detect1 :
type : s_type1 and s_type2 only the first dict will be keped and the klass (k,v) will be removed,
Box (k,v)
confidences (k,v)
The final result :
{
"duration": 1942,
"frame_id": 0,
"detect1": [
{
"type": {
"s_type1": [
{
"confidence": 98.70016,
"name": "c*****"
},
],
"s_type2": [
{
"confidence": 92.82484,
"name": "b*****"
}
],
"Box": [
80.80994,
170.0965,
1091.778
]
},
"confidences": [
90.08681,
99.91595,
90.12489
]
}
]
}
I was trying to do it with the JMESPath library but I can't get a good result.
Have someone any idea to do this ?
Thanks
Using jmespath to get your desired output:
import jmespath
expression = """{duration: duration,
frame_id: frame_id,
detect1: [{type:{s_type1: [detect1[].type.s_type1[].merge({confidence: confidence, name: name})|[0]],
s_type2: [detect1[].type.s_type2[].merge({confidence: confidence, name: name})|[0]],
Box: detect1[].type.Box[]},
confidences: detect1[].confidences[]
}
]}
"""
expression = jmespath.compile(expression)
expression.search(json)
{'duration': 1942,
'frame_id': 0,
'detect1': [{'type': {'s_type1': [{'confidence': 98.70016, 'name': 'c*****'}],
's_type2': [{'confidence': 92.82484, 'name': 'b*****'}],
'Box': [80.80994, 170.0965, 1091.778]},
'confidences': [90.08681, 99.91595, 90.12489]}]}

Flattening dict/list object - Python vs Sql?

I have a bit of dict/list values- typically I would store as json and flatten via SQL - but wondering if anyone has any perspective on using python? Just in general the most efficient way to get this data in one row, where the values for dimensions & metricHeaderEntries are columns and the respective row values fall in line under them
BTW this is an example Google Analytics API Export , so may help others. The goal would be to get all the values for Dimension & Metric on one row.
{
"reports": [
{
"columnHeader": {
"dimensions": [
"DimA",
"DimAB",
"DimAC",
"DimAD",
"DimAE"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "METRICAB",
"type": "INTEGER"
},
{
"name": "METRICABAC",
"type": "INTEGER"
},
{
"name": "METRICABACD",
"type": "INTEGER"
},
{
"name": "METRICABADCE",
"type": "VARCHAR"
},
{
"name": "METRICABADEEW",
"type": "FLOAT"
},
{
"name": "METRICABADEEFG",
"type": "FLOAT"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"(test)",
"2019022",
"(not set)",
"d",
"/search"
],
"metrics": [
{
"values": [
"0",
"1",
"1",
"100000.0",
"61.0",
"16.0"
]
}
]
},
{
"dimensions": [
"(test)",
"20190222",
"(not set)",
"asdf",
"/adf"
],
"metrics": [
{
"values": [
"30",
"133",
"31",
"330.0",
"31.0",
"32.0"
]
}
]
},
"nextPageToken": "331000"
}
]
}
Desired Output:

Identify location of item in json file

Suppose I have the following json file. With data1["tenants"][1]['name'] I can select uniquename2. Is there a way to collect the '1' number by looping over the document?
{
"tenants": [{
"key": "identifier",
"name": "uniquename",
"image": "url",
"match": [
"identifier"
],
"tags": [
"tag1",
"tag2"
]
},
{
"key": "identifier",
"name": "uniquename2",
"image": "url",
"match": [
"identifier1",
"identifier2"
],
"tags": ["tag"]
}
]
}
in short: data1["tenants"][1]['name']= uniquename2 data1["tenants"][0]['name'] = uniquename
How can I find out which number has which name. So if I have uniquename2 what number/index corresponds with it?
you can iterate over the tenants to map the index to the name
data = {
"tenants": [{
"key": "identifier",
"name": "uniquename",
"image": "url",
"match": [
"identifier"
],
"tags": [
"tag1",
"tag2"
]
},
{
"key": "identifier",
"name": "uniquename2",
"image": "url",
"match": [
"identifier1",
"identifier2"
],
"tags": ["tag"]
}
]
}
for index, tenant in enumerate(data['tenants']):
print(index, tenant['name'])
OUTPUT
0 uniquename
1 uniquename2
Assuming, you have turned your json into a dictionary already, this is how you can get the index of the firs ocurrence of a name in your list (This relies on the names actually being unique):
data = {
"tenants": [{
"key": "identifier",
"name": "uniquename",
"image": "url",
"match": [
"identifier"
],
"tags": [
"tag1",
"tag2"
]
},
{
"key": "identifier",
"name": "uniquename2",
"image": "url",
"match": [
"identifier1",
"identifier2"
],
"tags": ["tag"]
}
]
}
def index_of(tenants, tenant_name):
try:
return tenants.index(
next(
tenant for tenant in tenants
if tenant["name"] == tenant_name
)
)
except StopIteration:
raise ValueError(
f"tenants list does not have tenant by name {tenant_name}."
)
index_of(data["tenants"], "uniquename") # 0

Adding a list into a list in Python

I'm receiving JSON back from an API call want to log when a keyword has been detected, sometimes there may be one, none or several returned from the API. I'm able to log the data that comes back no problem.
I want to run 1000s of requests and then have each result logged as a list of results within a list, (So I know which list corresponds to which API call).
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.extend(TempEntityList)
TempEntityList = []
Which does append everything to a list but I can't seem to find the right setup.
I get the below when I run it twice I get.
['Chat', 'Case', 'Telephone','Chat', 'Case', 'Telephone']
When really I want
[['Chat', 'Case', 'Telephone'],['Chat', 'Case', 'Telephone']]
I'm creating TempEntityList appending any matches found to it, extending EntityList by what has been found and then clearing TempEntityList down for the next API call.
What's the best way I could log each set of results to a nested list as so far I've only managed to get one large list or every item is it's own nested item.
As requested the payload that is returned looks like the below
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "x"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "xxx-xxx-xxx"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
Firstly, since you have TempEntityList = [] in the beginning of the for loop, you don't need to add another TempEntityList = [] in the bottom. To answer the question, use list.append() instead of list.extend():
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.append(TempEntityList)
I've managed to get what I want, thanks everyone for the suggestions.
The solution was:
global EntityList
EntityList = []
for item in response['output']['entities']:
EntityList.append(item['entity'])
FinalList.append(EntityList)
Which after running the function for three times on the same input produced:
[['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone']]

Insert and delete geojson object based on conditions in Python

I have a Geojson as follows:
data = {
"type": "FeatureCollection",
"name": "entities",
"features": [
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "3C8",
"area": "141.81",
"type": "p",
"Text": "area:141.81;type:p"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2721.1572741014097,
1454.3223948456648
],
[
2720.121847266826,
1454.3223948456648
],
[
2720.121847266826,
1452.6092152478227
],
[
2710.5679254269344,
1452.6092152478227
],
[
2721.1572741014097,
1430.1478385206133
],
[
2721.1572741014097,
1454.3223948456648
]
]
]
}
},
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "3CE",
"area": "44.79",
"type": "h",
"Text": "area:44.79;type:h"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2710.723323781393,
1450.3320226620049
],
[
2720.0654518264787,
1450.3320226620049
],
[
2720.0654518264787,
1445.537183875705
],
[
2710.723323781393,
1445.537183875705
],
[
2710.723323781393,
1450.3320226620049
]
]
]
}
},
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "610",
"name": "706",
"area": "92.28",
"type": "o",
"Text": "name:706;area:92.28;type:o"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2714.603212251531,
1462.7249212430308
],
[
2711.7289360797,
1462.7249212430308
],
[
2711.7289360797,
1464.852506681824
],
[
2705.7302059101926,
1460.6840827804538
],
[
2710.567925426934,
1454.3223948456637
],
[
2710.567925426934,
1453.838838298367
],
[
2714.603212251531,
1453.838838298367
],
[
2714.603212251531,
1462.7249212430308
]
]
]
}
}
]
}
I want to insert "name": "" if name does not exist in properties, and delete "Text" object since it's duplicated, how can I do that in Python?
Thanks a lot at advance!
Expected result:
data = {
"type": "FeatureCollection",
"name": "entities",
"features": [
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "3C8",
"name": "",
"area": "141.81",
"type": "p"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2721.1572741014097,
1454.3223948456648
],
[
2720.121847266826,
1454.3223948456648
],
[
2720.121847266826,
1452.6092152478227
],
[
2710.5679254269344,
1452.6092152478227
],
[
2721.1572741014097,
1430.1478385206133
],
[
2721.1572741014097,
1454.3223948456648
]
]
]
}
},
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "3CE",
"name": "",
"area": "44.79",
"type": "h"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2710.723323781393,
1450.3320226620049
],
[
2720.0654518264787,
1450.3320226620049
],
[
2720.0654518264787,
1445.537183875705
],
[
2710.723323781393,
1445.537183875705
],
[
2710.723323781393,
1450.3320226620049
]
]
]
}
},
{
"type": "Feature",
"properties": {
"Layer": "0",
"SubClasses": "AcDbEntity:AcDbBlockReference",
"EntityHandle": "610",
"name": "706",
"area": "92.28",
"type": "o"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
2714.603212251531,
1462.7249212430308
],
[
2711.7289360797,
1462.7249212430308
],
[
2711.7289360797,
1464.852506681824
],
[
2705.7302059101926,
1460.6840827804538
],
[
2710.567925426934,
1454.3223948456637
],
[
2710.567925426934,
1453.838838298367
],
[
2714.603212251531,
1453.838838298367
],
[
2714.603212251531,
1462.7249212430308
]
]
]
}
}
]
}
UPDATE:
My solution so far, it seems works.
import json
features = data["features"]
for i in features:
d = i["properties"]
if "name" not in d:
d["name"] = ""
if i["properties"]["Text"] is not None:
del i["properties"]["Text"]
I define it as a function, but in some cases I get an error as follows. Does someone know how to fix it? Thanks.
Traceback (most recent call last):
File "<ipython-input-1-8e3095f67c57>", line 138, in <module>
modify_geojson(output_file)
File "<ipython-input-1-8e3095f67c57>", line 102, in modify_geojson
if i["properties"]["Text"] is not None:
KeyError: 'Text'
In each property 'Text' is only present once. Please explain where it's duplicated?
My solution so far, it seems works.
import json
features = data["features"]
for i in features:
d = i["properties"]
if "name" not in d:
d["name"] = ""
if i["properties"]["Text"] is not None:
del i["properties"]["Text"]
I define it as a function, but in some cases I get an error as follows. Does someone know how to fix it? Thanks.
Traceback (most recent call last):
File "<ipython-input-1-8e3095f67c57>", line 138, in <module>
modify_geojson(output_file)
File "<ipython-input-1-8e3095f67c57>", line 102, in modify_geojson
if i["properties"]["Text"] is not None:
KeyError: 'Text'

Categories

Resources