Extract specific data from JSON data from Google Cloud Vision

Extract specific data from JSON data from Google Cloud Vision - python

I am quite new to Raspberry Pi and Python coding but I was successful in configuring Google Cloud Vision. However the JSON dump looks like:
{
"responses": [
{
"faceAnnotations": [
{
"angerLikelihood": "UNLIKELY",
"blurredLikelihood": "VERY_UNLIKELY",
"boundingPoly": {
"vertices": [
{
"x": 129
},
{
"x": 370
},
{
"x": 370,
"y": 240
},
{
"x": 129,
"y": 240
}
]
},
"detectionConfidence": 0.99543685,
"fdBoundingPoly": {
"vertices": [
{
"x": 162,
"y": 24
},
{
"x": 337,
"y": 24
},
{
"x": 337,
"y": 199
},
{
"x": 162,
"y": 199
}
]
},
"headwearLikelihood": "VERY_UNLIKELY",
"joyLikelihood": "VERY_UNLIKELY",
"landmarkingConfidence": 0.77542377,
"landmarks": [
{
"position": {
"x": 210.93373,
"y": 92.71409,
"z": -0.00025338508
},
"type": "LEFT_EYE"
},
{
"position": {
"x": 280.00177,
"y": 82.57283,
"z": 0.49017733
},
"type": "RIGHT_EYE"
},
{
"position": {
"x": 182.08047,
"y": 77.89372,
"z": 6.825161
},
"type": "LEFT_OF_LEFT_EYEBROW"
},
{
"position": {
"x": 225.82335,
"y": 72.88091,
"z": -13.963233
},
"type": "RIGHT_OF_LEFT_EYEBROW"
},
{
"position": {
"x": 260.4491,
"y": 66.19005,
"z": -13.798634
},
"type": "LEFT_OF_RIGHT_EYEBROW"
},
{
"position": {
"x": 303.87503,
"y": 59.69522,
"z": 7.8336163
},
"type": "RIGHT_OF_RIGHT_EYEBROW"
},
{
"position": {
"x": 244.57729,
"y": 83.701904,
"z": -15.022567
},
"type": "MIDPOINT_BETWEEN_EYES"
},
{
"position": {
"x": 251.58353,
"y": 124.68004,
"z": -36.52176
},
"type": "NOSE_TIP"
},
{
"position": {
"x": 255.39096,
"y": 151.87607,
"z": -19.560472
},
"type": "UPPER_LIP"
},
{
"position": {
"x": 259.96045,
"y": 178.62886,
"z": -14.095398
},
"type": "LOWER_LIP"
},
{
"position": {
"x": 232.35422,
"y": 167.2542,
"z": -1.0750997
},
"type": "MOUTH_LEFT"
},
{
"position": {
"x": 284.49316,
"y": 159.06075,
"z": -0.078973025
},
"type": "MOUTH_RIGHT"
},
{
"position": {
"x": 256.94714,
"y": 163.11235,
"z": -14.0897665
},
"type": "MOUTH_CENTER"
},
{
"position": {
"x": 274.47885,
"y": 125.8553,
"z": -7.8479633
},
"type": "NOSE_BOTTOM_RIGHT"
},
{
"position": {
"x": 231.2164,
"y": 132.60686,
"z": -8.418254
},
"type": "NOSE_BOTTOM_LEFT"
},
{
"position": {
"x": 252.96692,
"y": 135.81783,
"z": -19.805998
},
"type": "NOSE_BOTTOM_CENTER"
},
{
"position": {
"x": 208.6943,
"y": 86.72571,
"z": -4.8503814
},
"type": "LEFT_EYE_TOP_BOUNDARY"
},
{
"position": {
"x": 223.4354,
"y": 90.71454,
"z": 0.42966545
},
"type": "LEFT_EYE_RIGHT_CORNER"
},
{
"position": {
"x": 210.67189,
"y": 96.09362,
"z": -0.62435865
},
"type": "LEFT_EYE_BOTTOM_BOUNDARY"
},
{
"position": {
"x": 195.00711,
"y": 93.783226,
"z": 6.6310787
},
"type": "LEFT_EYE_LEFT_CORNER"
},
{
"position": {
"x": 208.30045,
"y": 91.73073,
"z": -1.7749802
},
"type": "LEFT_EYE_PUPIL"
},
{
"position": {
"x": 280.8329,
"y": 75.722244,
"z": -4.3266015
},
"type": "RIGHT_EYE_TOP_BOUNDARY"
},
{
"position": {
"x": 295.9134,
"y": 78.8241,
"z": 7.3644505
},
"type": "RIGHT_EYE_RIGHT_CORNER"
},
{
"position": {
"x": 281.82813,
"y": 85.56999,
"z": -0.09711724
},
"type": "RIGHT_EYE_BOTTOM_BOUNDARY"
},
{
"position": {
"x": 266.6147,
"y": 83.689865,
"z": 0.6850431
},
"type": "RIGHT_EYE_LEFT_CORNER"
},
{
"position": {
"x": 282.31485,
"y": 80.471725,
"z": -1.3341979
},
"type": "RIGHT_EYE_PUPIL"
},
{
"position": {
"x": 202.4563,
"y": 66.06882,
"z": -8.493092
},
"type": "LEFT_EYEBROW_UPPER_MIDPOINT"
},
{
"position": {
"x": 280.76108,
"y": 54.08935,
"z": -7.895889
},
"type": "RIGHT_EYEBROW_UPPER_MIDPOINT"
},
{
"position": {
"x": 168.31839,
"y": 134.46411,
"z": 89.73161
},
"type": "LEFT_EAR_TRAGION"
},
{
"position": {
"x": 332.23724,
"y": 109.35637,
"z": 90.81501
},
"type": "RIGHT_EAR_TRAGION"
},
{
"position": {
"x": 242.81676,
"y": 67.845825,
"z": -16.629877
},
"type": "FOREHEAD_GLABELLA"
},
{
"position": {
"x": 264.32065,
"y": 208.95119,
"z": -4.0186276
},
"type": "CHIN_GNATHION"
},
{
"position": {
"x": 183.4723,
"y": 179.30655,
"z": 59.87147
},
"type": "CHIN_LEFT_GONION"
},
{
"position": {
"x": 331.6927,
"y": 156.69931,
"z": 60.93835
},
"type": "CHIN_RIGHT_GONION"
}
],
"panAngle": 0.41165036,
"rollAngle": -8.687789,
"sorrowLikelihood": "VERY_UNLIKELY",
"surpriseLikelihood": "VERY_UNLIKELY",
"tiltAngle": 0.2050134,
"underExposedLikelihood": "POSSIBLE"
}
]
}
]
}
Yes, it's an eyesore to look at. I am only wanting to extract the likelihood. Preferably in this format:
Anger likelihood is UNLIKEY
Joy likelihood is VERY_UNLIKELY
Sorrow likelihood is VERY_UNLIKELY
Suprise likelihood is VERY_UNLIKELY
Python code can be found here:
https://github.com/DexterInd/GoogleVisionTutorials/blob/master/camera-vision-face.py

Answered my own question in perhaps the most noobiest way:
print "Anger likelihood is:",
print(response['responses'][0]['faceAnnotations'][0]['angerLikelihood'])
print "Joy likelihood is:",
print(response['responses'][0]['faceAnnotations'][0]['joyLikelihood'])
print "Sorrow likelihood is:",
print(response['responses'][0]['faceAnnotations'][0]['sorrowLikelihood'])
print "Surprise likelihood is:",
print(response['responses'][0]['faceAnnotations'][0]['surpriseLikelihood'])
Came out looking like:
Anger likelihood is: VERY_UNLIKELY
Joy likelihood is: VERY_LIKELY
Sorrow likelihood is: VERY_UNLIKELY
Surprise likelihood is: VERY_UNLIKELY

You can go with dictionary comprehensions. Given that you have your response in variable result, the following code will output exactly what you want.
import json
likelihood = {
attr[:len(attr) - 10].capitalize(): value
for attr, value
in json.loads(result)['responses'][0]['faceAnnotations'][0].items()
if attr.find('Likelihood') != -1
}
print(*[
'{} likelihood is {}'.format(e, p) for e, p in likelihood.items()
], sep='\n')
Keep in mind that this code works correctly if there is only one item in both responses and faceAnnotations arrays. If there's more, the code will handle only the first items. It's also kinda ugly.
In len(attr) - 10, 10 is the length of word "Likelihood".

Related

Extract events data from GA4 via bigquery in ADF synapse delta table

We need to extract the events table from GA4 through bigquery (not connecting via Google API directly as it limits both - the number of rows & number of dimensions/metrics), however as there are several nested columns, the ADF reads data in the given format:
{
"v": [{
"v": {
"f": [{
"v": "firebase_conversion"
}, {
"v": {
"f": [{
"v": null
}, {
"v": "0"
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "ga_session_id"
}, {
"v": {
"f": [{
"v": null
}, {
"v": "123"
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "engaged_session_event"
}, {
"v": {
"f": [{
"v": null
}, {
"v": "1"
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "ga_session_number"
}, {
"v": {
"f": [{
"v": null
}, {
"v": "9"
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "page_referrer"
}, {
"v": {
"f": [{
"v": "ABC"
}, {
"v": null
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "page_title"
}, {
"v": {
"f": [{
"v": "ABC"
}, {
"v": null
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "page_location"
}, {
"v": {
"f": [{
"v": "xyz"
}, {
"v": null
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}, {
"v": {
"f": [{
"v": "session_engaged"
}, {
"v": {
"f": [{
"v": null
}, {
"v": "1"
}, {
"v": null
}, {
"v": null
}]
}
}]
}
}]
}
Unnesting is a problem as there are several columns with such data structure, and unnest will increase the number of rows (3.5mn records becomes 40mn). The plan is to maybe extract data as is & use azure functions with python functions to flatten it as JSON, but again the null values are creating trouble there.
Someone can suggest the best way to get data on daily basis without extrapolation in the desired format in the data lake?

does `transform_lookup` save space?

I am trying to link several Altair charts that share aspects of the same data. I can do this by merging all the data into one data frame, but because of the nature of the data the merged data frame is much larger than is needed to have two separate data frames for each of the two charts. This is because the columns unique to each chart have many repeated rows for each entry in the shared column.
Would using transform_lookup save space over just using the merged data frame, or does transform_lookup end up doing the whole merge internally?

No, the entire dataset is still included in the vegaspec when you use transform_lookup. You can see this by printing the json spec of the charts you create. With the example from the docs:
import altair as alt
import pandas as pd
from vega_datasets import data
people = data.lookup_people().head(3)
people
name age height
0 Alan 25 180
1 George 32 174
2 Fred 39 182
groups = data.lookup_groups().head(3)
groups
group person
0 1 Alan
1 1 George
2 1 Fred
With pandas merge:
merged = pd.merge(groups, people, how='left',
left_on='person', right_on='name')
print(alt.Chart(merged).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-b41b97ffc89b39c92e168871d447e720"
},
"datasets": {
"data-b41b97ffc89b39c92e168871d447e720": [
{
"age": 25,
"group": 1,
"height": 180,
"name": "Alan",
"person": "Alan"
},
{
"age": 32,
"group": 1,
"height": 174,
"name": "George",
"person": "George"
},
{
"age": 39,
"group": 1,
"height": 182,
"name": "Fred",
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar"
}
With transform lookup all the data is there but as to separate dataset (so technically it takes a little bit of more space with the additional braces and the transform):
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"name": "data-5fe242a79352d1fe243b588af570c9c6"
},
"datasets": {
"data-2b374d1509415e1d327c3a7521f8117c": [
{
"age": 25,
"height": 180,
"name": "Alan"
},
{
"age": 32,
"height": 174,
"name": "George"
},
{
"age": 39,
"height": 182,
"name": "Fred"
}
],
"data-5fe242a79352d1fe243b588af570c9c6": [
{
"group": 1,
"person": "Alan"
},
{
"group": 1,
"person": "George"
},
{
"group": 1,
"person": "Fred"
}
]
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"name": "data-2b374d1509415e1d327c3a7521f8117c"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}
When transform_lookup can save space is if you use it with the URLs of two dataset:
people = data.lookup_people.url
groups = data.lookup_groups.url
print(alt.Chart(groups).mark_bar().encode(
x='mean(age):Q',
y='group:O'
).transform_lookup(
lookup='person',
from_=alt.LookupData(data=people, key='name',
fields=['age'])
).to_json())
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
"config": {
"view": {
"continuousHeight": 300,
"continuousWidth": 400
}
},
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_groups.csv"
},
"encoding": {
"x": {
"aggregate": "mean",
"field": "age",
"type": "quantitative"
},
"y": {
"field": "group",
"type": "ordinal"
}
},
"mark": "bar",
"transform": [
{
"from": {
"data": {
"url": "https://vega.github.io/vega-datasets/data/lookup_people.csv"
},
"fields": [
"age",
"height"
],
"key": "name"
},
"lookup": "person"
}
]
}

Find dictionaries with same values and add new keys to them

have such a question. I have a list with dictionaries, which contain information about words, which were recognized from image using Google API. So my list looks like:
test_list = [
{
"value": "68004,",
"location": {
"TL": {
"x": 351,
"y": 0
},
"TR": {
"x": 402,
"y": 0
},
"BR": {
"x": 402,
"y": 12
},
"BL": {
"x": 351,
"y": 12
}
},
"type": 1
},
{
"value": "Чорномор",
"location": {
"TL": {
"x": 415,
"y": 0
},
"TR": {
"x": 493,
"y": 0
},
"BR": {
"x": 493,
"y": 12
},
"BL": {
"x": 415,
"y": 12
}
},
"type": 1
},
{
"value": "вулиця,",
"location": {
"TL": {
"x": 495,
"y": 14
},
"TR": {
"x": 550,
"y": 10
},
"BR": {
"x": 551,
"y": 22
},
"BL": {
"x": 496,
"y": 26
}
},
"type": 1
},
{
"value": "140,",
"location": {
"TL": {
"x": 557,
"y": 8
},
"TR": {
"x": 576,
"y": 7
},
"BR": {
"x": 577,
"y": 20
},
"BL": {
"x": 558,
"y": 21
}
},
"type": 1
},
{
"value": "кв.",
"location": {
"TL": {
"x": 581,
"y": 6
},
"TR": {
"x": 605,
"y": 4
},
"BR": {
"x": 606,
"y": 21
},
"BL": {
"x": 582,
"y": 23
}
},
"type": 1
},
{
"value": "77",
"location": {
"TL": {
"x": 607,
"y": 5
},
"TR": {
"x": 628,
"y": 4
},
"BR": {
"x": 629,
"y": 19
},
"BL": {
"x": 608,
"y": 21
}
},
"type": 1
},
]
So I want to find, if some dictionaries have the same location parameters and if it is true, add a new key "string_number" with the same index to these dictionaries. As example in the above code two first dictionaries have the same ["location"]["TL"]["y"] and ["location"]["TR"]["y"] == 0px, and also ["location"]["BR"]["y"] and ["location"]["BL"]["y"] == 12px.
So that means that these words are placed in a one string in a real document, so I want to add to them new key "string_number" with index 0. That will look like:
test_list = [
{
"value": "68004,",
"location": {
"TL": {
"x": 351,
"y": 0
},
"TR": {
"x": 402,
"y": 0
},
"BR": {
"x": 402,
"y": 12
},
"BL": {
"x": 351,
"y": 12
}
},
"type": 1
"string_number": 0
},
{
"value": "Чорномор",
"location": {
"TL": {
"x": 415,
"y": 0
},
"TR": {
"x": 493,
"y": 0
},
"BR": {
"x": 493,
"y": 12
},
"BL": {
"x": 415,
"y": 12
}
},
"type": 1
"string_number": 0
},
Then going through the rest of a list I want to find every such a duplication and set the same string index to them.
However, sometimes pixels can di ffer by 1-2 points less or more (so as example "y": 12 or 10 or 14 probably still say that word is on the same line in a document). So is it real to make an extra check for this difference?
EDIT: so I used the help of Aleksa Svitlica and created a class, which makes all work about searching for words on the same line.
So it looks like:
class WordParser():
def __init__(self):
self.list_wbw = self.load_json()
self.next_new_string_number = 0
def load_json(self):
with io.open(calc_paths(status="now", path_type=PathType.OCR_JSON_WBW), 'r', encoding='utf-8') as content:
self.list_wbw = json.load(content)
content.close()
return self.list_wbw
def mark_images_on_same_line(self):
number_of_images = len(self.list_wbw)
for i in range(number_of_images):
for j in range(i + 1, number_of_images):
image1 = self.list_wbw[i]
image2 = self.list_wbw[j]
on_same_line = self._check_if_images_on_same_line(image1, image2)
if on_same_line:
self._add_string_number_to_images(image1, image2)
def print_images(self):
print(json.dumps(self.list_wbw, indent=3, sort_keys=False, ensure_ascii=False))
def _check_if_images_on_same_line(self, image1, image2):
image1_top_left = image1["location"]["TL"]["y"]
image1_top_right = image1["location"]["TR"]["y"]
image1_bot_left = image1["location"]["BL"]["y"]
image1_bot_right = image1["location"]["BR"]["y"]
image2_top_left = image2["location"]["TL"]["y"]
image2_top_right = image2["location"]["TR"]["y"]
image2_bot_left = image2["location"]["BL"]["y"]
image2_bot_right = image2["location"]["BR"]["y"]
same_top_left_position = self._pixel_heights_match_within_threshold(image1_top_left, image2_top_left)
same_top_right_position = self._pixel_heights_match_within_threshold(image1_top_right, image2_top_right)
same_bot_left_position = self._pixel_heights_match_within_threshold(image1_bot_left, image2_bot_left)
same_bot_right_position = self._pixel_heights_match_within_threshold(image1_bot_right, image2_bot_right)
if same_top_left_position and same_top_right_position and same_bot_left_position and same_bot_right_position:
self._add_string_number_to_images(image1, image2)
def _add_string_number_to_images(self, image1, image2):
string_number = self._determine_string_number(image1, image2)
image1["string_number"] = string_number
image2["string_number"] = string_number
def _determine_string_number(self, image1, image2):
string_number = self.next_new_string_number
image1_number = image1.get("string_number")
image2_number = image2.get("string_number")
if image1_number is not None:
string_number = image1_number
elif image2_number is not None:
string_number = image2_number
else:
self.next_new_string_number += 1
return string_number
def _pixel_heights_match_within_threshold(self, height1, height2, threshold=4):
return abs(height1 - height2) <= threshold
And in my another module, where I call these methods:
word_parser = WordParser()
word_parser.mark_images_on_same_line()
word_parser.print_images()

Adding the following code after your test_list I got the output you can see below. My code currently just checks that the heights of TR and TL are within a threshold (defaults to 2 pixel threshold). But you could modify it depending on your requirements. In _check_if_images_on_same_line you can change the rules as you like.
import json
#-------------------------------------------------------------------
#---Classes---------------------------------------------------------
#-------------------------------------------------------------------
class ImageParser():
def __init__(self, list_of_images):
self.list_of_images = list_of_images
self.next_new_string_number = 0
# ----------------------------------------------------------------------------
# ---Public-------------------------------------------------------------------
# ----------------------------------------------------------------------------
def mark_images_on_same_line(self):
number_of_images = len(self.list_of_images)
for i in range(number_of_images):
for j in range(i+1, number_of_images):
image1 = self.list_of_images[i]
image2 = self.list_of_images[j]
on_same_line = self._check_if_images_on_same_line(image1, image2)
if on_same_line:
self._add_string_number_to_images(image1, image2)
def print_images(self):
print(json.dumps(self.list_of_images, indent=True, sort_keys=False, ensure_ascii=False))
# ----------------------------------------------------------------------------
# ---Private------------------------------------------------------------------
# ----------------------------------------------------------------------------
def _check_if_images_on_same_line(self, image1, image2):
image1_top = image1["location"]["TL"]["y"]
image1_bot = image1["location"]["BL"]["y"]
image2_top = image2["location"]["TL"]["y"]
image2_bot = image2["location"]["BL"]["y"]
same_top_position = self._pixel_heights_match_within_threshold(image1_top, image2_top)
same_bot_position = self._pixel_heights_match_within_threshold(image1_bot, image2_bot)
if same_bot_position & same_top_position:
self._add_string_number_to_images(image1, image2)
def _add_string_number_to_images(self, image1, image2):
string_number = self._determine_string_number(image1, image2)
image1["string_number"] = string_number
image2["string_number"] = string_number
def _determine_string_number(self, image1, image2):
string_number = self.next_new_string_number
image1_number = image1.get("string_number")
image2_number = image2.get("string_number")
if image1_number is not None:
string_number = image1_number
elif image2_number is not None:
string_number = image2_number
else:
self.next_new_string_number += 1
return string_number
def _pixel_heights_match_within_threshold(self, height1, height2, threshold=2):
return abs(height1 - height2) <= threshold
#-------------------------------------------------------------------
#---Main------------------------------------------------------------
#-------------------------------------------------------------------
if __name__ == "__main__":
image_parser = ImageParser(test_list)
image_parser.mark_images_on_same_line()
image_parser.print_images()
Gives the following results:
[
{
"value": "68004,",
"location": {
"TL": {
"x": 351,
"y": 0
},
"TR": {
"x": 402,
"y": 0
},
"BR": {
"x": 402,
"y": 12
},
"BL": {
"x": 351,
"y": 12
}
},
"type": 1,
"string_number": 0
},
{
"value": "Чорномор",
"location": {
"TL": {
"x": 415,
"y": 0
},
"TR": {
"x": 493,
"y": 0
},
"BR": {
"x": 493,
"y": 12
},
"BL": {
"x": 415,
"y": 12
}
},
"type": 1,
"string_number": 0
},
{
"value": "вулиця,",
"location": {
"TL": {
"x": 495,
"y": 14
},
"TR": {
"x": 550,
"y": 10
},
"BR": {
"x": 551,
"y": 22
},
"BL": {
"x": 496,
"y": 26
}
},
"type": 1
},
{
"value": "140,",
"location": {
"TL": {
"x": 557,
"y": 8
},
"TR": {
"x": 576,
"y": 7
},
"BR": {
"x": 577,
"y": 20
},
"BL": {
"x": 558,
"y": 21
}
},
"type": 1,
"string_number": 1
},
{
"value": "кв.",
"location": {
"TL": {
"x": 581,
"y": 6
},
"TR": {
"x": 605,
"y": 4
},
"BR": {
"x": 606,
"y": 21
},
"BL": {
"x": 582,
"y": 23
}
},
"type": 1,
"string_number": 1
},
{
"value": "77",
"location": {
"TL": {
"x": 607,
"y": 5
},
"TR": {
"x": 628,
"y": 4
},
"BR": {
"x": 629,
"y": 19
},
"BL": {
"x": 608,
"y": 21
}
},
"type": 1,
"string_number": 1
}
]

Adding a key/value pair once I have recursively searched a dict

I have searched a nested dict for certain keys, I have succeeded in being able to locate the keys I am looking for, but I am not sure how I can now add a key/value pair to the location the key I am looking for is. Is there a way to tell python to append the data entry to the location it is currently looking at?
Code:
import os
import json
import shutil
import re
import fileinput
from collections import OrderedDict
#Finds and lists the folders that have been provided
d='.'
folders = list(filter (lambda x: os.path.isdir(os.path.join(d, x)), os.listdir(d)))
print("Folders found: ")
print(folders)
print("\n")
def processModelFolder(inFolder):
#Creating the file names
fileName = os.path.join(d, inFolder, inFolder + ".mdl")
fileNameTwo = os.path.join(d, inFolder, inFolder + ".vg2.json")
fileNameThree = os.path.join(d, inFolder, inFolder + "APPENDED.vg2.json")
#copying the json file so the new copy can be appended
shutil.copyfile(fileNameTwo, fileNameThree)
#assigning IDs and properties to search for in the mdl file
IDs = ["7f034e5c-24df-4145-bab8-601f49b43b50"]
Properties = ["IDSU_FX[0]"]
#Basic check to see if IDs and Properties are valid
for i in IDs:
if len(i) != 36:
print("ID may not have been valid and might not return the results you expect, check to ensure the characters are correct: ")
print(i)
print("\n")
if len(IDs) == 0:
print("No IDs were given!")
elif len(Properties) == 0:
print("No Properties were given!")
#Reads code untill an ID is found
else:
with open(fileName , "r") as in_file:
IDCO = None
for n, line in enumerate(in_file, 1):
if line.startswith('IDCO_IDENTIFICATION'):
#Checks if the second part of each line is a ID tag in IDs
if line.split('"')[1] in IDs:
#If ID found it is stored as IDCO
IDCO = line.split('"')[1]
else:
if IDCO:
pass
IDCO = None
#Checks if the first part of each line is a Prop in Propterties
elif IDCO and line.split(' ')[0] in Properties:
print('Found! ID:{} Prop:{} Value: {}'.format(IDCO, line.split('=')[0][:-1], line.split('=')[1][:-1]))
print("\n")
#Stores the property name and value
name = str(line.split(' ')[0])
value = str(line.split(' ')[2])
#creates the entry to be appended to the dict
#json file editing
with open(fileNameThree , "r+") as json_data:
python_obj = json.load(json_data)
#calling recursive search
get_recursively(python_obj, IDCO, name, value)
with open(fileNameThree , "w") as json_data:
json.dump(python_obj, json_data, indent = 1)
print('Processed {} lines in file: {}'.format(n , fileName))
def get_recursively(search_dict, IDCO, name, value):
"""
Takes a dict with nested lists and dicts,
and searches all dicts for a key of the field
provided, when key "id" is found it checks to,
see if its value is the current IDCO tag, if so it appends the new data.
"""
fields_found = []
for key, value in search_dict.iteritems():
if key == "id":
if value == IDCO:
print("FOUND IDCO IN JSON: " + value +"\n")
elif isinstance(value, dict):
results = get_recursively(value, IDCO, name, value)
for result in results:
x = 1
elif isinstance(value, list):
for item in value:
if isinstance(item, dict):
more_results = get_recursively(item, IDCO, name, value)
for another_result in more_results:
x=1
return fields_found
for modelFolder in folders:
processModelFolder(modelFolder)
In short, once it finds a key/id value pair that I want, can I tell it to append name/value to that location directly and then continue?
nested dict:
{
"id": "79cb20b0-02be-42c7-9b45-96407c888dc2",
"tenantId": "00000000-0000-0000-0000-000000000000",
"name": "2-stufiges Stirnradgetriebe",
"description": null,
"visibility": "None",
"method": "IDM_CALCULATE_GEAR_COUPLED",
"created": "2018-10-16T10:25:20.874Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"lastModified": "2018-10-16T10:25:28.226Z",
"lastModifiedBy": "00000000-0000-0000-0000-000000000000",
"client": "STRING_BEARINX_ONLINE",
"project": {
"id": "10c37dcc-0e4e-4c4d-a6d6-12cf65cceaf9",
"name": "proj 2",
"isBookmarked": false
},
"rootObject": {
"id": "6ff0010c-00fe-485b-b695-4ddd6aca4dcd",
"type": "IDO_GEAR",
"children": [
{
"id": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab",
"type": "IDO_SYSTEM_LOADCASE",
"children": [],
"childList": "SYSTEMLOADCASE",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "1dd94d1a-e52d-40b3-a82b-6db02a8fbbab"
},
{
"name": "IDCO_DESIGNATION",
"value": "Lastfall 1"
},
{
"name": "IDSLC_TIME_PORTION",
"value": 100
},
{
"name": "IDSLC_DISTANCE_PORTION",
"value": 100
},
{
"name": "IDSLC_OPERATING_TIME_IN_HOURS",
"value": 1
},
{
"name": "IDSLC_OPERATING_TIME_IN_SECONDS",
"value": 3600
},
{
"name": "IDSLC_OPERATING_REVOLUTIONS",
"value": 1
},
{
"name": "IDSLC_OPERATING_DISTANCE",
"value": 1
},
{
"name": "IDSLC_ACCELERATION",
"value": 9.81
},
{
"name": "IDSLC_EPSILON_X",
"value": 0
},
{
"name": "IDSLC_EPSILON_Y",
"value": 0
},
{
"name": "IDSLC_EPSILON_Z",
"value": 0
},
{
"name": "IDSLC_CALCULATION_WITH_OWN_WEIGHT",
"value": "CO_CALCULATION_WITHOUT_OWN_WEIGHT"
},
{
"name": "IDSLC_CALCULATION_WITH_TEMPERATURE",
"value": "CO_CALCULATION_WITH_TEMPERATURE"
},
{
"name": "IDSLC_FLAG_FOR_LOADCASE_CALCULATION",
"value": "LB_CALCULATE_LOADCASE"
},
{
"name": "IDSLC_STATUS_OF_LOADCASE_CALCULATION",
"value": false
}
],
"position": 1,
"order": 1,
"support_vector": {
"x": 0,
"y": 0,
"z": 0
},
"u_axis_vector": {
"x": 1,
"y": 0,
"z": 0
},
"w_axis_vector": {
"x": 0,
"y": 0,
"z": 1
},
"role": "_none_"
},
{
"id": "ab7fbf37-17bb-4e60-a543-634571a0fd73",
"type": "IDO_SHAFT_SYSTEM",
"children": [
{
"id": "7f034e5c-24df-4145-bab8-601f49b43b50",
"type": "IDO_RADIAL_ROLLER_BEARING",
"children": [
{
"id": "0b3e695b-6028-43af-874d-4826ab60dd3f",
"type": "IDO_RADIAL_BEARING_INNER_RING",
"children": [
{
"id": "330aa09d-60fb-40d7-a190-64264b3d44b7",
"type": "IDO_LOADCONTAINER",
"children": [
{
"id": "03036040-fc1a-4e52-8a69-d658e18a8d4a",
"type": "IDO_DISPLACEMENT",
"children": [],
"childList": "DISPLACEMENT",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "03036040-fc1a-4e52-8a69-d658e18a8d4a"
},
{
"name": "IDCO_DESIGNATION",
"value": "Displacement 1"
}
],
"position": 1,
"order": 1,
"support_vector": {
"x": -201.3,
"y": 0,
"z": -229.8
},
"u_axis_vector": {
"x": 1,
"y": 0,
"z": 0
},
"w_axis_vector": {
"x": 0,
"y": 0,
"z": 1
},
"shaftSystemId": "ab7fbf37-17bb-4e60-a543-634571a0fd73",
"role": "_none_"
},
{
"id": "485f5bf4-fb97-415b-8b42-b46e9be080da",
"type": "IDO_CUMULATED_LOAD",
"children": [],
"childList": "CUMULATEDLOAD",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "485f5bf4-fb97-415b-8b42-b46e9be080da"
},
{
"name": "IDCO_DESIGNATION",
"value": "Cumulated load 1"
},
{
"name": "IDCO_X",
"value": 0
},
{
"name": "IDCO_Y",
"value": 0
},
{
"name": "IDCO_Z",
"value": 0
}
],
"position": 2,
"order": 1,
"support_vector": {
"x": -201.3,
"y": 0,
"z": -229.8
},
"u_axis_vector": {
"x": 1,
"y": 0,
"z": 0
},
"w_axis_vector": {
"x": 0,
"y": 0,
"z": 1
},
"shaftSystemId": "ab7fbf37-17bb-4e60-a543-634571a0fd73",
"role": "_none_"
}
],
"childList": "LOADCONTAINER",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "330aa09d-60fb-40d7-a190-64264b3d44b7"
},
{
"name": "IDCO_DESIGNATION",
"value": "Load container 1"
},
{
"name": "IDLC_LOAD_DISPLACEMENT_COMBINATION",
"value": "LOAD_MOMENT"
},
{
"name": "IDLC_TYPE_OF_MOVEMENT",
"value": "LB_ROTATING"
},
{
"name": "IDLC_NUMBER_OF_ARRAY_ELEMENTS",
"value": 20
}
],
"position": 1,
"order": 1,
"support_vector": {
"x": -201.3,
"y": 0,
"z": -229.8
},
"u_axis_vector": {
"x": 1,
"y": 0,
"z": 0
},
"w_axis_vector": {
"x": 0,
"y": 0,
"z": 1
},
"shaftSystemId": "ab7fbf37-17bb-4e60-a543-634571a0fd73",
"role": "_none_"
},
{
"id": "3258d217-e6e4-4a5c-8677-ae1fca26f21e",
"type": "IDO_RACEWAY",
"children": [],
"childList": "RACEWAY",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "3258d217-e6e4-4a5c-8677-ae1fca26f21e"
},
{
"name": "IDCO_DESIGNATION",
"value": "Raceway 1"
},
{
"name": "IDRCW_UPPER_DEVIATION_RACEWAY_DIAMETER",
"value": 0
},
{
"name": "IDRCW_LOWER_DEVIATION_RACEWAY_DIAMETER",
"value": 0
},
{
"name": "IDRCW_PROFILE_OFFSET",
"value": 0
},
{
"name": "IDRCW_PROFILE_ANGLE",
"value": 0
},
{
"name": "IDRCW_PROFILE_CURVATURE_RADIUS",
"value": 0
},
{
"name": "IDRCW_PROFILE_CENTER_POINT_OFFSET",
"value": 0
},
{
"name": "IDRCW_PROFILE_NUMBER_OF_WAVES",
"value": 0
},
{
"name": "IDRCW_PROFILE_AMPLITUDE",
"value": 0
},
{
"name": "IDRCW_PROFILE_POSITION_OF_FIRST_WAVE",
"value": 0
},

Bug
First of all, replace the value variable's name by something else, because you have a value variable as the method argument and another value variable with the same name when iterating over the dictionary:
for key, value in search_dict.iteritems(): # <-- REPLACE value TO SOMETHING ELSE LIKE val
Otherwise you will have bugs, because the value from the dictionary is the new value which you will insert. But if you iterate like for key, val in then you can actually use the outer value variable.
Adding The Value Pair
It seems id is a key inside your search_dict, but reading your JSON file your search_dict may have several nested lists like properties and/or children, so it depends on where you want to add the new pair.
If you want to add it to the same dictionary where your id is:
if key == "id":
if value == IDCO:
print("FOUND IDCO IN JSON: " + value +"\n")
search_dict[name] = value
Result:
{
"id": "3258d217-e6e4-4a5c-8677-ae1fca26f21e",
"type": "IDO_RACEWAY",
"children": [],
"childList": "RACEWAY",
"<new name>": "<new value>",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "3258d217-e6e4-4a5c-8677-ae1fca26f21e"
},
If you want to add it to the children or properties list inside the dictionary where id is:
if key == "id":
if value == IDCO:
print("FOUND IDCO IN JSON: " + value +"\n")
if search_dict.has_key("properties"): # you can swap "properties" to "children", depends on your use case
search_dict["properties"].append({"name": name, "value": value}) # a new dictionary with 'name' and 'value' keys
Result:
{
"id": "3258d217-e6e4-4a5c-8677-ae1fca26f21e",
"type": "IDO_RACEWAY",
"children": [],
"childList": "RACEWAY",
"properties": [
{
"name": "IDCO_IDENTIFICATION",
"value": "3258d217-e6e4-4a5c-8677-ae1fca26f21e"
},
{
"name": "<new name>",
"value": "<new value>"
},

Parsing rekognition get_face_search results

I am trying to parse out Face Matches from the results of the get_face_search() AWS Rekognition API. It outputs an array of Persons, within that array is another array of FaceMatches for a given person and timestamp. I want to take information from the FaceMatches array and be able to loop through the array of Face Matches.
I have done something similar before for single arrays and looped successfully, but I am missing something trivial here perhaps.
Here is output from API:
Response:
{
"JobStatus": "SUCCEEDED",
"NextToken": "U5EdbZ+86xseDBfDlQ2u8QhSVzbdodDOmX/gSbwIgeO90l2BKWvJEscjUDmA6GFDCSSfpKA4",
"VideoMetadata": {
"Codec": "h264",
"DurationMillis": 6761,
"Format": "QuickTime / MOV",
"FrameRate": 30.022184371948242,
"FrameHeight": 568,
"FrameWidth": 320
},
"Persons": [
{
"Timestamp": 0,
"Person": {
"Index": 0,
"BoundingBox": {
"Width": 0.987500011920929,
"Height": 0.7764084339141846,
"Left": 0.0031250000465661287,
"Top": 0.2042253464460373
},
"Face": {
"BoundingBox": {
"Width": 0.6778846383094788,
"Height": 0.3819068372249603,
"Left": 0.10096154361963272,
"Top": 0.2654387652873993
},
"Landmarks": [
{
"Type": "eyeLeft",
"X": 0.33232420682907104,
"Y": 0.4194057583808899
},
{
"Type": "eyeRight",
"X": 0.5422032475471497,
"Y": 0.41616082191467285
},
{
"Type": "nose",
"X": 0.45633792877197266,
"Y": 0.4843473732471466
},
{
"Type": "mouthLeft",
"X": 0.37002310156822205,
"Y": 0.567118763923645
},
{
"Type": "mouthRight",
"X": 0.5330674052238464,
"Y": 0.5631639361381531
}
],
"Pose": {
"Roll": -2.2475271224975586,
"Yaw": 4.371307373046875,
"Pitch": 6.83940315246582
},
"Quality": {
"Brightness": 40.40004348754883,
"Sharpness": 99.95819854736328
},
"Confidence": 99.87971496582031
}
},
"FaceMatches": [
{
"Similarity": 99.81229400634766,
"Face": {
"FaceId": "4699a1eb-9f6e-415d-8716-eef141d23433a",
"BoundingBox": {
"Width": 0.6262923432480737,
"Height": 0.46972032423490747,
"Left": 0.130435005324523403604,
"Top": 0.13354002343240603
},
"ImageId": "1ac790eb-615a-111f-44aa-4017c3c315ad",
"Confidence": 99.19400024414062
}
}
]
},
{
"Timestamp": 66,
"Person": {
"Index": 0,
"BoundingBox": {
"Width": 0.981249988079071,
"Height": 0.7764084339141846,
"Left": 0.0062500000931322575,
"Top": 0.2042253464460373
}
}
},
{
"Timestamp": 133,
"Person": {
"Index": 0,
"BoundingBox": {
"Width": 0.9781249761581421,
"Height": 0.783450722694397,
"Left": 0.0062500000931322575,
"Top": 0.19894365966320038
}
}
},
{
"Timestamp": 199,
"Person": {
"Index": 0,
"BoundingBox": {
"Width": 0.981249988079071,
"Height": 0.783450722694397,
"Left": 0.0031250000465661287,
"Top": 0.19894365966320038
},
"Face": {
"BoundingBox": {
"Width": 0.6706730723381042,
"Height": 0.3778440058231354,
"Left": 0.10817307233810425,
"Top": 0.26679307222366333
},
"Landmarks": [
{
"Type": "eyeLeft",
"X": 0.33244985342025757,
"Y": 0.41591548919677734
},
{
"Type": "eyeRight",
"X": 0.5446155667304993,
"Y": 0.41204410791397095
},
{
"Type": "nose",
"X": 0.4586191177368164,
"Y": 0.479543000459671
},
{
"Type": "mouthLeft",
"X": 0.37614554166793823,
"Y": 0.5639738440513611
},
{
"Type": "mouthRight",
"X": 0.5334802865982056,
"Y": 0.5592300891876221
}
],
"Pose": {
"Roll": -2.4899401664733887,
"Yaw": 3.7596628665924072,
"Pitch": 6.3544135093688965
},
"Quality": {
"Brightness": 40.46360778808594,
"Sharpness": 99.95819854736328
},
"Confidence": 99.89802551269531
}
},
"FaceMatches": [
{
"Similarity": 99.80543518066406,
"Face": {
"FaceId": "4699a1eb-9f6e-415d-8716-eef141d9223a",
"BoundingBox": {
"Width": 0.626294234234737,
"Height": 0.469234234890747,
"Left": 0.130435002334234604,
"Top": 0.13354023423449180603
},
"ImageId": "1ac790eb-615a-111f-44aa-4017c3c315ad",
"Confidence": 99.19400024414062
}
}
]
},
{
"Timestamp": 266,
"Person": {
"Index": 0,
"BoundingBox": {
"Width": 0.984375,
"Height": 0.7852112650871277,
"Left": 0,
"Top": 0.19718310236930847
}
}
}
],
I have isolated the timestamps (just testing my approach) using the following:
timestamps = [m['Timestamp'] for m in response['Persons']]
Output is this, as expected - [0, 66, 133, 199, 266]
However, when I try the same thing with FaceMatches, I get an error.
[0, 66, 133, 199, 266]
list indices must be integers or slices, not str: TypeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 40, in lambda_handler
matches = [m['FaceMatches']['Face']['FaceId'] for m in response['Persons']]
File "/var/task/lambda_function.py", line 40, in <listcomp>
matches = [m['FaceMatches']['Face']['FaceId'] for m in response['Persons']]
TypeError: list indices must be integers or slices, not str
What I need to end up with is for each face that is matched:
Timestamp
FaceID
Similarity
Can anybody shed some light on this for me?

According to your needs , you have two FaceMatch objects in your response and you can extract required info in this way :
import json
with open('newtest.json') as f:
data = json.load(f)
length =len(data['Persons'])
for i in range(0,length):
try:
print(data['Persons'][i]['FaceMatches'][0]['Similarity'])
print(data['Persons'][i]['FaceMatches'][0]['Face']['FaceId'])
print(data['Persons'][i]['Timestamp'])
except:
continue
I have taken your json object in data variable and i have ignored timestamps where there is no corresponding facematch, if you wish you can extract then the same way

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract specific data from JSON data from Google Cloud Vision - python

Related

Extract events data from GA4 via bigquery in ADF synapse delta table

does `transform_lookup` save space?

Find dictionaries with same values and add new keys to them

Adding a key/value pair once I have recursively searched a dict

Parsing rekognition get_face_search results

Categories

Resources