Adding a list into a list in Python - python

I'm receiving JSON back from an API call want to log when a keyword has been detected, sometimes there may be one, none or several returned from the API. I'm able to log the data that comes back no problem.
I want to run 1000s of requests and then have each result logged as a list of results within a list, (So I know which list corresponds to which API call).
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.extend(TempEntityList)
TempEntityList = []
Which does append everything to a list but I can't seem to find the right setup.
I get the below when I run it twice I get.
['Chat', 'Case', 'Telephone','Chat', 'Case', 'Telephone']
When really I want
[['Chat', 'Case', 'Telephone'],['Chat', 'Case', 'Telephone']]
I'm creating TempEntityList appending any matches found to it, extending EntityList by what has been found and then clearing TempEntityList down for the next API call.
What's the best way I could log each set of results to a nested list as so far I've only managed to get one large list or every item is it's own nested item.
As requested the payload that is returned looks like the below
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "x"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},
"context": {
"global": {
"system": {
"turn_count": 1
},
"session_id": "xxx-xxx-xxx"
},
"skills": {
"main skill": {
"user_defined": {
"Case": "Case",
"Chat": "Chat",
"Telephone": "Telephony"
},
"system": {
"state": "xxx-xxx-xxx"
}
}
}
}
}
{
"output": {
"intents": [],
"entities": [
{
"entity": "Chat",
"location": [
0,
4
],
"value": "Chat",
"confidence": 1
},
{
"entity": "Case",
"location": [
5,
9
],
"value": "Case",
"confidence": 1
},
{
"entity": "Telephone",
"location": [
10,
19
],
"value": "Telephony",
"confidence": 1
}
],
"generic": []
},

Firstly, since you have TempEntityList = [] in the beginning of the for loop, you don't need to add another TempEntityList = [] in the bottom. To answer the question, use list.append() instead of list.extend():
for item in response['output']['keywords']:
TempEntityList = []
TempEntityList.append(item['keywords'])
EntityList.append(TempEntityList)

I've managed to get what I want, thanks everyone for the suggestions.
The solution was:
global EntityList
EntityList = []
for item in response['output']['entities']:
EntityList.append(item['entity'])
FinalList.append(EntityList)
Which after running the function for three times on the same input produced:
[['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone'], ['Chat', 'Case', 'Telephone']]

Related

Modify the value of a field of a specific nested object (its index) depending on a condition

I would like to modify the value of a field on a specific index of a nested type depending on another value of the same nested object or a field outside of the nested object.
As example, I have the current mapping of my index feed:
{
"feed": {
"mappings": {
"properties": {
"attacks_ids": {
"type": "keyword"
},
"created_by": {
"type": "keyword"
},
"date": {
"type": "date"
},
"groups_related": {
"type": "keyword"
},
"indicators": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text"
},
"role": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
"malware_families": {
"type": "keyword"
},
"published": {
"type": "boolean"
},
"references": {
"type": "keyword"
},
"tags": {
"type": "keyword"
},
"targeted_countries": {
"type": "keyword"
},
"title": {
"type": "text"
},
"tlp": {
"type": "keyword"
}
}
}
}
}
Take the following document as example:
{
"took": 194,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "feed",
"_type": "_doc",
"_id": "W3CS7IABovFpcGfZjfyu",
"_score": 1,
"_source": {
"title": "Test",
"date": "2022-05-22T16:21:09.159711",
"created_by": "finch",
"tlp": "white",
"published": true,
"references": [
"test",
"test"
],
"tags": [
"tag1",
"tag2"
],
"targeted_countries": [
"Italy",
"Germany"
],
"malware_families": [
"family1",
"family2"
],
"groups_related": [
"group1",
"griup2"
],
"attacks_ids": [
""
],
"indicators": [
{
"value": "testest",
"description": "This is a test",
"type": "sha256",
"role": "file",
"date": "2022-05-22T16:21:09.159560"
},
{
"value": "testest2",
"description": "This is a test 2",
"type": "ipv4",
"role": "c2",
"date": "2022-05-22T16:21:09.159699"
}
]
}
}
]
}
}
I would like to make this update: indicators[0].value = 'changed'
if _id == 'W3CS7IABovFpcGfZjfyu'
or if title == 'some_title'
or if indicators[0].role == 'c2'
I already tried with a script, but it seems I can't manage to get it work, I hope the explanation is clear, ask any question if not, thank you.
Edit 1:
I managed to make it work, however it needs the _id, still looking for a way to do that without it.
My partial solution:
update = Pulse.get(id="XHCz7IABovFpcGfZWfz9") #Pulse is my document
update.update(script="for (indicator in ctx._source.indicators) {if (indicator.value=='changed2') {indicator.value='changed3'}}")
# Modify depending on the value of a field inside the same nested object

Fast way of adding fields to a nested dict

I need a help with improving my code.
I've got a nested dict with many levels:
{
"11": {
"FacLC": {
"immty": [
"in_mm",
"in_mm"
],
"moood": [
"in_oo",
"in_oo"
]
}
},
"22": {
"FacLC": {
"immty": [
"in_mm",
"in_mm",
"in_mm"
]
}
}
}
And I want to add additional fields on every level, so my output looks like this:
[
{
"id": "",
"name": "11",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
},
{
"id": "",
"name": "moood",
"characteristics": [
{
"id": "",
"name": "in_oo"
},
{
"id": "",
"name": "in_oo"
}
]
}
]
}
]
},
{
"id": "",
"name": "22",
"general": [
{
"id": "",
"name": "FacLC",
"specifics": [
{
"id": "",
"name": "immty",
"characteristics": [
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
},
{
"id": "",
"name": "in_mm"
}
]
}
]
}
]
}
]
I managed to write a 4-times nested for loop, what I find inefficient and inelegant:
for main_name, general in my_dict.items():
generals = []
for general_name, specific in general.items():
specifics = []
for specific_name, characteristics in specific.items():
characteristics_dicts = []
for characteristic in characteristics:
characteristics_dicts.append({
"id": "",
"name": characteristic,
})
specifics.append({
"id": "",
"name": specific_name,
"characteristics": characteristics_dicts,
})
generals.append({
"id": "",
"name": general_name,
"specifics": specifics,
})
my_new_dict.append({
"id": "",
"name": main_name,
"general": generals,
})
I am wondering if there is more compact and efficient solution.
In the past I created a function to do it. Basically you call this function everytime that you need to add new fields to a nested dict, independently on how many levels this nested dict have. You only have to inform the 'full path' , that I called the 'key_map'.
Like ['node1','node1a','node1apart3']
def insert_value_using_map(_nodes_list_to_be_appended, _keys_map, _value_to_be_inserted):
for _key in _keys_map[:-1]:
_nodes_list_to_be_appended = _nodes_list_to_be_appended.setdefault(_key, {})
_nodes_list_to_be_appended[_keys_map[-1]] = _value_to_be_inserted

How to flatten JSON response from Surveymonkey API

I'm setting up a Python function to use the Surveymonkey API to get survey responses from Surveymonkey.
The API returns responses in a JSON format with a deep recursive file structure.
I'm having issues trying to flatten this JSON so that it can go into Google Cloud Storage.
I have tried to flatten the response using the following code. Which works; however, it does not transform it to the format that I am looking for.
{
"per_page": 2,
"total": 1,
"data": [
{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "73527947",
"questions": [
{
"id": "273237811",
"answers": [
{
"choice_id": "1842351148"
},
{
"text": "I might be text or null",
"other_id": "1842351149"
}
]
},
{
"id": "273240822",
"answers": [
{
"choice_id": "1863145815",
"row_id": "1863145806"
},
{
"text": "I might be text or null",
"other_id": "1863145817"
}
]
},
{
"id": "273239576",
"answers": [
{
"choice_id": "1863156702",
"row_id": "1863156701"
},
{
"text": "I might be text or null",
"other_id": "1863156707"
}
]
},
{
"id": "296944423",
"answers": [
{
"text": "I might be text or null"
}
]
}
]
}
],
"date_modified": "1970-01-17T19:07:34+00:00",
"response_status": "completed",
"id": "5007154325",
"collector_id": "50253586",
"recipient_id": "0",
"date_created": "1970-01-17T19:07:34+00:00",
"survey_id": "105723396"
}
],
"page": 1,
"links": {
"self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
}
}
answers_df = json_normalize(data=response_json['data'],
record_path=['pages', 'questions', 'answers'],
meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])
Instead of returning a row for each question id, I need it to return a column for each question id, choice_id, and text field.
The columns I would like to see are total_time, collection_mode, href, custom_variables.custvar_1, custom_variables.custvar_2, custom_value, edit_url, analyze_url, ip_address, pages.id, pages.questions.0.id, pages.questions.0.answers.0.choice_id, pages.questions.0.answers.0.text, pages.questions.0.answers.0.other_id
Instead of the each Question ID, Choice_id, text and answer being on a separate row. I would like a column for each one. So that there is only 1 row per survey_id or index in data

How to parse JSON in Python and Bash?

I need to parse via Bash and Python the JSON below. I am getting different errors.
From JSON I want to get name and ObjectID information and put it on array. But don't know how to do this.
Example of JSON :
{
"aliases": [],
"localizations": {},
"name": "Super DX-Ball",
"popularity": 0,
"objectID": "7781",
"_highlightResult": {
"name": {
"value": "Super DX-<em>Ba</em>ll",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"popularity": 0,
"objectID": "77522",
"_highlightResult": {
"name": {
"value": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-<em>Ba</em>kuhatsu!!",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Bagitman",
"popularity": 0,
"objectID": "7663",
"_highlightResult": {
"name": {
"value": "<em>Ba</em>gitman",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
},
{
"aliases": [],
"localizations": {},
"name": "Virtual Bart",
"popularity": 0,
"objectID": "7616",
"_highlightResult": {
"name": {
"value": "Virtual <em>Ba</em>rt",
"matchLevel": "full",
"fullyHighlighted": false,
"matchedWords": [
"ba"
]
}
}
}
I'm getting error due that few independends jsons. Here is an example :
cat /tmp/out | jq ".name"
"Fortnite"
parse error: Expected value before ',' at line 35, column 4
The input JSON looks like an array but lacks brackets. Try to add them:
$ (echo '['; cat /tmp/out; echo ']') | jq 'map({ name, objectID })'
[
{
"name": "Super DX-Ball",
"objectID": "7781"
},
{
"name": "Katekyo Hitman Reborn! DS Flame Rumble X - Mirai Chou-Bakuhatsu!!",
"objectID": "77522"
},
{
"name": "Bagitman",
"objectID": "7663"
},
{
"name": "Virtual Bart",
"objectID": "7616"
}
]

Convert deeply nested json from facebook to dataframe in python

I am trying to get user details of persons who has put likes, comments on Facebook posts. I am using python facebook-sdk package. Code is as follows.
import facebook as fi
import json
graph = fi.GraphAPI('Access Token')
data = json.dumps(graph.get_object('DSIfootcandy/posts'))
From the above, I am getting a highly nested json. Here I will put only a json string for one post in the fb.
{
"paging": {
"next": "https://graph.facebook.com/v2.0/425073257683630/posts?access_token=&limit=25&until=1449201121&__paging_token=enc_AdD0DL6sN3aDZCwfYY25rJLW9IZBZCLM1QfX0venal6rpjUNvAWZBOoxTjbOYZAaFiBImzMqiv149HPH5FBJFo0nSVOPqUy78S0YvwZDZD",
"previous": "https://graph.facebook.com/v2.0/425073257683630/posts?since=1450843741&access_token=&limit=25&__paging_token=enc_AdCYobFJpcNavx6STzfPFyFe6eQQxRhkObwl2EdulwL7mjbnIETve7sJZCPMwVm7lu7yZA5FoY5Q4sprlQezF4AlGfZCWALClAZDZD&__previous=1"
},
"data": [
{
"picture": "https://fbcdn-photos-e-a.akamaihd.net/hphotos-ak-xfa1/v/t1.0-0/p130x130/1285_5066979392443_n.png?oh=b37a42ee58654f08af5abbd4f52b1ace&oe=570898E7&__gda__=1461440649_aa94b9ec60f22004675c4a527e8893f",
"is_hidden": false,
"likes": {
"paging": {
"cursors": {
"after": "MTU3NzQxODMzNTg0NDcwNQ==",
"before": "MTU5Mzc1MjA3NDE4ODgwMA=="
}
},
"data": [
{
"id": "1593752074188800",
"name": "Maduri Priyadarshani"
},
{
"id": "427605680763414",
"name": "Darshi Mashika"
},
{
"id": "599793563453832",
"name": "Shakeer Nimeshani Shashikala"
},
{
"id": "1577418335844705",
"name": "Däzlling Jalali Muishu"
}
]
},
"from": {
"category": "Retail and Consumer Merchandise",
"name": "Footcandy",
"category_list": [
{
"id": "2239",
"name": "Retail and Consumer Merchandise"
}
],
"id": "425073257683630"
},
"name": "Timeline Photos",
"privacy": {
"allow": "",
"deny": "",
"friends": "",
"description": "",
"value": ""
},
"is_expired": false,
"comments": {
"paging": {
"cursors": {
"after": "WTI5dGJXVnVkRjlqZFhKemIzSUVXdNVFExTURRd09qRTBOVEE0TkRRNE5EVT0=",
"before": "WTI5dGJXVnVkRjlqZFhKemIzNE16Y3dNVFExTVRFNE9qRTBOVEE0TkRRME5UVT0="
}
},
"data": [
{
"from": {
"name": "NiFû Shafrà",
"id": "1025030640553"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:20:55+0000",
"message": "wow lovely one",
"id": "50018692683829_500458145118",
"user_likes": false
},
{
"from": {
"name": "Shamnaz Lukmanjee",
"id": "160625809961884"
},
"like_count": 0,
"can_remove": false,
"created_time": "2015-12-23T04:27:25+0000",
"message": "Nice",
"id": "500186926838929_500450145040",
"user_likes": false
}
]
},
"actions": [
{
"link": "https://www.facebook.com/425073257683630/posts/5001866838929",
"name": "Comment"
},
{
"link": "https://www.facebook.com/42507683630/posts/500186926838929",
"name": "Like"
}
],
"updated_time": "2015-12-23T04:27:25+0000",
"link": "https://www.facebook.com/DSIFootcandy/photos/a.438926536298302.1073741827.4250732576630/50086926838929/?type=3",
"object_id": "50018692838929",
"shares": {
"count": 3
},
"created_time": "2015-12-23T04:09:01+0000",
"message": "Reach new heights in the cute and extremely comfortable \"Silviar\" www.focandy.lk",
"type": "photo",
"id": "425077683630_50018926838929",
"status_type": "added_photos",
"icon": "https://www.facebook.com/images/icons/photo1.gif"
}
]
}
Now I need to get this data into a dataframe as follows(no need to get all).
item | Like_id |Like_username | comments_userid |comments_username|comment(msg)|
-----+---------+--------------+-----------------+-----------------+------------+
Bag | 45546 | noel | 641 | James | nice work |
-----+---------+--------------+-----------------+-----------------+------------+
Any Help will be Highly Appreciated.
Not exactly like your intended format, but here is the making of a solution :
import pandas
DictionaryObject_as_List = str(mydict).replace("{","").replace("}","").replace("[","").replace("]","").split(",")
newlist = []
for row in DictionaryObject_as_List :
row = row.replace('https://',' ').split(":")
exec('newlist.append ( ' + "[" + " , ".join(row)+"]" + ')')
DataFrame_Object = pandas.DataFrame(newlist)
print DataFrame_Object

Categories

Resources