update array in mongodb with new item and value - python

I am trying to update the array: ['media_details'] with a local image path after the image has been downloaded. However using $push just added the local_url on top.
This is what ['media_details'] looks like:
"image_details": [
{
"processed": true,
"position": 0,
"seconds": "46",
"src_url": "https://xxxxx/1.jpg",
"image_fname": "1.jpg",
},
{
"processed": true,
"position": 1,
"seconds": "55",
"src_url": "https://xxxxx/2.jpg",
"image_fname": "2.jpg",
},
my code then downloads the image from the src_url and I want to add the local image url to the ['media_details'].
job = mongo.db.JobProcess
job.update({'_id': db_id},
{'$push': {
'image_details': {
'local_url': img_local_file,
}
}})
This adds the local_url to the top of the ['media_details'] - like so:
{'local_url': '/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/1.jpg'},
{'local_url': '/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/2.jpg'},
{'processed': True, 'position': 0, 'seconds': '46', 'src_url': 'https://xxxxx1.jpg', 'image_fname': '1.jpg'}
what I want it to do is:
"image_details": [
{
"processed": true,
"position": 0,
"seconds": "46",
"src_url": "https://xxxxx/1.jpg",
"image_fname": "1.jpg",
"local_url": "/bin/static/5432ec0f-ea53-4fe1-83e4-f78166d1b9a6/1.jpg"
},
but which command ($set, $push, $addToSet) is best suited for updating this? and how do I implement it?

You need to update the image_details array item using the positional operator $. You will need a query that can uniquely identify the array item, perhaps src_url:
job.update({$and:[
{"_id": db_id},
{"image_details.src_url": img_src_url }
]},
{$set :{"image_details.$.local_url": img_local_file },
{multi:false})

You need to use positional update operator
job.updateOne({
'_id': db_id,
'image_details.src_url': yourUrl,
}, {
$set: {
'image_details.$.local_url': img_local_file
});

Related

Mongodb 4.2 update with aggregation pipeline faster method to update only one element in array?

I want to optimize this code since it is not so fast by design.
I have position with array scores for judges objects each have judge_id and evaluation. I want to update exactly one evaluation for judge_id == 1 and keep the rest unchanged in the fastest way in MongoDB. Extra requirement is to use update with aggregation pipeline (update=[...] not update={...} MongoDB 4.2).
First I scan the whole array to search matching element and update it (exactly one) than I search the whole array to search not matching elements. Can I for example find a matching element index and change only this element?
Maybe it is not possible in MongoDB to optimize it?
I want to use only update with aggregate pipeline - not classic syntax to define more complex conditional updated than is possible by classic syntax?
.update(filter={...}, update=[...]) I mean as update with aggregate pipeline not .update(filter={...}, update={...]).
Can you help with this problem?
import pymongo
client = pymongo.MongoClient()
client.drop_database('delete_it')
db = client.delete_it
db.position.create_index('position', unique=True)
position: pymongo.collection.Collection = db.position
def show_position():
r = position.find_one(
filter={'position': 1},
projection={'_id': False}
)
print(r)
def create_position():
position.delete_one(
filter={'position': 1}
)
position.insert_one(
document={'position': 1,
'scores': [{'judge_id': 1, 'evaluation': 3},
{'judge_id': 2, 'evaluation': 4},
{'judge_id': 3, 'evaluation': 5}]}
)
print('Original data:')
show_position()
def show_updated_position():
print('Updated data:')
show_position()
create_position()
position.update_one(
filter={'position': 1,
'scores.judge_id': 1},
update=[
{
'$set': {
'scores': {
'$concatArrays': [
{
# change evaluation of one element
'$map': {
# array with one element only matching
'input': {
'$filter': {
'input': '$scores',
'cond': {'$eq': ['$$this.judge_id', 1]}
}
},
'in': {
'$mergeObjects' : [
'$$this', {'evaluation': 10}
]
}
}
},
# array of rest elements not matching
{
'$filter': {
'input': '$scores',
'cond': {'$ne': ['$$this.judge_id', 1]}
}
}
]
}
}
}
],
)
show_updated_position()
I meanwhile I found that slower code is faster by timeit() see here - why map is faster then filter I do not know but it is faster:
I have such results now:
update_pipeline_concat_arrays 1.455e+00 1.000 3.843
update_find_and_replace 7.461e-01 0.513 1.971
update_pipeline_merge_objects 4.845e-01 0.333 1.280
update_find_one_and_update 3.798e-01 0.261 1.003
update_one 3.786e-01 0.260 1.000
Here is new the best code (you can replace previous function in full code) - strange I think that map is slower - still slower than update() or find() and replace():
position.update_one(
filter={'position': 1,
'scores.judge_id': 1},
update=[
{
'$set': {
'scores': {
'$map': {
'input': '$scores',
'in': {
'$mergeObjects': [
'$$this', {
'evaluation': {
'$cond': {
'if': {'$eq': ['$$this.judge_id', 1]},
'then': NEW_EVALUATION,
'else': '$$this.evaluation'
}
}
}
]
}
}
}
}
}
],
)

Pull key from json file when values is known (groovy or python)

Is there any way to pull the key from JSON if the only thing I know is the value? (In groovy or python)
An example:
I know the "_number" value and I need a key.
So let's say, known _number is 2 and as an output, I should get dsf34f43f34f34f
{
"id": "8e37ecadf4908f79d58080e6ddbc",
"project": "some_project",
"branch": "master",
"current_revision": "3rtgfgdfg2fdsf",
"revisions": {
"43g5g534534rf34f43f": {
"_number": 3,
"created": "2019-04-16 09:03:07.459000000",
"uploader": {
"_account_id": 4
},
"description": "Rebase"
},
"dsf34f43f34f34f": {
"_number": 2,
"created": "2019-04-02 10:54:14.682000000",
"uploader": {
"_account_id": 2
},
"description": "Rebase"
}
}
}
With Groovy:
def json = new groovy.json.JsonSlurper().parse("x.json" as File)
println(json.revisions.findResult{ it.value._number==2 ? it.key : null })
// => dsf34f43f34f34f
Python 3: (assuming that data is saved in data.json):
import json
with open('data.json') as f:
json_data = json.load(f)
for rev, revdata in json_data['revisions'].items():
if revdata['_number'] == 2:
print(rev)
Prints all revs where _number equals 2.
using dict-comprehension:
print({k for k,v in d['revisions'].items() if v.get('_number') == 2})
OUTPUT:
{'dsf34f43f34f34f'}

Using .values() with list of dictionaries?

I'm comparing json files between two different API endpoints to see which json records need an update, which need a create and what needs a delete. So, by comparing the two json files, I want to end up with three json files, one for each operation.
The json at both endpoints is structured like this (but they use different keys for same sets of values; different problem):
{
"records": [{
"id": "id-value-here",
"c": {
"d": "eee"
},
"f": {
"l": "last",
"f": "first"
},
"g": ["100", "89", "9831", "09112", "800"]
}, {
…
}]
}
So the json is represented as a list of dictionaries (with further nested lists and dictionaries).
If a given json endpoint (j1) id value ("id":) exists in the other endpoint json (j2), then that record should be added to j_update.
So far I have something like this, but I can see that .values() doesn't work because it's trying to operate on the list instead of on all the listed dictionaries(?):
j_update = {r for r in j1['records'] if r['id'] in
j2.values()}
This doesn't return an error, but it creates an empty set using test json files.
Seems like this should be simple, but tripping over the nesting I think of dictionaries in a list representing the json. Do I need to flatten j2, or is there a simpler dictionary method python has to achieve this?
====edit j1 and j2====
have same structure, use different keys; toy data
j1
{
"records": [{
"field_5": 2329309841,
"field_12": {
"email": "cmix#etest.com"
},
"field_20": {
"last": "Mixalona",
"first": "Clara"
},
"field_28": ["9002329309999", "9002329309112"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309831,
"field_12": {
"email": "mherbitz345#test.com"
},
"field_20": {
"last": "Herbitz",
"first": "Michael"
},
"field_28": ["9002329309831", "9002329309112", "8002329309999"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309855,
"field_12": {
"email": "nkatamaran#test.com"
},
"field_20": {
"first": "Noriss",
"last": "Katamaran"
},
"field_28": ["9002329309111", "8002329309112"],
"field_44": ["1002329309877"]
}]
}
j2
{
"records": [{
"id": 2329309831,
"email": {
"email": "mherbitz345#test.com"
},
"name_primary": {
"last": "Herbitz",
"first": "Michael"
},
"assign": ["8003329309831", "8007329309789"],
"hr_id": ["1002329309877"]
}, {
"id": 2329309884,
"email": {
"email": "yinleeshu#test.com"
},
"name_primary": {
"last": "Lee Shu",
"first": "Yin"
},
"assign": ["8002329309111", "9003329309831", "9002329309111", "8002329309999", "8002329309112"],
"hr_id": ["1002329309832"]
}, {
"id": 23293098338,
"email": {
"email": "amlouis#test.com"
},
"name_primary": {
"last": "Maxwell Louis",
"first": "Albert"
},
"assign": ["8002329309111", "8007329309789", "9003329309831", "8002329309999", "8002329309112"],
"hr_id": ["1002329309877"]
}]
}
If you read the json it will output a dict. You are looking for a particular key in the list of the values.
if 'records' in j2:
r = j2['records'][0].get('id', []) # defaults if id does not exist
It it prettier to do a recursive search but i dunno how you data is organized to quickly come up with a solution.
To give an idea for recursive search consider this example
def resursiveSearch(dictionary, target):
if target in dictionary:
return dictionary[target]
for key, value in dictionary.items():
if isinstance(value, dict):
target = resursiveSearch(value, target)
if target:
return target
a = {'test' : 'b', 'test1' : dict(x = dict(z = 3), y = 2)}
print(resursiveSearch(a, 'z'))
You tried:
j_update = {r for r in j1['records'] if r['id'] in j2.values()}
Aside from the r['id'/'field_5] problem, you have:
>>> list(j2.values())
[[{'id': 2329309831, ...}, ...]]
The id are buried inside a list and a dict, thus the test r['id'] in j2.values() always return False.
The basic solution is fairly simple.
First, create a set of j2 ids:
>>> present_in_j2 = set(record["id"] for record in j2["records"])
Then, rebuild the json structure of j1 but without the j1 field_5 that are not present in j2:
>>> {"records":[record for record in j1["records"] if record["field_5"] in present_in_j2]}
{'records': [{'field_5': 2329309831, 'field_12': {'email': 'mherbitz345#test.com'}, 'field_20': {'last': 'Herbitz', 'first': 'Michael'}, 'field_28': ['9002329309831', '9002329309112', '8002329309999'], 'field_44': ['1002329309832']}]}
It works, but it's not totally satisfying because of the weird keys of j1. Let's try to convert j1 to a more friendly format:
def map_keys(json_value, conversion_table):
"""Map the keys of a json value
This is a recursive DFS"""
def map_keys_aux(json_value):
"""Capture the conversion table"""
if isinstance(json_value, list):
return [map_keys_aux(v) for v in json_value]
elif isinstance(json_value, dict):
return {conversion_table.get(k, k):map_keys_aux(v) for k,v in json_value.items()}
else:
return json_value
return map_keys_aux(json_value)
The function focuses on dictionary keys: conversion_table.get(k, k) is conversion_table[k] if the key is present in the conversion table, or the key itself otherwise.
>>> j1toj2 = {"field_5":"id", "field_12":"email", "field_20":"name_primary", "field_28":"assign", "field_44":"hr_id"}
>>> mapped_j1 = map_keys(j1, j1toj2)
Now, the code is cleaner and the output may be more useful for a PUT:
>>> d1 = {record["id"]:record for record in mapped_j1["records"]}
>>> present_in_j2 = set(record["id"] for record in j2["records"])
>>> {"records":[record for record in mapped_j1["records"] if record["id"] in present_in_j2]}
{'records': [{'id': 2329309831, 'email': {'email': 'mherbitz345#test.com'}, 'name_primary': {'last': 'Herbitz', 'first': 'Michael'}, 'assign': ['9002329309831', '9002329309112', '8002329309999'], 'hr_id': ['1002329309832']}]}

Parsing json with python3

Newbie python programmer here, I have the following json response:
[
{
"type": "Incursion",
"state": "mobilizing",
"influence": 1,
"has_boss": true,
"faction_id": 500019,
"constellation_id": 20000739,
"staging_solar_system_id": 30005054,
"infested_solar_systems": [
30005050,
30005051,
30005052,
30005053,
30005054,
30005055
]
},
{
"type": "Incursion",
"state": "established",
"influence": 0,
"has_boss": false,
"faction_id": 500019,
"constellation_id": 20000035,
"staging_solar_system_id": 30000248,
"infested_solar_systems": [
30000244,
30000245,
30000246,
30000247,
30000248,
30000249,
30000250,
30000251,
30000252,
30000253
]
},
{
"type": "Incursion",
"state": "mobilizing",
"influence": 0,
"has_boss": false,
"faction_id": 500019,
"constellation_id": 20000161,
"staging_solar_system_id": 30001101,
"infested_solar_systems": [
30001097,
30001098,
30001099,
30001100,
30001101,
30001102
]
},
{
"type": "Incursion",
"state": "established",
"influence": 0,
"has_boss": false,
"faction_id": 500019,
"constellation_id": 20000647,
"staging_solar_system_id": 30004434,
"infested_solar_systems": [
30004425,
30004426,
30004427,
30004428,
30004429,
30004430,
30004431,
30004432,
30004433,
30004434,
30004435
]
},
{
"type": "Incursion",
"state": "established",
"influence": 0.061500001698732376,
"has_boss": false,
"faction_id": 500019,
"constellation_id": 20000570,
"staging_solar_system_id": 30003910,
"infested_solar_systems": [
30003904,
30003906,
30003908,
30003909,
30003910,
30003903
]
}
]
The original code was written to parse an XML reponse.
This is the code in question:
incursion_constellations = []
if (online):
inc = urllib2.urlopen('https://esi.tech.ccp.is/latest/incursions/')
else:
inc = file(r'incursions.json', 'r')
jinc = json.load(inc)
for j in jinc['items']:
incursion_constellations.append(str(j['constellation']['id_str']))
for s in all_stations:
cur.execute("SELECT constellationID FROM mapSolarSystems WHERE solarSystemID = " + str(s['ssid']))
res = cur.fetchone()
cid = str(res[0])
s['incursion'] = cid in incursion_constellations
The area I have having a hard time understanding is this: for j in jinc['items']:
I am getting this error:
Traceback (most recent call last):
File "./stations.py", line 201, in <module>
for j in jinc['items']:
TypeError: list indices must be integers, not str
Can anyone help me understand how to convert this into being able to parse the json response and retrieve the constellation_id and append it to a list?
Thanks in advance.
Change your original loop to:
for j in jinc:
incursion_constellations.append(str(j['constellation_id']))
But you need to be sure that constellation_id in json is the same id that was under ['constellation']['id_str'] previously
Seeing the [ and ] at the beginning and the end of the response, it seems like this json response is list, not a dict, just as your error is suggesting.
If it is a list, you should be using integer as index, instead of str, like you'd do in dict. Hence, your code should be something like
jinc[0]['constellation_id']
(I don't see where the ['constellation']['id_str'] part comes from)
whatever goes inside the [ and ] is in a list, and should be using an integer index. the ones in { and } are in dict, and should use str index.
to loop through it, just use range and len.
a similar question has been answered here.

Appending a list inside of a JSON - Python 2.7

My JSON dict looks like this:
{
"end": 1,
"results": [
{
"expired": false,
"tag": "search"
},
{
"span": "text goes here"
}
],
"totalResults": 1
}
which is the product of this line:
tmp_response['results'].append({'span':"text goes here"})
My goal is to get the "span" key into the "results" list. This is necessary for when totalResults > 1.
{
"end": 1,
"results": [
{
"expired": false,
"tag": "search",
"span": "text goes here"
},
],
"totalResults": 1
}
I've tried several methods, for example with use 'dictname.update', but this overwrites the existing data in 'results'.
tmp_response['results'][0]['span'] = "text goes here"
or, if you really wanted to use update:
tmp_response['results'][0].update({'span':"text goes here"})
but note that is an unnecessary creation of a dict.
Here is one more solution if you want you can use below code.
>>> tmp_response = {"end": 1,"results": [{"expired": False,"tag": "search"},{"span": "text goes here"}],"totalResults": 1}
>>> tmp_response['results'][0] = dict(tmp_response['results'][0].items() + {'New_entry': "Ney Value"}.items())
>>> tmp_response
{'totalResults': 1, 'end': 1, 'results': [{'tag': 'search', 'expired': False, 'New_entry': 'Ney Value'}, {'span': 'text goes here'}]}
>>>

Categories

Resources