Searching ID or property for match in Mongo - python

Goal:
I want to allow the user to search for a document by ID, or allow other text-based queries.
Code:
l_search_results = list(
cll_sips.find(
{
'$or': [
{'_id': ObjectId(s_term)},
{'s_text': re.compile(s_term, re.IGNORECASE)},
{'choices': re.compile(s_term, re.IGNORECASE)}
]
}
).limit(20)
)
Error:
<Whatever you searched for> is not a valid ObjectId

s_term needs to be a valid object ID (or at least in the right format) when you pass it to the ObjectId constructor. Since it's sometimes not an ID, that explains why you get the exception.
Try something like this instead:
from pymongo.errors import InvalidId
or_filter = [
{'s_text': re.compile(s_term, re.IGNORECASE)},
{'choices': re.compile(s_term, re.IGNORECASE)}
]
try:
id = ObjectId(s_term)
or_filter.append({ '_id': id })
except InvalidId:
pass
l_search_results = list(
cll_sips.find({ '$or': or_filter }).limit(20)
)

Related

Extracting a value in python with specific JSON array

New to python how would I get the value out of the key value pair appid in the below JSON?
{
"Datadog":[
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services":[
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}
What you're using is a dictionnary. You can access the values like this
nameOfYourDict["nameOfYourKey"]
For example if the name of your dict is data and you want to access Datadog :
data["Datadog"]
Start by getting the AWS pairs into their own variable:
aws_pairs = data["Amazon Web Services"]
Then loop over the pairs until you find one with the correct anchor:
appid_pair = None
for pair in aws_pairs:
if pair.startswith("appid:"):
appid_pair = pair
break
appid_value = None
if appid_pair:
appid_value = appid_pair.split(":", 1)[1]
print(appid_value)
Breaking this down into a simple next statement:
aws_pairs = data["Amazon Web Services"]
appid_value = next(
(
pair.split(":", 1)[1]
for pair in aws_pairs
if pair.startswith("appid:")
),
None
)
print(appid_value)
It's not really a JSON thing, you have a dictionary of lists so extract the relevant list then search it for the item you're looking for:
x = {
"Datadog":[
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services":[
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}
aws = x["Amazon Web Services"]
for string in aws:
name, value = string.split(":", 1)
if name == "appid":
print(value)
Gives:
42928482474dh28424a
The most efficient approach I can think of is to check if each tag (under the "Amazon Web Services" key) starts with a specified prefix, or tag name in this case.
Note that you can also use str.startswith, however here I just use a substring lookup, which also has the same effect.
data = {
"Datadog": [
"host:i-068fee2324438213477be9a4"
],
"Amazon Web Services": [
"availability-zone:us-east-1a",
"aws:cloudformation:logical-id:ec2instance01",
"aws:cloudformation:stack-id:arn:aws:cloudformation:us-east-1:353245",
"appid:42928482474dh28424a",
"name:devinstance",
"region:us-east-1",
"security-group:sg-022442414d8a",
"security-group:sg-0691af18875ad9d0b",
"security-group:sg-022442414d8a",
"security-group:sg-022442414d8a"
]
}['Amazon Web Services']
target_tag = 'appid:'
len_tag_name = len(target_tag)
for tag in data:
if tag[:len_tag_name] == target_tag:
app_id = tag[len_tag_name:]
break
else: # no `break` statement encountered, hence app_id not found
app_id = None
assert app_id == '42928482474dh28424a' # True
And finally, here is a one-liner version of the above, using a next iterator to find the first match in a generator expression. This should work if you know for sure that an appid tag exists.
app_id = next(tag[len_tag_name:] for tag in data if tag[:len_tag_name] == target_tag)

how to make a parent child relationship using python Elasticsearch client?

I am using python's elasticsearch client to make searchable pdfs. One group of pdf's is called surveys. I would like to make a parent child relationship where the parent consist of the group of pdf's and the child index will be the filenames within the group. However, I keep getting errors. My code is below:
in settings.py:
import elasticsearch
from elasticsearch import Elasticsearch, RequestsHttpConnection
ES_CLIENT = Elasticsearch(
['http://127.0.0.1:9200/'], #could be 9201,9300,9301
connection_class=RequestsHttpConnection
)
in my command.py:
from elasticsearch import Elasticsearch
from django.conf import settings
self.indices_client = settings.ES_CLIENT
print "create parent"
self.indices_client.index(
# op_type='create',
id='surveys',
doc_type='parent',
body={ "properties": { 'title': {'type': 'string', 'index': 'not_analyzed'}}},
index="surveys"
)
# create child index file_name with parent index surveys
# self.indices_client.create(index=child_index)
print 'create child'
self.indices_client.index(
doc_type='child',
body= upload_models.Survey._meta.es_mapping,
index=child_index,
parent='surveys'
)
print 'post child'
I keep getting this error:
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u"Can't specify parent if no parent field has been configured")
During child mapping in:
self.indices_client.index(
doc_type='child',
body= upload_models.Survey._meta.es_mapping,
index=child_index,
parent='surveys'
)
parent parameter here is ID of the parent document, so you can't use it for your purpose, instead try:
self.indices_client.index(
doc_type='child',
body= {
doc_type: {
'_parent': {"type": "surveys"},
'properties': upload_models.Survey._meta.es_mapping
}
}
index=child_index
)
Or try another function - put_mapping(*args, **kwargs):
self.indices_client.indices.put_mapping(
doc_type='child',
index=child_index,
body= {
doc_type: {
'_parent': {"type": "surveys"},
'properties': upload_models.Survey._meta.es_mapping
}
}
index=child_index
)

pymongo $set on array of subdocuments

I have a pymongo collection in the form of:
{
"_id" : "R_123456789",
"supplier_ids" : [
{
"id" : "S_987654321",
"file_version" : ISODate("2016-03-15T00:00:00Z"),
"latest" : false
},
{
"id" : "S_101010101",
"file_version" : ISODate("2016-03-29T00:00:00Z"),
"latest" : true
}
]
}
when I get new supplier data, if the supplier ID has changed, I want to capture that by setting latest on the previous 'latest' to False and the $push the new record.
$set is not working as I am trying to employ it (commented code after 'else'):
import pymongo
from dateutil.parser import parse
new_id = 'S_323232323'
new_date = parse('20160331')
with pymongo.MongoClient() as client:
db = client.transactions
collection_ids = db.ids
try:
collection_ids.insert_one({"_id": "R_123456789",
"supplier_ids": ({"id": "S_987654321",
"file_version": parse('20160315'),
"latest": False},
{"id": "S_101010101",
"file_version": parse('20160329'),
"latest": True})})
except pymongo.errors.DuplicateKeyError:
print('record already exists')
record = collection_ids.find_one({'_id':'R_123456789'})
for supplier_id in record['supplier_ids']:
print(supplier_id)
if supplier_id['latest']:
print(supplier_id['id'], 'is the latest')
if supplier_id['id'] == new_id:
print(new_id, ' is already the latest version')
else:
# print('setting', supplier_id['id'], 'latest flag to False')
# <<< THIS FAILS >>>
# collection_ids.update_one({'_id':record['_id']},
# {'$set':{'supplier_ids.latest':False}})
print('appending', new_id)
data_to_append = {"id" : new_id,
"file_version": new_date,
"latest": True}
collection_ids.update_one({'_id':record['_id']},
{'$push':{'supplier_ids':data_to_append}})
any and all help is much appreciated.
This whole process seems unnaturally verbose - should I be using a more streamlined approach?
Thanks!
You can try with positional operators.
collection_ids.update_one(
{'_id':record['_id'], "supplier_ids.latest": true},
{'$set':{'supplier_ids.$.latest': false}}
)
This query will update supplier_ids.latest = false, if it's true in document and matches other conditions.
The catch is you have to include field array as part of condition too.
For more information see Update

Example of update_item in dynamodb boto3

Following the documentation, I'm trying to create an update statement that will update or add if not exists only one attribute in a dynamodb table.
I'm trying this
response = table.update_item(
Key={'ReleaseNumber': '1.0.179'},
UpdateExpression='SET',
ConditionExpression='Attr(\'ReleaseNumber\').eq(\'1.0.179\')',
ExpressionAttributeNames={'attr1': 'val1'},
ExpressionAttributeValues={'val1': 'false'}
)
The error I'm getting is:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the UpdateItem operation: ExpressionAttributeNames contains invalid key: Syntax error; key: "attr1"
If anyone has done anything similar to what I'm trying to achieve please share example.
Found working example here, very important to list as Keys all the indexes of the table, this will require additional query before update, but it works.
response = table.update_item(
Key={
'ReleaseNumber': releaseNumber,
'Timestamp': result[0]['Timestamp']
},
UpdateExpression="set Sanity = :r",
ExpressionAttributeValues={
':r': 'false',
},
ReturnValues="UPDATED_NEW"
)
Details on dynamodb updates using boto3 seem incredibly sparse online, so I'm hoping these alternative solutions are useful.
get / put
import boto3
table = boto3.resource('dynamodb').Table('my_table')
# get item
response = table.get_item(Key={'pkey': 'asdf12345'})
item = response['Item']
# update
item['status'] = 'complete'
# put (idempotent)
table.put_item(Item=item)
actual update
import boto3
table = boto3.resource('dynamodb').Table('my_table')
table.update_item(
Key={'pkey': 'asdf12345'},
AttributeUpdates={
'status': 'complete',
},
)
If you don't want to check parameter by parameter for the update I wrote a cool function that would return the needed parameters to perform a update_item method using boto3.
def get_update_params(body):
"""Given a dictionary we generate an update expression and a dict of values
to update a dynamodb table.
Params:
body (dict): Parameters to use for formatting.
Returns:
update expression, dict of values.
"""
update_expression = ["set "]
update_values = dict()
for key, val in body.items():
update_expression.append(f" {key} = :{key},")
update_values[f":{key}"] = val
return "".join(update_expression)[:-1], update_values
Here is a quick example:
def update(body):
a, v = get_update_params(body)
response = table.update_item(
Key={'uuid':str(uuid)},
UpdateExpression=a,
ExpressionAttributeValues=dict(v)
)
return response
The original code example:
response = table.update_item(
Key={'ReleaseNumber': '1.0.179'},
UpdateExpression='SET',
ConditionExpression='Attr(\'ReleaseNumber\').eq(\'1.0.179\')',
ExpressionAttributeNames={'attr1': 'val1'},
ExpressionAttributeValues={'val1': 'false'}
)
Fixed:
response = table.update_item(
Key={'ReleaseNumber': '1.0.179'},
UpdateExpression='SET #attr1 = :val1',
ConditionExpression=Attr('ReleaseNumber').eq('1.0.179'),
ExpressionAttributeNames={'#attr1': 'val1'},
ExpressionAttributeValues={':val1': 'false'}
)
In the marked answer it was also revealed that there is a Range Key so that should also be included in the Key. The update_item method must seek to the exact record to be updated, there's no batch updates, and you can't update a range of values filtered to a condition to get to a single record. The ConditionExpression is there to be useful to make updates idempotent; i.e. don't update the value if it is already that value. It's not like a sql where clause.
Regarding the specific error seen.
ExpressionAttributeNames is a list of key placeholders for use in the UpdateExpression, useful if the key is a reserved word.
From the docs, "An expression attribute name must begin with a #, and be followed by one or more alphanumeric characters". The error is because the code hasn't used an ExpressionAttributeName that starts with a # and also not used it in the UpdateExpression.
ExpressionAttributeValues are placeholders for the values you want to update to, and they must start with :
Based on the official example, here's a simple and complete solution which could be used to manually update (not something I would recommend) a table used by a terraform S3 backend.
Let's say this is the table data as shown by the AWS CLI:
$ aws dynamodb scan --table-name terraform_lock --region us-east-1
{
"Items": [
{
"Digest": {
"S": "2f58b12ae16dfb5b037560a217ebd752"
},
"LockID": {
"S": "tf-aws.tfstate-md5"
}
}
],
"Count": 1,
"ScannedCount": 1,
"ConsumedCapacity": null
}
You could update it to a new digest (say you rolled back the state) as follows:
import boto3
dynamodb = boto3.resource('dynamodb', 'us-east-1')
try:
table = dynamodb.Table('terraform_lock')
response = table.update_item(
Key={
"LockID": "tf-aws.tfstate-md5"
},
UpdateExpression="set Digest=:newDigest",
ExpressionAttributeValues={
":newDigest": "50a488ee9bac09a50340c02b33beb24b"
},
ReturnValues="UPDATED_NEW"
)
except Exception as msg:
print(f"Oops, could not update: {msg}")
Note the : at the start of ":newDigest": "50a488ee9bac09a50340c02b33beb24b" they're easy to miss or forget.
Small update of Jam M. Hernandez Quiceno's answer, which includes ExpressionAttributeNames to prevent encoutering errors such as:
"errorMessage": "An error occurred (ValidationException) when calling the UpdateItem operation:
Invalid UpdateExpression: Attribute name is a reserved keyword; reserved keyword: timestamp",
def get_update_params(body):
"""
Given a dictionary of key-value pairs to update an item with in DynamoDB,
generate three objects to be passed to UpdateExpression, ExpressionAttributeValues,
and ExpressionAttributeNames respectively.
"""
update_expression = []
attribute_values = dict()
attribute_names = dict()
for key, val in body.items():
update_expression.append(f" #{key.lower()} = :{key.lower()}")
attribute_values[f":{key.lower()}"] = val
attribute_names[f"#{key.lower()}"] = key
return "set " + ", ".join(update_expression), attribute_values, attribute_names
Example use:
update_expression, attribute_values, attribute_names = get_update_params(
{"Status": "declined", "DeclinedBy": "username"}
)
response = table.update_item(
Key={"uuid": "12345"},
UpdateExpression=update_expression,
ExpressionAttributeValues=attribute_values,
ExpressionAttributeNames=attribute_names,
ReturnValues="UPDATED_NEW"
)
print(response)
An example to update any number of attributes given as a dict, and keep track of the number of updates. Works with reserved words (i.e name).
The following attribute names shouldn't be used as we will overwrite the value: _inc, _start.
from typing import Dict
from boto3 import Session
def getDynamoDBSession(region: str = "eu-west-1"):
"""Connect to DynamoDB resource from boto3."""
return Session().resource("dynamodb", region_name=region)
DYNAMODB = getDynamoDBSession()
def updateItemAndCounter(db_table: str, item_key: Dict, attributes: Dict) -> Dict:
"""
Update item or create new. If the item already exists, return the previous value and
increase the counter: update_counter.
"""
table = DYNAMODB.Table(db_table)
# Init update-expression
update_expression = "SET"
# Build expression-attribute-names, expression-attribute-values, and the update-expression
expression_attribute_names = {}
expression_attribute_values = {}
for key, value in attributes.items():
update_expression += f' #{key} = :{key},' # Notice the "#" to solve issue with reserved keywords
expression_attribute_names[f'#{key}'] = key
expression_attribute_values[f':{key}'] = value
# Add counter start and increment attributes
expression_attribute_values[':_start'] = 0
expression_attribute_values[':_inc'] = 1
# Finish update-expression with our counter
update_expression += " update_counter = if_not_exists(update_counter, :_start) + :_inc"
return table.update_item(
Key=item_key,
UpdateExpression=update_expression,
ExpressionAttributeNames=expression_attribute_names,
ExpressionAttributeValues=expression_attribute_values,
ReturnValues="ALL_OLD"
)
Hope it might be useful to someone!
In a simple way you can use below code to update item value with new one:
response = table.update_item(
Key={"my_id_name": "my_id_value"}, # to get record
UpdateExpression="set item_key_name=:item_key_value", # Operation action (set)
ExpressionAttributeValues={":value": "new_value"}, # item that you need to update
ReturnValues="UPDATED_NEW" # optional for declarative message
)
Simple example with multiple fields:
import boto3
dynamodb_client = boto3.client('dynamodb')
dynamodb_client.update_item(
TableName=table_name,
Key={
'PK1': {'S': 'PRIMARY_KEY_VALUE'},
'SK1': {'S': 'SECONDARY_KEY_VALUE'}
}
UpdateExpression='SET #field1 = :field1, #field2 = :field2',
ExpressionAttributeNames={
'#field1': 'FIELD_1_NAME',
'#field2': 'FIELD_2_NAME',
},
ExpressionAttributeValues={
':field1': {'S': 'FIELD_1_VALUE'},
':field2': {'S': 'FIELD_2_VALUE'},
}
)
using previous answer from eltbus , it worked for me , except for minor bug,
You have to delete the extra comma using update_expression[:-1]

MongoDB - Upsert with increment

I am trying to run the following query:
data = {
'user_id':1,
'text':'Lorem ipsum',
'$inc':{'count':1},
'$set':{'updated':datetime.now()},
}
self.db.collection('collection').update({'user_id':1}, data, upsert=True)
but the two '$' queries cause it to fail. Is it possible to do this within one statement?
First of all, when you ask a question like this it's very helpful to add information on why it's failing (e.g. copy the error).
Your query fails because you're mixing $ operators with document overrides. You should use the $set operator for the user_id and text fields as well (although the user_id part in your update is irrelevant at this example).
So convert this to pymongo query:
db.test.update({user_id:1},
{$set:{text:"Lorem ipsum", updated:new Date()}, $inc:{count:1}},
true,
false)
I've removed the user_id in the update because that isn't necessary. If the document exists this value will already be 1. If it doesn't exist the upsert will copy the query part of your update into the new document.
If you're trying to do the following:
If the doc doesn't exist, insert a new doc.
If it exists, then only increment one field.
Then you can use a combo of $setOnInsert and $inc. If the song exists then $setOnInsert won't do anything and $inc will increase the value of "listened". If the song doesn't exist, then it will create a new doc with the fields "songId" and "songName". Then $inc will create the field and set the value to be 1.
let songsSchema = new mongoose.Schema({
songId: String,
songName: String,
listened: Number
})
let Song = mongoose.model('Song', songsSchema);
let saveSong = (song) => {
return Song.updateOne(
{songId: song.songId},
{
$inc: {listened: 1},
$setOnInsert: {
songId: song.songId,
songName: song.songName,
}
},
{upsert: true}
)
.then((savedSong) => {
return savedSong;
})
.catch((err) => {
console.log('ERROR SAVING SONG IN DB', err);
})

Categories

Resources