PyMongo query - check if value exists or not - python

I'm looking to perform a query on a database and extract some data to be processed. Here is my query so far:
pipeline = [{'$match':{"Timestamp":{'$gte':m(), '$lt':current()},
'Frequency Survey Reference':{'$regex':'Ch2'}}},
{'$group': {
'_id': '$Timestamp',
'Trace' : {'$push': '$TR Trace'}
}},
{'$sort': {'_id': -1}},
{'$limit': 1}
]
get_tr = collection.aggregate(pipeline, allowDiskUse=True)
However, some of the records don't have any value for TR Trace (an empty array), and I want to perform a check where it ignores those entries and doesn't include them in the pipeline. How would I perform such a check?

Filter them out as part of the $match with a $exists query operator:
pipeline = [{'$match':{"Timestamp":{'$gte':m(), '$lt':current()},
'Frequency Survey Reference':{'$regex':'Ch2'},
'TR Trace': {'$exists': True, '$ne': ''}}},
...

Related

How do i get the document id in Marqo?

i added a document to marqo add_documents() but i didn't pass an id and now i am trying to get the document but i don't know what the document_id is?
Here is what my code look like:
mq = marqo.Client(url='http://localhost:8882')
mq.index("my-first-index").add_documents([
{
"Title": title,
"Description": document_body
}]
)
i tried to check whether the document got added or not but ;
no_of_docs = mq.index("my-first-index").get_stats()
print(no_of_docs)
i got;
{'numberOfDocuments': 1}
meaning it was added.
if you don't add the "_id" as part of key/value then by default marqo will generate a random id for you, to access it you can search the document using the document's Title,
doc = mq.index("my-first-index").search(title_of_your_document, searchable_attributes=['Title'])
you should get a dictionary as the result something like this;
{'hits': [{'Description': your_description,
'Title': title_of_your_document,
'_highlights': relevant part of the doc,
'_id': 'ac14f87e-50b8-43e7-91de-ee72e1469bd3',
'_score': 1.0}],
'limit': 10,
'processingTimeMs': 122,
'query': 'The Premier League'}
the part that says _id is the id of your document.

What to write in pipeline of MangoDb to find one element of 'keys'?

I would like to create a query to find the number of trees whose species name ends by 'um'
by arrondissement.
My code is here:
from pymongo import MongoClient
from utils import get_my_password, get_my_username
from pprint import pprint
client = MongoClient(
host='127.0.0.1',
port=27017,
username=get_my_username(),
password=get_my_password(),
authSource='admin'
)
db = client['paris']
col = db['trees']
pprint(col.find_one())
{'_id': ObjectId('5f3276d8c22f704983b3f681'),
'adresse': 'JARDIN DU CHAMP DE MARS / C04',
'arrondissement': 'PARIS 7E ARRDT',
'circonferenceencm': 115.0,
'domanialite': 'Jardin',
'espece': 'hippocastanum',
'genre': 'Aesculus',
'geo_point_2d': [48.8561906007, 2.29586827747],
'hauteurenm': 11.0,
'idbase': 107224.0,
'idemplacement': 'P0040937',
'libellefrancais': 'Marronnier',
'remarquable': '0',
'stadedeveloppement': 'A',
'typeemplacement': 'Arbre'}
I tryed to do it with next lines:
import re
regex = re.compile('um')
pipeline = [
{'$group': {'_id': '$arrondissement',
'CountNumberTrees': {'$count': '${'espece': regex}'}
}
}
]
results = col.aggregate(pipeline)
pprint(list(results))
But it returns:
File "<ipython-input-114-fba3a8bf5bfd>", line 8
'CountNumberTrees': {'$count': '${'espece': regex}'}
^
SyntaxError: invalid syntax
When I check like this, it shows results: '25245'
results = col.count_documents(filter={'espece': regex})
print(results)
Could you help me please to understand what should I put in pipeline?
Best regards
Try this syntax for your aggregate query:
The $match stage filters on espace ending in um.
The $group stage counts each returned record grouped by arrondissement
The $project stage is optional but it provides a tidier list of fields.
cursor = col.aggregate([
{'$match': {'espece': {'$regex': 'um$'}}},
{'$group': {'_id': '$arrondissement', 'CountNumberTrees': {'$sum': 1}}},
{'$project': {'_id': 0, 'arrondissement': '$_id', 'CountNumberTrees': '$CountNumberTrees'}}
])
print(list(cursor))

MongoDB updation not working with Pymongo

I wanted to add new keys to an existing object in a MongoDB docuemnt, I am trying to update the specific abject with update query but I don't see new keys in database.
I have a object like this:
{'_id': 'patent_1023',
'raw': {'id': 'CN-109897889-A',
'title': 'A kind of LAMP(ring mediated isothermal amplification) product visible detection method',
'assignee': '北京天恩泽基因科技有限公司',
'inventor/author': '徐堤',
'priority_date': '2019-04-17',
'filing/creation_date': '2019-04-17',
'publication_date': '2019-06-18',
'grant_date': None,
'result_link': 'https://patents.google.com/patent/CN109897889A/en', 'representative_figure_link': None
},
'source': 'Google Patent'}
I added two new keys in raw and want to update only 'raw' with new keys 'abstract' and 'description'
Here is what I have done.
d = client.find_one({'_id': {'$in': ids}})
d['raw'].update(missing_data) # missing_data contain new keys to be added in raw.
here = client.find_one_and_update({'_id': d['_id']}, {'$set': {"raw": d['raw']}})
Both update_one and update_many will work with this:
missing_data = {'abstract':'a book', 'description':'a fun book'};
ids = [ 'patent_1023', 'X'];
rc=db.foo.update_one(
{'_id': {'$in': ids}},
# Use pipeline form of update to exploit richer agg framework
# function like $mergeObjects. Below we are saying "take the
# incoming raw object, overlay the missing_data object on top of
# it, and then set that back into raw and save":
[ {'$set': {
'raw': {'$mergeObjects': [ '$$ROOT.raw', missing_data ] }
}}
]
)

Python-How to find duplicated name/document in mongo db?

I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = [{'$project': {'name':'$name'}}]
name_cursor = collection.aggregate(names, cursor={})
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
The following query will show only duplicates:
db['collection_name'].aggregate([{'$group': {'_id':'$name', 'count': {'$sum': 1}}}, {'$match': {'count': {'$gt': 1}}}])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert({name: "name1"})
db.a.insert({name: "name1"})
db.a.insert({name: "name2"})
db.a.aggregate([{"$group": {_id:"$name", count: {"$sum": 1}}}, {$match: {count: {"$gt": 1}}}])
Result is { "_id" : "name1", "count" : 2 }
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
{'$group': {'_id': '$name', 'count': {'$sum': 1}}},
{'$match': {'count': {'$gt': 1}}}
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()

MongoDB - Upsert with increment

I am trying to run the following query:
data = {
'user_id':1,
'text':'Lorem ipsum',
'$inc':{'count':1},
'$set':{'updated':datetime.now()},
}
self.db.collection('collection').update({'user_id':1}, data, upsert=True)
but the two '$' queries cause it to fail. Is it possible to do this within one statement?
First of all, when you ask a question like this it's very helpful to add information on why it's failing (e.g. copy the error).
Your query fails because you're mixing $ operators with document overrides. You should use the $set operator for the user_id and text fields as well (although the user_id part in your update is irrelevant at this example).
So convert this to pymongo query:
db.test.update({user_id:1},
{$set:{text:"Lorem ipsum", updated:new Date()}, $inc:{count:1}},
true,
false)
I've removed the user_id in the update because that isn't necessary. If the document exists this value will already be 1. If it doesn't exist the upsert will copy the query part of your update into the new document.
If you're trying to do the following:
If the doc doesn't exist, insert a new doc.
If it exists, then only increment one field.
Then you can use a combo of $setOnInsert and $inc. If the song exists then $setOnInsert won't do anything and $inc will increase the value of "listened". If the song doesn't exist, then it will create a new doc with the fields "songId" and "songName". Then $inc will create the field and set the value to be 1.
let songsSchema = new mongoose.Schema({
songId: String,
songName: String,
listened: Number
})
let Song = mongoose.model('Song', songsSchema);
let saveSong = (song) => {
return Song.updateOne(
{songId: song.songId},
{
$inc: {listened: 1},
$setOnInsert: {
songId: song.songId,
songName: song.songName,
}
},
{upsert: true}
)
.then((savedSong) => {
return savedSong;
})
.catch((err) => {
console.log('ERROR SAVING SONG IN DB', err);
})

Categories

Resources