Querying large Mongodb collection using pymongo - python

I want to query my mongodb collection which has more than 5k records, each record has key-value pair like
{
"A" : "unique-value1",
"B" : "service1",
"C" : 1.2321,
...
},
...
here A will always have unique value, B has value like service1, service2, ....service8 and C is some float value.
what I want is to get a record like this with key-value pair.
{
"A" : "unique-value1",
"B" : "service1",
"C" : 1.2321
}
{
"A" : "unique-value2",
"B" : "service2",
"C" : 0.2321
}
{
"A" : "unique-value3",
"B" : "service1",
"C" : 3.2321
}
I am not sure how to do this, earlier I used MapReduce but that time I was needed to generate records with A and C key value paire only but now since i also need B i do not know what should i do.
this is what i was doing
map_reduce = Code("""
function () {
emit(this.A, parseFloat(this.C));
}
""")
result = my_collection.map_reduce(map_reduce, reduce, out='temp_collection')
for doc in result.find({}):
out = dict()
out[doc['_id']] = doc['_id']
out['cost'] = doc['value']
out_handle.update_one(
{'A': doc['_id']},
{'$set': out},
upsert=True
)

Unless I've misunderstood what you need , it looks like you are making this harder than it need be. Just project the keys you want using the second parameter of the find method.
for record in db.testcollection.find({}, { 'A': 1, 'B': 1, 'C': 1}):
db.existingornewcollection.replace_one({'_id': record['_id']}, record, upsert=True)
Full example:
from pymongo import MongoClient
from bson.json_util import dumps
db = MongoClient()['testdatabase']
db.testcollection.insert_one({
"A": "unique-value1",
"B": "service1",
"C": 1.2321,
"D": "D",
"E": "E",
"F": "F",
})
for record in db.testcollection.find({}, { 'A': 1, 'B': 1, 'C': 1}):
db.existingornewcollection.replace_one({'_id': record['_id']}, record, upsert=True)
print(dumps(db.existingornewcollection.find_one({}, {'_id': 0}), indent=4))
gives:
{
"A": "unique-value1",
"B": "service1",
"C": 1.2321
}

Related

Why doesn't pymongo MongoDB return an exact value in find_one()?

I want to retrieve the single value "count "from pymongo DB but it is not working. The image below shows how the data entry is setup.
Here is the call to my Database class to use the db.find_one().
CODE HERE:
filters = {"email": session.get('email')}
returns = {f'words.{today_s}.{self.length - 3}.count': 1}
count_value = Database.find_one_return_one("users", filters, returns)
print({f'words.{today_s}.{self.length - 3}.count':1})
print(count_value)
#staticmethod
def find_one_return_one(collection: str, query: Dict, data: Dict) -> Dict:
return Database.DATABASE[collection].find_one(query, data)
This returns an empty list of dictionaries from the correct data? I want the count value returned.
This is the projection query: {words.20220302.0.count : 1}
This is what is returned:
{'_id': ObjectId('621ee5065d08c44070140df0'), 'words': {'20220302': [{}, {}, {}, {}, {}, {}, {}]}}
What is wrong or is there a better quicker way to retrieve the count value?
The following query projection can be used to get the desired result. Note this worked with MongoDB v5.
A sample document; similar to the one in the question post:
{ _id: 1, words: { fld: [ { a: 1, b: 2 }, { a: 9, b: 100 } ] } }
The expected result is: { "_id" : 1, "words" : { "fld" : { "a" : 9 } } }
The query:
INDEX = 1 # this is the index of the array element
query = { }
projection = {
'words.fld': {
'$arrayElemAt': [
{ '$map': { 'input': '$words.fld', 'in': { 'a': '$$this.a' } } },
INDEX
]
}
}
result = collection.find_one(query, projection)
print(result)

unique combinations values in a dictionary

For the following example dictionary, is there a builtin method to get all unique combinations?
a = {
"a": ["a_1", "a_2"],
"b": ["b_1", "b_2"]
}
output:
[
["a_1", "b_1"],
["a_1", "b_2"],
["a_2", "b_1"],
["a_2", "b_2"]
]
I did this with itertools.product()
import itertools
a = {
"a": ["a_1", "a_2"],
"b": ["b_1", "b_2"]
}
print(list(itertools.product(*a.values())))
Output:
[('a_1', 'b_1'), ('a_1', 'b_2'), ('a_2', 'b_1'), ('a_2', 'b_2')]

Update mongodb database from python

I am having a mongodb database that I want to update. Actually, my code for creating and updating the database is the following:
from pymongo import MongoClient
client = MongoClient()
client = MongoClient('localhost', 27017)
db = client['my_db_values']
res = collection = db['db_values']
res = collection.find({"User": "2145"})
if res.count() == 0:
json_file = {"User": "2145", "Item": {"123456": {"process1": [],"process2": []}}}
temp_json1 = {"timestamp": "2123532158", "process1_value": 0.4, "state": {"B": 0.1, "F": 0.2, "E": 0.3}}
temp_json2 = {"timestamp": "2323532158", "process2_value": 0.2, "P": 0.8}
json_file ["Item"][str(123464)]["process1"].append(temp_json1)
json_file ["Item"][str(123464)]["process2"].append(temp_json2)
temp = db.values
temp_id = temp.insert_one(json_file).inserted_id
else:
for line in res:
counter = 0
for key in line["Item"].keys():
if line["Item"].keys()[counter] == "123464":
collection.update_one({"User": "2145", "Item": {"123464": {"process1":[]}}}, {"$set": {"Item.123464.process2": [
{"timestamp": "21354879546213", "process1_value": 0,
"state": {"B": 0.1, "F": 0.2,
"E": 0.3}}], "Item.123464.process2": [
{"timestamp": "11354879546213", "process2_value": 0, "P": 0.8}]}})
else:
collection.update_one({"User": "2145"},{"$set": {"Item.123464.process1": [{"timestamp": "21354879546213", "process1_value": 0.4, "state": {"B": 0.1, "F": 0.2, "E": 0.3}}], "Item.123464.process2": [{"timestamp": "11354879546213", "process2_value": 0.2, "P": 0.8}]}})
counter = counter + 1
In the first if stetement if the count it is equally to zero I am creating the json file with that specific user, while if it is already there then i need to do the same for Sla and then I need to update the db with new temp_json1 and temp_json2. How can I update a subdocument into my initial document. Actually I want to perform a check to see if there is in the db a user with the specific id (otherwise I want to add him) then if the current item_id does not exist to add the item in the user document (as I did in my code already). Finally, if the item does exist then I want just to add temp_json1 and temp_json2 in the already created subdocument. How can I do so?
What you desire is subdocument querying (querying documents by their nested contents).
You can control this query by using the $elemMatch feature, to specify what matches your query by the contents of your Item subdocument's 123456 subdocument's process1 array.
The Mongo Shell format of the query is the following (for python driver just use the query part):
db.your_collection.find({
"User": "2145",
"Item.123456.process1": {$elemMatch: {$eq: "12345"} }
});
So if your collection is populated with the following 2 documents:
{ "_id" : ObjectId("aaa"), "User" : "2145", "Item" : { "123456" : { "process1" : [ ], "process2" : [ ] } } }
{ "_id" : ObjectId("bbb"), "User" : "2145", "Item" : { "123456" : { "process1" : [ "12345" ], "process2" : [ ] } } }
This query will only return the 2nd document, but the omit the first because its process1 array contents don't contain a "12345".
Hope this helps!

How to interpret a string to define a dictionary call?

I am attempting to pass in to a function, a string which will be interpreted to determine the desired dictionary call required to update a dictionary.
Here is an example of what I have so far, hard-coded:
import json
from collections import defaultdict
def default_dict():
return defaultdict(default_dict)
def build_dict():
d["a"]["b"]["c"]["d"]["e"]["f"].update({})
d["a"]["b"]["c1"]["d1"].update({})
return json.dumps(d)
d = default_dict()
print build_dict()
But to be useful to me I want to pass in strings to the build_dict() function. Lets call it 's':
for s in ["a/b/c/d/e/f", "a/b/c1/d1"]:
print build_dict(s)
Which should print the following (exactly as it does in the example I hard-coded:
{
"a": {
"b": {
"c": {
"d": {
"e": {
"f": {}
}
}
},
"c1": {
"d1": {}
}
}
}
}
I have to make sure that multiple branches are supported in the way they are (as far as I have tested) in my hard-coded example.
What I am currently attempting:
Midway through constructing this question I found out about dpath, "A python library for accessing and searching dictionaries via /slashed/paths ala xpath". It looks exactly what I need so if I successfully work it out, I will post an answer to this question.
I worked out a solution to my own question.
import json
import dpath.util
def build_dict(viewsDict, viewsList):
for views in viewsList:
viewsDict = new_keys(viewsDict, views)
return viewsDict
def new_keys(viewsDict, views):
dpath.util.new(viewsDict, views, {})
return viewsDict
viewsDict = {}
viewsList = [
"a/b/c/d/e/f",
"a/b/c1/d1"
]
print json.dumps(build_dict(viewsDict, viewsList), indent=4, sort_keys=True)
This builds a dict based on sequence of paths and passes your test case.
It builds a dictionary from up to down, adding a new keys if they are missing, and updating an existing dictionary when they are present.
def build_dict(string_seq):
d = {}
for s in string_seq:
active_d = d
parts = s.split("/")
for p in parts:
if p not in active_d:
active_d[p] = {}
active_d = active_d[p]
return d
expected = {
"a": {
"b": {
"c": {
"d": {
"e": {
"f": {}
}
}
},
"c1": {
"d1": {}
}
}
}
}
string_seq = ["a/b/c/d/e/f", "a/b/c1/d1"]
result = build_dict(string_seq)
assert result == expected

python append to array in json object

I have the following json object in python:
jsonobj = {
"a": {
"b": {
"c": var1,
"d": var2,
"e": [],
},
},
}
And I would like to append key-value elements into "e", but can't figure out the syntax for it. I tried appending with the following, but it doesn't come out right with the brackets and quotes:
jsobj["a"]["b"]["e"].append("'f':" + var3)
Instead, I want "e" to be the following:
"e":[
{"f":var3, "g":var4, "h":var5},
{"f":var6, "g":var7, "h":var8},
]
Does anyone know the right way to append to this json array? Much appreciation.
jsobj["a"]["b"]["e"].append({"f":var3, "g":var4, "h":var5})
jsobj["a"]["b"]["e"].append({"f":var6, "g":var7, "h":var8})
Just add the dictionary as a dictionary object not a string :
jsobj["a"]["b"]["e"].append(dict(f=var3))
Full source :
var1 = 11
var2 = 32
jsonobj = {"a":{"b":{"c": var1,
"d": var2,
"e": [],
},
},
}
var3 = 444
jsonobj["a"]["b"]["e"].append(dict(f=var3))
jsonobj will contain :
{'a': {'b': {'c': 11, 'd': 32, 'e': [{'f': 444}]}}}
jsonobj["a"]["b"]["e"] += [{'f': var3, 'g' : var4, 'h': var5},
{'f': var6, 'g' : var7, 'h': var8}]

Categories

Resources