Pymongo - aggregating from two databases

Pymongo - aggregating from two databases - python

Noob here. I need to get a report that is an aggregate of two collections from two databases. Tried to wrap my head around the problem but failed. The examples I have found are for aggregating two collections from the same database.
Collection 1
SweetsResults = client.ItemsDB.sweets
Collection sweets : _id, type, color
Collection2
SearchesResults = client.LogsDB.searches
Collection searches : _id, timestamp, type, color
The report I need will list all the sweets from the type “candy” with all the listed colors in the sweets collection, and for each line, the number (count) of searches for any available combination of “candy”+color.
Any help will be appreciated.
Thanks.

You can use the below script in mongo shell.
Get the distinct color for each type followed by count for each type and color combination.
var itemsdb = db.getSiblingDB('ItemsDB');
var logsdb = db.getSiblingDB('LogsDB');
var docs = [];
itemsdb.getCollection("sweets").aggregate([
{$match:{"type":"candy"}},
{$group: {_id:{type:"$type", color:"$color"}},
{$project: {_id:0, type:"$_id.type", color:"$_id.color"}}
]).forEach(function(doc){
doc.count = logsdb.getCollection("searches").count({ "type":"candy","color":doc.color});
docs.push(doc)
});

Exactly the same as in #Veeram answer but with python:
uri = 'mongodb://localhost'
client = MongoClient(uri)
items_db = client.get_database('ItemsDB')
logs_db = client.get_database('LogsDB')
docs = []
aggr = items_db.get_collection('sweets').aggregate([
{'$match': {"type": "candy"}},
{'$group': {'_id': {'type': "$type", 'color': "$color"}}},
{'$project': {'_id': 0, 'type': "$_id.type", 'color': "$_id.color"}},
])
for doc in aggr:
doc['count'] = logs_db.get_collection("searches").count({"type": "candy", "color": doc['color']})
docs.append(doc)

Related

How can I refactor my code to return a collection of dictionaries?

def read_data(service_client):
data = list_data(domain, realm) # This returns a data frame
building_data = []
building_names = {}
all_buildings = {}
for elem in data.iterrows():
building = elem[1]['building_name']
region_id = elem[1]['region_id']
bandwith = elem[1]['bandwith']
building_id = elem[1]['building_id']
return {
'Building': building,
'Region Id': region_id,
'Bandwith': bandwith,
'Building Id': building_id,
}
Basically I am able to return a single dictionary value upon a iteration here in this example. I have tried printing it as well and others.
I am trying to find a way to store multiple dictionary values on each iteration and return it, instead of just returning one.. Does anyone know any ways to achieve this?

You may replace your for-loop with the following to get all dictionaries in a list.
naming = {
'building_name': 'Building',
'region_id': 'Region Id',
'bandwith': 'Bandwith',
'building_id': 'Building Id',
}
return [
row[list(naming.values())].to_dict()
for idx, row in data.rename(naming, axis=1).iterrows()
]

I want to firstly create a database and then update it as per the values in mongodb

I want to update the value of Entry1 using upsert. I have a sensor that returns the value of Entry1. If sensor is blocked, the value is true. If sensor is not blocked then the value is False.
machineOne = None
oneIn = 1
while True:
global machineOneId
global userId
try:
if Entry1.get_value() and oneIn < 2:
machineOne = Entry1.get_value()
print('entered looopp ONeeeE', machineOne)
machine1 = {
'Entry1': Entry1.get_value(),
'Exit1': Exit1.get_value(),
'id': 'test'
}
result = Machine1.insert_one(machine1)
myquery = {"Entry1": 'true'}
newvalues = {"$set": {"id": result.inserted_id}}
#result = Machine1.insert_one(machine1)
Machine1.update_one(myquery, newvalues)
userId = result.inserted_id
oneIn += 1
print('added', result.inserted_id, oneIn)
elif machineOne:
print('entered looopp', userId)
myquery = {"id": userId}
newvalues = {"$set": {"id": Entry1.get_value()}}
upsert = True
#result = Machine1.insert_one(machine1)
Machine1.update_one(myquery, newvalues)
if Exit1.get_value():
print('added',)
finally:
print('nothings happened', machineOne)
what is expected: i should be able to update the Entry1 from true to false in the same log, displayed in robo3t

Good Afternoon #digs10 ,
I read your post and I think the error is how you locate the document that you want to update.
I remember that MongoDB document primary key is "_id" instead of "id". You could take a look here MongoDB Documents
For what I see in the code (I don't know Python but it is readable), you are referring to the document "Entry1" using the field "id" instead of "_id.
Try modifing the line myquery = {"id": userId} for myquery = {"_id": userId}.
I hope this answer can help you.
Best Regards,
JB
P.S: I saw this question in my email and I took a quick read of it, If I misunderstood it, please let me know.

Python-How to find duplicated name/document in mongo db?

I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = [{'$project': {'name':'$name'}}]
name_cursor = collection.aggregate(names, cursor={})
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!

The following query will show only duplicates:
db['collection_name'].aggregate([{'$group': {'_id':'$name', 'count': {'$sum': 1}}}, {'$match': {'count': {'$gt': 1}}}])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert({name: "name1"})
db.a.insert({name: "name1"})
db.a.insert({name: "name2"})
db.a.aggregate([{"$group": {_id:"$name", count: {"$sum": 1}}}, {$match: {count: {"$gt": 1}}}])
Result is { "_id" : "name1", "count" : 2 }
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
{'$group': {'_id': '$name', 'count': {'$sum': 1}}},
{'$match': {'count': {'$gt': 1}}}
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()

What is the data format returned by the AdWords API TargetingIdeaPage service?

When I query the AdWords API to get search volume data and trends through their TargetingIdeaSelector using the Python client library the returned data looks like this:
(TargetingIdeaPage){
totalNumEntries = 1
entries[] =
(TargetingIdea){
data[] =
(Type_AttributeMapEntry){
key = "KEYWORD_TEXT"
value =
(StringAttribute){
Attribute.Type = "StringAttribute"
value = "keyword phrase"
}
},
(Type_AttributeMapEntry){
key = "TARGETED_MONTHLY_SEARCHES"
value =
(MonthlySearchVolumeAttribute){
Attribute.Type = "MonthlySearchVolumeAttribute"
value[] =
(MonthlySearchVolume){
year = 2016
month = 2
count = 2900
},
...
(MonthlySearchVolume){
year = 2015
month = 3
count = 2900
},
}
},
},
}
This isn't JSON and appears to just be a messy Python list. What's the easiest way to flatten the monthly data into a Pandas dataframe with a structure like this?
Keyword | Year | Month | Count
keyword phrase 2016 2 10

The output is a sudsobject. I found that this code does the trick:
import suds.sudsobject as sudsobject
import pandas as pd
a = [sudsobject.asdict(x) for x in output]
df = pd.DataFrame(a)
Addendum: This was once correct but new versions of the API (I tested
201802) now return a zeep.objects. However, zeep.helpers.serialize_object should do the same trick.
link

Here's the complete code that I used to query the TargetingIdeaSelector, with requestType STATS, and the method I used to parse the data to a useable dataframe; note the section starting "Parse results to pandas dataframe" as this takes the output given in the question above and converts it to a dataframe. Probably not the fastest or best, but it works! Tested with Python 2.7.
"""This code pulls trends for a set of keywords, and parses into a dataframe.
The LoadFromStorage method is pulling credentials and properties from a
"googleads.yaml" file. By default, it looks for this file in your home
directory. For more information, see the "Caching authentication information"
section of our README.
"""
from googleads import adwords
import pandas as pd
adwords_client = adwords.AdWordsClient.LoadFromStorage()
PAGE_SIZE = 10
# Initialize appropriate service.
targeting_idea_service = adwords_client.GetService(
'TargetingIdeaService', version='v201601')
# Construct selector object and retrieve related keywords.
offset = 0
stats_selector = {
'searchParameters': [
{
'xsi_type': 'RelatedToQuerySearchParameter',
'queries': ['donald trump', 'bernie sanders']
},
{
# Language setting (optional).
# The ID can be found in the documentation:
# https://developers.google.com/adwords/api/docs/appendix/languagecodes
'xsi_type': 'LanguageSearchParameter',
'languages': [{'id': '1000'}],
},
{
# Location setting
'xsi_type': 'LocationSearchParameter',
'locations': [{'id': '1027363'}] # Burlington,Vermont
}
],
'ideaType': 'KEYWORD',
'requestType': 'STATS',
'requestedAttributeTypes': ['KEYWORD_TEXT', 'TARGETED_MONTHLY_SEARCHES'],
'paging': {
'startIndex': str(offset),
'numberResults': str(PAGE_SIZE)
}
}
stats_page = targeting_idea_service.get(stats_selector)
##########################################################################
# Parse results to pandas dataframe
stats_pd = pd.DataFrame()
if 'entries' in stats_page:
for stats_result in stats_page['entries']:
stats_attributes = {}
for stats_attribute in stats_result['data']:
#print (stats_attribute)
if stats_attribute['key'] == 'KEYWORD_TEXT':
kt = stats_attribute['value']['value']
else:
for i, val in enumerate(stats_attribute['value'][1]):
data = {'keyword': kt,
'year': val['year'],
'month': val['month'],
'count': val['count']}
data = pd.DataFrame(data, index = [i])
stats_pd = stats_pd.append(data, ignore_index=True)
print(stats_pd)

How to turn a dataframe of categorical data into a dictionary

I have a dataframe that I need to transform into JSON. I think it would be easier to first turn it into a dictionary, but I can't figure out how. I need to transform it into JSON so that I can visualize it with js.d3
Here is what the data looks like currently:
NAME, CATEGORY, TAG
Ex1, Education, Books
Ex2, Transportation, Bus
Ex3, Education, Schools
Ex4, Education, Books
Ex5, Markets, Stores
Here is what I want the data to look like:
Data = {
Education {
Books {
key: Ex1,
key: Ex2
}
Schools {
key: Ex3
}
}
Transportation {
Bus {
key: Ex2
}
}
Markets {
Stores {
key: Ex5
}
}
(I think my JSON isn't perfect here, but I just wanted to convey the general idea).

This code is thanks to Brent Washburne's very helpful answer above. I just needed to remove the tags column because for now it was too messy (many of the rows had more than one tag separated by commas). I also added a column (of integers) which I wanted connected to the names. Here it is:
import json, string
import pprint
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], [])
to_append = {}
to_append[fields[0]] = fields[3]
categories.append(to_append)
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')

You can't use 'key' as a key more than once, so the innermost group is a list:
import json, string
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], {})
tags = categories.get(fields[2], [])
tags.append(fields[0])
categories[fields[2]] = tags
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')
Result:
{"Markets": {"Stores": ["Ex5"]}, "Education": {"Schools": ["Ex3"], "Books": ["Ex1", "Ex4"]}, "Transportation": {"Bus": ["Ex2"]}}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pymongo - aggregating from two databases - python

Related

How can I refactor my code to return a collection of dictionaries?

I want to firstly create a database and then update it as per the values in mongodb

Python-How to find duplicated name/document in mongo db?

What is the data format returned by the AdWords API TargetingIdeaPage service?

How to turn a dataframe of categorical data into a dictionary

Categories

Resources