Bucketing for histogram in MongoDB - python

Syntax issue on line #25. Can someone please help me to spot the mistake not sure if the problem is in the code before this line.
File SyntaxError: invalid syntax, line 25
} #25
^
The line with the syntax is highlighted by #25. Thank you in advance :)
import pandas as pd
def length_vs_references(articles):
res = {"1-5" : 0, "6-10" : 0, "11-15" : 0, "16-20" : 0, "21-25" : 0, "25-30" : 0, ">30" :0}
n = {"1-5" : 0, "6-10" : 0, "11-15" : 0, "16-20" : 0, "21-25" : 0, "25-30" : 0, ">30" :0}
cursor = articles.aggregate([
{'$match': {'$and' : [{'references': {'$exists': False}
}, {'$ne':['$page_end', '']}, {'$ne':['$page_start', '']} ]}},
{'$project': {'len_refernces': {"$size": '$references'},
'pages': {'$subtract': [{"$toInt": 'page_end'},
{"$toInt" : 'page_start'}]}}},
{'$bucket' : {
'$groupBy': '$pages',
'boundaries': [ 0, 6, 11, 16, 21, 26, 31, 1000000],
'default': 'Other',
{
'output' : {"average": {"$avg" : '$len_references'}},
}
} #25
}
])
return cursor
print(length_vs_references(articles))

'default': 'Other',
{
'output' : {"average": {"$avg" : '$len_references'}},
}
That is the problem area. You have a sub-dictionary without a key name.
To illustrate the problem more simply, here is an equivalent dictionary:
mydict = {
'some_key': 5,
'other_key': 10,
'yet_another_key': 100,
3,
'final_key': 1000
}
The 3 is an error because it's just a value without a key name. Your code has a sub-dictionary instead of an integer, but it's the same kind of error.

Related

find time difference between previous and new data in collection using pymongo

I have list of dictionary data which is inserted in mongodb using pymongo
data =[{
"cam_name" : "cam1",
"stats" : [
{
"total" : 10,
"red" : 5,
"yellow" : 0,
"green" : 5,
"time_stamp" : datetime(2020,6,20,17,52,4,992000),
"image_path" : "image/19-06-2020/cam1/19-06-2020_17-52-16.jpg"
},
{
"total" : 10,
"red" : 5,
"yellow" : 0,
"green" : 5,
"time_stamp" : datetime(2020,6,20,17,52,27,992000),
"image_path" : "image/19-06-2020/cam1/19-06-2020_17-52-25.jpg"
},
{
"total" : 10,
"red" : 5,
"yellow" : 0,
"green" : 5,
"time_stamp" : datetime(2020,6,20,17,52,1,992000),
"image_path" : "image/19-06-2020/cam1/19-06-2020_17-52-25.jpg"
}]
},
{
"cam_name": "cam2",
"stats": [
{
"total": 10,
"red": 5,
"yellow": 0,
"green": 5,
"time_stamp": datetime(2020, 6, 20, 17, 52, 6, 992000),
"image_path": "image/19-06-2020/cam1/19-06-2020_17-52-16.jpg"
},
{
"total": 10,
"red": 5,
"yellow": 0,
"green": 5,
"time_stamp": datetime(2020, 6, 20, 17, 52, 59, 992000),
"image_path": "image/19-06-2020/cam1/19-06-2020_17-52-25.jpg"
},
{
"total": 10,
"red": 5,
"yellow": 0,
"green": 5,
"time_stamp": datetime(2020,6, 20, 17, 52, 4, 992000),
"image_path": "image/19-06-2020/cam1/19-06-2020_17-52-25.jpg"
}]
}
]
And after entering the data I am trying to find the difference between time between each element in list of dictionaries available for key ['stats']['time_stamp']
But I am not able to find out by using the approach of finding difference between the elements available in the list
from pymongo import MongoClient
from datetime import datetime
import json
import pandas as pd
from datetime import timedelta
myclient = MongoClient('localhost', 27017)
master_data = myclient['data_set']
cam_db = master_data['cam_table']
prev = None
for i,c in enumerate(data):
for k in c['stats']:
#print(k['time_stamp'])
if prev==None:
prev = k['time_stamp']
#print(prev)
else:
prev = k['time_stamp'] - prev
print(prev)
output:
0:00:23
2020-06-20 17:52:08.992000
-1 day, 23:59:56
2020-06-20 17:52:31.992000
0:00:00
after that, I am not able to find the perfect approach to use for find out the time difference
Note: I want to check whether time difference between every data available is less than 20 seconds or not
Suggestions will be really helpful

How to find a document by using specific datetime?

This code runs perfectly:
dt = datetime.datetime(2018, 7, 15, 13, 24, 1, 962)
for doc in db.wing_model.find({'sampling_time': {'$gte': dt}}):
print doc
But when I try to find a document with specific datetime like below, it returns nothing.
dt = datetime.datetime(2018, 7, 15, 13, 24, 1, 962)
for doc in db.wing_model.find({'sampling_time': dt}):
print doc
Here's my document looks like:
{
"_id" : ObjectId("5b4ae88100285b236134e08e"),
"value" : [
{
"MQ3_ALCOHOL" : 0,
"MQ5_H2" : 0.39,
"MQ138_PROPANE" : 4.4,
"MQ135_ALCOHOL" : 1.25
}
],
"enose_id" : "node1",
"sampling_time" : ISODate("2018-07-15T13:24:01.962Z")
}

Google Sheets alternating colors via API

Using the sheets API with Python I'm attempting to format a sheet using alternating colors. In the UI this is found at Format > Alternating colors...
From what I've been able to find this is done via the API using banding. Unfortunately, I haven't been able to find a working example of how this is done. Below is the values dictionary I've constructed, color values aren't important at the moment, I'd just like it to colorize the sheet.
requests = {
'bandedRange': {
'bandedRangeId': 1,
'range': {
'sheetId': 0,
'startRowIndex': 0,
'endRowIndex': len(values),
'startColumnIndex': 0,
'endColumnIndex': 4,
},
'rowProperties': {
'headerColor': {
'red': 1,
'green': 0,
'blue': 1,
'alpha': 1,
},
'firstBandColor': {
'red': 1,
'green': 0,
'blue': 0,
'alpha': 0,
},
'secondBandColor': {
'red': 0,
'green': 1,
'blue': 0,
'alpha': 0,
}
},
},
'fields': '*',
}
body = {'requests': requests}
response = service.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, body=body).execute()
This fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 840, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://sheets.googleapis.com/v4/spreadsheets/$spreadsheet_id:batchUpdate?alt=json returned "Invalid JSON payload received. Unknown name "banded_range" at 'requests': Cannot find field.">
I'm fairly certain my issue is the fields value, but I can't find a valid example of what to use here. I get the same error if I omit the fields key entirely.
Per the reference docs for batchUpdate, requests takes an array of Request objects. Each Request must have exactly one field set, the available fields for banding being:
"updateBanding": {
object(UpdateBandingRequest)
},
"addBanding": {
object(AddBandingRequest)
},
"deleteBanding": {
object(DeleteBandingRequest)
},
There is no field bandedRange, which is what you're trying to set. That's what the error message (Unknown name "banded_range" at 'requests': Cannot find field.) is saying... though I have no idea why it translated bandedRange to snake_case.
Depending on if you want to add or update the banded range, you'd set either updateBanding with an UpdateBandingRequest object, or addBanding with an AddBandingRequest object.
By adding addBanding to your JSON format. As explained above you will end up creating the below JSON. Also, the key fields is optional.
{'addBanding': {
'bandedRange': {
'bandedRangeId': 1,
'range': {
'sheetId': 0,
'startRowIndex': 0,
'endRowIndex': len(values),
'startColumnIndex': 0,
'endColumnIndex': 4,
},
'rowProperties': {
'headerColor': {
'red': 1,
'green': 0,
'blue': 1,
'alpha': 1,
},
'firstBandColor': {
'red': 1,
'green': 0,
'blue': 0,
'alpha': 0,
},
'secondBandColor': {
'red': 0,
'green': 1,
'blue': 0,
'alpha': 0,
}
},
},
},
},

python api call using JSON data

Pretty new to Python API calls I Have the JSON data sample below
I can get data calling for example (swell.*) or (swell.minBreakingHeight) and it returns all the swell data no worries. So ok with a working request
I can't seem to narrow it down with success example swell.primary.height
Obviously the format above here is incorrect and keeps returning []
How do I get in that extra level?
[{
timestamp: 1366902000,
localTimestamp: 1366902000,
issueTimestamp: 1366848000,
fadedRating: 0,
solidRating: 0,
swell: {
minBreakingHeight: 1,
absMinBreakingHeight: 1.06,
maxBreakingHeight: 2,
absMaxBreakingHeight: 1.66,
unit: "ft",
components: {
combined: {
height: 1.1,
period: 14,
direction: 93.25,
compassDirection: "W"
},
primary: {
height: 1,
period: 7,
direction: 83.37,
compassDirection: "W"
},
Working with your data snippet:
data = [{
'timestamp': 1366902000,
'localTimestamp': 1366902000,
'issueTimestamp': 1366848000,
'fadedRating': 0,
'solidRating': 0,
'swell': {
'minBreakingHeight': 1,
'absMinBreakingHeight': 1.06,
'maxBreakingHeight': 2,
'absMaxBreakingHeight': 1.66,
'unit': "ft",
'components': {
'combined': {
'height': 1.1,
'period': 14,
'direction': 93.25,
'compassDirection': "W"
},
'primary': {
'height': 1,
'period': 7,
'direction': 83.37,
'compassDirection': "W"
}
}
}
}
]
In [54]: data[0]['timestamp']
Out[54]: 1366902000
In [55]: data[0]['swell']['components']['primary']['height']
Out[55]: 1
So using your dot notation, you should be calling:
swell.components.primary.height
For furhter insight on parsing json files refer to this other stackoverflow question

Get child dict values use Mongo Map/Reduce

I have a mongo collection, i want get total value of 'number_of_ad_clicks' by given sitename, timestamp and variant id. Because we have large data so it would be better use map/reduce. Could any guys give me any suggestion?
Here is my collection json format
{ "_id" : ObjectId( "4e3c280ecacbd1333b00f5ff" ),
"timestamp" : "20110805",
"variants" : { "94" : { "number_of_ad_clicks" : 41,
"number_of_search_keywords" : 9,
"total_duration" : 0,
"os" : { "os_2" : 2,
"os_1" : 1,
"os_0" : 0 },
"countries" : { "ge" : 6,
"ca" : 1,
"fr" : 8,
"uk" : 4,
"us" : 6 },
"screen_resolutions" : { "(320, 240)" : 1,
"(640, 480)" : 5,
"(1024, 960)" : 5,
"(1280, 768)" : 5 },
"widgets" : { "widget_1" : 1,
"widget_0" : 0 },
"languages" : { "ua_uk" : 8,
"ca_en" : 2,
"ca_fr" : 2,
"us_en" : 5 },
"search_keywords" : { "search_keyword_8" : 8,
"search_keyword_5" : 5,
"search_keyword_4" : 4,
"search_keyword_7" : 7,
"search_keyword_6" : 6,
"search_keyword_1" : 1,
"search_keyword_3" : 3,
"search_keyword_2" : 2 },
"number_of_pageviews" : 18,
"browsers" : { "browser_4" : 4,
"browser_0" : 0,
"browser_1" : 1,
"browser_2" : 2,
"browser_3" : 3 },
"keywords" : { "keyword_5" : 5,
"keyword_4" : 4,
"keyword_1" : 1,
"keyword_0" : 0,
"keyword_3" : 3,
"keyword_2" : 2 },
"number_of_keyword_clicks" : 83,
"number_of_visits" : 96 } },
"site_name" : "fonter.com",
"number_of_variants" : 1 }
Here is my try. but failed.
He is my try.
m = function() {
emit(this.query, {variants: this.variants});
}
r = function(key , vals) {
var clicks = 0 ;
for(var i = 0; i < vals.length(); i++){
clicks = vals[i]['number_of_ad_clicks'];
}
return clicks;
}
res = db.variant_daily_collection.mapReduce(m, r, {out : "myoutput", "query":{"site_name": 'fonter.com', 'timestamp': '20110805'}})
db.myoutput.find()
could somebody any suggestion?
Thank you very much, i try you solution but nothing return.
I invoke the mapreduce in the following, is there any thing wrong?
res = db.variant_daily_collection.mapReduce(map, reduce, {out : "myoutput", "query":{"site_name": 'facee.com', 'timestamp': '20110809', 'variant_id': '305'}})
db.myoutput.find()
The emit function emits both a key and a value.
If you are used to SQL think of key as your GROUP BY and value as your SUM(), AVG(), etc..
In your case you want to "group by": site_name, timestamp and variant id. It looks like you may have more than one variant, so you will need to loop through the variants, like this:
map = function() {
for(var i in variants){
var key = {};
key.timestamp = this.timestamp;
key.site_name = this.site_name;
key.variant_id = i; // that's the "94" string.
var value = {};
value.clicks = this.variants[i].number_of_ad_clicks;
emit(key, value);
}
}
The reduce function will get an array of values each one like this { clicks: 41 }. The function needs to return one object that looks the same.
So if you get values = [ {clicks:21}, {clicks:10}, {clicks:5} ] you must output {clicks:36}.
So you do something like this:
reduce = function(key , vals) {
var returnValue = { clicks: 0 }; // initializing to zero
for(var i = 0; i < vals.length(); i++){
returnValue.clicks += vals[i].clicks;
}
return returnValue;
}
Note that the value from map has the same shape as the return from reduce.

Categories

Resources