i am recording user's daily usage of my platform.
structures of documents in mongodb are like that:
_id: X
day1:{
loginCount = 4
someDict { x:y, z:m }
}
day2:{
loginCount = 5
someDict { a:b, c:d }
}
then, i need to get last 2 day's user stats which belongs to user X.
how can i get values whose days are greater than two days ago? (like using '$gte' command?)
Ok, if you insist on this scheme try this:
{
_id: Usemongokeyhere
userid: X
days: [
{day:IsoDate(2013-08-12 00:00),
loginCount: 10,
#morestuff
},
{day:IsoDate(2013-08-13 00:00),
loginCount: 11,
#morestuff
},
]
},
#more users
Then you can query like:
db.items.find(
{"days.day":{$gte:ISODate("2013-08-30T00:00:00.000Z"),
$lt: ISODate("2013-08-31T00:00:00.000Z")
}
}
)
Unless there is any change in the question, i am answering based on this schema.
_id: X
day1:{
loginCount:4
someDict:{ x:y, z:m }
}
day2:{
loginCount:5
someDict:{ a:b, c:d }
}
Answer:
last 2 day's user stats which belongs to user X.
You cannot get it from mongo side with operators like $gte, with this structure, because you get the whole days when do query for user X. The document contains information about all days and keeping dynamic values as keys is in my opinion a bad practice. You can retrieve a documents by defining fields like db.collection.find({_id:X},{day1:1,day2:1})
However you have to know what the keys are and i am not sure how you keep day1 and day2 as key iso date, timestamp? Depending on how you hold it, you can write fields on the query by writing yesterday and before yesterday as date string or timestamp and get your required information.
Related
I have data of the form:
{
'_id': asdf123b51234
'field2': 0
'array': [
0: {
'unique_array_elem_id': id
'nested_field': {
'new_field_i_want_to_add': value
}
}
...
]
}
I have been trying to update like this:
for doc in update_dict:
collection.find_one_and_update(
{'_id':doc['_id']},
{'$set': {
'array.$[elem].nested_field.new_field_i_want_to_add':doc['new_field_value']
}
},
array_filters=[{'elem.unique_array_elem_id':doc['unique_array_elem_id']}]
But it is painfully slow. Updating all of my data will take several days running continuously. Is there a way to update this nested field for all array elements for a given document at once?
Thanks a lot
I just have to check the JSON data on the basis of comma-separated e_code in the table.
how to filter only that data where users e_codes are available
in the database:
id email age e_codes
1. abc#gmail 19 123456,234567,345678
2. xyz#gmail 31 234567,345678,456789
This is my JSON data
[
{
"ct": 1,
"e_code": 123456,
},
{
"ct": 2,
"e_code": 234567,
},
{
"ct": 3,
"e_code": 345678,
},
{
"ct": 4,
"e_code": 456789,
},
{
"ct": 5,
"e_code": 456710,
}
]
If efficiency is not an issue, you could loop through the table, split the values to a list by using case['e_codes'].split(',') and then, for each code loop through the JSON to see whether it is present.
This might be a little inefficient if your data, JSON, or number of values are long.
It might be better to first create a lookup dictionary in which the codes are the keys:
lookup={}
for e in my_json:
lookup[e['e_code']] = 1
You can then check how many of the codes in your table are actually in the JSON:
## Let's assume that the "e_codes" cell of the
## current line is data['e_codes'][i], where i is the line number
for i in lines:
match = [0,0]
for code in data['e_codes'][i].split(','):
try:
match[0]+=lookup[code]
match[1]+=1
except:
match[1]+=1
if match[1]>0: share_present=match[0]/match[1]
For each case, you get a share_present, which is 1.0 if all codes appear in the JSON, 0.0 if none of them do and some value between to indicate the share of codes that were present. Depending on your threshold for keeping a case you can set a filter to True or False depending on this value.
Trying to output just the employee data(empfirst, emplast, empsalary, emproles) to a bottle project. I Just want the value not the keys. How would I go about this? It feels like i've tried everything but can't get at the data I need!
My query
emp_curs = connection.coll.find({},{"_id": False,"employee.empFirst":True})
dept_list = list(emp_curs)```
(just playing with the first name for now until its working)
My loop
```% for d in emp_list:
% for i in d:
<tr>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
</tr>
%end
%end```
thats the closest i've gotten :\
Looking to take all the data and place in a table.
Sorry, here is the whole data file!
Sorry, here's some sample data
[
{
"deptCode": "ACCT",
"deptName": "Accounting",
"deptBudget": 200000,
"employee": [
{
"empFirst": "Marsha",
"empLast": "Bonavoochi",
"empSalary": 59000
},
{
"empFirst": "Roberto",
"empLast": "Acostaletti",
"empSalary": 85000,
"empRoles": [
"Manager"
]
},
{
"empFirst": "Dini",
"empLast": "Cappelletti",
"empSalary": 50500
}
]
}
]
It looks like you are stopping just one layer early within your nested list of dictionaries. This should get you all the applicable values for the employee data:
for department in department_list:
for employee in department["employee"]:
for value in employee.values():
print(value) # or whatever operation you want, adding to the table in your case
Looks like you have adding to the table working as you want, so that should work for you. Based on the structure of your sample data, I'm assuming there will be multiple departments to pull this data from (hence me starting with department_list).
I'm trying to construct a dictionary from my database, that will separate my data into values with common time stamps.
data_point:
time: <timestamp>
value: integer
I have 66k data points, out of which around 7k share timestamps (meaning the measurement was taken at the same time.
I need to make a dict that would look like:
{
"data_array": [
{
"time": "2018-05-11T10:34:43.826Z",
"values": [
13560465,
87856595,
78629348
]
},
{
"time": "2018-05-11T10:34:43.882Z",
"values": [
13560689,
78237945,
92378456
]
}
]
}
There are other keys in the dictionary, but I'm just having a bit of a struggle with this particular key.
The idea is, look at my data queryset, and group up objects that share a timestamp, then add a key "time" to my dict, with the value being the timestamp, and an array "values" with the value being a list of those data.value objects
I'm not experienced enough to build this without looping a lot and probably being very innefficient. Some kind of "while timestamp doesn't change: append value to list", though I'm not sure how to go about that either.
Ideally, if I can do this with queries (should be faster, right?) I would prefer that.
Why not use collections.defaultdict?
from collections import defaultdict
data = defaultdict(list)
# qs is your queryset
for time, value in qs.values_list('time', 'value'):
data[time].append(value)
In that case data looks like:
{
'time_1': [
value_1_1,
value_1_2,
...
],
'time_2': [
value_2_1,
value_2_2,
...
],
....
}
at this point you can build any output format you want
I have a function which computes something like sum of data (it's not a simple sum, there is an increasing number that multiplies it every time) in database through year. It is calculated in views, I need to pass them to template. I store it in Dictionary portfolio_dict[year] += amount
{'2013': Decimal('92.96892879384746351465539182'), '2012': Decimal('71.48765907571338816005401399')}
But I need some extra data to send as well. Let's say:
date:date
amount:Decimal
year:string
I know it sounds kind of stupid to have a year and date as well. I use year as index. How do I pass these data to template/add date to my current dictionary?
But now, I always had Model and I passed a list of that model instances. But now I don't need to store these data in database, so I don't want to create a model.
Where do I create new class in django if I don't want it to be in database?
Or should I use collections or data structures?
Only django.db.Model instances are stored in the database (and only if you explicitely ask for it). Else this is just plain old Python and you can create and use your own classes as you see fit.
But anyway: if all you need is a year-indexed collection of (date, amount) items, then a dict of dicts is enough:
{
'2013': {
'amount': Decimal('92.96892879384746351465539182'),
'date': datetime.date(2013, 10, 25)
},
# etc
}
Or if you need more than one (amount, date) per year, a dict with lists or dicts:
{
'2013': [
{
'amount': Decimal('92.96892879384746351465539182'),
'date': datetime.date(2013, 10, 25)
},
{
'amount': Decimal('29.9689287'),
'date': datetime.date(2013, 10, 21)
},
],
# etc
}
In fact the proper structure depends on how you're going to use the data.