Django queryset: Build a dictionary from a queryset, with common elements - python

I'm trying to construct a dictionary from my database, that will separate my data into values with common time stamps.
data_point:
time: <timestamp>
value: integer
I have 66k data points, out of which around 7k share timestamps (meaning the measurement was taken at the same time.
I need to make a dict that would look like:
{
"data_array": [
{
"time": "2018-05-11T10:34:43.826Z",
"values": [
13560465,
87856595,
78629348
]
},
{
"time": "2018-05-11T10:34:43.882Z",
"values": [
13560689,
78237945,
92378456
]
}
]
}
There are other keys in the dictionary, but I'm just having a bit of a struggle with this particular key.
The idea is, look at my data queryset, and group up objects that share a timestamp, then add a key "time" to my dict, with the value being the timestamp, and an array "values" with the value being a list of those data.value objects
I'm not experienced enough to build this without looping a lot and probably being very innefficient. Some kind of "while timestamp doesn't change: append value to list", though I'm not sure how to go about that either.
Ideally, if I can do this with queries (should be faster, right?) I would prefer that.

Why not use collections.defaultdict?
from collections import defaultdict
data = defaultdict(list)
# qs is your queryset
for time, value in qs.values_list('time', 'value'):
data[time].append(value)
In that case data looks like:
{
'time_1': [
value_1_1,
value_1_2,
...
],
'time_2': [
value_2_1,
value_2_2,
...
],
....
}
at this point you can build any output format you want

Related

Convert Embedded JSON Dict To Panda DataFrame Where Columns Headers Are Seperate From Values

I'm trying to create a python pandas DataFrame out of a JSON dictionary. The embedding is tripping me up.
The column headers are in a different section of the JSON file to the values.
The json looks similar to below. There is one section of column headers and multiple sections of data.
I need each column filled with the data that relates to it. So value_one in each case will fill the column under header_one and so on.
I have come close, but can't seem to get it to spit out the dataframe as described.
{
"my_data": {
"column_headers": [
"header_one",
"header_two",
"header_three"
],
"values": [
{
"data": [
"value_one",
"value_two",
"value_three"
]
},
{
"data": [
"value_one",
"value_two",
"value_three"
]
}
]
}
}
Assuming your dictionary is my_dict, try:
>>> pd.DataFrame(data=[d["data"] for d in my_dict["my_data"]["values"]],
columns=my_dict["my_data"]["column_headers"])

how to check whether the comma-separated value in the database is present in JSON data or not using python

I just have to check the JSON data on the basis of comma-separated e_code in the table.
how to filter only that data where users e_codes are available
in the database:
id email age e_codes
1. abc#gmail 19 123456,234567,345678
2. xyz#gmail 31 234567,345678,456789
This is my JSON data
[
{
"ct": 1,
"e_code": 123456,
},
{
"ct": 2,
"e_code": 234567,
},
{
"ct": 3,
"e_code": 345678,
},
{
"ct": 4,
"e_code": 456789,
},
{
"ct": 5,
"e_code": 456710,
}
]
If efficiency is not an issue, you could loop through the table, split the values to a list by using case['e_codes'].split(',') and then, for each code loop through the JSON to see whether it is present.
This might be a little inefficient if your data, JSON, or number of values are long.
It might be better to first create a lookup dictionary in which the codes are the keys:
lookup={}
for e in my_json:
lookup[e['e_code']] = 1
You can then check how many of the codes in your table are actually in the JSON:
## Let's assume that the "e_codes" cell of the
## current line is data['e_codes'][i], where i is the line number
for i in lines:
match = [0,0]
for code in data['e_codes'][i].split(','):
try:
match[0]+=lookup[code]
match[1]+=1
except:
match[1]+=1
if match[1]>0: share_present=match[0]/match[1]
For each case, you get a share_present, which is 1.0 if all codes appear in the JSON, 0.0 if none of them do and some value between to indicate the share of codes that were present. Depending on your threshold for keeping a case you can set a filter to True or False depending on this value.

How to pick apart array data

Trying to output just the employee data(empfirst, emplast, empsalary, emproles) to a bottle project. I Just want the value not the keys. How would I go about this? It feels like i've tried everything but can't get at the data I need!
My query
emp_curs = connection.coll.find({},{"_id": False,"employee.empFirst":True})
dept_list = list(emp_curs)```
(just playing with the first name for now until its working)
My loop
```% for d in emp_list:
% for i in d:
<tr>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
<td>{{d[i]}}</td>
</tr>
%end
%end```
thats the closest i've gotten :\
Looking to take all the data and place in a table.
Sorry, here is the whole data file!
Sorry, here's some sample data
[
{
"deptCode": "ACCT",
"deptName": "Accounting",
"deptBudget": 200000,
"employee": [
{
"empFirst": "Marsha",
"empLast": "Bonavoochi",
"empSalary": 59000
},
{
"empFirst": "Roberto",
"empLast": "Acostaletti",
"empSalary": 85000,
"empRoles": [
"Manager"
]
},
{
"empFirst": "Dini",
"empLast": "Cappelletti",
"empSalary": 50500
}
]
}
]
It looks like you are stopping just one layer early within your nested list of dictionaries. This should get you all the applicable values for the employee data:
for department in department_list:
for employee in department["employee"]:
for value in employee.values():
print(value) # or whatever operation you want, adding to the table in your case
Looks like you have adding to the table working as you want, so that should work for you. Based on the structure of your sample data, I'm assuming there will be multiple departments to pull this data from (hence me starting with department_list).

Writing 3 python dictionaries to a csv

I have 3 dictionaries( 2 of them are setdefault dicts with multiple values)-
Score_dict-
{'Id_1': [('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [('100000435456546',5.7),
('100000234354556',3.5)]}
post_dict-
{'Id_1':[(+,100004536)],
'Id_2' :[(-,100035430)]}
comment_dict-
{'Id_1':[(+,1023434234)],
'Id_2':[(-,10343534534)
(*,1097963644)]}
My current approach is to write them into 3 different csv files and then merging them,I want to merge them according to a common first row(ID_row).
But I am unable to figure out how to merge 3 csv files into a single csv file. Also , Is there any way which I can write all the 3 dictionaries into a single csv without writing them individually.
Output required-
Ids Score_Ids Post_Ids Comment_Ids
Id_1 100001124156327',0.0 +,100004536 +,1023434234
100003643614411',0.0
Id_2 100000435456546',5.7 -,100035430 -,10343534534
100000234354556',3.5 *,1097963644
How to do this in a correct way with the best approach?
You can merge them all first, then write them to a csv file:
import pprint
scores = {
'Id_1': [
('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [
('100000435456546',5.7),
('100000234354556',3.5)
]
}
post_dict = {
'Id_1':[
('+',100004536)
],
'Id_2' :[
('-',100035430)
]
}
comment_dict = {
'Id_1':[
('+',1023434234)
],
'Id_2':[
('-',10343534534),
('*',1097963644)
]
}
merged = {
key: {
"Score_Ids": value,
"Post_Ids": post_dict[key],
"Comment_Ids": comment_dict[key]
}
for key, value
in scores.iteritems()
}
pp = pprint.PrettyPrinter(depth=6)
pp.pprint(merged)
For reference: https://repl.it/repls/SqueakySlateblueDictionaries
I suggest you to transform your three dicts into one list of dicts before write it to a csv file.
Example
rows = [
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
...
]
And then use the csv.DictWriter class to write all the rows.
Since you have commas in your values (are you sure it's a good behaviour? Maybe splitting them in two different columns could be a better approach), be careful to use tabs or something else as separator
I suggest writing all three to the same file
You could get common keys by doing something like:
common_keys = set(score_dict.keys()+post_dict.keys()+comment_dict.keys())
for key_ in common_keys:
val_score = score_dict.get(key_, some_default_value)
post_score = post_dict.get(key_, some_default_value)
comment_score = comment_dict.get(key_, some_default_value)
# print key and vals to csv as before

How to get documents where KEY is greater than X

i am recording user's daily usage of my platform.
structures of documents in mongodb are like that:
_id: X
day1:{
loginCount = 4
someDict { x:y, z:m }
}
day2:{
loginCount = 5
someDict { a:b, c:d }
}
then, i need to get last 2 day's user stats which belongs to user X.
how can i get values whose days are greater than two days ago? (like using '$gte' command?)
Ok, if you insist on this scheme try this:
{
_id: Usemongokeyhere
userid: X
days: [
{day:IsoDate(2013-08-12 00:00),
loginCount: 10,
#morestuff
},
{day:IsoDate(2013-08-13 00:00),
loginCount: 11,
#morestuff
},
]
},
#more users
Then you can query like:
db.items.find(
{"days.day":{$gte:ISODate("2013-08-30T00:00:00.000Z"),
$lt: ISODate("2013-08-31T00:00:00.000Z")
}
}
)
Unless there is any change in the question, i am answering based on this schema.
_id: X
day1:{
loginCount:4
someDict:{ x:y, z:m }
}
day2:{
loginCount:5
someDict:{ a:b, c:d }
}
Answer:
last 2 day's user stats which belongs to user X.
You cannot get it from mongo side with operators like $gte, with this structure, because you get the whole days when do query for user X. The document contains information about all days and keeping dynamic values as keys is in my opinion a bad practice. You can retrieve a documents by defining fields like db.collection.find({_id:X},{day1:1,day2:1})
However you have to know what the keys are and i am not sure how you keep day1 and day2 as key iso date, timestamp? Depending on how you hold it, you can write fields on the query by writing yesterday and before yesterday as date string or timestamp and get your required information.

Categories

Resources