Storing user permissions on rows of data - python

I have a table with rows of data for different experiments.
experiment_id data_1 data_2
------------- ------ -------
1
2
3
4
..
I have a user database on django, and I would like to store permissions indicating which users can access which rows and then return only the rows the user is authorized for.
What format should I use to store the permissions? Simply a table with a row for each user and a column for each experiment with Boolean? And in that case I would have to add a row to this table each time an experiment is added?
user experiment_1 experiment_2 experiment_3 ..
---- ------------ ------------ ------------ --
user_1 True False False ..
user_2 False True False ..
..
Any reference literature on the topic would also be great, preferably related to sqlite3 functionality since that is my current db backend.

I'm not 100% sure what all will work best for you but in the past I find using a solution as follows to be the easiest to query against and maintain in the future.
Table: Experiment
Experiment_Id | data_1 | data_2
-----------------------------------
1 | ... | ...
2 | ... | ...
Table: User
User_Name | Password | ...
----------------------------
User1 | ...
User2 | ...
Table: User_Experiment_Permissions
User_Name | Experiment | Can_Read | Can_Edit
--------------------------------------------
User1 | 1 | true | false
User2 | 1 | false | false
User1 | 2 | true | true
User2 | 2 | true | false
As you can see, in the new table we reference both the user and the experiment. This allows us fine grain control over the relationship between the user and the experiment. Also, if this relationship had a new permission that arose, such as can_delete then you can simply add this to the new cross reference table with a default and the change will be retrofit into your your system :-)

It depends on the way you will use the permissions for.
- In case you will use this values inside a query
you have two options for example to get the users with specific permiss
Create a bit masking number fields and every bit will represent permission, and you can use AND/OR to get whatever combinations of permissions you need.
Advantage : small size, very efficient
Disadvantage: complex to implement
Create a field for each permission ( your solution ).
Advantage : To easy to add
Disadvantage: Have to edit schema with each permission
- In case you will not use it for any query and will process it at the code you can just dump a JSON into one column include all the permission the user has like :
{"experiment_1": 1, "experiment_2": 0, "experiment_3": 1}

Related

How to compare two tables and identify a specific type of row to be returned?

I have two tables with relatively different data.
the photos table is a table with all the relevant meta data for photos such as user_id, photo_id, datetime, name, etc.
I have another table ratings that holds liked/disliked data for each respective photo. The columns in this table would have rater_id(for the person rating the picture), photo_id, and the rating (like/dislike).
The user would be presented a picture (at random) and then pick whether they liked it or not. Every time the image is loaded/presented it would have to be something that they have not yet rated.
What I'm trying to do is return a photo_id where the user has not yet rated it.
I've thought of using join or union, but I'm having difficulty understanding how to best use those (or any other solution) for this application. Where my confusion lies is how I can compare the ratings table against the photos table, to only return the photos that have not been rated by rater_id.
Sample data
photos table
id | photo_id
-------------------------
1 | photo_123
2 | photo_456
3 | photo_432
4 | photo_642
-------------------------
ratings table
id | photo_id | rater_id | rating
---------------------------------
1 | photo_123 | user2 | 1
2 | photo_456 | user2 | 1
3 | photo_123 | user1 | 1
4 | photo_642 | user2 | 1
--------------------------------
Sample Result: return photo_432 for user2 because it has not yet had a rating in ratings table
The canonical way would be not exists:
select p.*
from photos p
where not exists (select 1
from ratings r
where r.photo_id = p.id and
r.rater_id = #rater
)
order by rand()
limit 1;
There are more efficient ways to get a random row back if the table is big.

SQLite: Transposing results of a GROUP BY and filling in IDs with names

My question is rather specific, if you have a better title please suggest one. Also, formatting is bad - didn't know how to combine lists and codeblocks.
I have an SQLite3 database with the following (relevant parts of the) .schema:
CREATE TABLE users (id INTEGER PRIMARY KEY NOT NULL, user TEXT UNIQUE);
CREATE TABLE locations (id INTEGER PRIMARY KEY NOT NULL, name TEXT UNIQUE);
CREATE TABLE purchases (location_id INTEGER, user_id INTEGER);
CREATE TABLE sales (location_id integer, user_id INTEGER);
purchases has about 4.5mil entries, users about 300k, sales about 100k, and locations about 250 - just to gauge the data volume.
My desired use would be to generate a JSON object to be handed off to another application, very much condensed in volume by doing the following:
-GROUPing both purchases and sales into one common table BY location_id,user_id - IOW, getting the number of "actions" per user per location. That I can do, result is something like
loc | usid | loccount
-----------------------
1 | 1246 | 123
1 | 2345 | 1
13 | 1246 | 46
13 | 8732 | 4
27 | 2345 | 41
(At least it looks good, always hard to tell with such volumes; query:
select location_id,user_id,count(location_id) from
(select location_id,user_id from purchases
union all
select location_id,user_id from sales)
group by location_id,user_id order by user_id`
)
-Then, transposing that giant table such that I would get:
usid | loc1 | loc13 | loc27
---------------------------
1246 | 123 | 46 | 0
2345 | 1 | 0 | 41
8732 | 0 | 4 | 0
That I cannot do, and it's my absolutely crucial point for this question. I tried some things I found online, especially here, but I just started SQLite a little while ago and don't understand many queries.
-Lastly, translate the table into plain text in order to write it to JSON:
user | AAAA | BBBBB | CCCCC
---------------------------
zeta | 123 | 46 | 0
beta | 1 | 0 | 41
iota | 0 | 4 | 0
That I probably could do with quite a bit of experimentation and inner join, although I'm always very unsure what way is the best approach to handle such data volumes, hence I wouldn't mind a pointer.
The whole thing is written in Python's sqlite3 interface, if it matters. In the end, I'd love to have something I could just do a "for" loop per user over in order to generate the JSON, which would then of course be very simple. It doesn't matter if the query takes a long time (<10min would be nice), it's only run twice per day as a sort of backup. I've only got a tiny VPS available, but being limited to a single core the performance is as good as on my reasonably powerful desktop. (i5-3570k running Debian.)
The table headers are just examples because I wasn't quite sure if I could use integers for them (didn't discover the syntax if so), as long as I'm somehow able to look up the numeric part in the locations table I'm fine. Same for translating the user IDs into names. The number of columns is known beforehand - they're after all just INTEGER PRIMARY KEYs and I have a list() of them from some other operation. The number of rows can be determined reasonably quickly, ~3s, if need be.
Consider using subqueries to achieve your desired transposed output:
SELECT DISTINCT m.usid,
IFNULL((SELECT t1.loccount FROM tablename t1
WHERE t1.usid = m.usid AND t1.loc=1),0) AS Loc1,
IFNULL((SELECT t2.loccount FROM tablename t2
WHERE t2.usid = m.usid AND t2.loc=13),0) AS Loc13,
IFNULL((SELECT t3.loccount FROM tablename t3
WHERE t3.usid = m.usid AND t3.loc=27),0) AS Loc27
FROM tablename As m
Alternatively, you can use nested IF statements (or in the case of SQLite that uses CASE/WHEN) as derived table:
SELECT temp.usid, Max(temp.loc1) As Loc1,
Max(temp.loc13) As Loc13, Max(temp.loc27) As Loc27
FROM
(SELECT tablename.usid,
CASE WHEN loc=1 THEN loccount ELSE 0 As Loc1 END,
CASE WHEN loc=13 THEN loccount ELSE 0 As Loc13 END,
CASE WHEN loc=27 THEN loccount ELSE 0 As Loc27 END
FROM tablename) AS temp
GROUP BY temp.usid

Django: Foreign key optional but unique?

I have a model (A) that has 2 foreign keys: b and c.
b and c should unique together INCLUDING NULL
therefore b and c should only be NULL together ONCE
However, I am unable to accomplish this in Django. Here is the code I have so far. Any help is appreciated!
-_-
class A(models.Model):
b = models.ForeignKey('B', blank = True, null = True)
c = models.ForeignKey('C', blank = True, null = True)
class Meta:
unique_together = (
('b', 'c')
)
This code will produce this unwanted result in the database:
+----+------+------+
| id | b_id | c_id |
+----+------+------+
| 1 | 2 | 3 |
| 2 | 2 | 77 |
| 3 | 2 | NULL |
| 4 | 2 | NULL |
| 5 | NULL | NULL |
| 6 | NULL | NULL |
+----+------+------+
The first 2 rows can only be inserted once by django. which is great :)
However, the remaining rows are duplicate entries for me, and i'd like to restrict this.
UPDATE
I've found something that gets the job done, but it seems really hacky..
Any thoughts?
class A(models.Model):
def clean(self):
from django.core.exceptions import ValidationError
if not any([self.b, self.c]):
if Setting.objects.filter(b__isnull = True, c__isnull = True).exists():
raise ValidationError("Already exists")
elif self.b and not self.c:
if Setting.objects.filter(c__isnull = True, b = self.b).exists():
raise ValidationError("Already exists")
elif self.c and not self.user:
if Setting.objects.filter(c = self.c, b__isnull = True).exists():
raise ValidationError("Already exists")
It's not a problem with Django but with the SQL spec itself - NULL is not a value, so it must NOT be taken into account for uniqueness constraints checks.
You can either have a "pseudo-null" B record and a "pseudo-null" C record in your db and make them the defaults (and not allow NULL of course), or have a denormalized field like OBu suggests.
Maybe there is a better solution out there, but you could do the following:
create a new attribute d and find a generic way to combine b_id and c_id (e.g. str(b_id) + "*" + str(c_id) and do this automatically on model creation (the signals mechanism might come in handy, here)
use d as primary_key
This is more a work around then a solution, but it should do the trick.
One more thought: Would it be an option to check whether there is aready an existing instance with "Null"/"Null" on creation / update of your instance? This would not solve your problem on database level, but the logics would work as expected.
You can use the Unique constraint for b_id column. It wont allow the duplicate entries. Even for a_id column, primary key can be used. Primary key means the combination of unique key and not null constraints.

Django group by id then select max timestamp

It might be a redundant question, but I have tried previous answers from other related topics and still can't figure it out.
I have a table Board_status looks like this (multiple status and timestamp for each board):
time | board_id | status
-------------------------------
2012-4-5 | 1 | good
2013-6-6 | 1 | not good
2013-6-7 | 1 | alright
2012-6-8 | 2 | good
2012-6-4 | 3 | good
2012-6-10 | 2 | good
Now I want to select all records from Board_status table, group all of them by board_id for distinct board_id, then select the latest status on each board. Basically end up with table like this (only latest status and timestamp for each board):
time | board_id | status
------------------------------
2013-6-7 | 1 | alright
2012-6-4 | 3 | good
2012-6-10 | 2 | good
I have tried:
b = Board_status.objects.values('board_id').annotate(max=Max('time')).values_list('board_id','max','status')
but doesn't seem like it is working. Still give me more than 1 record per board_id.
Which command should I use in Django to do this?
An update, this is the solution I use. Not the best, but it works for now:
b=[]
a = Board_status.objects.values('board_id').distinct()
for i in range(a.count()):
b.append(Board_status.objects.filter(board_id=a[i]['board_id']).latest('time'))
So I got all board_id, store into list a. Then for each board_id, do another query to get the latest time. Any better answer is still welcomed.
How will it work? You neither have filter nor distinct to filter out the duplicates. I am not sure if this can be easily done in a single django query. You should read more on:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
https://docs.djangoproject.com/en/1.4/topics/db/aggregation/
If you can't do it in 1 raw sql query, you can't do it with an OR mapper either as it's built on top of mysql (in your case). Can you tell me how you would do this via raw SQL?

Pivoting data and complex annotations in Django ORM

The ORM in Django lets us easily annotate (add fields to) querysets based on related data, hwoever I can't find a way to get multiple annotations for different filtered subsets of related data.
This is being asked in relation to django-helpdesk, an open-source Django-powered trouble-ticket tracker. I need to have data pivoted like this for charting and reporting purposes
Consider these models:
CHOICE_LIST = (
('open', 'Open'),
('closed', 'Closed'),
)
class Queue(models.model):
name = models.CharField(max_length=40)
class Issue(models.Model):
subject = models.CharField(max_length=40)
queue = models.ForeignKey(Queue)
status = models.CharField(max_length=10, choices=CHOICE_LIST)
And this dataset:
Queues:
ID | Name
---+------------------------------
1 | Product Information Requests
2 | Service Requests
Issues:
ID | Queue | Status
---+-------+---------
1 | 1 | open
2 | 1 | open
3 | 1 | closed
4 | 2 | open
5 | 2 | closed
6 | 2 | closed
7 | 2 | closed
I would like to see an annotation/aggregate look something like this:
Queue ID | Name | open | closed
---------+-------------------------------+------+--------
1 | Product Information Requests | 2 | 1
2 | Service Requests | 1 | 3
This is basically a crosstab or pivot table, in Excel parlance. I am currently building this output using some custom SQL queries, however if I can move to using the Django ORM I can more easily filter the data dynamically without doing dodgy insertion of WHERE clauses in my SQL.
For "bonus points": How would one do this where the pivot field (status in the example above) was a date, and we wanted the columns to be months / weeks / quarters / days?
You have Python, use it.
from collections import defaultdict
summary = defaultdict( int )
for issue in Issues.objects.all():
summary[issue.queue, issue.status] += 1
Now your summary object has queue, status as a two-tuple key. You can display it directly, using various template techniques.
Or, you can regroup it into a table-like structure, if that's simpler.
table = []
queues = list( q for q,_ in summary.keys() )
for q in sorted( queues ):
table.append( q.id, q.name, summary.count(q,'open'), summary.count(q.'closed') )
You have lots and lots of Python techniques for doing pivot tables.
If you measure, you may find that a mostly-Python solution like this is actually faster than a pure SQL solution. Why? Mappings can be faster than SQL algorithms which require a sort as part of a GROUP-BY.
Django has added a lot of functionality to the ORM since this question was originally asked. The answer to how to pivot data since Django 1.8 is to use the Case/When conditional expressions. And there is a third party app that will do that for you, PyPI and documentation

Categories

Resources