Getting database object in Django python - python

The following is my database model:
id | dept | bp | s | o | d | created_by | created_date
dept and bp together have an unique index for the table. This means there will always be 30 different records in the database under the same dept and bp. I was trying to do an update function on the records, by getting the object first. The following is the way I tried to get the object:
try:
Sod_object = Sod.objects.get(dept=dept_name, bp=bp_name)
except Sod.DoesNotExist:
print "Object doesn't exist"
msg = "Sod doesn't exist!"
else:
for s in Sod_object:
# Do something
But it's always giving me 30 records (obviously). How can I make this a single object? Any suggestions?

In Django, every table in the database will automatically have a column called id. That is the default primary key column and unique for every object in a table.
So you can do
Sod_object = Sod.objects.get(id=1)
to fetch the first object in your table.
To update all the 30 objects for your dept-bp combination, you can filter() by those values
sods = Sod.objects.filter(dept=dept_name, bp=bp_name)
and then update each one and save. Django will internally remember the individual records by their id.
for sod in sods:
sod.o = 1
sod.save()

Related

SQLAchemy: Delete all duplicate rows [duplicate]

I'm using SQLAlchemy to manage a database and I'm trying to delete all rows that contain duplicates. The table has an id (primary key) and domain name.
Example:
ID| Domain
1 | example-1.com
2 | example-2.com
3 | example-1.com
In this case I want to delete 1 instance of example-1.com. Sometimes I will need to delete more than 1 but in general the database should not have a domain more than once and if it does, only the first row should be kept and the others should be deleted.
Assuming your model looks something like this:
import sqlalchemy as sa
class Domain(Base):
__tablename__ = 'domain_names'
id = sa.Column(sa.Integer, primary_key=True)
domain = sa.Column(sa.String)
Then you can delete the duplicates like this:
# Create a query that identifies the row for each domain with the lowest id
inner_q = session.query(sa.func.min(Domain.id)).group_by(Domain.domain)
aliased = sa.alias(inner_q)
# Select the rows that do not match the subquery
q = session.query(Domain).filter(~Domain.id.in_(aliased))
# Delete the unmatched rows (SQLAlchemy generates a single DELETE statement from this loop)
for domain in q:
session.delete(domain)
session.commit()
# Show remaining rows
for domain in session.query(Domain):
print(domain)
print()
If you are not using the ORM, the core equivalent is:
meta = sa.MetaData()
domains = sa.Table('domain_names', meta, autoload=True, autoload_with=engine)
inner_q = sa.select([sa.func.min(domains.c.id)]).group_by(domains.c.domain)
aliased = sa.alias(inner_q)
with engine.connect() as conn:
conn.execute(domains.delete().where(~domains.c.id.in_(aliased)))
This answer is based on the SQL provided in this answer. There are other ways of deleting duplicates, which you can see in the other answers on the link, or by googling "sql delete duplicates" or similar.

Complex query using Django QuerySets

I am working on a personal project and I am trying to write a complex query that:
Gets every device that belongs to a certain user
Gets every sensor belonging to every one of the user's devices
Gets the last recorded value and timestamp for each of the user's devices sensors.
I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.
My models:
class User(AbstractBaseUser):
email = models.EmailField()
class Device(models.Model):
user = models.ForeignKey(User)
name = models.CharField()
class Unit(models.Model):
name = models.CharField()
class SensorType(models.Model):
name = models.CharField()
unit = models.ForeignKey(Unit)
class Sensor(models.Model):
gpio_port = models.IntegerField()
device = models.ForeignKey(Device)
sensor_type = models.ForeignKey(SensorType)
class SensorData(models.Model):
sensor = models.ForeignKey(Sensor)
value = models.FloatField()
timestamp = models.DateTimeField()
And here is the SQL query:
SELECT acc.email,
dev.name as device_name,
stype.name as sensor_type,
sen.gpio_port as sensor_port,
sdata.value as sensor_latest_value,
unit.name as sensor_units,
sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user as acc on dev.user_id = acc.id
INNER JOIN devices_sensor as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
FROM devices_sensordata as sd
INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id
I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...
The closest I managed to get is with this:
>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)
However it has two problems:
It does not include the sensors that do not yet have any values in SensorsData
If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors
If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!
Thanks!
P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.
EDIT:
Based on the answers I should clarify:
I only want the latest sensor value for each sensor. For example I have In sensordata:
id | sensor_id | value | timestamp|
1 | 1 | 2 | <today> |
2 | 1 | 5 | <yesterday>|
3 | 2 | 3 | <yesterday>|
Only the latest should be returned for each sensor_id:
id | sensor_id | value | timestamp |
1 | 1 | 2 | <today> |
3 | 2 | 3 | <yesterday>|
Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).
EDIT2:
Based on #ivissani 's answer, I managed to produce this:
>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
... print(x)
Which seems to do the job.
This is the SQL it produces:
SELECT
"devices_device"."name",
"devices_sensortype"."name",
"devices_sensor"."gpio_port",
"devices_sensordata"."value",
"devices_unit"."name",
"devices_sensordata"."timestamp"
FROM
"devices_sensor"
LEFT OUTER JOIN "devices_sensordata" ON (
"devices_sensor"."id" = "devices_sensordata"."sensor_id"
)
INNER JOIN "devices_device" ON (
"devices_sensor"."device_id" = "devices_device"."id"
)
INNER JOIN "devices_sensortype" ON (
"devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
)
INNER JOIN "devices_unit" ON (
"devices_sensortype"."unit_id" = "devices_unit"."id"
)
WHERE
(
NOT EXISTS(
SELECT
U0."id",
U0."sensor_id",
U0."value",
U0."timestamp"
FROM
"devices_sensordata" U0
WHERE
(
U0."sensor_id" = ("devices_sensor"."id")
AND U0."timestamp" > ("devices_sensordata"."timestamp")
)
) = True
AND "devices_device"."user_id" = 1
)
ORDER BY
"devices_device"."name" ASC
Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True)
Then it's just a matter of filtering this queryset by user in the following way:
certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).
The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:
latest_data = SensorData.objects.annotate(
is_latest=~Exists(
SensorData.objects.filter(sensor=OuterRef('sensor'),
timestamp__gt=OuterRef('timestamp'))
)
).filter(is_latest=True, sensor=OuterRef('pk'))
sensors_with_value = Sensor.objects.annotate(
latest_value=Subquery(latest_data.values('value')),
latest_value_timestamp=Subquery(latest_data.values('timestamp'))
) # This will generate two subqueries...
certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.
For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects
It's perfectly fine to execute raw queries with django, especially if they are that complex.
If you want to map the results to models use this :
https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries
Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Note that in both cases, no checking is done on the query by django.
This means that the security of the query is your full responsability, sanitize the parameters.
Something like this?:
Multiple Devices for 1 User
device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
).values("sensor__device__name", "sensor__sensor_type__name",
"value","timestamp").order_by("-timestamp")
1 Device, 1 User
SensorData.objects.filter(sensor__device__user=user
).values("sensor__device__name", "sensor__sensor_type__name",
"value", "timestamp").order_by("-timestamp")
That Queryset will:
1.Gets every device that belongs to a certain user
2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)
3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.
UPDATE
try this:
list_data=[]
for _id in device_ids:
sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
if sensor_data.exists():
data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
list_data.append(data)

What kind of table is created when executing a query that returns nothing?

I'm using petl and trying to create a simple table with a value from a query. I have written the following:
#staticmethod
def get_base_price(date):
# open connection to db
# run SQL query to check if price exists for that date
# set base price to that value if it exists
# set it to 100 if it doesn't
sql = '''SELECT [TimeSeriesValue]
FROM [RAP].[dbo].[TimeSeriesPosition]
WHERE TimeSeriesTypeID = 12
AND SecurityMasterID = 45889
AND FundID = 7
AND EffectiveDate = %s''' % date
with self.job.rap.connect() as conn:
data = etl.fromdb(conn, sql).cache()
return data
I'm connecting to the database, and if there's a value for that date, then i'll be able to create a table that would look like this:
+-----------------+
| TimeSeriesValue |
+=================+
| 100 |
+-----------------+
However, if the query returns nothing, what would the table look like?
I want to set the TimeSeriesValue to 100 if the query returns nothing. Not sure how to do that.
You should be passing in parameters when you execute the statement rather than munging the string . . but that is not central to your question.
Possibly the simplest solution is to do all the work in SQL. If you are expecting at most one row from the query, then:
SELECT COALESCE(TimeSeriesValue, 100) as TimeSeriesValue
FROM [RAP].[dbo].[TimeSeriesPosition]
WHERE TimeSeriesTypeID = 12 AND
SecurityMasterID = 45889 AND
FundID = 7 AND
EffectiveDate = %s
This will always return one row and if nothing is found, it will put in the 100 value.
A query that returns nothing would just display the column name(s) with nothing under them. I'd try something like this:
IF(TimeSeriesValue IS NOT NULL)
<your query>
ELSE
SET TimeSeriesValue = 100

Django ORM fails to recognise concrete inheritance in nested ON statement

Defining a custom Django user combined with django-taggit I have ran into an ORM issue, I also have this issue in the django admin filters.
NOTE: I am using this snippet: https://djangosnippets.org/snippets/1034/
# User
id | first_name
---------------------------------
1 | John
2 | Jane
# MyUser
usr_ptr_id | subscription
---------------------------------
1 | 'A'
2 | 'B'
Now when I use the django ORM to filter on certain tags for MyUser, e.g.
MyUser.objects.filter(tags__in=tags)
I get the following error:
(1054, "Unknown column 'myapp_user.id' in 'on clause'")
The printed raw query:
SELECT `myproject_user`.`id`, `myproject_user`.`first_name`, `myapp_user`.`user_ptr_id`, `myapp_user`.`subscription`
FROM `myapp_user` INNER JOIN `myproject_user`
ON ( `myapp_user`.`user_ptr_id` = `myproject_user`.`id` )
INNER JOIN `taggit_taggedtag`
ON ( `myapp_user`.`id` = `taggit_taggedtag`.`object_id`
AND (`taggit_taggedtag`.`content_type_id` = 31))
WHERE (`taggit_taggedtag`.`tag_id`)
IN (SELECT `taggit_tag`.`id` FROM `taggit_tag` WHERE `taggit_tag`.`id` IN (1, 3)))
Changing 'id' to 'user_ptr_id' in the second ON part makes the query work is there any way to force this with the Django ORM ?
The issue is that you can't look for an ID in a list of Tags; you need to look for the ID in a list of IDs. To fix this, construct a values_list of all of the IDs you want to filter by, and then pass that list off to your original query instead.
id_list = Tag.objects.all().values_list("id")
MyUser.objects.filter(tags__in=id_list)
If you have a many_to_many rleationship between MyUser and Tag, you can also just use the manytomany manager in place of the whole thing:
MyUser.tags.all()

Grouping by week, and padding out 'missing' weeks

In my Django model, I've got a very simple model which represents a single occurrence of an event (such as a server alert occurring):
class EventOccurrence:
event = models.ForeignKey(Event)
time = models.DateTimeField()
My end goal is to produce a table or graph that shows how many times an event occurred over the past n weeks.
So my question has two parts:
How can I group_by the week of the time field?
How can I "pad out" the result of this group_by to add a zero-value for any missing weeks?
For example, for the second part, I'd like transform a result like this:
| week | count | | week | count |
| 2 | 3 | | 2 | 3 |
| 3 | 5 | —— becomes —> | 3 | 5 |
| 5 | 1 | | 4 | 0 |
| 5 | 1 |
What's the best way to do this in Django? General Python solutions are also OK.
Django's DateField as well as datetime doesn't support week attribute. To fetch everything in one query you need to do:
from django.db import connection
cursor = connection.cursor()
cursor.execute(" SELECT WEEK(`time`) AS 'week', COUNT(*) AS 'count' FROM %s GROUP BY WEEK(`time`) ORDER BY WEEK(`time`)" % EventOccurrence._meta.db_table, [])
data = []
results = cursor.fetchall()
for i, row in enumerate(results[:-1]):
data.append(row)
week = row[0] + 1
next_week = results[i+1][0]
while week < next_week:
data.append( (week, 0) )
week += 1
data.append( results[-1] )
print data
After digging django query api doc, I have don't found a way to make query through django ORM system. Cursor is a workaround, if your database brand is MySQL:
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute("""
select
week(time) as `week`,
count(*) as `count`
from EventOccurrence
group by week(time)
order by 1;""")
myData = dictfetchall(cursor)
This is, in my opinion, the best performance solution. But notice that this don't pad missing weeks.
EDITED Indepedent database brand solution via python (less performance)
If you are looking for database brand independece code then you should take dates day by day and aggregate it via python. If this is your case code may looks like:
#get all weeks:
import datetime
weeks = set()
d7 = datetime.timedelta( days = 7)
iterDay = datetime.date(2012,1,1)
while iterDay <= datetime.now():
weeks.add( iterDay.isocalendar()[1] )
iterDay += d7
#get all events
allEvents = EventOccurrence.objects.value_list( 'time', flat=True )
#aggregate event by week
result = dict()
for w in weeks:
result.setdefault( w ,0)
for e in allEvents:
result[ e.isocalendar()[1] ] += 1
(Disclaimer: not tested)
Since I have to query multiple tables by join them, I'm using db view to solve these requirements.
CREATE VIEW my_view
AS
SELECT
*, // <-- other fields goes here
YEAR(time_field) as year,
WEEK(time_field) as week
FROM my_table;
and the model as:
from django.db import models
class MyView(models.Model):
# other fields goes here
year = models.IntegerField()
week = models.IntegerField()
class Meta:
managed = False
db_table = 'my_view'
def query():
rows = MyView.objects.filter(week__range=[2, 5])
# to handle the rows
after get rows from this db view, use the way by #danihp to padding 0 for "hole" weeks/months.
NOTE: this is only tested for MySQL backend, I'm not sure if it's OK for MS SQL Server or other.

Categories

Resources