How to use lazy_attribute with Faker in Factory Boy

How to use lazy_attribute with Faker in Factory Boy - python

Context:
I have a model with two dates, I want to use factory.Faker for both of them but the second date should always be greater that the first one.
I tried this:
Model excerpt:
class Event(models.Model):
execution_start_date = models.DateTimeField()
execution_end_date = models.DateTimeField()
Factory:
class EventFactory(factory.DjangoModelFactory):
class Meta:
model = Event
strategy = factory.BUILD_STRATEGY
execution_start_date = factory.Faker('date_time_this_year', tzinfo=pytz.utc)
#factory.lazy_attribute
def execution_end_date(self):
return factory.Faker('date_time_between_dates',
datetime_start=self.execution_start_date,
datetime_end=now(),
tzinfo=pytz.utc)
But when I try to use the factory from the python shell I got this:
In [3]: e = EventFactory()
In [4]: e.execution_end_date
Out[4]: <factory.faker.Faker at 0x1103f51d0>
The only way I managed to make it work was with like this:
#factory.lazy_attribute
def execution_end_date(self):
# return factory.Faker('date_time_between_dates',
# datetime_start=self.execution_start_date,
# datetime_end=now(),
# tzinfo=pytz.utc)
faker = factory.Faker._get_faker()
return faker.date_time_between_dates(datetime_start=self.execution_start_date,
datetime_end=now(),
tzinfo=pytz.utc)
But I honestly think there is a better way to do it.
My dependencies are:
Django (1.8.18)
factory-boy (2.8.1)
Faker (0.7.17)

When lazy_attribute come into play you already have generated object on your hand. So you can work with, for example, random and timedelta, like this:
#factory.lazy_attribute
def execution_end_date(self):
max_days = (now() - self.execution_start_date).days
return self.execution_start_date + timedelta(random.randint(1, max_days))
or some other way to generate random date. There is no point to stick to factory_boy.Faker
EDIT
After my first answer I manage to found a way to do what you want, it's really simple.You just need to call generate() method with empty dict from Faker:
#factory.lazy_attribute
def execution_end_date(self):
return factory.Faker('date_time_between_dates',
datetime_start=self.execution_start_date,
datetime_end=now(),
tzinfo=pytz.utc).generate({})

At first I was trying to do the same thing, but according to Factory Boy's documentation for the Faker wrapper, the parameters can be any valid declaration. That means that you're allowed to specify each of the faker's parameter as a SelfAttribute, LazyAttribute, etc. I don't know when this feature was first introduced, but version 3.1.0 (Oct 2020) docs mentions it for the first time.
I believe that the question's example can be rewritten as:
class EventFactory(factory.DjangoModelFactory):
class Meta:
model = Event
strategy = factory.BUILD_STRATEGY
execution_start_date = factory.Faker('date_time_this_year', tzinfo=pytz.utc)
execution_end_date = factory.Faker('date_time_between_dates',
datetime_start=factory.SelfAttribute('..execution_start_date'),
datetime_end='now',
tzinfo=pytz.utc)
So basically it's turned around and the faker's parameters are evaluated instead of the LazyAttribute's return value. In my example datetime_start now refers to whatever execution_start_date resolves to and datetime_end takes a literal string 'now', that the Faker library will replace with the current datetime.

Related

Prioritizing calculated fields that depend on each other

In my odoo instance I have several calculated fields on the analytic account object. These fields are calculated to ensure the viewer always has the most up to date overview.
Some of these fields depend on other fields that are by themselves calculated fields. The calculations by themselves are fairly simple (field A = field B + field C). Most of the fields are also depending on the underlying child ids. For example, field A on the top object is a summary of all field A values of the child ids. Field A on the children is calculated on their own field B and C combined, as described above.
The situation I currently find myself in is that for some reason the fields seem to be calculated in a random order. I noticed this because when I refresh in rapid succession I get different values for the same record.
Example:
Field B and C are both 10. I expect A to be 20 (B+C) but most of the times it's actually 0 because field calculation for A happens before B and C. Sometimes it's 10 since either B or C snuck in before A could finish. On very rare occasions it's actually 20....
Note:
- I cannot make the fields stored because they will depend on account move lines which are created at an incredible rate and the database will go absolutely nuts recalculating all records every minute or so.
- I already added the #api.depends but this is only useful if you use stored fields to determine that fields should trigger it, which is not applicable in my situation.
Does anyone know of a solution to this? Or have suggestions on alternative ways of calculating?
[EDIT] Added code
Example code:
#api.multi
#api.depends('child_ids','costs_allowed','total_cost')
def _compute_production_result(self):
for rec in self:
rec_prod_cost = 0.0
if rec.usage_type in ['contract','project']:
for child in rec.child_ids:
rec_prod_cost += child.production_result
elif rec.usage_type in ['cost_control','planning']:
rec_prod_cost = rec.costs_allowed - rec.total_cost
rec.production_result = rec_prod_cost
As you can see, if we are on a contract or project we need to look at the children (cost_control accounts) for their results and ADD them together. If we are actually on a cost_control account, then we can get the actual values by taking field B and C and (in this case) subtracting them.
The problem occurs when EITHER the contract records are handled before the cost_control OR the costs_allowed and total_cost fields are 0.0 when evaluating the cost_control accounts.
Mind you: costs_allowed and total_cost are both calculated fields in their own respect!

You can do as they did in Invoice, many computed fields depends on many other fields and they set a value for each computed field.
#api.one
#api.depends('X', 'Y', ...)
def _compute_amounts(self):
self.A = ...
self.B = ...
self.C = self.A + self.B

You may find python #properties helpful. Rather than just using plain fields this allows you do define something that looks like a field, but is lazy evaluated - i.e. calculated on demand when you 'get' it. This way we can guarantee it's up to date. An example:
import datetime
class Person(object):
def __init__(self):
self._born = datetime.datetime.now()
#property
def age(self):
return datetime.datetime.now() - self._born
p = Person()
# do some stuff...
# We can 'get' age just like a field, but it is lazy evaluated
# i.e. calculated on demand
# This way we can guarantee it's up to date
print(p.age)

So I managed to find a colleague and we figured it out together.
As it so turns out, when you define a method that calculates a field for both it's own record as well as depending on that field on child records, you need to explicitly mention this in the dependencies.
For example:
#api.multi
#api.depends('a', 'b', 'c')
def _compute_a(self):
for rec in self:
if condition:
rec.a = sum(child_ids.a)
else:
rec.a = rec.b + rec.c
In this example, the self object contains records (1,2,3,4).
If you include the dependency but otherwise let the code remain the same, like so:
#api.multi
#api.depends('a', 'b', 'c', 'child_ids.a')
def _compute_a(self):
for rec in self:
if condition:
rec.a = sum(child_ids.a)
else:
rec.a = rec.b + rec.c
will run this method 4 times, starting with the lowest/deepest candidate. So self in this case will be (4), then (3), etc.
Too bad this logic seems to be implied and not really described anywhere (as far as I could see).

Pythonic way of creating simple and complex version of two function calls (ie overloading a function)

I have two functions I would like to make use of: a complex version and a simple version.
complex: u.upload("name2", "cat2", "mod2")
simple u.upload("name2")
I would like to keep the default parameters as globals inside the .py file.
uploader.py
category_s = ""
model_s = ""
def upload(name, category = category_s, model = model_s):
print(name, category, model)
upload(name, category, model, author, asRoot = True)
script.py
import uploader as u
u.category_s = "cat1"
u.model_s = "mod1"
u.upload("name1")
u.upload("name2", "cat2", "mod2")
Output
name1
name2, cat2, mod2
Desired Output
name1, cat1, mod1
name2, cat2, mod2
It's as if def upload(category = category_s) doesn't see category_s. I have tried calling category_s as a global, and it still doesn't work.
Is there a pythonic way of achieving this? In another language, I would overload the upload function and make a class to hold the category and model variables. I thought I would be able to use the parameter assignment and a global variable at the top the achieve the same thing.
Edit: Figured out what the problem is. category = category_s inside the function definition is evaluated when upload.py is imported. So whatever value it is set to, it cannot be changed upon future category_s = ... modifications. Is there any way to get around this behavior? Or would I want to? Is there a better way to achieve what I am going for? I want a simple function that I am going to use 99% of the time, with the option to call the complex parameters. The catch is I also want to periodically change the default parameters in the simple function.

In Python, default function parameters are evaluated ONCE, at the time the function is defined - further changes to category_s and model_s do not affect the default values. Yes, this leads to all sorts of surprises...
To achieve a changeable default, do something like this:
def upload(name, category = None, model = None):
if category is None:
category = category_s
if model is None:
model = model_s
print(name, category, model)
You may need to choose a different default value if None is a potential actual value for the parameter.

You could move it into an object and use parameters first and then fallback to attributes.
class Uploader:
def __init__(self, category="", model=""):
self.category_s = category
self.model_s = model
def setCategory(self, category):
self.category_s = category
def setModel(self, model):
self.model_s = model
def upload(self, name, category=None, model=None):
print(name, category or self.category_s, model or self.model_s)
And other code
from uploader import Uploader
u = Uploader()
u.category_s = "cat1"
u.model_s = "mod1"
u.upload("name1")
u.upload("name2", "cat2", "mod2")
Output
('name1', 'cat1', 'mod1')
('name2', 'cat2', 'mod2')

What is the difference between table level operation and record-level operation?

While going through the documentation of django to muster the detailed knowledge, i endured the word 'table level operation' and 'record level operation'. What is the difference in between them? Could anyone please explain me this 2 word with example? Does they have other name too?
P.S I am not asking their difference just because i feel they are alike but i feel it can be more clear to comprehend this way.

In the context of Django, record level operations are those that on a single records. An example is when you define custom methods in a model:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
birth_date = models.DateField()
def baby_boomer_status(self):
"Returns the person's baby-boomer status."
import datetime
if self.birth_date < datetime.date(1945, 8, 1):
return "Pre-boomer"
elif self.birth_date < datetime.date(1965, 1, 1):
return "Baby boomer"
else:
return "Post-boomer"
Table level operations are those that act on a set of records and an example of these are when you define a ModelManager for a class:
# First, define the Manager subclass.
class DahlBookManager(models.Manager):
def get_queryset(self):
return super(DahlBookManager, self).get_queryset().filter(author='Roald Dahl')
# Then hook it into the Book model explicitly.
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.CharField(max_length=50)
objects = models.Manager() # The default manager.
dahl_objects = DahlBookManager() # The Dahl-specific manager.
PS: I took these examples from django documentation.

I do not know specifically how Django people use the terms, but 'record-level operation' should mean an operation on 1 or more records while a 'table-level operation' should mean an operation of the table as a whole. I am not quite sure what an operation on all rows should be -- perhaps both, perhaps it depends on the result.
In Python, the usual term for 'record-level' would be 'element-wise'. For Python builtins, bool operates on collections: bool([0, 1, 0, 3]) = True. For numpy arrays, bool operates (at least usually) on elements: `bool([0, 1, 0, 2]) = [False, True, False, True]. Also compare [1,2,3]*2 = [1,2,3,1,2,3] versus [1,2,3]*2 = [2,4,6].
I hope this helps. See if it makes sense in context.

How to judge if a django model object is exactly a BaseClass instance?

I have a ParentModel model in django:
class ParentModel(models.Model):
field_a = models.IntegerField()
field_b = models.IntegerField()
class Meta:
db_table = 'table_parent'
And I defined a super class ChildModel after that:
class ChildModel(ParentModel):
field_c = models.IntegerField()
class Meta:
db_table = 'table_child'
The above will create two tables in my database, which we called table_parent and table_child.
So, now I create two instances:
first_obj = ParentModel.objects.create(...) # id=1
second_obj = ChildModel.objects.create(...) # id=2
And it will create two objects, which totally inserted two rows in table_parent and one row in table_child.
Now, if I fetch the instances, but both create from ParentModel:
first_obj = ParentModel.objects.get(id=1) # id=1
second_obj = ParentModel.objects.create(id=2) # id=2
So, in fact, the second_obj is a ChildModel instance. I want a neat way to judge it, like:
first_obj.is_exact_base() # I want it to be True
second_obj.is_exact_base() # I want it to be False
More, I may have more than one Super Classes of ParentModel, I want it can work well in that case.
My effort:
class ParentModel(models.Model):
...
def is_exact_base(self):
try:
child = self.childmodel
return False
except:
return True
This method can work, but too much redundancy, is there a best implementation for my problem?

Could you provide a complete minimum working example? I've never used Django and I don't even see what you try to do.
From a python programmer perspective (sorry, I'm convinced that I did not understand everything), if you want to know if objectA is an instance of BaseClass, what you want is:
isinstance(objectA, BaseInstance)
If this cannot be applied to your case, then Python has no way to tell the difference. Just like if you do:
def f(a, b):
a.append(b)
def g(a, b):
a.append(b)
a = []
f(a, 0)
g(a, 1)
>>> print(a)
[0, 1]
You have absolutely no way of telling which function appended what.
So if you are in this case, you need either to add a new column to your table which would contain that information or the object you are writing should contain that information.

Retrieve UNIX TIMESTAMP in django api call

I have the following query:
users = Analytics.objects.values('date', 'users').order_by('date')
[(datetime.date(2012, 8, 20), 156L), (datetime.date(2012, 8, 21), 153L),...]
How would I get the unix timestamp of the datetime here? The equivalent of doing:
select concat(UNIX_TIMESTAMP(date), '000') as date, users from analytics_analytics
1345446000000 156
1345532400000 153
1345618800000 153
Note that I do not want to do formatting in python as there are a lot of the above calls that need to be done.

The best way to do this, if really want the DB to do the conversion (and this can be applicable for other things, in general when you want to return information that's not readily available in a column from the DB), you can use the QuerySet.extra() function https://docs.djangoproject.com/en/dev/ref/models/querysets/#extra like this:
Analytics.objects.extra(select={'timestamp': "CONCAT(UNIX_TIMESTAMP(date), '000')"}).order_by('date').values('timestamp', 'users')
Downside of this is that it's no longer DB agnostic, so when you change RDBMS, you have to change the code. Upside is you can use values with it, unlike a property on the model.

Python datetime.date objects don't have a time component, so the timestamp you want needs a little qualification. If midnight suffices, you can use the .timetuple() method together with the time.mktime() function to create a timestamp:
>>> import datetime, time
>>> adate = datetime.date(2012, 8, 20)
>>> print time.mktime(adate.timetuple())
1345413600.0
If you need a specific time in the day, use the datetime.datetime.combine() class method to construct a datetime, then use the same trick to make it a timestamp:
>>> adatetime = datetime.datetime.combine(adate, datetime.time(12, 0))
>>> print time.mktime(adatetime.timetuple())
1345456800.0
Use either as a property on your Django model:
class Analytics(models.Model):
#property
def timestamp(self):
return time.mktime(self.date.timetuple())

I am not familiar with UNIX_TIMESTAMP, but with regard to Django: why not create a calculated field as proprty at your Django Model
class Analytics(...):
...
#property
def unix_timestamp(self):
return time.mktime(self.date.timetuple())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.