I'm writing a web scraper to get information about customers and appointment times to visit them. I have a class called Job that stores all the details about a specific job. (Some of its attributes are custom classes too e.g Client).
class Job:
def __init__(self, id_=None, client=Client(None), appointment=Appointment(address=Address(None)), folder=None,
notes=None, specific_reqs=None, system_notes=None):
self.id = id_
self.client = client
self.appointment = appointment
self.notes = notes
self.folder = folder
self.specific_reqs = specific_reqs
self.system_notes = system_notes
def set_appointment_date(self, time, time_format):
pass
def set_appointment_address(self, address, postcode):
pass
def __str__(self):
pass
My scraper works great as a stand alone app producing one instance of Job for each page of data scraped.
I now want to save these instances to a Django database.
I know I need to create a model to map the Job class onto but that's where I get lost.
From the Django docs (https://docs.djangoproject.com/en2.1/howto/custom-model-fields/) it says in order to use my Job class in the Django model I don't have to change it at all. That's great - just what I want. but I can't follow how to create a model that maps to my Job class.
Should it be something like
from django.db import models
import Job ,Client
class JobField(models.Field):
description = "Job details"
def __init__(self, *args, **kwargs):
kwargs['id_'] = Job.id_
kwargs['client'] = Client(name=name)
...
super().__init__(*args, **kwargs)
class Job(models.Model):
job = JobField()
And then I'd create a job using something like
Job.objects.create(id_=10101, name="Joe bloggs")
What I really want to know is am I on the right lines? Or (more likely) how wrong is this approach?
I know there must be a big chunk of something missing here but I can't work out what.
By mapping I'm assuming you want to automatically generate a Django model that can be migrated in the database, and theoretically that is possible if you know what field types you have, and from that code you don't really have that information.
What you need to do is to define a Django model like exemplified in https://docs.djangoproject.com/en/2.1/topics/db/models/.
Basically you have to create in a project app's models.py the following class:
from django import models
class Job(models.Model):
client = models.ForeignKey(to=SomeClientModel)
appointment = models.DateTimeField()
notes = models.CharField(max_length=250)
folder = models.CharField(max_length=250)
specific_reqs = models.CharField(max_length=250)
system_notes = models.CharField(max_length=250)
I don't know what data types you actually have there, you'll have to figure that out yourself and cross-reference it to https://docs.djangoproject.com/en/2.1/ref/models/fields/#model-field-types. This was just an example for you to understand how to define it.
After you have these figured out you can do the Job.objects.create(...yourdata).
You don't need to add an id field, because Django creates one by default for all models.
Related
I have a program that lets users upload data files and lookup tables (both which are ID'd to a specific company) and map them together. One page lets users choose which company they want to map data for by looking at which companies have both data files and lookup tables, which I use a queryset/model manager for. The problem is if I load a new data file and hierarchy the queryset doesn't pick them up until the server restarts. The queryset returns all the companies that have a data file and hierarchies at the time the server starts, but not anything that's added afterwards. I think this must be because the queryset is defined at startup, but I'm not sure. Is there a way to work around this?
forms.py
class CompanySelectionForm(forms.Form):
companies = RawData.objects.get_companyNames(source="inRDandH")
companiesTuple = makeTuple(companies)
print(companiesTuple)
company = forms.ChoiceField(widget=forms.Select(attrs={'class': 'form-select'}), choices=companiesTuple)
managers.py
class RawDataManager(models.Manager):
def get_queryset(self):
return RawDataQuerySet(self.model, using=self._db)
def get_companyNames(self, source):
return self.get_queryset().get_companyNames(source)
class RawDataQuerySet(models.QuerySet):
def get_companyNames(self, source):
if (source == 'inRDandH'):
distinct_companiesRD = self.filter(fileType=0).values_list('companyName', flat=True).distinct()
distinct_companiesH = self.filter(fileType=1).values_list('companyName', flat=True).distinct()
distinct_companies = set(distinct_companiesRD).intersection(set(distinct_companiesH))
else:
distinct_companies = self.values_list('companyName', flat=True).distinct()
return distinct_companies
The problem is that this code runs only once, when the code is initialised on server start, because it is part of your form class definition:
companies = RawData.objects.get_companyNames(source="inRDandH")
The solution is to make choices a callable, which is run every time the form is instantiated. define that field dynamically, in the __init__ method of the form:
def get_companies_tuple():
companies = RawData.objects.get_companyNames(source="inRDandH")
return makeTuple(companies)
class CompanySelectionForm(forms.Form):
company = forms.ChoiceField(
widget=forms.Select(attrs={'class': 'form-select'}),
choices=get_companies_tuple
)
This will now fetch the data from the database every time the form is initialised, rather than only once during startup.
Question
How can I build a Model that that stores one field in the database, and then retrieves other fields from an API behind-the-scenes when necessary?
Details:
I'm trying to build a Model called Interviewer that stores an ID in the database, and then retrieves name from an external API. I want to avoid storing a copy of name in my app's database. I also want the fields to be retrieved in bulk rather than per model instance because these will be displayed in a paginated list.
My first attempt was to create a custom Model Manager called InterviewManager that overrides get_queryset() in order to set name on the results like so:
class InterviewerManager(models.Manager):
def get_queryset(self):
query_set = super().get_queryset()
for result in query_set:
result.name = 'Mary'
return query_set
class Interviewer(models.Model):
# ID provided by API, stored in database
id = models.IntegerField(primary_key=True, null=False)
# Fields provided by API, not in database
name = 'UNSET'
# Custom model manager
interviewers = InterviewerManager()
However, it seems like the hardcoded value of Mary is only present if the QuerySet is not chained with subsequent calls. I'm not sure why. For example, in the django shell:
>>> list(Interviewer.interviewers.all())[0].name
'Mary' # Good :)
>>> Interviewer.interviewers.all().filter(id=1).first().name
'UNSET' # Bad :(
My current workaround is to build a cache layer inside of InterviewManager that the model accesses like so:
class InterviewerManager(models.Manager):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.api_cache = {}
def get_queryset(self):
query_set = super().get_queryset()
for result in query_set:
# Mock querying a remote API
self.api_cache[result.id] = {
'name': 'Mary',
}
return query_set
class Interviewer(models.Model):
# ID provided by API, stored in database
id = models.IntegerField(primary_key=True, null=False)
# Custom model
interviewers = InterviewerManager()
# Fields provided by API, not in database
#property
def name(self):
return Interviewer.interviewers.api_cache[self.id]['name']
However this doesn't feel like idiomatic Django. Is there a better solution for this situation?
Thanks
why not just make the API call in the name property?
#property
def name(self):
name = get_name_from_api(self.id)
return name
If that isnt possible by manipulating a get request where you can add a list of names and recieve the data. The easy way is to do it is in a loop.
I would recommand you to build a so called proxy where you load the articles in a dataframe/dict, save this varible data ( with for example pickle ) and use it when nessary. It reduces loadtimes and is near efficient.
I am new to django and I'm trying to do something pretty simple.
my models.py is as below:
from django.db import models
class DiskDrive(models.Model):
deviceId = models.CharField(max_length=64, primary_key=True)
freeSpace = models.BigIntegerField()
def __unicode__(self):
return self.deviceId
class StoragePool(models.Model):
poolId = models.CharField(max_length=256, primary_key=True)
size = models.BigIntegerField()
drive = models.ForeignKey(DiskDrive, related_name='pools')
def __unicode__(self):
return self.poolId
I haven't added anything to views.py and urls.py yet.
I'm able to create objects of both the classes.
Whenever I create an object of StoragePool class, I want to reduce the value of 'freeSpace' attribute of the related DiskDrive object by the 'size' of 'StoragePool' object. How should I do this? Please help...
Sounds like the perfect job for a post_save signal:
#receiver(post_save, sender=StoragePool)
def update_drive_space(sender, instance, created, **kwargs):
if created:
instance.drive.freeSpace = F('freeSpace') - instance.size
instance.drive.save(update_fields=['freeSpace'])
See Django signals documentation for more info about how signals work. In a nutshell this method will be called each time you create or update a StoragePool object.
Few notes:
I am using F expression to reference current database size value instead of blindly saving whatever we have on Django side (this will ensure correct value when multiple clients will create new StoragePool objects)
It is likely that you want to adjust freeSpace attribute also when size is updated - not just on StoragePool creation. To make that happen just delete if created check and it will be run on every StoragePool.save()
I'm using Piston to create an API for an application in Django.
I'll try to explain my doubt on an easy way.
Let's think I've got two models:
class Device(models.Model):
id = models.TextField(...)
class Person(models.Model):
name = models.TextField(...)
device = models.ForeigKey(Device)
Now, if I receive an url like this:
(r'^api/(?P<person_name>\w+)/(?P<device_id>\w+)$',handler),
I want to add a person to the DB and, to do that, I need to add a new Device to the DB, but, since handlers in Piston are linked to a Model, how can I add a device to the DB in the same handler?
I tried something like this:
class PersonHandler(BaseHandler):
allowed_methods= ('PUT')
model = Person
def create(self, request, person_name, device_id):
Device.objects.create(id=device_id)
d = Device.objets.get(id=device_id)
Person.objects.create(name=person_name,device=d)
return rc.CREATED
But I guess it won't work.
How can I do what I want to do?
I'm using django to build an internal webapp where devices and analysis reports on those devices are managed.
Currently an abstract Analysis is defined like this:
class Analysis(models.Model):
project = models.ForeignKey(Project)
dut = models.ForeignKey(Dut) # Device Under Test
date = models.DateTimeField()
raw_data = models.FileField(upload_to="analysis")
public = models.BooleanField()
#property
def analysis_type(self):
s = str(self.__class__)
class_name = s.split('.')[-1][:-2] # Get latest name in dotted class name as remove '> at end
return AnalysisType.objects.get(name=class_name)
class Meta:
abstract = True
There are then a number of different analysis types that can be done on a device, with different resulting data.
class ColorAnalysis(Analysis):
value1 = models.FloatField()
value2 = models.FloatField()
...
class DurabilityAnalysis(Analysis):
value1 = models.FloatField()
value2 = models.FloatField()
...
...
Each such analysis is created from an Excel sheet posted by the operator. There exists an Excel template the operator fills in for each analysis type.
(The issue here is not if data input should be done in a web form, there are lots of reasons to choose the Excel path)
On a page on the website all analysis types should be listed along with a link to the corresponding Excel sheet template used to report that analysis.
Currently I have defined something like
class AnalysisType(models.Model):
name = models.CharField(max_length=256 )
description = models.CharField(max_length=1024,blank=True )
template = models.FileField(upload_to="analysis_templates")
but when I though about how I would link this data to the different analysis result model classes I though that what I want to do is to add this data as class attributes to each analysis type.
The problem is that the class attributes are already used by the django magic to define the data of each instance.
How do I add "class attributes" to django models? Any other ideas on how to solve this?
EDIT:
Now added the analysis_type property by looking up the class name. This requires no manual adding of a variable to each sub-class. Works fine, but still requires manual adding of an entry of AnalysisType corresponding to each sub-class. It would be nice if this could be handled by the class system as well. Any ideas?
How about a property or method that returns an AnalysisType dependent on an attribute in the particular Analysis subclass?
class Analysis(models.Model):
...
#property
def analysis_type(self):
return AnalysisType.objects.get(name=self.analysis_type_name)
class ColorAnalysis(Analysis):
analysis_type_name = 'color'
class DurabilityAnalysis(Analysis):
analysis_type_name = 'durability'