Accessing uploaded files in ModelForm clean() - python

My system has products, with images associated with them, like so:
class Product(models.Model):
name = models.CharField(max_length=100)
...
class Image(models.Model):
product = models.ForeignKey(Product)
image = models.ImageField(upload_to='products')
So far so good. Naturally, the client wants to upload their products in bulk in a csv and upload a zip file containing the images. I format the csv as so:
product_name,image_1.jpg,image_2.jpg,...
product_2,image.jpg,...
So far I've made a model just as a helper:
class BulkUpload(models.Model):
csv = models.FileField(upload_to='tmp')
img_zip = models.FileField(upload_to='tmp')
The workflow goes something like this:
User uploads files via the django admin
Get the zip file contents and store for later
Extract the zip into tmp directory
Start a transaction. If anything unexpected happens from here we rollback
For each row in the csv
Create and save product with the name specified in the first column.
Grab image filenames from the other csv fields
Check the images are in the zip, otherwise rollback
Check the images don't already exist in the destination directory, otherwise rollback
Move the images to the destination directory and set the fk to the saved product object, rollback on any errors.
Commit the transaction
Delete the zip and csv, and delete the bulk upload object (or just don't save it)
If we roll back at any point we should somehow inform the user what went wrong.
My initial idea was to override save or use a post_save signal, but not having access to the request means I can neither use messages nor raise a validation error. Overriding model_save() in the admin has it's own problems, not being able to do any validation.
So now my thought is to change the ModelForm and give this to the django admin. I can override the clean() method, raise ValidationErrors and (presumably) run all my stuff in a transaction. But I'm struggling to figure out how I can access the files in such a way that I can use Python's ZipFile and csv libraries on them. It also feels a little dirty to do actual work in a form validation method, but I'm not sure where else I can do it.
I might have gone into too much detail, but I wanted to explain the solution so that alternative solutions can be suggested.

I don't think you should use a BulkUpload or any model representing this operation, at least if you plan on doing the process synchronously as you're currently suggesting. I would add an additional view to the admin area, either by hand or using a third party library, and there I would process the form and perform the workflow.
But anyways, given you already have your BulkUpload model, it's certainly easier to do it using a admin.ModelAdmin object. Your main concern seems to be where you should place the code of the transaction. As you've mentioned there are several alternatives. In my opinion, the best option would be to divide the process in two parts:
First, in your model's clean method you should check all the potential errors that may be produced by the user: images that already exist, missing images, duplicated products, etcetera. Here you should check that the uploaded files are OK, for instance using something like:
def clean(self):
if not zipfile.is_zipfile(self.img_zip.file):
raise ValidationError('Not a zip file')
After that, you know that any error that may arise from this point on will be produced by a system error: the bd failing, the HD not having enough space, etc. because all other possible errors should have been checked in the previous step. In your ModelAdmin.save_model method you should perform the rest of your workflow. You can inform the user of any errors using ModelAdmin.message_user.
As for the actual processing of the uploaded files, well, you named it: just use the zipfile and csv modules in the standard library. You should create a ZipFile object and extract it somewhere. Now, you should go over the data of your csv file using csv.reader. Something like this (not tested):
def save_model(self, request, obj, form, change):
# ...
with open('tmp/' + obj.img_zip.name, 'r') as csvfile:
productreader = csv.reader(csvfile)
for product_details in productreader:
p = Product(name=product_details[0])
p.save()
for image in product_details[1:]:
i = ImageField()
i.product = p
i.image = File(open('tmp/' + image)) # not tested
i.save()
After all this there would be no point in having a BulkUpload instance, so you should delete it. That's why I said in the beginning that that model is a little bit useless.
Obviously you'd need to add the code for the transactions and some other stuff, but I hope you get the general idea.

Related

How to create element in another model with django

I have a small question. That is actually making me scratch my head.
So in my Database, I have the following models:
Activity
Bill
Clients
I think you are all seeing the relationship I am trying to create :
A Bill has one client and one or more activities. Here is the trick to make this whole thing user-friendly I am trying to create Bills (with the url: Bill/new) that can be edited manually. So the user is sent to an HTML page with the basic Bill template and he has a table that can add some rows with the activity the time spent and its cost.
There are three things I am trying to achieve.
Generate automatically the ID of the Bill (it should be pk of Bill) but it seems it's not generated until I have pressed on save.
When I save a Bill I want to save also the activities I have entered manually.
When I save the Bill I would like to save it as a Word or PDF document in the database.
Are these possible?
Thanks all for reading and helping I am banging my head to figure out how to do all of this and I am quite a newbie so any help is welcome.
Thanks in advance.
One way would be for the Bill objects to have a boolean field in_preparation. There would be a sequence of forms involved. The first would create a minimal Bill object with in_preparation=True. Then the related objects could be created and linked to this Bill. The final stage would be to display the entire bill and related objects for checking, with options to go back and edit, or "Confirm and issue to customer". This latter would set in_preparation=False and generate the pdf and word files.
If all the necessary information is already available and you are just asking how to use a ModelForm in this circumstance, the answer is obj = form.save( commit=False). It's then up to you to call obj.save() once you have finished updating it or its related objects More information here
If you are saving a bunch of objects at once and want to be sure it's all-or-nothing (the latter if something throws an exception), you need a Django transaction.
Generate automatically the ID of the Bill (it should be pk of Bill) but it seems it's not generated until I have pressed on save.
For bills and any other "real world"-like documents you'd probably better define some custom number generator to detach number from the auto-id and to be able to generate numbers using more complex patterns (eg AB-12345/321).
But in any case whether you choose to use auto-id or a custom generator the most simple way to garanty existence and uniqueness of that id/number is to save bill instance first. Also this approach has some additional pros.
When I save a Bill I want to save also the activities I have entered manually.
You can use django formsets for this (assuming that your bill and activity models have fk or m2m relation)
https://docs.djangoproject.com/en/3.2/topics/forms/formsets/
When I save the Bill I would like to save it as a Word or PDF document in the database.
I'm not really sure if it is a good idea to store pdf or word files in db somehow.
In my opinion there are better ways
Generate pdf (word, excel etc) docs from data on request
Store as files in filesystem.
In this case files shouldn't be stored in publicly accessible
dirs (like media and static) and should be served instead with
FileResponce with proper access checks in the view

Getting ImageField file path post_delete

I'm making a little piece of software with Django and JS that will handle image uploads. So far, so good. I'm getting nice little images via AJAX from dropzone.js. They are saved on the file system and have an ImageField in my Photo model to keep track of what is stored and where.
I even stabbed dropzone.js to nicely ask my dev server to delete the database entries and the files themselves. I find that the latter part is lacking a bit. So I started writing a function that catches a post_delete signal from my Photo model and has the task of handling the deletion from the file system. The problem is, I can't seem to find a way to get my hands on the file path that's stored in database.
If I've understood correctly, the following should work:
from django.db import models
from django.db.models.signals import post_delete
from django.dispatch import receiver
class Photo(models.Model):
imageFile = models.ImageField(upload_to=generateImageFileNameAndPath)
#receiver(post_delete, sender=Photo)
def cleanupImageFiles(sender, **kwargs):
print("Cleanup called")
p = kwargs['instance']
path = p.imageFile.name
print(path)
But when I try to output path to the console, there's nothing.
Sorry about the upperCasing instead of using under_scores as seems to be Python convention. I personally find the underscore convention a bit annoying and am having a wrestling match inside my head over whether to follow the convention or just go my own way. For now, I've done the latter.
edit: I can't seem to make it work with p.imageFile.url either as suggested here.
editedit: I also tried with pre_delete signal thinking that maybe post_delete the data has already been blown to smithereens, which would be dumb, but who knows :)
edit3: calling imageFile.path, doesn't cut it either. It just produces
[27/Nov/2016 22:29:08] "POST /correcturl/upload/ HTTP/1.1" 200
Cleanup called
[27/Nov/2016 22:29:15] "DELETE /correcturl/upload/ HTTP/1.1" 500 37
on the console window. The HTTP error 500 just comes from the view not being able to handle the delete call because of this code not working properly. That's what I use as status message to the frontend at this point.
It might be worth noting, that if I do
print(p)
the output on the console is
Photo object
If you need the path of the image, try:
path = p.imageField.path
P.S.: Yes, you should follow the convention. Otherwise it will be hard for others to read your code if you share it with somebody, or contribute to an open source project, or hire programmers in your company, etc.
I knew I had to have done some stupid and finally had time to get back to debugging.
In my view I'd done
deletable = Photo(id=id)
instead of
deletable = Photo.objects.get(id=id)
thus ending up with a new photo object with just the id field filled in. Because Photo.save() was never called this didn't end up in my DB and no errors were thrown. Because of this, the bug flew stealthily under my radar.
Thus, when finally calling
deletable.delete()
it only removed the uncomplete instance I had just created. Although, it also deleted the proper entry from the DB. This is what threw me off and made me mostly look elsewhere for the problem thinking I had the correct database object in my hands.
Where this behavior came from remains unclear to me. Does delete() actually check the database for the id (which it in this case would've found) instead of just handling the instance in question? I guess taking a look at django.db.models.Model.delete() could shed some light on this.

Where to Start with Custom Reporting Python/Django

I'm trying to find a way to build a robust report library that I can use with a Django project.
Essentially, what I'm looking for is a way to access a list of functions(reports) that I can allow an end-user to attach to a Django model.
For example, let's say I have an Employee object. Well, I have a module called reports.py that has a growing list of possible reports that take an employee object and output a report, usually in JSON form. There might be number of timecards submitted, number of supervisions created, etc.
I want to be able to link those changing report lists to the Employee object via a FK called (job description), so admins can create custom reports per job description.
What I've tried:
Direct model methods: good for some things, but it requires a programmer to call them in a template or via API to generate some output. Since the available reports are changing, I don't want to hard-code anything and would rather allow the end-user to choose from a list of available reports and attach them to a related model (say a JobDescription).
dir(reports): I could offer up a form where the select values are the results from dir(reports), but then I'd get the names of variables/libraries called in the file, not just a list of available reports
Am I missing something? Is there a way to create a custom class from which I can call all methods available? Where would I even start with that type of architecture?
I really appreciate any sort of input re: the path to take. Really just a 'look in this direction' response would be really appreciated.
What I would do is expand on your dir(reports) idea and create a dynamically loaded module system. Have a folder with .py files containing module classes. Here's an example of how you can dynamically load classes in Python.
Each class would have a static function called getReportName() so you could show something readable to the user, and a member function createReport(self, myModel) which gets the model and does it's magic on it.
And then just show all the possible reports to the user, user selects one and you run the createReport on the selected class.
In the future you might think about having different report folders for different models, and this too should be possible by reflection using model's __name__ attribute.

Django can StdImageField replace default variation?

I have existing Django-based project with some files uploaded.
I need to add a feature to automatically resize new uploaded files to some resolution (200x200). I found a nice library django-stdimage that does what I need.
But on upload it stores original file with its original resolution. And existing code works with the original file instead of resized one.
class Product(models.Model):
name = models.CharField(verbose_name=_('Name'), max_length=64)
image = StdImageField(upload_to='product_images/', verbose_name=_('Image'), blank=True, null=True,
variations={'default': (200, 200)})
I would like to save processed files by the same name as original file. I do not need original file by the way.
I do not want to change all the code where it works with image field - there are complex DRF serializers, some views, forms, templates, etc.
So I would like to get new resized image as before by using myproduct.image - in templates for example.
Is it possible to do without subclassing StdImageField ?
I opened a ticket on the StdImage's page and received an answer..
No, I'm sorry, but nether would this work, nor would I advice you to
do such a thing. It would add implicit behavior. That's something you
might want to avoid long term.
You can always overwrite the field tho and implement your own
behavior. The StdImage code base should be good guidance to implement
your own behavior.

How to externally populate a Django model?

What is the best idea to fill up data into a Django model from an external source?
E.g. I have a model Run, and runs data in an XML file, which changes weekly.
Should I create a view and call that view URL from a curl cronjob (with the advantage that that data can be read anytime, not only when the cronjob runs), or create a python script and install that script as a cron (with DJANGO _SETTINGS _MODULE variable setup before executing the script)?
There is excellent way to do some maintenance-like jobs in project environment- write a custom manage.py command. It takes all environment configuration and other stuff allows you to concentrate on concrete task.
And of course call it directly by cron.
You don't need to create a view, you should just trigger a python script with the appropriate Django environment settings configured. Then call your models directly the way you would if you were using a view, process your data, add it to your model, then .save() the model to the database.
I've used cron to update my DB using both a script and a view. From cron's point of view it doesn't really matter which one you choose. As you've noted, though, it's hard to beat the simplicity of firing up a browser and hitting a URL if you ever want to update at a non-scheduled interval.
If you go the view route, it might be worth considering a view that accepts the XML file itself via an HTTP POST. If that makes sense for your data (you don't give much information about that XML file), it would still work from cron, but could also accept an upload from a browser -- potentially letting the person who produces the XML file update the DB by themselves. That's a big win if you're not the one making the XML file, which is usually the case in my experience.
"create a python script and install that script as a cron (with DJANGO _SETTINGS _MODULE variable setup before executing the script)?"
First, be sure to declare your Forms in a separate module (e.g. forms.py)
Then, you can write batch loaders that look like this. (We have a LOT of these.)
from myapp.forms import MyObjectLoadForm
from myapp.models import MyObject
import xml.etree.ElementTree as ET
def xmlToDict( element ):
return dict(
field1= element.findtext('tag1'),
field2= element.findtext('tag2'),
)
def loadRow( aDict ):
f= MyObjectLoadForm( aDict )
if f.is_valid():
f.save()
def parseAndLoad( someFile ):
doc= ET.parse( someFile ).getroot()
for tag in doc.getiterator( "someTag" )
loadRow( xmlToDict(tag) )
Note that there is very little unique processing here -- it just uses the same Form and Model as your view functions.
We put these batch scripts in with our Django application, since it depends on the application's models.py and forms.py.
The only "interesting" part is transforming your XML row into a dictionary so that it works seamlessly with Django's forms. Other than that, this command-line program uses all the same Django components as your view.
You'll probably want to add options parsing and logging to make a complete command-line app out of this. You'll also notice that much of the logic is generic -- only the xmlToDict function is truly unique. We call these "Builders" and have a class hierarchy so that our Builders are all polymorphic mappings from our source documents to Python dictionaries.

Categories

Resources