I have scraped data from a website using their API on a Django application. The data is JSON (a Python dictionary when I retrieve it on my end). The data has many, many fields. I want to store them in a database, so that I can create endpoints that will allow for lookup and modifications (updates). I need to use their fields to create the structure of my database. Any help on this issue or on how to tackle it would be greatly appreciated. I apologize if my question is not concise enough, please let me know if there is anything I need to specify.
I have seen many, many people saying to just populate it, such as this example How to populate a Django sqlite3 database. The issue is, there are so many fields that I cannot go and actually create the django model fields myself. From what I have read, it seems like I may be able to use serializers.ModelSerializer, although that seems to just populate a pre-existing db with already defined model.
Tricky to answer without details, but I would consider doing this in two steps - first, convert your json data to a database schema, for example using a tool like sqlify: https://sqlify.io/convert/json/to/sqlite
Then, create a database from the generated schema file, and use inspectdb to generate your django models: https://docs.djangoproject.com/en/2.2/ref/django-admin/#inspectdb
You'll probably need to tweak the generated schema and/or models, but this should go a long way towards automating the process.
I would go for a document database, like Elasticsearch or MongoDB.
Those are made for this kind of situation, look it up.
First of all, I know that my question duplicate this question. But I supose it's not the same.
I need to save user "search filter". As I understand Django ORM create specific SQL query for different DB. So if I save SQL query I can't migrate on other database with different SQL syntax.
Am I wrong? If no, how can I save Django side of query, without accsesing to DB?
The short answer is that you're correct -- mostly. If the SQL dialect that Django compiled the query for isn't compatible with a different backend, it wouldn't work or might work unpredictably.
To save the Django side of the query, why not just save the actual filter() statement that you're using or a representation of it that you can convert back on the fly?
Edit: Okay in that case I think you're on the right track based on comments and above answer. If you're parsing a query string already save that in the database as a CharField and then just use it to build a Django QuerySet when you retrieve it. If I'm understanding.
If you can suggest better sulution I open for conversation
So... Pickle the function .filter() is not the best idea so as saving SQL string for specific DB. I think the best solution for this problem is saving search parameters. In my case it's GET string. I get it:
request.META["QUERY_STRING"]
And save to DB.
If I need to get it, i just parse:
from django.http import QueryDict
QueryDict(request.META["QUERY_STRING"])
Aditionally I use different form for validate this values (optional) SearchTrustedForm(), because if data structure has changed I can save backwards compatibility.
I have some data in mongodb and would like to create an email which pulls data from the db and inserts it into some sort of (mustache/django'esque) email template and then sends it.
The data has been scraped from a website with articles and when new articles are retrieved I'd like to create a summary email of all the new articles from this particular site.
So far I've discovered premailer which looks like it will be useful for the necessary inline css. python-emails looks somewhat promising as well...
Surely it is a common task to create well formatted emails from data in a db using python? But surprisingly I haven't been able to find any specific information about how to do this.
EDIT: I just discovered this way of using django to generate emails from their templates. I'll investigate this further and update my question here once I find a functioning solution.
I don't think you need Django, you can use a templating engine like Jinja2 to create the email templates and inject data into them, then use smtplib to send the email.
I need in my project (based on Postgresql) to export few models as SQLite dump. It must be made 'on-demand' f.e. on user request.
I can prepare appropriate database manually, but I would like to omit the duplication of information about schema. I dream about solution like 'dumpdata app-name' but instead of JSON/XML/YAML there should be SQLite.
Is there such solution?
P.S. For too overbearing - it's not broad question. Possibilities are only two: there is such snippet, helper etc. or there is not and it should be done individually. I can't find it by my own so I ask for help.
To sum up details (some people could not figure them out and could put my question 'on hold'):
there is Django project with main Postgresql database
I'm already processing request from users (through an API)
one of request is "make a dump of some models (tables) in SQLite format"
I can prepare temporary SQLite database and manually fill it with data
I'm looking for powerful and universal tool (solution) which will do such export automatically (from some of my Django models to SQLite)
Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.
What's the best way to import the following CSV
"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"
Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:
Admin users can change the database model and CSV import-representation in real-time via a custom UI
The saved CSV-to-Database table/field mappings are used when regular users import CSV files
So far two approaches have been considered
ETL-API Approach: Providing an ETL API a spreadsheet, my CSV-to-Database table/field mappings, and connection info to the target database. The API would then load the spreadsheet and populate the target database tables. Looking at pygrametl I don't think what i'm aiming for is possible. In fact, i'm not sure any ETL APIs do this.
Row-level Insert Approach: Parsing the CSV-to-Database table/field mappings, parsing the spreadsheet, and generating SQL inserts in "join-order".
I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?
Background
The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.
The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.
How about separating the problem into two separate problems?
Create a Person class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.
Now you have two issues:
Create a Person instance from a row in the CSV.
Save a Person instance to the database.
Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.
Does that help any?
I write import sub-systems almost every month at work, and as I do that kind of tasks to much I wrote sometime ago django-data-importer. This importer works like a django form and has readers for CSV, XLS and XLSX files that give you lists of dicts.
With data_importer readers you can read file to lists of dicts, iter on it with a for and save lines do DB.
With importer you can do same, but with bonus of validate each field of line, log errors and actions, and save it at end.
Please, take a look at https://github.com/chronossc/django-data-importer. I'm pretty sure that it will solve your problem and will help you with process of any kind of csv file from now :)
To solve your problem I suggest use data-importer with celery tasks. You upload the file and fire import task via a simple interface. Celery task will send file to importer and you can validate lines, save it, log errors for it. With some effort you can even present progress of task for users that uploaded the sheet.
I ended up taking a few steps back to address this problem per Occam's razor using updatable SQL views. It meant a few sacrifices:
Removing: South.DB-dependent real-time schema administration API, dynamic model loading, and dynamic ORM syncing
Defining models.py and an initial south migration by hand.
This allows for a simple approach to importing flat datasets (CSV/Excel) into a normalized database:
Define unmanaged models in models.py for each spreadsheet
Map those to updatable SQL Views (INSERT/UPDATE-INSTEAD SQL RULEs) in the initial south migration that adhere to the spreadsheet field layout
Iterating through the CSV/Excel spreadsheet rows and performing an INSERT INTO <VIEW> (<COLUMNS>) VALUES (<CSV-ROW-FIELDS>);
Here is another approach that I found on github. Basically it detects the schema and allows overrides. Its whole goal is to just generate raw sql to be executed by psql and or whatever driver.
https://github.com/nmccready/csv2psql
% python setup.py install
% csv2psql --schema=public --key=student_id,class_id example/enrolled.csv > enrolled.sql
% psql -f enrolled.sql
There are also a bunch of options for doing alters (creating primary keys from many existing cols) and merging / dumps.