What is the best way to migrate MySQL tables to Google Datastore and create python models for them?
I have a PHP+MySQL project that I want to migrate to Python+GAE project. So far the big obstacle is migrating the tables and creating corresponding models. Each table is about 110 columns wide. Creating a model for the table manually is a bit tedious, let alone creating a loader and importing a generated csv table representation.
Is there a more efficient way for me to do the migration?
In general, generating your models automatically shouldn't be too difficult. Suppose you have a csv file for each table, with lines consisting of (field name, data type), then something like this would do the job:
# Maps MySQL types to Datastore property classes
type_map = {
'char': 'StringProperty',
'text': 'TextProperty',
'int': 'IntegerProperty',
# ...
}
def generate_model_class(classname, definition_file):
ret = []
ret.append("class %s(db.Model):" % (classname,))
for fieldname, type in csv.reader(open(definition_file)):
ret.append(" %s = db.%s()" % (fieldname, type_map[type]))
return "\n".join(ret)
Once you've defined your schema, you can bulk load directly from the DB - no need for intermediate CSV files. See my blog post on the subject.
approcket can mysql⇌gae or gae builtin remote api from google
In your shoes, I'd write a one-shot Python script to read the existing MySQL schema (with MySQLdb), generating a models.py to match (then do some manual checks and edits on the generated code, just in case). That's assuming that a data model with "about 110" properties per entity is something you're happy with and want to preserve, of course; it might be worth to take the opportunity to break things up a bit (indeed you may have to if your current approach also relies on joins or other SQL features GAE doesn't give you), but that of course requires more manual work.
Once the data model is in place, bulk loading can happen, typically via intermediate CSV files (there are several ways you can generate those).
you don't need to
http://code.google.com/apis/sql/
:)
You could migrate them to django models first
In particular use
python manage.py inspectdb > models.py
And edit models.py until satisfied. You might have to put ForeignKeys in, adjusts the length of CharFields etc.
I've converted several legacy databases to django like this with good success.
Django models however are different to GAE models (which I'm not very familiar with) so that may not be terribly helpful I don't know!
Related
I need to dynamically create database tables depending on user requirements. so apart from a few predefined databases, all other databases should be created at runtime after taking table characteristics(like no of cols, primary key etc.) from user.
I read a bit of docs, and know about django.db.connection but all examples there are only for adding data to a database, not creating tables. (ref: https://docs.djangoproject.com/en/4.0/topics/db/sql/#executing-custom-sql-directly)
So is there anyway to create tables without models in django, this condition is a must, so if not possible with django, which other framework should I look at?
note: I am not good at writing questions, ask if any other info is needed.
Thanks!
You can use inspectdb to automatically generate the models from the legacy database. You can check about it in here.
Or you can use SQL directly. Although, you will have to process the tables in python. Check it here.
I am a newbie in programming, but now I connected my project with PostgreSQL. I learned the way to enter by SQL code and also found out that we can actually enter /adming (by creating the superuser and add data there). So which one is widely used in webdev?
It will depend completely on your application.
You can add rows to a table using SQL if that's the easiest way for you. Or you can add rows by creating new object instances in Python code and .save()ing them. Or you can create instances through a CreateView or through the Django admin.
Adding data with SQL has the drawback that you will lise the benefit of any validators declared on the model's fields. YOu may end up with data stored in your SQL tables which your app regards as "impossible", which may cause you minor or even major difficulties.
I have several times written management commands which all have the same general format. For each "row" in a data source (often a spreadsheet) construct one or more Django objects and save them. You can process each data "row" within a transaction (with transaction.atomic()) so if anything goes wrong, the data row is not committed. Or you can treat the entire process as a single transaction (not recommended for vast numbers of "rows", though)·
I have scraped data from a website using their API on a Django application. The data is JSON (a Python dictionary when I retrieve it on my end). The data has many, many fields. I want to store them in a database, so that I can create endpoints that will allow for lookup and modifications (updates). I need to use their fields to create the structure of my database. Any help on this issue or on how to tackle it would be greatly appreciated. I apologize if my question is not concise enough, please let me know if there is anything I need to specify.
I have seen many, many people saying to just populate it, such as this example How to populate a Django sqlite3 database. The issue is, there are so many fields that I cannot go and actually create the django model fields myself. From what I have read, it seems like I may be able to use serializers.ModelSerializer, although that seems to just populate a pre-existing db with already defined model.
Tricky to answer without details, but I would consider doing this in two steps - first, convert your json data to a database schema, for example using a tool like sqlify: https://sqlify.io/convert/json/to/sqlite
Then, create a database from the generated schema file, and use inspectdb to generate your django models: https://docs.djangoproject.com/en/2.2/ref/django-admin/#inspectdb
You'll probably need to tweak the generated schema and/or models, but this should go a long way towards automating the process.
I would go for a document database, like Elasticsearch or MongoDB.
Those are made for this kind of situation, look it up.
This might sound like a bit of an odd question - but is it possible to load data from a (in this case MySQL) table to be used in Django without the need for a model to be present?
I realise this isn't really the Django way, but given my current scenario, I don't really know how better to solve the problem.
I'm working on a site, which for one aspect makes use of a table of data which has been bought from a third party. The columns of interest are liklely to remain stable, however the structure of the table could change with subsequent updates to the data set. The table is also massive (in terms of columns) - so I'm not keen on typing out each field in the model one-by-one. I'd also like to leave the table intact - so coming up with a model which represents the set of columns I am interested in is not really an ideal solution.
Ideally, I want to have this table in a database somewhere (possibly separate to the main site database) and access its contents directly using SQL.
You can always execute raw SQL directly against the database: see the docs.
There is one feature called inspectdb in Django. for legacy databases like MySQL , it creates models automatically by inspecting your db tables. it stored in our app files as models.py. so we don't need to type all column manually.But read the documentation carefully before creating the models because it may affect the DB data ...i hope this will be useful for you.
I guess you can use any SQL library available for Python. For example : http://www.sqlalchemy.org/
You have just then to connect to your database, perform your request and use the datas at your will. I think you can't use Django without their model system, but nothing prevents you from using another library for this in parallel.
Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.
What's the best way to import the following CSV
"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"
Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:
Admin users can change the database model and CSV import-representation in real-time via a custom UI
The saved CSV-to-Database table/field mappings are used when regular users import CSV files
So far two approaches have been considered
ETL-API Approach: Providing an ETL API a spreadsheet, my CSV-to-Database table/field mappings, and connection info to the target database. The API would then load the spreadsheet and populate the target database tables. Looking at pygrametl I don't think what i'm aiming for is possible. In fact, i'm not sure any ETL APIs do this.
Row-level Insert Approach: Parsing the CSV-to-Database table/field mappings, parsing the spreadsheet, and generating SQL inserts in "join-order".
I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?
Background
The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.
The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.
How about separating the problem into two separate problems?
Create a Person class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.
Now you have two issues:
Create a Person instance from a row in the CSV.
Save a Person instance to the database.
Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.
Does that help any?
I write import sub-systems almost every month at work, and as I do that kind of tasks to much I wrote sometime ago django-data-importer. This importer works like a django form and has readers for CSV, XLS and XLSX files that give you lists of dicts.
With data_importer readers you can read file to lists of dicts, iter on it with a for and save lines do DB.
With importer you can do same, but with bonus of validate each field of line, log errors and actions, and save it at end.
Please, take a look at https://github.com/chronossc/django-data-importer. I'm pretty sure that it will solve your problem and will help you with process of any kind of csv file from now :)
To solve your problem I suggest use data-importer with celery tasks. You upload the file and fire import task via a simple interface. Celery task will send file to importer and you can validate lines, save it, log errors for it. With some effort you can even present progress of task for users that uploaded the sheet.
I ended up taking a few steps back to address this problem per Occam's razor using updatable SQL views. It meant a few sacrifices:
Removing: South.DB-dependent real-time schema administration API, dynamic model loading, and dynamic ORM syncing
Defining models.py and an initial south migration by hand.
This allows for a simple approach to importing flat datasets (CSV/Excel) into a normalized database:
Define unmanaged models in models.py for each spreadsheet
Map those to updatable SQL Views (INSERT/UPDATE-INSTEAD SQL RULEs) in the initial south migration that adhere to the spreadsheet field layout
Iterating through the CSV/Excel spreadsheet rows and performing an INSERT INTO <VIEW> (<COLUMNS>) VALUES (<CSV-ROW-FIELDS>);
Here is another approach that I found on github. Basically it detects the schema and allows overrides. Its whole goal is to just generate raw sql to be executed by psql and or whatever driver.
https://github.com/nmccready/csv2psql
% python setup.py install
% csv2psql --schema=public --key=student_id,class_id example/enrolled.csv > enrolled.sql
% psql -f enrolled.sql
There are also a bunch of options for doing alters (creating primary keys from many existing cols) and merging / dumps.