Diffing and Synchronizing 2 tables MySQL - python

I have 2 tables, One with new data, and another with old data.
I need to find the diff between the two tables and push only the changes into the table with the old data as it will be in production.
Both the tables are identical in terms of columns, only the data varies.
EDIT:
I am looking for only one way sync
EDIT 2
The table may have foreign keys.
Here are the constraints
I can't use shell utilities like mk-table-sync
I can't use gui tools,because they cannot be automated, like suggested here.
This needs to be done programmatically, or in the db.
I am working in python on Google App-engine.
Currently I am doing things like
OUTER JOINs and WHERE [NOT] EXISTS to compare each record in SQL queries and pushing the results.
My questions are
Is there a better way to do this ?
Is it better to do this in python rather than in the db ?

According to your comment to my question, you could simply do:
DELETE FROM OldTable;
INSERT INTO OldTable (field1, field2, ...) SELECT * FROM NewTable;
As I pointed out above, there might be reasons not to do this, e.g., data size.

Related

SQLalchemy - Iterate through all mapped tables

I am currently creating a web app in Flask and use SQL-alchemy (not the flask version) to deal with reading and writing to my MySQL database.
I have about 15 different tables each mapped to a different declarative class, however the application is still in beta stages and so this number will probably increase.
I would like a way to iterate through every single table and run the same command on every single one. This is part of an update function where an admin can change the name of a book, this name change should be reflected in all the other tables where that book is referred to.
Is there a way to iterate through all your SqlAlchemy tables?
Thanks!
Not exactly sure what you want to achieve here, but if you use declarative base, you can try something like this:
tables = Base.__subclasses__()
for t in tables:
rows = Session.query(t).all()
for r in rows:
... do something ...
This gets all tables by listing subclasses of Base. Then it queries everything from each table in turn and loops through selected rows.
However, I do not quite understand why you would want to do this. How you describe your question is that you should have a Book table, and all others link to it if they want to reference books. This would be the relational model instead of dragging information on Books in each and every table and trying to manage them like this manually.

Maintain duplicate record across databases

I have some data that I would like to keep consistent across two separate databases. This may seem ridiculous, but it's something we'd like to do for our project.
My initial thoughts were to use something like:
#event.listens_for(Table, "after_insert")
and then create a new session within this even to insert into the new database (and likewise for updates)
I don't believe the new session can use the ORM Table object, or it will just spin, as the event triggers repeatedly (so I can just use raw SQL). Is there a clean way of doing this with sqlalchemy? I experimented with binds, but it seems like they are more for splitting data across databases (instead of duplicating)
Update
As #Tim mentioned, there probably isn't an easy way to do this in SQLAlchemy. The best solution is probably to pull it up into a layer above SQLAlchemy. Basically write functions like createMyModel(model, session1, session2)

Python: Dumping Database Data with Peewee

Background
I am looking for a way to dump the results of MySQL queries made with Python & Peewee to an excel file, including database column headers. I'd like the exported content to be laid out in a near-identical order to the columns in the database. Furthermore, I'd like a way for this to work across multiple similar databases that may have slightly differing fields. To clarify, one database may have a user table containing "User, PasswordHash, DOB, [...]", while another has "User, PasswordHash, Name, DOB, [...]".
The Problem
My primary problem is getting the column headers out in an ordered fashion. All attempts thus far have resulted in unordered results, and all of which are less then elegant.
Second, my methodology thus far has resulted in code which I'd (personally) hate to maintain, which I know is a bad sign.
Work so far
At present, I have used Peewee's pwiz.py script to generate the models for each of the preexisting database tables in the target databases, then went and entered all primary and foreign keys. The relations are setup, and some brief tests showed they're associating properly.
Code: I've managed to get the column headers out using something similar to:
for i, column in enumerate(User._meta.get_field_names()):
ws.cell(row=0,column=i).value = column
As mentioned, this is unordered. Also, doing it this way forces me to do something along the lines of
getattr(some_object, title)
to dynamically populate the fields accordingly.
Thoughts and Possible Solutions
Manually write out the order that I want stuff in an array, and use that for looping through and populating data. The pros of this is very strict/granular control. The cons are that I'd need to specify this for every database.
Create (whether manually or via a method) a hash of fields with an associated weighted value for all possibly encountered fields, then write a method for sorting "_meta.get_field_names()" according to weight. The cons of this is that the columns may not be 100% in the right order, such as Name coming before DOB in one DB, while after it in another.
Feel free to tell me I'm doing it all wrong or suggest completely different ways of doing this, I'm all ears. I'm very much new to Python and Peewee (ORMs in general, actually). I could switch back to Perl and do the database querying via DBI with little to no hassle. However, it's libraries for excel would cause me as many problems, and I'd like to take this as a time to expand my knowledge.
There is a method on the model meta you can use:
for field in User._meta.get_sorted_fields():
print field.name
This will print the field names in the order they are declared on the model.

How to perform this insert into PostgreSQL using MySQL data?

I'm in the processing of moving over a mysql database to a postgres database. I have read all of the articles presented here, as well as reading over some of the solutions presented on stackoverflow. The tools recommended don't seem to work for me. Both databases were generated by Django's syncdb, although the postgres db is more or less empty at the moment. I tried to migrate the tables over using Django's built in dumpdata / loaddata functions and its serializers, but it doesn't seem to like a lot of my tables, leading me to believe that writing a manual solution might be best in this case. I have code to verify that the column headers are the same for each table in the database and that the matching tables exist- that works fine. I was thinking it would be best to just grab the mysql data row by row and then insert it into the respective postgres table row by row (I'm not concerned with speed atm). The one thing is, I don't know what's the proper way to construct the insert statement. I have something like:
table_name = retrieve_table()
column_headers = get_headers(table_name) #can return a tuple or a list
postgres_cursor = postgres_con.cursor()
rows = mysql_cursor.fetchall()
for row in rows: #row is a tuple
postgres_cursor.execute(????)
Where ??? would be the insert statement. I just don't know what the proper way is to construct it. I have the table name that I would like to insert into as a string, I have the column headers that I can treat as a list, tuple, or string, and I have the respective values that I'd like to insert. What would be the recommended way to construct the statement? I have read the documentation on psycopg's documentation page and I didn't quite see the way that would satisfy my needs. I don't know (or think) this is the entirely correct way to properly migrate, so if someone could steer me in the correct way or offer any advice I'd really appreciate it.

Work with Postgres/PostGIS View in SQLAlchemy

Two questions:
i want to generate a View in my PostGIS-DB. How do i add this View to my geometry_columns Table?
What i have to do, to use a View with SQLAlchemy? Is there a difference between a Table and View to SQLAlchemy or could i use the same way to use a View as i do to use a Table?
sorry for my poor english.
If there a questions about my question, please feel free to ask so i can try to explain it in another way maybe :)
Nico
Table objects in SQLAlchemy have two roles. They can be used to issue DDL commands to create the table in the database. But their main purpose is to describe the columns and types of tabular data that can be selected from and inserted to.
If you only want to select, then a view looks to SQLAlchemy exactly like a regular table. It's enough to describe the view as a Table with the columns that interest you (you don't even need to describe all of the columns). If you want to use the ORM you'll need to declare for SQLAlchemy that some combination of the columns can be used as the primary key (anything that's unique will do). Declaring some columns as foreign keys will also make it easier to set up any relations. If you don't issue create for that Table object, then it is just metadata for SQLAlchemy to know how to query the database.
If you also want to insert to the view, then you'll need to create PostgreSQL rules or triggers on the view that redirect the writes to the correct location. I'm not aware of a good usage recipe to redirect writes on the Python side.

Categories

Resources