How to perform this insert into PostgreSQL using MySQL data? - python

I'm in the processing of moving over a mysql database to a postgres database. I have read all of the articles presented here, as well as reading over some of the solutions presented on stackoverflow. The tools recommended don't seem to work for me. Both databases were generated by Django's syncdb, although the postgres db is more or less empty at the moment. I tried to migrate the tables over using Django's built in dumpdata / loaddata functions and its serializers, but it doesn't seem to like a lot of my tables, leading me to believe that writing a manual solution might be best in this case. I have code to verify that the column headers are the same for each table in the database and that the matching tables exist- that works fine. I was thinking it would be best to just grab the mysql data row by row and then insert it into the respective postgres table row by row (I'm not concerned with speed atm). The one thing is, I don't know what's the proper way to construct the insert statement. I have something like:
table_name = retrieve_table()
column_headers = get_headers(table_name) #can return a tuple or a list
postgres_cursor = postgres_con.cursor()
rows = mysql_cursor.fetchall()
for row in rows: #row is a tuple
postgres_cursor.execute(????)
Where ??? would be the insert statement. I just don't know what the proper way is to construct it. I have the table name that I would like to insert into as a string, I have the column headers that I can treat as a list, tuple, or string, and I have the respective values that I'd like to insert. What would be the recommended way to construct the statement? I have read the documentation on psycopg's documentation page and I didn't quite see the way that would satisfy my needs. I don't know (or think) this is the entirely correct way to properly migrate, so if someone could steer me in the correct way or offer any advice I'd really appreciate it.

Related

Is there a way to speed up database transactions?

Sorry for the vague question, let me explain...
I have a list of words and counts in a database that has, no doubt, reached a gigantic amount. ~80mb database with each entry being two columns (word, integer)
Now when I am trying to add a word, I check to see if it is already in the database like this...python sqlite3 class method...
self.c.execute('SELECT * FROM {tn} WHERE {cn} = """{wn}"""'.format(tn=self.table1, cn=self.column1, wn=word_name))
exist = self.c.fetchall()
if exist:
do something
So you're checking for the existence of a word within a very large table of words? I think the short and simple answer to your question is to create an index for your word column.
The next step would be to setup a real database (e.g. Postgres) instead of sqlite. Sqlite is doesn't have the optimization tweaks of a production database and you'd likely see a performance gain after switching.
Even for a table with millions of rows, this shouldn't be a super time-intensive query if your table is properly indexed. If you already have an index and are still facing performance issues there's either something wrong with either your database setup/environment or perhaps there's a bottleneck in your Python code or DB adapter. Hard to say without more information.
I would imagine that using COUNT within SQL would be faster:
self.c.execute('SELECT COUNT(*) FROM {tn} WHERE {cn} = """{wn}"""'.format(tn=self.table1, cn=self.column1, wn=word_name))
num = self.c.fetchone()[0]
if num:
#do something
though I haven't tested it.
See How to check the existence of a row in SQLite with Python? for a similar question.

Diffing and Synchronizing 2 tables MySQL

I have 2 tables, One with new data, and another with old data.
I need to find the diff between the two tables and push only the changes into the table with the old data as it will be in production.
Both the tables are identical in terms of columns, only the data varies.
EDIT:
I am looking for only one way sync
EDIT 2
The table may have foreign keys.
Here are the constraints
I can't use shell utilities like mk-table-sync
I can't use gui tools,because they cannot be automated, like suggested here.
This needs to be done programmatically, or in the db.
I am working in python on Google App-engine.
Currently I am doing things like
OUTER JOINs and WHERE [NOT] EXISTS to compare each record in SQL queries and pushing the results.
My questions are
Is there a better way to do this ?
Is it better to do this in python rather than in the db ?
According to your comment to my question, you could simply do:
DELETE FROM OldTable;
INSERT INTO OldTable (field1, field2, ...) SELECT * FROM NewTable;
As I pointed out above, there might be reasons not to do this, e.g., data size.

Python: Dumping Database Data with Peewee

Background
I am looking for a way to dump the results of MySQL queries made with Python & Peewee to an excel file, including database column headers. I'd like the exported content to be laid out in a near-identical order to the columns in the database. Furthermore, I'd like a way for this to work across multiple similar databases that may have slightly differing fields. To clarify, one database may have a user table containing "User, PasswordHash, DOB, [...]", while another has "User, PasswordHash, Name, DOB, [...]".
The Problem
My primary problem is getting the column headers out in an ordered fashion. All attempts thus far have resulted in unordered results, and all of which are less then elegant.
Second, my methodology thus far has resulted in code which I'd (personally) hate to maintain, which I know is a bad sign.
Work so far
At present, I have used Peewee's pwiz.py script to generate the models for each of the preexisting database tables in the target databases, then went and entered all primary and foreign keys. The relations are setup, and some brief tests showed they're associating properly.
Code: I've managed to get the column headers out using something similar to:
for i, column in enumerate(User._meta.get_field_names()):
ws.cell(row=0,column=i).value = column
As mentioned, this is unordered. Also, doing it this way forces me to do something along the lines of
getattr(some_object, title)
to dynamically populate the fields accordingly.
Thoughts and Possible Solutions
Manually write out the order that I want stuff in an array, and use that for looping through and populating data. The pros of this is very strict/granular control. The cons are that I'd need to specify this for every database.
Create (whether manually or via a method) a hash of fields with an associated weighted value for all possibly encountered fields, then write a method for sorting "_meta.get_field_names()" according to weight. The cons of this is that the columns may not be 100% in the right order, such as Name coming before DOB in one DB, while after it in another.
Feel free to tell me I'm doing it all wrong or suggest completely different ways of doing this, I'm all ears. I'm very much new to Python and Peewee (ORMs in general, actually). I could switch back to Perl and do the database querying via DBI with little to no hassle. However, it's libraries for excel would cause me as many problems, and I'd like to take this as a time to expand my knowledge.
There is a method on the model meta you can use:
for field in User._meta.get_sorted_fields():
print field.name
This will print the field names in the order they are declared on the model.

preventing updation of specific columns in sqlite from python

i am developing a twisted app which interacts with a sqlite backend, in the sqlite db there is a users table of which certain columns should not be updated if they already contain a value.
one way of doing this would be to check the user table before each insert for existence of values in the columns of interest and proceed accordingly , but this will be a performance killer and people familiar with twisted will know how cumbersome this can be, can someone suggest a better way of doing this .
TIA
Try the INSERT OR IGNORE variation of the INSERT statement.

How to clean the database, dropping all records using sqlalchemy?

I am using SQLAlchemy. I want to delete all the records efficiently present in database but I don't want to drop the table/database.
I tried with the following code:
con = engine.connect()
trans = con.begin()
con.execute(table.delete())
trans.commit()
It seems, it is not a very efficient one since I am iterating over all tables present in the database.
Can someone suggest a better and more efficient way of doing this?
If you models rely on the existing DB schema (usually use autoload=True), you cannot avoid deleting data in each table. MetaData.sorted_tables comes in handy:
for tbl in reversed(meta.sorted_tables):
engine.execute(tbl.delete())
If your models do define the complete schema, there is nothing simpler than drop_all/create_all (as already pointed out by #jadkik94).
Further, TRUNCATE would anyways not work on the tables which are referenced by ForeignKeys, which is limiting the usage significantly.
For me putting tbl.drop(engine) worked, but not engine.execute(tbl.delete())
SQLAlchemy 0.8.0b2 and
Python 2.7.3

Categories

Resources