Display data as insert statements - python

I need to move data from one database to another.
I can use python my counterpart can't.
How can select all data from a table and save it as insert statements.
Using SQLalchemy.
Is there a way to create a back up like this?

As others have suggested in comments, using the database backup program (mysqldump, pg_dump, etc) is your best bet; that will make sure that the data is transferred correctly for the underlying database.
Outputting INSERT statements will be risky; even the built-in SQLAlchemy facility for doing this comes with a big red warning, complete with a picture of a dragon, indicating that it can be dangerous.
If you nevertheless need to do this, and the data is generally trusted and doesn't contain much in the way of odd types, you can use:
Create (but do not execute) an insert expression as though you were inserting the rows back into the database.
Use the .compile() method with the relevant dialect parameter and literal_binds set to True.
Manually double-check that the output is, in fact, valid for the database; as per the warning in the SQLAlchemy FAQ, this method is not very dependable and may expose you to attacks if it's part of any production system.
I wouldn't recommend formatting up INSERT statements by hand; you're unlikely to do a better job than SQLAlchemy...

Related

How can I safely parameterize table/column names in BigQuery SQL?

I am using python's BigQuery client to create and keep up-to-date some tables in BigQuery that contain daily counts of certain firebase events joined with data from other sources (sometimes grouped by country etc.). Keeping them up-to-date requires the deletion and replacement of data for past days because the day tables for firebase events can be changed after they are created (see here and here). I keep them up-to-date in this way to avoid querying the entire dataset which is very financially/computationally expensive.
This deletion and replacement process needs to be repeated for many tables and so consequently I need to reuse some queries stored in text files. For example, one deletes everything in the table from a particular date onward (delete from x where event_date >= y). But because BigQuery disallows the parameterization of table names (see here) I have to duplicate these query text files for each table I need to do this for. If I want to run tests I would also have to duplicate the aforementioned queries for test tables too.
I basically need something like psycopg2.sql for bigquery so that I can safely parameterize table and column names whilst avoiding SQLi. I actually tried to repurpose this module by calling the as_string() method and using the result to query BigQuery. But the resulting syntax doesn't match and I need to start a postgres connection to do it (as_string() expects a cursor/connection object). I also tried something similar with sqlalchemy.text to no avail. So I concluded I'd have to basically implement some way of parameterizing the table name myself, or implement some workaround using the python client library. Any ideas of how I should go about doing this in a safe way that won't lead to SQLi? Cannot go into detail but unfortunately I cannot store the tables in postgres or any other db.
As discussed in the comments, the best option for avoiding SQLi in your case is ensuring your server's security.
If anyway you need/want to parse your input parameter before building your query, I recommend you to use REGEX in order to check the input strings.
In Python you could use the re library.
As I don't know how your code works, how your datasets/tables are organized and I don't know exactly how you are planing to check if the string is a valid source, I created the basic example below that shows how you could check a string using this library
import re
tests = ["your-dataset.your-table","(SELECT * FROM <another table>)", "dataset-09123.my-table-21112019"]
#Supposing that the input pattern is <dataset>.<table>
regex = re.compile("[a-zA-Z0-9-]+\.[a-zA-Z0-9-]+")
for t in tests:
if(regex.fullmatch(t)):
print("This source is ok")
else:
print("This source is not ok")
In this example, only strings that matches the configuration dataset.table (where both the dataset and the table must contain only alphanumeric characters and dashes) will be considered as valid.
When running the code, the first and the third elements of the list will be considered valid while the second (that could potentially change your whole query) will be considered invalid.

Loading data from a (MySQL) database into Django without models

This might sound like a bit of an odd question - but is it possible to load data from a (in this case MySQL) table to be used in Django without the need for a model to be present?
I realise this isn't really the Django way, but given my current scenario, I don't really know how better to solve the problem.
I'm working on a site, which for one aspect makes use of a table of data which has been bought from a third party. The columns of interest are liklely to remain stable, however the structure of the table could change with subsequent updates to the data set. The table is also massive (in terms of columns) - so I'm not keen on typing out each field in the model one-by-one. I'd also like to leave the table intact - so coming up with a model which represents the set of columns I am interested in is not really an ideal solution.
Ideally, I want to have this table in a database somewhere (possibly separate to the main site database) and access its contents directly using SQL.
You can always execute raw SQL directly against the database: see the docs.
There is one feature called inspectdb in Django. for legacy databases like MySQL , it creates models automatically by inspecting your db tables. it stored in our app files as models.py. so we don't need to type all column manually.But read the documentation carefully before creating the models because it may affect the DB data ...i hope this will be useful for you.
I guess you can use any SQL library available for Python. For example : http://www.sqlalchemy.org/
You have just then to connect to your database, perform your request and use the datas at your will. I think you can't use Django without their model system, but nothing prevents you from using another library for this in parallel.

Is it a good practice to use pickled data instead of additional tables?

Many times while creating database structure, I get stuck at the question, what would be more effective, storing data in pickled format in a column in the same table or create additional table and then use JOIN.
Which path should be followed, any advice ?
For example:
There is a table of Customers, containing fields like Name, Address
Now for managing Orders (each customer can have many), you can either create an Order table or store the orders in a serialized format in a separate column in the Customers table only.
It's usually better to create seperate tables. If you go with pickling and later find you want to query the data in a different way, it could be difficult.
See Database normalization.
Usually it's best to keep your data normalized (i.e. create more tables). Storing data 'pickled' as you say, is acceptable, when you don't need to perform relational operations on them.
Mixing SQL databases and pickling seems to ask for trouble. I'd go with either sticking all data in the SQL databases or using only pickling, in the form of the ZODB, which is a Python only OO database that is pretty damn awesome.
Mixing makes case sometimes, but is usually just more trouble than it's worth.
I agree with Mchi, there is no problem storing "pickled" data if you don't need to search or do relational type operations.
Denormalisation is also an important tool that can scale up database performance when applied correctly.
It's probably a better idea to use JSON instead of pickles. It only uses a little more space, and makes it possible to use the database from languages other than Python
I agree with #Lennart Regebro. You should probably see whether you need a Relational DB or an OODB. If RDBMS is your choice, I would suggest you stick with more tables. IMHO, pickling may have issues with scalability. If thats what you want, you should look at ZODB. It is pretty good and supports caching etc for better performance

Is this a good approach to avoid using SQLAlchemy/SQLObject?

Rather than use an ORM, I am considering the following approach in Python and MySQL with no ORM (SQLObject/SQLAlchemy). I would like to get some feedback on whether this seems likely to have any negative long-term consequences since in the short-term view it seems fine from what I can tell.
Rather than translate a row from the database into an object:
each table is represented by a class
a row is retrieved as a dict
an object representing a cursor provides access to a table like so:
cursor.mytable.get_by_ids(low, high)
removing means setting the time_of_removal to the current time
So essentially this does away with the need for an ORM since each table has a class to represent it and within that class, a separate dict represents each row.
Type mapping is trivial because each dict (row) being a first class object in python/blub allows you to know the class of the object and, besides, the low-level database library in Python handles the conversion of types at the field level into their appropriate application-level types.
If you see any potential problems with going down this road, please let me know. Thanks.
That doesn't do away with the need for an ORM. That is an ORM. In which case, why reinvent the wheel?
Is there a compelling reason you're trying to avoid using an established ORM?
You will still be using SQLAlchemy. ResultProxy is actually a dictionary once you go for .fetchmany() or similar.
Use SQLAlchemy as a tool that makes managing connections easier, as well as executing statements. Documentation is pretty much separated in sections, so you will be reading just the part that you need.
web.py has in a decent db abstraction too (not an ORM).
Queries are written in SQL (not specific to any rdbms), but your code remains compatible with any of the supported dbs (sqlite, mysql, postresql, and others).
from http://webpy.org/cookbook/select:
myvar = dict(name="Bob")
results = db.select('mytable', myvar, where="name = $name")

Database change underneath SQLObject

I'm starting a web project that likely should be fine with SQLite. I have SQLObject on top of it, but thinking long term here -- if this project should require a more robust (e.g. able to handle high traffic), I will need to have a transition plan ready. My questions:
How easy is it to transition from one DB (SQLite) to another (MySQL or Firebird or PostGre) under SQLObject?
Does SQLObject provide any tools to make such a transition easier? Is it simply take the objects I've defined and call createTable?
What about having multiple SQLite databases instead? E.g. one per visitor group? Does SQLObject provide a mechanism for handling this scenario and if so, what is the mechanism to use?
Thanks,
Sean
3) Is quite an interesting question. In general, SQLite is pretty useless for web-based stuff. It scales fairly well for size, but scales terribly for concurrency, and so if you are planning to hit it with a few requests at the same time, you will be in trouble.
Now your idea in part 3) of the question is to use multiple SQLite databases (eg one per user group, or even one per user). Unfortunately, SQLite will give you no help in this department. But it is possible. The one project I know that has done this before is Divmod's Axiom. So I would certainly check that out.
Of course, it would probably be much easier to just use a good concurrent DB like the ones you mention (Firebird, PG, etc).
For completeness:
1 and 2) It should be straightforward without you actually writing much code. I find SQLObject a bit restrictive in this department, and would strongly recommend SQLAlchemy instead. This is far more flexible, and if I was starting a new project today, I would certainly use it over SQLObject. It won't be moving "Objects" anywhere. There is no magic involved here, it will be transferring rows in tables in a database. Which as mentioned you could do by hand, but this might save you some time.
Your success with createTable() will depend on your existing underlying table schema / data types. In other words, how well SQLite maps to the database you choose and how SQLObject decides to use your data types.
The safest option may be to create the new database by hand. Then you'll have to deal with data migration, which may be as easy as instantiating two SQLObject database connections over the same table definitions.
Why not just start with the more full-featured database?
I'm not sure I understand the question.
The SQLObject documentation lists six kinds of connections available. Further, the database connection (or scheme) is specified in a connection string. Changing database connections from SQLite to MySQL is trivial. Just change the connection string.
The documentation lists the different kinds of schemes that are supported.

Categories

Resources