I have a large sql dump file ... with multiple CREATE TABLE and INSERT INTO statements. Is there any way to load these all into a SQLAlchemy sqlite database at once. I plan to use the introspected ORM from sqlsoup after I've created the tables. However, when I use the engine.execute() method it complains: sqlite3.Warning: You can only execute one statement at a time.
Is there a way to work around this issue. Perhaps splitting the file with a regexp or some kind of parser, but I don't know enough SQL to get all of the cases for the regexp.
Any help would be greatly appreciated.
Will
EDIT:
Since this seems important ... The dump file was created with a MySQL database and so it has quite a few commands/syntax that sqlite3 does not understand correctly.
"or some kind of parser"
I've found MySQL to be a great parser for MySQL dump files :)
You said it yourself: "so it has quite a few commands/syntax that sqlite3 does not understand correctly." Clearly then, SQLite is not the tool for this task.
As for your particular error: without context (i.e. a traceback) there's nothing I can say about it. Martelli or Skeet could probably reach across time and space and read your interpreter's mind, but me, not so much.
The SQL recognized by MySQL and the SQL in SQLite are quite different. I suggest dumping the data of each table individually, then loading the data into equivalent tables in SQLite.
Create the tables in SQLite manually, using a subset of the "CREATE TABLE" commands given in your raw-dump file.
Related
I have been using cx_Oracle to perform SQL queries on an Oracle database in Python. So far I have been pasting these queries into strings and then running them using the cursor.execute() function that comes with cx_Oracle:
#simple example
query = """SELECT *
FROM my_table"""
cursor.execute(query)
However, my select queries have gotten quite complex, and the code is starting to look a bit messy. I was wondering if there were any way to simply save the SQL code into a .sql file and for Python or cx_Oracle to call that file? I thought something like that might be simple to find using Google, but my searches are oddly coming up dry.
Well, you can certainly save SQL code to a file and load it:
query = open('foo.sql', 'r').read()
cursor.execute(query)
I can't find any reference to saved queries in cx_Oracle, so that may be your best bet.
I am creating a simple python program that needs to search a somewhat large database ( ~40 tables, 6 Million or so rows all together ).
Currently, I use MySQLdb to query my local MySQL database then I have some other python function that work with the data and returns some statistics and other stuff. I would like to share this with others that do not want to construct their own database. At this point the database is used for queries only.
How best can I share the database and python program as a "package". Do I have to give up on the SQL method and switch to some sort of text file database or is there an easier way... sqlite maybe?
If the answer is sqlite how do I go about exporting my current SQL database to the sqlite database? Is there any gotchas I should know about?
Currently I use simple SELECT quarries with a few WHERE statements to locate the data I need. I am afraid that if I switched to text based database I would end up having to write a large amount of code to make these queries.
Thank you in advance for any suggestions.
EDIT
So I wrote my little python program with an sqlite3 database and it works perfectly.
I ended up using using a shell script called mysql2sqlite.sh found here to convert my MySQL database to sqlite. It worked flawlessly.
I only had to change 2 lines of python code. Awesome.
My little program runs in osx, windows and linux (ubuntu and redhat) without any changes or hassle. Thanks for the advise!
Converting your database could be as easy as an sql-dump and then an import, depending on the complexity of your db. See this post for strategies and alternatives.
I have a CSV file and want to generate dumps of the data for sqlite, mysql, postgres, oracle, and mssql.
Is there a common API (ideally Python based) to do this?
I could use an ORM to insert the data into each database and then export dumps, however that would require installing each database. It also seems a waste of resources - these CSV files are BIG.
I am wary of trying to craft the SQL myself because of the variations with each database. Ideally someone has already done this hard work, but I haven't found it yet.
SQLAlchemy is a database library that (as well as ORM functionality) supports SQL generation in the dialects of the all the different databases you mention (and more).
In normal use, you could create a SQL expression / instruction (using a schema.Table object), create a database engine, and then bind the instruction to the engine, to generate the SQL.
However, the engine is not strictly necessary; the dialects each have a compiler that can generate the SQL without a connection; the only caveat being that you need to stop it from generating bind parameters as it does by default:
from sqlalchemy.sql import expression, compiler
from sqlalchemy import schema, types
import csv
# example for mssql
from sqlalchemy.dialects.mssql import base
dialect = base.dialect()
compiler_cls = dialect.statement_compiler
class NonBindingSQLCompiler(compiler_cls):
def _create_crud_bind_param(self, col, value, required=False):
# Don't do what we're called; return a literal value rather than binding
return self.render_literal_value(value, col.type)
recipe_table = schema.Table("recipe", schema.MetaData(), schema.Column("name", types.String(50), primary_key=True), schema.Column("culture", types.String(50)))
for row in [{"name": "fudge", "culture": "america"}]: # csv.DictReader(open("x.csv", "r")):
insert = expression.insert(recipe_table, row, inline=True)
c = NonBindingSQLCompiler(dialect, insert)
c.compile()
sql = str(c)
print sql
The above example actually works; it assumes you know the target database table schema; it should be easily adaptable to import from a CSV and generate for multiple target database dialects.
I am no database wizard, but AFAIK in Python there's not a common API that would do out-of-the-box what you ask for. There is PEP 249 that defines an API that should be used by modules accessing DB's and that AFAIK is used at least by the MySQL and Postgre python modules (here and here) and that perhaps could be a starting point.
The road I would attempt to follow myself - however - would be another one:
Import the CVS nto MySQL (this is just because MySQL is the one I know best and there are tons of material on the net, as for example this very easy recipe, but you could do the same procedure starting from another database).
Generate the MySQL dump.
Process the MySQL dump file in order to modify it to meet SQLite (and others) syntax.
The scripts for processing the dump file could be very compact, although they might somehow be tricky if you use regex for parsing the lines. Here's an example script MySQL → SQLite that I simply pasted from this page:
#!/bin/sh
mysqldump --compact --compatible=ansi --default-character-set=binary mydbname |
grep -v ' KEY "' |
grep -v ' UNIQUE KEY "' |
perl -e 'local $/;$_=<>;s/,\n\)/\n\)/gs;print "begin;\n";print;print "commit;\n"' |
perl -pe '
if (/^(INSERT.+?)\(/) {
$a=$1;
s/\\'\''/'\'\''/g;
s/\\n/\n/g;
s/\),\(/\);\n$a\(/g;
}
' |
sqlite3 output.db
You could write your script in python (in which case you should have a look to re.compile for performance).
The rationale behind my choice would be:
I get the heavy-lifting [importing and therefore data consistency checks + generating starting SQL file] done for me by mysql
I only have to have one database installed.
I have full control on what is happening and the possibility to fine-tune the process.
I can structure my script in such a way that it will be very easy to extend it for other databases (basically I would structure it like a parser that recognises individual fields + a set of grammars - one for each database - that I can select via command-line option)
There is much more documentation on the differences between SQL flavours than on single DB import/export libraries.
EDIT: A template-based approach
If for any reason you don't feel confident enough to write the SQL yourself, you could use a sort of template-based script. Here's how I would do it:
Import and generate a dump of the table in all the 4 DB you are planning to use.
For each DB save the initial part of the dump (with the schema declaration and all the rest) and a single insert instruction.
Write a python script that - for each DB export - will output the "header" of the dump plus the same "saved line" into which you will programmatically replace the values for each line in your CVS file.
The obvious drawback of this approach is that your "template" will only work for one table. The strongest point of it is that writing such script would be extremely easy and quick.
HTH at least a bit!
You could do this - Create SQL tables from CSV files
or Generate Insert Statements from CSV file
or try this Generate .sql from .csv python
Of course you might need to tweak the scripts mentioned to suite your needs.
I facing an atypical conversion problem. About a decade ago I coded up a large site in ASP. Over the years this turned into ASP.NET but kept the same database.
I've just re-done the site in Django and I've copied all the core data but before I cancel my account with the host, I need to make sure I've got a long-term backup of the data so if it turns out I'm missing something, I can copy it from a local copy.
To complicate matters, I no longer have Windows. I moved to Ubuntu on all my machines some time back. I could ask the host to send me a backup but having no access to a machine with MSSQL, I wouldn't be able to use that if I needed to.
So I'm looking for something that does:
db = {}
for table in database:
db[table.name] = [row for row in table]
And then I could serialize db off somewhere for later consumption... But how do I do the table iteration? Is there an easier way to do all of this? Can MSSQL do a cross-platform SQLDump (inc data)?
For previous MSSQL I've used pymssql but I don't know how to iterate the tables and copy rows (ideally with column headers so I can tell what the data is). I'm not looking for much code but I need a poke in the right direction.
Have a look at the sysobjects and syscolumns tables. Also try:
SELECT * FROM sysobjects WHERE name LIKE 'sys%'
to find any other metatables of interest. See here for more info on these tables and the newer SQL2005 counterparts.
I've liked the ADOdb python module when I've needed to connect to sql server from python. Here is a link to a simple tutorial/example: http://phplens.com/lens/adodb/adodb-py-docs.htm#tutorial
I know you said JSON, but it's very simple to generate a SQL script to do an entire dump in XML:
SELECT REPLACE(REPLACE('SELECT * FROM {TABLE_SCHEMA}.{TABLE_NAME} FOR XML RAW', '{TABLE_SCHEMA}',
QUOTENAME(TABLE_SCHEMA)), '{TABLE_NAME}', QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_SCHEMA
,TABLE_NAME
As an aside to your coding approach - I'd say :
set up a virtual machine with an eval on windows
put sql server eval on it
restore your data
check it manually or automatically using the excellent db scripting tools from red-gate to script the data and the schema
if fine then you have (a) a good backup and (b) a scripted output.
I have created a Python module that creates and populates several SQLite tables. Now, I want to use it in a program but I don't really know how to call it properly. All the tutorials I've found are essentially "inline", i.e. they walk through using SQLite in a linear fashion rather than how to actually use it in production.
What I'm trying to do is have a method check to see if the database is already created. If so, then I can use it. If not, an exception is raised and the program will create the database. (Or use if/else statements, whichever is better).
I created a test script to see if my logic is correct but it's not working. When I create the try statement, it just creates a new database rather than checking if one already exists. The next time I run the script, I get an error that the table already exists, even if I tried catching the exception. (I haven't used try/except before but figured this is a good time to learn).
Are there any good tutorials for using SQLite operationally or any suggestions on how to code this? I've looked through the pysqlite tutorial and others I found but they don't address this.
Don't make this more complex than it needs to be. The big, independent databases have complex setup and configuration requirements. SQLite is just a file you access with SQL, it's much simpler.
Do the following.
Add a table to your database for "Components" or "Versions" or "Configuration" or "Release" or something administrative like that.
CREATE TABLE REVISION(
RELEASE_NUMBER CHAR(20)
);
In your application, connect to your database normally.
Execute a simple query against the revision table. Here's what can happen.
The query fails to execute: your database doesn't exist, so execute a series of CREATE statements to build it.
The query succeeds but returns no rows or the release number is lower than expected: your database exists, but is out of date. You need to migrate from that release to the current release. Hopefully, you have a sequence of DROP, CREATE and ALTER statements to do this.
The query succeeds, and the release number is the expected value. Do nothing more, your database is configured correctly.
AFAIK an SQLITE database is just a file.
To check if the database exists, check for file existence.
When you open a SQLITE database it will automatically create one if the file that backs it up is not in place.
If you try and open a file as a sqlite3 database that is NOT a database, you will get this:
"sqlite3.DatabaseError: file is encrypted or is not a database"
so check to see if the file exists and also make sure to try and catch the exception in case the file is not a sqlite3 database
SQLite automatically creates the database file the first time you try to use it. The SQL statements for creating tables can use IF NOT EXISTS to make the commands only take effect if the table has not been created This way you don't need to check for the database's existence beforehand: SQLite can take care of that for you.
The main thing I would still be worried about is that executing CREATE TABLE IF EXISTS for every web transaction (say) would be inefficient; you can avoid that by having the program keep an (in-memory) variable saying whether it has created the database today, so it runs the CREATE TABLE script once per run. This would still allow for you to delete the database and start over during debugging.
As #diciu pointed out, the database file will be created by sqlite3.connect.
If you want to take a special action when the file is not there, you'll have to explicitly check for existance:
import os
import sqlite3
if not os.path.exists(mydb_path):
#create new DB, create table stocks
con = sqlite3.connect(mydb_path)
con.execute('''create table stocks
(date text, trans text, symbol text, qty real, price real)''')
else:
#use existing DB
con = sqlite3.connect(mydb_path)
...
Sqlite doesn't throw an exception if you create a new database with the same name, it will just connect to it. Since sqlite is a file based database, I suggest you just check for the existence of the file.
About your second problem, to check if a table has been already created, just catch the exception. An exception "sqlite3.OperationalError: table TEST already exists" is thrown if the table already exist.
import sqlite3
import os
database_name = "newdb.db"
if not os.path.isfile(database_name):
print "the database already exist"
db_connection = sqlite3.connect(database_name)
db_cursor = db_connection.cursor()
try:
db_cursor.execute('CREATE TABLE TEST (a INTEGER);')
except sqlite3.OperationalError, msg:
print msg
Doing SQL in overall is horrible in any language I've picked up. SQLalchemy has shown to be easiest from them to use because actual query and committing with it is so clean and absent from troubles.
Here's some basic steps on actually using sqlalchemy in your app, better details can be found from the documentation.
provide table definitions and create ORM-mappings
load database
ask it to create tables from the definitions (won't do so if they exist)
create session maker (optional)
create session
After creating a session, you can commit and query from the database.
See this solution at SourceForge which covers your question in a tutorial manner, with instructive source code :
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
Yes, I was nuking out the problem. All I needed to do was check for the file and catch the IOError if it didn't exist.
Thanks for all the other answers. They may come in handy in the future.