SQLite database backup - python

For backup of SQLite database, I went through https://www.sqlite.org/backup.html
I came to know that there is one python wrapper over these SQLite online backup APIs.So, I went through https://github.com/husio/python-sqlite3-backup
Here are some doubts that I have about sqlitebck(python package)
I checked out the code in tests, it shows db is copied from :memory: to file and vice versa using sqlitebck.copy(:memory:,dbfile), I am confused about ":memory:" and its use.
Instead from memory, can I copy one database file to other database file like sqlitebck(dbfile1, dbfile2). so that dbfile2 will be the backup of dbfile1?

':memory:' as a "filename" is how you tell sqlite to keep a small DB in memory rather than on-disk. But yes, sqlitebck is fine to copy from file to file -- though the arguments it takes are sqlite connections so you'd need to sqlite3.connect to each file first (and usually might as well just copy the file directly w/o involving sqlite -- as the sqlite page you link to implies, suggesting Unix cp or Windows copy... Python has its own standard library module for file copies, https://docs.python.org/2/library/shutil.html ).

Related

in-memory sqlite in production with python

I am creating a python system that needs to handle many files. Each of the file has more than 10 thousand lines of text data.
Because DB (like mysql) can not be used in that environment, when file is uploaded by a user, I think I will save all the data of the uploaded file in in-memory-SQLite so that I can use SQL to fetch specific data from there.
Then, when all operations by program are finished, save the processed data in a file. This is the file users will receive from the system.
But some websites say SQLite shouldn't be used in production. But in my case, I just save them temporarily in memory to use SQL for the data. Is there any problem for using SQLite in production even in this scenario?
Edit:
The data in in-memory-DB doesn't need to be shared between processes. It just creates tables, process data, then discard all data and tables after saving the processed data in file. I just think saving everything in list makes search difficult and slow. So using SQLite is still a problem?
SQLite shouldn't be used in production is not a one-for-all rule, it's more of a rule of thumb. Of course there are appliances where one could think of reasonable use of SQLite even in production environments.
However your case doesn't seem to be one of them. While SQLite supports multi-threaded and multi-process environments, it will lock all tables when it opens a write transaction. You need to ask yourself whether this is a problem for your particular case, but if you're uncertain go for "yes, it's a problem for me".
You'd be probably okay with in-memory structures alone, unless there are some details you haven't uncovered.
I'm not familiar with the specific context of your system, but if what you're looking for is a SQL database that is
light
Access is from a single process and a single thread.
If the system crashes in the middle, you have a good way to recover from it (either backing up the last stable version of the database or just create it from scratch).
If you meet all these criteria, using SQLite is production is fine. OSX, for example, uses sqlite for a few purposes (e.g. ./var/db/auth.db).

Opening sqlite3 database with no write access

I'm trying to perform queries on an SQLite database from Python. Problem is, I do not have write access to the database file, and also not to the directory containing the database file.
When I connect to the database and perform a SELECT query I get an "Unable to open database file" error. I tried following the advice this answer, but it didn't work. I guess SQLite fails when trying to create the lock files.
When I have write access to the directory, but not to the sqlite file, I get another error - a locking error. This is because sqlite creates the shm and wal files with the same permissions as the db file, meaning I get shm and wal files I can't write to, resulting in a locking error.
Other than copying all files to a directory I do have full access to, is there another way around this?
The documentation says:
It is not possible to open read-only WAL databases. The opening process must have write privileges for "-shm" wal-index shared memory file associated with the database, if that file exists, or else write access on the directory containing the database file if the "-shm" file does not exist.
To allow read-only access to that database, some user with write permissions needs to change it to some other journal mode.
You can open a sqlite3 connection in read mode with the following syntax:
con = sqlite3.connect('file:path/to/database.sqlite?mode=ro', uri=True)

Sqlite3 or QtSql

Need to make a small database for a desktop app built using PySide. I don't know if both(sqlite3 and QtSql) are similar or not, but I'm leaning towards sqlite3. This is because, well, its Pythonic! I wanna know if I'll be missing out on something or not, such as performance, features, etc. (Or is there a convention to use each one considering the project at hand?)
I know this question will get closed because it may not seem constructive enough, and I'm sorry for that.
QtSql isn't a database engine like SQLite is, rather it is software for accessing databases through the Qt environment.
The Qt SQLite plugin makes it possible to access SQLite databases.
SQLite is an in-process database, which means that it is not necessary
to have a database server. SQLite operates on a single file, which
must be set as the database name when opening a connection. If the
file does not exist, SQLite will try to create it. SQLite also
supports in-memory databases, simply pass ":memory:" as the database
name. - Source

Copy neo4j database from python

I need to copy an existing neo4j database in Python. I even do not need it for backup, just to play around with while keeping the original database untouched. However, there is nothing about copy/backup operations in neo4j.py documentation (I am using python embedded binding).
Can I just copy the whole folder with the original neo4j database to a folder with a new name?
Or is there any special method available in neo4j.py?
Yes,
you can copy the whole DB directory when you have cleanly shut down the DB for backup.

How to load existing db file to memory in Python sqlite3?

I have an existing sqlite3 db file, on which I need to make some extensive calculations. Doing the calculations from the file is painfully slow, and as the file is not large (~10 MB), so there should be no problem to load it into memory.
Is there a Pythonic way to load the existing file into memory in order to speed up the calculations?
Here is the snippet that I wrote for my flask application:
import sqlite3
from io import StringIO
def init_sqlite_db(app):
# Read database to tempfile
con = sqlite3.connect(app.config['SQLITE_DATABASE'])
tempfile = StringIO()
for line in con.iterdump():
tempfile.write('%s\n' % line)
con.close()
tempfile.seek(0)
# Create a database in memory and import from tempfile
app.sqlite = sqlite3.connect(":memory:")
app.sqlite.cursor().executescript(tempfile.read())
app.sqlite.commit()
app.sqlite.row_factory = sqlite3.Row
What about sqlite3.Connection.backup(...)? "This method makes a backup of a SQLite database even while it’s being accessed by other clients, or concurrently by the same connection." Availability: SQLite 3.6.11 or higher. New in version 3.7.
import sqlite3
source = sqlite3.connect('existing_db.db')
dest = sqlite3.connect(':memory:')
source.backup(dest)
sqlite3.Connection.iterdump "[r]eturns an iterator to dump the database in an SQL text format. Useful when saving an in-memory database for later restoration. This function provides the same capabilities as the .dump command in the sqlite3 shell."
Get such an iterator and dump the disk-based database into a memory-based one, and you're ready to compute. When the computation is done, just dump the other way around back to disk.
First you should try and find out what is causing the slowness you are observing. Are you writing to tables? Are your writes within large enough transactions so that you don't save needless temporary results to disk? Can you change writes to go to temporary tables (with pragma temp_store=memory)? Can you live with pragma synchronous=off?
I don't think this functionality is exposed in the Python module, but sqlite has a backup API that sounds like exactly what you are asking for: a way to copy from one database to another (either one of which may be an in-memory database) that works pretty much automatically without any user-visible enumeration of tables. (Maybe APSW exposes this?)
Another option is to create a ram disk (if you have sufficient control of the environment) and copy the file there.
if we must use a python wrapper,then there are no better solution than the two pass, read and write solution.
but beginning with version 3.7.17, SQLite has the option of accessing disk content directly using memory-mapped I/O.sqlite mmap
if you want to use mmap,you have to use the C interface since no wrapper provide it.
and there is another hardware solution,the memory disk.then you have the convenient file IO and the speed of memory.
This has already been answered before, including code examples at In python, how can I load a sqlite db completely to memory before connecting to it?
You do not mention operating system, but one gotcha of Windows XP is that it defaults to a 10MB file cache, no matter how much memory you have. (This made sense in the days when systems came with 64MB etc). This message has several links:
http://marc.info/?l=sqlite-users&m=116743785223905&w=2
Here is a relatively simple way to read a SQLite db into memory. Depending upon your preferences with regard to manipulating data, you either use Pandas dataframe or write your table to a in-memory sqlite3 database. Similarly after manipulating your data you use the same df.to_sqlite approach to store your results back into a db table.
import sqlite3 as lite
from pandas.io.sql import read_sql
from sqlalchemy import create_engine
engine = create_engine('sqlite://')
c = engine.connect()
conmem = c.connection
con = lite.connect('ait.sqlite', isolation_level=None) #Here is the connection to <ait.sqlite> residing on disk
cur = con.cursor()
sqlx = 'SELECT * FROM Table'
df = read_sql(sqlx, con, coerce_float=True, params=None)
#Read SQLite table into a panda dataframe
df.to_sql(con=conmem, name='Table', if_exists='replace', flavor='sqlite')
With the solution of Cenk Alti, I always had a MemoryError with Python 3.7, when the process reached 500MB. Only with the use of the backup functionality of sqlite3 (mentioned by thinwybk), I was able to to load and save bigger SQLite databases. Also you can do the same with just 3 lines of code, both ways.
The answers of #thinwybk and Crooner are both excellent.
When you have multiple connections to :memory: sqlite databases, for instance when using SQLAlchemy together with the source.backup(dest) function, then you may end up in not placing the backup into the "right" memory DB.
This can be fixed using a proper connection string: https://stackoverflow.com/a/65429612/1617295
and does not involve any hack not use of undocumented features.
sqlite supports in-memory databases.
In python, you would use a :memory: database name for that.
Perhaps you could open two databases (one from the file, an empty one in-memory), migrate everything from the file database into memory, then use the in-memory database further to do calculations.

Categories

Resources