Store a set in SQLite - python

I am using Python and I would like to have a list of IDs stored in disk preserving some of the functionalities of a set (that is, efficiently checking if an ID is contained). To this end, I think using SQLite library is a wise decision (at least that is my impression after googling and stacking a bit). However, I am a beginner in SQL world and could not find any post explaining what I am looking for.
How can I store IDs (strings) in SQLite and later check if a specific ID appears or not in the database?
import sqlite3
id1 = 'abc'
id2 = 'def'
# Initialization of the database
define_database()
# Update the database by inserting a new ID
insert_in_database(id1)
insert_in_database(id2)
# Check if the specified ID is contained in the database (returns a Boolean)
check_if_exists_in_database(id1)
PS: I am aware of the sqlite3 library.
Thanks!

Just use a table with a single column. This column must be indexed (explicitly, or by making it the primary key) for lookups over large data to be efficient:
db = sqlite3.connect('...filename...')
def define_database():
db.execute('CREATE TABLE IF NOT EXISTS MyStuff(id PRIMARY KEY)')
(Use a WITHOUT ROWID table if your Python version is recent enough to have a modern version of the SQLite library.)
Inserting is done with standard SQL:
def insert_in_database(value):
db.execute('INSERT INTO MyStuff(id) VALUES(?)', [value])
To check whether a value exists, just try to read its row:
def check_if_exists_in_database(value):
for row in db.execute('SELECT 1 FROM MyStuff WHERE id = ?', [value])
return True
else:
return False

Related

200GB Sqlite database file search and retrieve

I have a 250GB sqlite database file on an SSD drive and need to search through this file and search for a specific value in a table.
I wrote a script to perform the lookup in python and here is a similar sql statement to the one that I wrote:
SELECT table FROM database WHERE table like X'003485FAd480'.
I am looking to compare between hex values stored in a table to a given hex value.I am using Anaconda command prompt and not sure if this is the best route.
My question is about possible recommendations or tools to help speed up the lookup?
Thanks!
LIKE converts both operands into strings, so it might not work correctly if a value contains zero bytes or bytes that are not valid in the UTF-8 encoding.
To compare for equality, use =:
SELECT ... FROM MyTable WHERE MyColumn = x'003485FAD480';
This search can be sped up with an index on the lookup column; if you do not already have a primary key or unique constraint on this column, you can create an index manually:
CREATE INDEX MyLittleIndex ON MyTable(MyColumn);
I don't know if this is what your looking for, you mentioned using Python. If you're searching different values that are in Python, have you thought about writing two functions, one to search the database and one to compare those results and do something with them?
def queryFuntion():
cnxn = pyodbc.connect('DRIVER={SQLite3 ODBC Driver};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes') #for production use only
cursor = cnxn.cursor()
query = cursor.execute("SELECT table FROM database")
for row in cursor.fetchall():
yield str(row.table)
def compareFunction(row):
search = '003485FAd480'
if row == search:
print('Yes')
else:
print('No')

Effective batch "update-or-insert" in SqlAlchemy

There exists a table Users and in my code I have a big list of User objects. To insert them I can use :
session.add_all(user_list)
session.commit()
The problem is that there can be several duplicates which I want to update but the database wont allow to insert duplicate entries. For sure, I can iterate over user_list and try to insert user in the database and if it fails - update it :
for u in users:
q = session.query(T).filter(T.fullname==u.fullname).first()
if q:
session.query(T).filter_by(index=q.index).update({column: getattr(u,column) for column in Users.__table__.columns.keys() if column!='id'})
session.commit()
else:
session.add(u)
session.commit()
but I find this solution quiet ineffective : first, I am making several requests to retrieve object q, and instead of batch inserting of new items I insert them one per one. I wonder if there exists a better solution for this task.
UPD better version:
for u in users:
q = session.query(T).filter(Users.fullname==u.fullname).first()
if q:
for column in Users.__table__.columns.keys():
if not column=='index':
setattr(q,column,getattr(u,column))
session.add(q)
else:
session.add(u)
session.commit()
a better solution would be to use
INSERT ... ON DUPLICATE KEY UPDATE ...
bulk MySQL construct (I assume you're using MySQL because your post is tagged with 'mysql'). This way you're both inserting new entries and updating existing ones in one statement / transaction, see http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
It's not ideal if you have multiple unique indexes and, depending on your schema, you'll have to fill in all NOT NULL values (hence issuing one bulk SELECT before calling it), but it's definitely the most efficient option and we use it a lot. The bulk version will look something like (let's assume name is a unique key):
INSERT INTO User (name, phone, ...) VALUES
('ksmith', '111-11-11', ...),
('jford', '222-22,22', ...),
...,
ON DUPLICATE KEY UPDATE
phone = VALUES(phone),
... ;
Unfortunately, INSERT ... ON DUPLICATE KEY UPDATE ... is not supported natively by SQLa so you'll have to implement a little helper function which will build the query for you.

SQLite - Check table format on CREATE and drop if needed

Say I need a table that has to have two columns (A TEXT, B TEXT).
Every time before I run a program, I want to check if the table exists, and create it if it doesn't. Now say that the table with that name exists already, but has only one column (A TEXT), or maybe (A INT, B INT)
So in general, different columns.
How do I check that on CREATE query? And if there's a conflict back it up somewhere and drop, then create a new correct table. If there's no conflict - don't do anything.
I am working in Python, using sqlite3 by the way. Database is stored locally for now and program is distributed to multiple people, that's why I need to check the database.
Currently I have
con = sqlite3.connect(path)
with con:
cur = con.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS table (A TEXT, B TEXT);')
You can use the pragma table_info in order to get information about the table, and use the result to check your columns:
def validate(connection):
cursor = connection.cursor()
cursor.execute('PRAGMA table_info(table)')
columns = cursor.fetchall()
cursor.close()
return (len(columns) == 2
and columns[0][1:3] == ('A', 'TEXT')
and columns[1][1:3] == ('B', 'TEXT'))
So if validate returns False you can rename the table and create the new one.

creating blank field and receving the INTEGER PRIMARY KEY with sqlite, python

I am using sqlite with python. When i insert into table A i need to feed it an ID from table B. So what i wanted to do is insert default data into B, grab the id (which is auto increment) and use it in table A. Whats the best way receive the key from the table i just inserted into?
As Christian said, sqlite3_last_insert_rowid() is what you want... but that's the C level API, and you're using the Python DB-API bindings for SQLite.
It looks like the cursor method lastrowid will do what you want (search for 'lastrowid' in the documentation for more information). Insert your row with cursor.execute( ... ), then do something like lastid = cursor.lastrowid to check the last ID inserted.
That you say you need "an" ID worries me, though... it doesn't matter which ID you have? Unless you are using the data just inserted into B for something, in which case you need that row ID, your database structure is seriously screwed up if you just need any old row ID for table B.
Check out sqlite3_last_insert_rowid() -- it's probably what you're looking for:
Each entry in an SQLite table has a
unique 64-bit signed integer key
called the "rowid". The rowid is
always available as an undeclared
column named ROWID, OID, or _ROWID_ as
long as those names are not also used
by explicitly declared columns. If the
table has a column of type INTEGER
PRIMARY KEY then that column is
another alias for the rowid.
This routine returns the rowid of the
most recent successful INSERT into the
database from the database connection
in the first argument. If no
successful INSERTs have ever occurred
on that database connection, zero is
returned.
Hope it helps! (More info on ROWID is available here and here.)
Simply use:
SELECT last_insert_rowid();
However, if you have multiple connections writing to the database, you might not get back the key that you expect.

How do you safely and efficiently get the row id after an insert with mysql using MySQLdb in python?

I have a simple table in mysql with the following fields:
id -- Primary key, int, autoincrement
name -- varchar(50)
description -- varchar(256)
Using MySQLdb, a python module, I want to insert a name and description into the table, and get back the id.
In pseudocode:
db = MySQLdb.connection(...)
queryString = "INSERT into tablename (name, description) VALUES" % (a_name, a_desc);"
db.execute(queryString);
newID = ???
I think it might be
newID = db.insert_id()
Edit by Original Poster
Turns out, in the version of MySQLdb that I am using (1.2.2)
You would do the following:
conn = MySQLdb(host...)
c = conn.cursor()
c.execute("INSERT INTO...")
newID = c.lastrowid
I am leaving this as the correct answer, since it got me pointed in the right direction.
I don't know if there's a MySQLdb specific API for this, but in general you can obtain the last inserted id by SELECTing LAST_INSERT_ID()
It is on a per-connection basis, so you don't risk race conditions if some other client performs an insert as well.
You could also do a
conn.insert_id
The easiest way of all is to wrap your insert with a select count query into a single stored procedure and call that in your code. You would pass in the parameters needed to the stored procedure and it would then select your row count.

Categories

Resources