SQLite - Check table format on CREATE and drop if needed

SQLite - Check table format on CREATE and drop if needed - python

Say I need a table that has to have two columns (A TEXT, B TEXT).
Every time before I run a program, I want to check if the table exists, and create it if it doesn't. Now say that the table with that name exists already, but has only one column (A TEXT), or maybe (A INT, B INT)
So in general, different columns.
How do I check that on CREATE query? And if there's a conflict back it up somewhere and drop, then create a new correct table. If there's no conflict - don't do anything.
I am working in Python, using sqlite3 by the way. Database is stored locally for now and program is distributed to multiple people, that's why I need to check the database.
Currently I have
con = sqlite3.connect(path)
with con:
cur = con.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS table (A TEXT, B TEXT);')

You can use the pragma table_info in order to get information about the table, and use the result to check your columns:
def validate(connection):
cursor = connection.cursor()
cursor.execute('PRAGMA table_info(table)')
columns = cursor.fetchall()
cursor.close()
return (len(columns) == 2
and columns[0][1:3] == ('A', 'TEXT')
and columns[1][1:3] == ('B', 'TEXT'))
So if validate returns False you can rename the table and create the new one.

Related

Using SQL Alchemy to Import Data and Replace Given a Condition

Below is the last part of my selenium web scraper that loops through the different tabs of this website page, selects the "export data" button, downloads the data, adds a "yearid" column, then loads the data into a MySQL table.
df = pd.read_csv(desired_filepath)
df["yearid"] = datetime.today().year
df[df.columns[df.columns.str.contains('%')]] = \
(df.filter(regex='%')
.apply(lambda x: pd.to_numeric(x.str.replace(r'[\s%]', ''),
errors='coerce')))
df.to_csv(desired_filepath)
engine = create_engine("mysql+pymysql://{user}:{pw}#localhost/{db}"
.format(user="walker",
pw="password",
db="data"))
df.to_sql(con=engine, name='fg_test_hitting_{}'.format(button_text), if_exists='replace')
time.sleep(10)
driver.quit()
Everything works great, but I would like to import the data into the MySQL table and replace only if the yearid=2018. Does anyone know if it is possible to load data and replace given a specific condition? Thanks in advance!

I think rather than deleting from your table it may be better to just let MySQL handle the replacing. You can do this by creating a temporary table with the new data, replace into the permanent table, then delete the temp table. The big caveat here is that you will need to set the keys in your table (Ideally only once). I don't know what your key fields are so its tough to help in this regard.
Replace the commented line with this:
# df.to_sql(con=engine, name='fg_test_hitting_{}'.format(button_text), if_exists='replace')
conn = engine.connect()
# should fail if temporary table already exists (we want it to fail in this case)
df.to_sql('fg_test_hitting_{}_tmp'.format(button_text), conn)
# Will create the permanent table if it does not already exist (will only matter in the first run)
# note that you may have to create keys here so that mysql knows what constitutes a replacement
conn.execute('CREATE TABLE IF NOT EXISTS fg_test_hitting_{} LIKE fg_test_hitting_{}_tmp;'.format(button_text, button_text))
# updating the permanent table and dropping the temporary table
conn.execute('REPLACE INTO fg_test_hitting_{} (SELECT * FROM fg_test_hitting_{}_tmp);'.format(button_text, button_text))
conn.execute('DROP TABLE IF EXISTS fg_test_hitting_{}_tmp;'.format(button_text))

As described by #Leo in comments first delete that part of data (from MySQL table) that you were going to update and then save it to MySQL table:
conn = engine.connect()
cur = conn.cursor()
...
cur.execute('delete from fg_test_hitting_{} where yearid=?'.format(button_text),
(pd.datetime.today().year,))
df.to_sql(con=engine, name='fg_test_hitting_{}'.format(button_text), if_exists='replace')

Store a set in SQLite

I am using Python and I would like to have a list of IDs stored in disk preserving some of the functionalities of a set (that is, efficiently checking if an ID is contained). To this end, I think using SQLite library is a wise decision (at least that is my impression after googling and stacking a bit). However, I am a beginner in SQL world and could not find any post explaining what I am looking for.
How can I store IDs (strings) in SQLite and later check if a specific ID appears or not in the database?
import sqlite3
id1 = 'abc'
id2 = 'def'
# Initialization of the database
define_database()
# Update the database by inserting a new ID
insert_in_database(id1)
insert_in_database(id2)
# Check if the specified ID is contained in the database (returns a Boolean)
check_if_exists_in_database(id1)
PS: I am aware of the sqlite3 library.
Thanks!

Just use a table with a single column. This column must be indexed (explicitly, or by making it the primary key) for lookups over large data to be efficient:
db = sqlite3.connect('...filename...')
def define_database():
db.execute('CREATE TABLE IF NOT EXISTS MyStuff(id PRIMARY KEY)')
(Use a WITHOUT ROWID table if your Python version is recent enough to have a modern version of the SQLite library.)
Inserting is done with standard SQL:
def insert_in_database(value):
db.execute('INSERT INTO MyStuff(id) VALUES(?)', [value])
To check whether a value exists, just try to read its row:
def check_if_exists_in_database(value):
for row in db.execute('SELECT 1 FROM MyStuff WHERE id = ?', [value])
return True
else:
return False

Python function to select records from a mysql table

I created a table with 100 records and now i want to generate a python function which takes a number and the table name and pulls records from the referenced table. ie. function should selects 'n' records from specified table.
I have already stated querying the database using python scripts. I can do a regular select but every time i want to select i have to edit the query. Is there a way for me to write a function in python that would take two parameters; eg.(n, table) that will allow me to select n records from any table in my database?
Is this possible and if it is where should i start?

You can use below function
def query_db( no_of_rows, table_name ):
cur = db.cursor()
query = """
select top %d rows from %s
""" %( int(no_of_rows), table_name )
cur.execute(query)
for row in cur.fetchall() :
print row[0]
Is this what you want, or am i missing something?

sqlite3 in Python: Update a column in one table through a column from another table when the primary key is the same

I want to use sqlite3 in Python. I have a table with four columns (id INT, other_no INT, position TEXT, classification TEXT, PRIMARY KEY is id). In this table, the column for classification is left empty and will be updated by the information from table 2. See my code below. I then have a second table which has three columns. (id INT, class TEXT, type TEXT, PRIMARY KEY (id)). Basically, the two tables have two common columns. In both tables, the primary key is the id column, the classification and class column would eventually have to be merged. So the code needs to be able to go through table 2 and whenever it finds a matching id in table 1 to updating the class column (of table 1) from the classification column of table 2. The information to build the two tables comes from two separate files.
# function to create Table1...
# function to create Table2...
(the tables are created as expected). The problem occurs when I try to update table1 with information from table2.
def update_table1():
con = sqlite3.connect('table1.db', 'table2.db') #I know this is wrong but if how do I connect table2 so that I don't get error that the Table2 global names is not defined?
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class # so now instead of Null it should have the class information from table2
cur.execute("UPDATE Table1 SET class = ? WHERE id =? ", (new_classification, id))
con.commit()
But, I get an error for line2: TypeError: a float is required. I know that it's because I put two parameters in the connect method. But then if I only connect with Table1 I get the error Table2 is not defined.
I read this post Updating a column in one table through a column in another table I understand the logic around it but I can't translate the SQL code into Python. I have been working on this for some time and can't seem to just get it. Would you please help? Thanks
After the comments of a user I got this code but it still doesn't work:
#connect to the database containing the two tables
cur.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1 = row[0]
cur.execute("SELECT (id, class) FROM Table2")
for row1 in cur.fetchall():
row_table2 = row[0] #catches the id
row_table2_class = row[1] #catches the name
if row_table1 == row_table2:
print "yes" #as a test for me to see the loop worked
new_class = row_table_class
cur.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1))
con.commit()
From this however I get an operational error. I know it's my syntax, but like I said I am new to this so any guidance is greatly appreciated.

You need a lot more code than what you have there. Your code logic should go something like this:
connect to sqlite db
execute a SELECT query on TABLE2 and fetch rows. Call this rows2.
execute a SELECT query on TABLE1 and fetch rows. Call this rows1.
For every id in rows1, if this id exists in rows2, execute an UPDATE on that particular id in TABLE1.
You are missing SELECT queries in your code:
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class
You can't just directly test like this. You need to first fetch the rows in both tables using SELECT queries before you can test them out the way you want.
Find below modified code from what you posted above. I have just typed that code in here directly, so I have not had the chance to test it, but you can look at it to get an idea. This could probably even run.
Also, this is by no means the most efficient way to do this. This is actually very clunky. Especially because for every id in Table1, you are fetching all the rows for Table2 everytime to match. Instead, you would want to fetch all the rows for Table1 once, then all the rows for Table2 once and then match them up. I will leave the optimization to make this faster upto you.
import sqlite3
#connect to the database containing the two tables
conn = sqlite3.connect("<PUT DB FILENAME HERE>")
cur = conn.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1_id = row[0]
cur2 = conn.execute("SELECT id, class FROM Table2")
for row1 in cur2.fetchall():
row_table2_id = row1[0] # catches the id
row_table2_class = row1[1] # catches the name
if row_table1_id == row_table2_id:
print "yes" # as a test for me to see the loop worked
new_class = row_table2_class
conn.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1_id))
conn.commit()

Migrating data from one sqlite database to multiple SQLite databases using SQLObject

Till now our application has been using one SQLite database with SQLObject as the ORM. Obviously at some point we knew we had to face the SQLite concurrency problem and so we did.
We ended up splitting the current database into multiple databases. Meaning each table schema remained the same but we distributed different tables into multiple databases keeping tightly coupled tables together.
Now this works very well in a clean install of the new version of our application but upgrade to the previous versions of our application to this new version needs a special data migration before our application can start working. In this case the database migration is simple moving the tables from this single database into appropriate different databases.
To exemplify, consider this is the older structure:
single_db.db --- A single db
* A -- Table A
* B -- Table B
* C -- Table C
* D -- Table D
* E -- Table E
* F -- Table F
The new structure:
db1.db --- Database 1
- A -- Table A
- B -- Table B
- C -- Table C
- D -- Table D
db2.db --- Database 2
- E -- Table E
db3.db --- Database 3
- F -- Table F
When the upgrade will happen, our application will create the new structure with the above 3 databases and with empty tables in them. Also the older database single_db.db with all the tables and actual data will be there. Now before our application can begin working, it should move the tables or I should say copy the data from a table from the older database to the corresponding table in the corresponding new database.
I will need to write the code for this database migration. I know I can query a table using the older database connection and insert the returned rows to the corresponding table using the newer database connection. One caveat I should mention here is some of these tables can contain large number of rows. That is rows can be till 2 - 2.5 million in 2/3 tables.
So want to ask if I can use any other SLQObject tricks since I am using SQLObject on top of SQLite and also has anyone done this before?
Thanks for your help.

I realise you probably solved this by now but for anyone googling I had to do almost exactly the same as the OP, this was the core part of the code that I used (it's modified from something I found, but I can't find it again to credit the original author, apologies!)
def _iterdump(connection, table_name):
"""
Returns an iterator to dump a database table in SQL text format.
"""
cu = connection.cursor()
yield('BEGIN TRANSACTION;')
# sqlite_master table contains the SQL CREATE statements for the database.
q = """
SELECT name, type, sql
FROM sqlite_master
WHERE sql NOT NULL AND
type == 'table' AND
name == :table_name
"""
schema_res = cu.execute(q, {'table_name': table_name})
for table_name, type, sql in schema_res.fetchall():
if table_name == 'sqlite_sequence':
yield('DELETE FROM sqlite_sequence;')
elif table_name == 'sqlite_stat1':
yield('ANALYZE sqlite_master;')
elif table_name.startswith('sqlite_'):
continue
else:
yield('%s;' % sql)
# Build the insert statement for each row of the current table
res = cu.execute("PRAGMA table_info('%s')" % table_name)
column_names = [str(table_info[1]) for table_info in res.fetchall()]
q = "SELECT 'INSERT INTO \"%(tbl_name)s\" VALUES("
q += ",".join(["'||quote(" + col + ")||'" for col in column_names])
q += ")' FROM '%(tbl_name)s'"
query_res = cu.execute(q % {'tbl_name': table_name})
for row in query_res:
yield("%s;" % row[0])
If you pass the sqlite connection for the original db and the name of the table in the original db this generator will give back commands that you can pass to execute on the sqlite object for the new db.
When I did this I also did a count of rows first on all the tables and incremented a counter as I executed INSERT lines so I could show progress on the migration.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.