I'm writing a python script that would reset the database to an initial state (some hardcoded entries in every table). The db consists of multiple tables with primary and foreign keys.
Every time I would run the script, it should remove all the old entries in all of the tables, reset the primary key counter and insert the sample entries.
Currently I am trying to achieve this like this:
# Delete all the entries from the tables
cursor.execute("DELETE FROM table1")
cursor.execute("DELETE FROM table2")
cursor.execute("DELETE FROM table3")
# Reset the primary key counter and insert sample entries
cursor.execute("ALTER TABLE table1 AUTO_INCREMENT = 1")
cursor.execute("INSERT INTO table1(username, password) VALUES('user01', '123')")
cursor.execute("ALTER TABLE table2 AUTO_INCREMENT = 1")
cursor.execute("INSERT INTO table2(column1, column2) VALUES('column1_data', 'column2_data')")
This isn't working due to the presence of foreign keys in some tables (it won't let me delete them).
I generate the tables using a models.py script (I also use Django), so I thought I could solve this the following way:
remove the database programatically and create a new one with the same name
call the models.py script to generate empty tables in the db
insert sample data using the script I wrote
Is this a good solution or am I overlooking something?
I use scripts monthly to purge a transaction table, after archiving the contents.
Try using the 'truncate' command, ie.
truncate table [tablename];
It resets the counter (auto-increment) for primary key, automatically.
Then use your insert statements to populate base info.
Also, this preserves all of the table base settings (keys,indexes,.).
Related
Here is a recreation of the problem statement.
The database schemas for two tables (say Tab1 and Tab2) have been given in a file called (say) schemas.ddl. Multiple integrity constraints (including foreign keys) exist between these two tables. The data of these two tables has been given in data1.csv and data2.csv respectively. I am required to get the data from these csv files into the tables.
This is how I do it:
import psycopg2
conn = psycopg2.connect(database=postgres, user=postgres, host=localhost, port=5432)
cur = conn.cursor()
# Execute the schema, this works fine
cur.execute(open(schemas.ddl, "r").read())
files = ["data1.csv", "data2.csv"]
# Iterate through files to add data
for i in range(1,3):
copy_sql = "copy Tab"+str(i)+" from stdin with csv header delimiter as ',' null as 'null'"
with open(files[i-1],'r') as f:
cur.copy_expert(sql=copy_sql, file=f)
That is, I use copy_expert() as shown to copy the first file's data and then the second file's data. This however causes foreign key constraints (and other custom constraints) to be violated as the data is entered.
Is there a way to check the integrity of the entered data at commit, and not after every command? I've tried cur.execute("SET CONSTRAINTS ALL DEFERRED;") but that doesn't change anything.
Solution
by Anand Sowmithiran
Just make the foreign keys and the constraints as deferrable. Example:
Foreign Key(A) references B deferrable;
Or if you trust the data to be correct, just disable all triggers and re-enable them after copying.
Those FK constraints must be created with DEFERRABLE initially, only then it can be deferred by using the SET command. Alternatively, you can turn off all triggers on those 2 tables and then do the bulk copy operation, and then enable the triggers. PG uses triggers to enforce FK constraints.
ALTER TABLE tab2 DISABLE TRIGGER ALL;
--run your program---
--and then enable triggers
ALTER TABLE tab2 ENABLE TRIGGER ALL;
I am creating a database from different CSV files. After doing this I have tried to define the primary key table by table but I got an error.
c.execute("ALTER TABLE patient_data ADD PRIMARY KEY (ID);").fetchall()
OperationalError: near "PRIMARY": syntax error
Maybe the best thing to avoid this error is to define the primary key when the table is create but I dont know how to do that. I have been working with python for a few years but today is my first approach with SQL.
This is the code I use to import a CSV to a table
c.execute('''DROP TABLE IF EXISTS patient_data''')
c.execute(''' CREATE TABLE patient_data (ID, NHS_Number,Full_Name,Gender, Birthdate, Ethnicity, Postcode)''')
patients_admitted.to_sql('patient_data', conn, if_exists='append', index = False)
c.execute('''SELECT * FROM patient_data''').fetchall()
This is too long for a comment.
If your table does not have data, just re-create it with the primary key definition.
If your table does have data, you cannot add a primary key in one statement. Why not? The default value is either NULL or constant. And neither is allowed as a primary key.
And finally, SQLite does not allow you to add a primary key to an existing table. The solution is to copy the data to another table, recreate the table with the structure you want, and then copy the data back in.
I'm querying a json on a website for data, then saving that data into a variable so I can put it into a sqlite table. I'm 2 out of 3 for what I'm trying to do, but the sqlite side is just mystifying. I'm able to request the data, from there I can verify that the variable has data when I test it with a print, but all of my sqlite stuff is failing. It's not even creating a table, much less updating the table (but it is printing all the results to the buffer for some reason) Any idea what I'm doing wrong here? Disclaimer: Bit of a python noob. I've successfully created test tables just copying the stuff off of the python sqlite doc
# this is requesting the data and seems to work
for ticket in zenpy.search("bananas"):
id = ticket.id
subj = ticket.subject
created = ticket.created_at
for comment in zenpy.tickets.comments(ticket.id):
body = comment.body
# connecting to sqlite db that exists. things seem to go awry here
conn = sqlite3.connect('example.db')
c = conn.cursor()
# Creating the table table (for some reason table is not being created at all)
c.execute('''CREATE TABLE tickets_test
(ticket id, ticket subject, creation date, body text)''')
# Inserting the variables into the sqlite table
c.execute("INSERT INTO ticketstest VALUES (id, subj, created, body)")
# committing changes the changes and closing
c.commit()
c.close()
I'm on Windows 64bit and using pycharm to do this.
Your table likely isn't created because you haven't committed yet, and your sql fails before it commits. It should work when you fix your 2nd sql statement.
You're not inserting the variables you've created into the table. You need to use parameters. There are two ways of parameterizing your sql statement. I'll show the named placeholders one:
c.execute("INSERT INTO ticketstest VALUES (:id, :subj, :created, :body)",
{'id':id, 'subj':subj, 'created':created, 'body':body}
)
I am writing a basic gui for a program which uses Peewee. In the gui, I would like to show all the tables which exist in my database.
Is there any way to get the names of all existing tables, lets say in a list?
Peewee has the ability to introspect Postgres, MySQL and SQLite for the following types of schema information:
Table names
Columns (name, data type, null?, primary key?, table)
Primary keys (column(s))
Foreign keys (column, dest table, dest column, table)
Indexes (name, sql*, columns, unique?, table)
You can get this metadata using the following methods on the Database class:
Database.get_tables()
Database.get_columns()
Database.get_indexes()
Database.get_primary_keys()
Database.get_foreign_keys()
So, instead of using a cursor and writing some SQL yourself, just do:
db = PostgresqlDatabase('my_db')
tables = db.get_tables()
For even more craziness, check out the reflection module, which can actually generate Peewee model classes from an existing database schema.
To get a list of the tables in your schema, make sure that you have established your connection and cursor and try the following:
cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
myables = cursor.fetchall()
mytables = [x[0] for x in mytables]
I hope this helps.
I have written a python script to create a table using a create table if not exists statement and then insert rows from dataframe into vertica database. For the first time when I run this python script, I want it to create a table and insert the data - it works fine.
But from next time onwards, I want it to create a table only if it does not exist (works fine) and insert data only if that row is not contained in the database.
I use both insert statement and COPY statement to insert data. How to do this in python ? I am accessing Vertica database from python using pyodbc.
Editing the post to include some code :
There is a dataframe called tableframe_df , from which I need to populate content into a table created as bellow:
I am creating a table in vertica with create table if not exists, which creates a table if there is not one.
cursor.execute("create table if not exists <tablename> (fields in the table)")
COPY statement to write to this table from a csv that was created
`cursor.execute("COPY tablename1 FROM LOCAL 'tablename.csv' DELIMITER ',' exceptions 'exceptions' rejected data 'rejected'")`
##for i,row in tablename_df.iterrows():
cursor.execute("insert into tablename2 values(?,?,?,?,?,?,?,?,?,?,?,?)",row.values[0],row.values[1],row.values[2],row.values[3],row.values[4],row.values[5],row.values[6],row.values[7],row.values[8],row.values[9],row.values[10],row.values[11])
Here in the above code, I am creating table and then inserting into tablename1 and tablename2 using COPY and insert. This works fine when executed the first time ( as there is no data in the table). Now by mistake if I run the same script twice, the data will be inserted twice in these tables. What check should I perform to ensure that data does not get inserted if it is already present?
First I'll mention that INSERT VALUES is pretty slow if you are doing a lot of rows. If you are using batch sql and the standard vertica drivers, it should convert it to a COPY but if it doesn't then your inserts might take forever. I don't think this will happen with pyodbc since they don't implement executemany() optimally. You might be able to with ceodbc though, but I haven't tried it. Alternatively, you can use vertica_python which has a .copy('COPY FROM STDIN...',data) command that is efficient.
Anyhow, for your question...
You can do it one of two ways. Also for the inserts, I would really try to change this to a copy or at least an executemany. Again, pydobc does not do this properly, at least for the releases that I have used.
Use a control table that somehow uniquely describe the set of data being loaded and insert into it and check before running the script that the data set has not been loaded.
--Step 1. Check control table for data set load.
SELECT *
FROM mycontroltable
WHERE dataset = ?
--Step 2. If row not found, insert rows
for row in data:
cursor.execute('INSERT INTO mytargettable....VALUES(...)')
-- Step 3. Insert row into control table
INSERT INTO mycontroltable( dataset ) VALUES ( ? )
-- Step 4. Commit data
COMMIT
Alternatively you can insert or merge data in based on a key. You can create a temp or other staging table to do it. If you don't want updates and data does not change once inserted, then INSERT will be better as it will not incur a delete vector. I'll do INSERT based on the way you phrased your question.
--Step 1. Create local temp for intermediate target
CREATE LOCAL TEMP TABLE mytemp (fields) ON COMMIT DELETE ROWS;
--Step 2. Insert data.
for row in data:
cursor.execute('INSERT INTO mytemp....VALUES(...)')
--Step 3. Insert/select only data that doesn't exist by key value
INSERT INTO mytargettable (fields)
SELECT fields
FROM mytemp
WHERE NOT EXISTS (
SELECT 1
FROM mytargettable t
WHERE t.key = mytemp.key
)
--Step 4. Commit
COMMIT;