How do I add timestamp (using GETDATE()) to my insert statement? - python

I'm trying to figure out how to add a timestamp to my database table. df2 doesn't include any column for time so i'm trying to create the value either in values_ or when I execute to sql. I want to use the GETDATE() redshift function
values_ = ', '.join([f"('{str(i.columnA)}','{str(i.columnB)}','{str(i.columnC)}','{str(i.columnD)}', 'GETDATE()')" for i in df2.itertuples()])
sqlexecute(f'''insert into table.table2 (columnA, columnB, columnC, columnD, time_)
values
({values_})
;
''')
This is one of several errors I get depending on where I put GETDATE()
FeatureNotSupported: ROW expression, implicit or explicit, is not supported in target list

The "INSERT ... VALUES (...)" construct is for inserting literals into a table and getdate() is not a literal. However, there are a number of ways to get this to work. A couple of easy ways are:
You can make the default value of the column 'time_' be getdate() and then just use the key work default in the insert values statement. This will tell Redshift to use the default for the column (getdate())
insert into values ('A', 'B', 3, default)
You could switch to a "INSERT ... SELECT ..." construct which will allow you to have a mix of literals and function calls.
insert into table (select 'A', 'B', 3, getdate())
NOTE: inserting row by row into a table in Redshift can slow and make a mess of the table if the number of rows being inserted is large. This can be compounded if auto-commit is on as each insert will be committed which will need to work its way through the commit queue. If you are inserting a large amount of data you should do this through writing an S3 object and COPYing it to Redshift. Or at least bundling up 100+ rows of data into a single insert statement (with auto-commit off and explicitly commit the changes at the end).

When I created the table I added a time_log column using timestamp.
drop table if exists table1;
create table table1(
column1 varchar (255),
column2 varchar(255),
time_log timestamp
);
The issue was I had parentheses around the values in my insert statement. remove those and it will work.{values_}
sqlexecute(f'''insert into table.table2 (columnA, columnB, time_log)
values
({values_})
;
''')

Related

How to avoid explicit casting NULL during INSERT in Postgresql

I am writing python scripts to sychronize tables from a MSSQL database to a Postgresql DB. The original author tends to use super wide tables with a lot of regional consecutive NULL holes in them.
For insertion speed, I serialized the records in bulk to string in the following form before execute()
INSERT INTO A( {col_list} )
SELECT * FROM ( VALUES (row_1), (row_2),...) B( {col_list} )
During the row serialization, its not possbile to determin the data type of NULL or None in python. This makes the job complicated. All NULL values in timestamp columns, integer columns etc need explicit type cast into proper types, or Pg complains about it.
Currently I am checking the DB API connection.description property and compare column type_code, for every column and add type casting like ::timestamp as needed.
But this feels cumbersome, with the extra work: the driver already converted the data from text to proper python data type, now I have to redo it for column with those many Nones.
Is there any better way to work around this with elegancy & simplicity ?
If you don't need the SELECT, go with #Nick's answer.
If you need it (like with a CTE to use the input rows multiple times), there are workarounds depending on the details of your use case.
Example, when working with complete rows:
INSERT INTO A -- complete rows
SELECT * FROM (
VALUES ((NULL::A).*), (row_1), (row_2), ...
) B
OFFSET 1;
{col_list} is optional noise in this particular case, since we need to provide complete rows anyway.
Detailed explanation:
Casting NULL type when updating multiple rows
Instead of inserting from a SELECT, you can attach a VALUES clause directly to the INSERT, i.e.:
INSERT INTO A ({col_list})
VALUES (row_1), (row_2), ...
When you insert from a query, Postgres examines the query in isolation when trying to infer the column types, and then tries to coerce them to match the target table (only to find out that it can't).
When you insert directly from a VALUES list, it knows about the target table when performing the type inference, and can then assume that any untyped NULL matches the corresponding column.
You could try to create json from data and then rowset from json using json_populate_record(..).
postgres=# create table js_test (id int4, dat timestamp, val text);
CREATE TABLE
postgres=# insert into js_test
postgres-# select (json_populate_record(null::js_test,
postgres(# json_object(array['id', 'dat', 'val'], array['5', null, 'test']))).*;
INSERT 0 1
postgres=# select * from js_test;
id | dat | val
----+-----+------
5 | | test
You can use json_populate_recordset(..) to do the same with multiple rows in one go. You just pass json value that is array of json. Make sure it isn't array of json.
So this is OK: '[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]'::json
This is not: array['{"id":1,"dat":null,"val":6}'::json,'{"id":3,"val":"tst"}'::json]
select *
from json_populate_recordset(null::js_test,
'[{"id":1,"dat":null,"val":6},{"id":3,"val":"tst"}]')

Pyodbc - loading data inside a table causing error

I am trying to load the data inside the table trial and it says Invalid Column name - Name.
I am passing values inside Name and Area dynamically.
cursor.execute("insert into trial (NameofTheProperty, AreaofTheProperty)
values (Name, Area)")
cnxn.commit()
You need to have quotes around the column values so that they are not gonna be interpreted as column names instead:
insert into
trial (NameofTheProperty, AreaofTheProperty)
values
("Name", "Area")
Now, since you mentioned that you dynamically insert these values into the query, you can just let your database driver handle the quotes and other things like type conversions:
property_name = "Name"
property_area = "Area"
cursor.execute("""
insert into
trial (NameofTheProperty, AreaofTheProperty)
values
(?, ?)""", (property_name, property_area))
cnxn.commit()
This is called query parameterization and is considered the safest and the most robust way to insert values into the SQL queries. These ? values are called "placeholders".
Note that the database driver is gonna put quotes around the string values automatically - no need to do it manually.

Get All of Single Column from Every Table in Schema

In our system, we have 1000+ tables, each of which has an 'date' column containing DateTime object. I want to get a list containing every date that exists within all of the tables. I'm sure there should be an easy way to do this, but I've very limited knowledge of either postgresql or sqlalchemy.
In postgresql, I can do a full join on two tables, but there doesn't seem to be a way to do a join on every table in a schema, for a single common field.
I then tried to solve this programmatically in python with sqlalchemy. For each table, I did created a select distinct for the 'date' column, then set that list of selectes that to the selects property of a CompoundSelect object, and executed. As one might expect from an ugly brute force query, it has ben running now for an hour or so, and I am unsure if it has broken silently somewhere and will never return.
Is there a clean and better way to do this?
You definitely want to do this on the server, not at the application level, due to the many round trips between application and server and likely duplication of data in intermediate results.
Since you need to process 1,000+ tables, you should use the system catalogs and dynamically query the tables. You need a function to do that efficiently:
CREATE FUNCTION get_all_dates() RETURNS SETOF date AS $$
DECLARE
tbl name;
BEGIN
FOR tbl IN SELECT 'public.' || tablename FROM pg_tables WHERE schemaname = 'public' LOOP
RETURN QUERY EXECUTE 'SELECT DISTINCT date::date FROM ' || tbl;
END LOOP
END; $$ LANGUAGE plpgsql;
This will process all the tables in the public schema; change as required. If the tables are in multiple schemas you need to insert your additional logic on where tables are stored, or you can make the schema name a parameter of the function and call the function multiple times and UNION the results.
Note that you may get duplicate dates from multiple tables. These duplicates you can weed out in the statement calling the function:
SELECT DISTINCT * FROM get_all_dates() ORDER BY 1;
The function creates a result set in memory, but if the number of distinct dates in the rows in the 1,000+ tables is very large, the results will be written to disk. If you expect this to happen, then you are probably better off creating a temporary table at the beginning of the function and inserting the dates into that temp table.
Ended up reverting back to a previous solution of using SqlAlchemy to run the queries. This allowed me to parallelize things and run a little faster, since it really was a very large query.
I knew a few things with the dataset that helped with this query- I only wanted distinct dates from each table, and that the dates were the PK in my set. I ended up using the approach from this wiki page. Code being sent in the query looked like the following:
WITH RECURSIVE t AS (
(SELECT date FROM schema.tablename ORDER BY date LIMIT 1)
UNION ALL SELECT (SELECT knowledge_date FROM schema.table WHERE date > t.date ORDER BY date LIMIT 1)
FROM t WHERE t.date IS NOT NULL)
SELECT date FROM t WHERE date IS NOT NULL;
I pulled the results of that query into a list of all my dates if they weren't already in the list, then saved that for use later. It's possible that it takes just as long as running it all in the pgsql console, but it was easier for me to save locally than to have to query the temp table in the db.

Wildcards in column name for MySQL

I am trying to select multiple columns, but not all of the columns, from the database. All of the columns I want to select are going to start with "word".
So in pseudocode I'd like to do this:
SELECT "word%" from searchterms where onstate = 1;
More or less. I am not finding any documentation on how to do this - is it possible in MySQL? Basically, I am trying to store a list of words in a single row, with an identifier, and I want to associate all of the words with that identifier when I pull the records. All of the words are going to be joined as a string and passed to another function in an array/dictionary with their identifier.
I am trying to make as FEW database calls as possible to keep speedy code.
Ok, here's another question for you guys:
There are going to be a variable number of columns with the name "word" in them. Would it be faster to do a separate database call for each row, with a generated Python query per row, or would it be faster to simply SELECT *, and only use the columns I needed? Is it possible to say SELECT * NOT XYZ?
No, SQL doesn't provide you with any syntax to do such a select.
What you can do is ask MySQL for a list of column names first, then generate the SQL query from that information.
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'your_table'
AND column_name LIKE 'word%'
let's you select the column names. Then you can do, in Python:
"SELECT * FROM your_table WHERE " + ' '.join(['%s = 1' % name for name in columns])
Instead of using string concatenation, I would recommend using SQLAlchemy instead to do the SQL generating for you.
However, if all you are doing is limit the number of columns there is no need to do a dynamic query like this at all. The hard work for the database is selecting the rows; it makes little difference to send you 5 columns out of 10, or all 10.
In that case just use a "SELECT * FROM ..." and use Python to pick out the columns from the result set.
No, you cannot dynamically produce the list of columns to be selected. It will have to be hardcoded in your final query.
Your current query would produce a result set with one column and the value of that column would be the string "word%" in all rows that satisfy the condition.
You can generate the list of column names first by using
SHOW COLUMNS IN tblname LIKE "word%"
Then loop through the cursor and generate SQL statement uses all the columns from the query above.
"SELECT {0} FROM searchterms WHERE onstate = 1".format(', '.join(columns))
This could be helpful: MySQL wildcard in select
In conclusion it is not possible in MySQL directly.
What you could do as a dirty workaround is get all the column names from the table with an initial query (http://dev.mysql.com/doc/refman/5.0/en/show-columns.html) and then compare in python if the name matches your pattern. Afterwards you could do the MySQL select statement with the found column names like this:
SELECT word1, word2, word3 from searchterms where onstate = 1;

load data into table in Mysql

I created 4 columns for a table
cur.execute("""CREATE TABLE videoinfo (
id INT UNSIGNED PRIMARY KEY AUTO INCREMENT,
date DATETIME NOT NULL,
src_ip CHAR(32),
hash CHAR(150));
""")
I have a .txt file which has three columns of data inside. I want to use LOAD DATA LOCAL INFILEcommand to insert data, but the problem is ,the table I created now has four columns, the first one is the id, so, can mysql automatically insert data from the second column or extra command is needed?
Many thanks!
AUTO INCREMENT isn't valid syntax. If you check MySQL's documentation for the CREATE TABLE statement, you'll see the proper keyword is AUTO_INCREMENT.
Additionally, date is a keyword, so you'll need to quote it with backticks, as mentioned on the MySQL identifier documentation page. The documentation also lists all keywords, which must be quoted to use them as identifiers. To be safe, you could simply quote all identifiers.
To insert data only into some columns, you can explicitly specify columns. For LOAD DATA INFILE:
LOAD DATA INFILE 'file_name'
INTO TABLE videoinfo
(`date`, src_ip, hash)
For the INSERT statement:
INSERT INTO videoinfo (`date`, src_ip, hash)
VALUES (...);
This, too, is revealed in the MySQL manual. Notice a pattern?

Categories

Resources