Snowflake & SQLAlchemy "unexpected 'UNIQUE' - python

I'm running a python db migration script (Flask-Migrate) and have added the alembic.ddl.imp import DefaultImpl to get around the first set of errors but now I'm getting the following. I'm trying to use this script to set up my tables and database in snowflake. What am I missing? Everything seems to be working and I can't seem to find any help on this particular error in the snowflake documentation. I would assume that the snowflake sqlalchemy connector would address the creation of a unique index.
The script so far does create several of the tables, but when it gets to this part it throws the error.
> sqlalchemy.exc.ProgrammingError:
> (snowflake.connector.errors.ProgrammingError) 001003 (42000): SQL
> compilation error: syntax error line 1 at position 7 unexpected
> 'UNIQUE'. [SQL: CREATE UNIQUE INDEX ix_flicket_users_token ON
> flicket_users (token)] (Background on this error at:
> http://sqlalche.me/e/f405)

Snowflake does not have INDEX objects, so any CREATE ... INDEX statement will fail.
With Snowflake, you have to trust the database to organize your data with micro partitions and build a good access plan for your queries.
You will feel uneasy at first, but eventually stop worrying.
Bleeding edge solutions will require monitoring/tuning performance using the query log, however.
Nothing new here.

Related

Great expectations framework - AWS Redshift connection

I'm trying to set up a connection to AWS Redshift from the Great Expectations Framework (GE) according to the tutorial using Python and facing two issues:
When I'm using postgresql+psycopg2 as driver in the connection string in step 5, adding the datasource (context.add_datasource(**datasource_config)) takes extremely long (up to 20 minutes !!!). Validating expectations afterwards works as expected and even runs quite fast. I'm assuming the huge amount of time needed is due to the size of the redshift cluster I'm connecting to (more than 1000 schemas) and the postgresql driver not being optimized for redshift.
In search for alternatives to the postgresql driver I came across the sqlalchemy-redshift driver. Changing it in the connection string (redshift+psycopg2) adds the datasource instantly, however, validating some expectations (e.g. expect_column_values_to_not_be_null) fails! After some digging through the code I realized it might be due to GE creating a temporary table in the SQL query. So when I specify the query:
select * from my_redshift_schema.my_table;
GE actually seems to run something like:
CREATE TEMPORARY TABLE "ge_temp_bf3cbfa2" AS select * from my_redshift_schema.my_table;
For certain expectations sqlalchemy-redshift tries to find information about the columns of the table, however, it searches for the name of the temporary table and not the actual one I specified in the SQL query. It consequently fails as it obviously can't find a table with that name in the redshift cluster. More specifically it results in a KeyError in the dialect.py file within sqlalchemy-redshift:
.venv/lib/python3.8/site-packages/sqlalchemy_redshift/dialect.py\", line 819, in _get_redshift_columns
return all_schema_columns[key]
KeyError: RelationKey(name='ge_temp_bf3cbfa2', schema='public')
Has anyone succeeded running GE on redshift? How could I mitigate the issues I'm facing (make option 1 faster or fix the error in option 2)?

Sql Alchemy Insert Statement failing to insert, but no error

I am attempting to execute a raw sql insert statement in Sqlalchemy, SQL Alchemy throws no errors when the constructed insert statement is executed but the lines do not appear in the database.
As far as I can tell, it isn't a syntax error (see no 2), it isn't an engine error as the ORM can execute an equivalent write properly (see no 1), it's finding the table it's supposed to write too (see no 3). I think it's a problem with a transaction not being commited and have attempted to address this (see no 4) but this hasn't solved the issue. Is it possible to create a nested transaction and what would start the 'first' so to speak?
Thankyou for any answers.
Some background:
I know that the ORM facilitates this and have used this feature and it works, but is too slow for our application. We decided to try using raw sql for this particular write function due to how often it's called and the ORM for everything else. An equivalent method using the ORM works perfectly, and the same engine is used for both, so it can't be an engine problem right?
I've issued an example of the SQL that the method using raw sql constructs to the database directly and that reads in fine, so I don't think it's a syntax error.
it's communicating with the database properly and can find the table as any syntax errors with table and column names throw a programmatic error so it's not just throwing stuff into the 'void' so to speak.
My first thought after reading around was that it was transaction error and that a transaction was being created and not closed, and so constructed the execute statement as such to ensure a transaction was properly created and commited.
with self.Engine.connect() as connection:
connection.execute(Insert_Statement)
connection.commit
The so called 'Insert Statement' has been converted to text using the sqlalchemy 'text' function, I don't quite understand why it won't execute if I pass the constructed string directly to the execute statement but mention it in case it's relevant.
Other things that may be relevant:
Python3 is running on an individual ec2 instance the postgres database on another. The table in particular is a timescaledb hypertable taking realtime data, hence the need for very fast writes, but probably not relevant.
Currently using pg8000 as dialect for no particular reason other than psycopg2 was throwing errors when trying the execute an equivalent method using the ORM.
Just so this question is answered in case anyone else ends up here:
The issue was a failure to call commit as a method, as #snakecharmerb pointed out. Gord Thompson also provided an alternate method using 'begin' which automatically commits rather than connection which is a 'commit as you go' style transaction.

Insert in data in db2 using sqlalchemy using pyodbc and iacces

I'm trying to insert some data on a DB2 table on an IBM iSeries (AS400) server, using sqlalchemy, pyodbc and the iaccess packages.
The server allows me to run SELECT and CREATE queries, but when I try to insert rows i get the following error:
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY000', "[REDACTED]
SQL7008 - TABLE in DATABASE not valid for operations. (-7008)
(SQLExecDirectW)")
I'm executing the following query
INSERT INTO database.table VALUES ('A', 'B', 'C')
I know the query works because I am able to run it using the same credentials from Aqua data studio, a db management IDE.
I'm using the following python code to connect to the db:
from sqlalchemy import create_engine
import pandas as pd
engine_statement = f"iaccess+pyodbc://{user}:{pwd}#{server}/{schema_name}?DRIVER={driver}"
connection = create_engine(engine_statement)
I tried using ibmi instead of iaccess+pyodbc but nothing changes.
The closest queztion I found asks the same thing but using Java.
I tried implementing the answer there in python, by setting the isolation_level option to all possible values, but still nothing changes.
I'm not 100% sure how journaling works and therefore how to use it so I was not able to implement point 2 in the answer.
If it may help, I am able to create new tables, but not write on them, which seems surprising, but I'm no sql expert so I guess I'm missing something.

MySQL ONLY_FULL_GROUP_BY error despite not having ONLY_FULL_GROUP_BY enabled

I've recently migrated a MySQL database from local to Google Cloud Platform in preparation for deployment. When I first did this, I encountered:
MySQLdb._exceptions.OperationalError: (1055, "Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testing.Event.EventID' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")
Annoying, but no problem: a quick search revealed it could be turned off in the tags section of the GCP console, and that seemed to be OK as I wasn't too worried about the risks of turning it off. This worked, or so I thought. The same issue continues to appear days after setting the "Traditional" tag on my GCP SQL instance.
Even when I run the query:
SELECT ##sql_mode;
The result I get is:
STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
Which does not contain the only_full_group_by setting, and nonetheless I receive the error: this is incompatible with sql_mode=only_full_group_by
Is there some reason that this error would continue to appear despite not being in the setting that the error code says causes the error?

python: Invalid data type (0) (SQLBindParameter)

I am using python 2.7 for a specific job. I am connecting to MSSQL Server (2008) using FreeTDS. I can make some simple select queries but when I try to run a parametrised query I got an error:
('HY004', '[HY004] [FreeTDS][SQL Server]Invalid data type (0) (SQLBindParameter)')
Here is my query:
query = u"UPDATE table SET column1=? WHERE column2=?"
cursor.execute(query,[param1, param2])
However the same code on live works fine.
I have skimmed so many thread in various forums but they all seem misleading and I am really confused.
What is my actual problem and what do you suggest?
Edit: I've added query.
I know this is a super old thread, but I came across this same problem and the solution for me was to type cast the variables. For instance:
query = u"UPDATE table SET column1=? WHERE column2=?"
cursor.execute(query,[str(param1), str(param2)])
In this case it doesn't really matter what type the parameters are as it will be converted to a string.

Categories

Resources