Test Postgresql query on a table with JSON columns using Python - python

Example query:
SELECT error->>'message' as message
FROM error_cases
In reality, my query is way more complicated and I would like to make sure that future code changes won't destroy the data this query outputs. I would like to compare result of this query with some particular output I already have.
I am using testing.postgresql library to create temporary database, run the query, save the output and destroy the database.
My query uses Postgresql ->> notation. I get the error:
psycopg2.errors.UndefinedFunction: operator does not exist: text ->> unknown
To reproduce, first I create table:
cur.execute('CREATE TABLE error_cases (error TEXT NOT NULL)')
Then I insert data:
cur.execute('INSERT INTO error_cases
VALUES ('{"message": "someMessage"}')
And select:
select (error->>'message') as message from error_cases
I've looked at sqlalchemy to query the data, but the problem is that I want to test this particular query I have. In sqlalchemy for retrieving JSON I can't use Postgresql ->> notation, which is in my query.
---Is there any other way to run query containing ->> operator on database created using testing.postgresql?

I've just located the issue, which is so basic - should be:
CREATE TABLE error_cases (error JSONB NOT NULL) instead of
CREATE TABLE error_cases (error TEXT NOT NULL)

Related

Describe Snowflake table from Azure Databricks

I want to issue a DESC TABLE SQL command for a Snowflake table and using Azure Databricks but I can't quite figure it out! I'm not getting any errors but I'm not getting any results either. Here's the Python code I'm using:
options_vcp = {
"sfUrl": snowflake_url,
"sfUser": user,
"sfPassword": password,
"sfDatabase": db,
"sfWarehouse": wh,
"sfSchema": sch
}
sfUtils = sc._jvm.net.snowflake.spark.snowflake.Utils
sfUtils.runQuery(options_vcp, "DESC TABLE myTable")
I can download the Snowflake table using the "sfDatabase", "sfWarehouse", etc. values so they seem to be correct. I can run the DESC TABLE command in Snowflake and get correct results. But the only output I'm getting from databricks is this:
Out[1]: JavaObject id=o315
Does anyone know how to display this JavaObject or know of a different method to run DESC TABLE from Databricks?
From doc: Executing DDL/DML SQL Statements:
The runQuery method returns only TRUE or FALSE. It is intended for statements that do not return a result set, for example DDL statements like CREATE TABLE and DML statements like INSERT, UPDATE, and DELETE. It is not useful for statements that return a result set, such as SELECT or SHOW.
Alternative approach is to use INFORMATION_SCHEMA.COLUMNS view:
df = spark.read.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("query", "SELECT * FROM information_schema.columns WHERE table_name ILIKE 'myTable'")
.load()
Related: Moving Data from Snowflake to Spark:
When using DataFrames, the Snowflake connector supports SELECT queries only.
Usage Notes
Currently, the connector does not support other types of queries (e.g. SHOW or DESC, or DML statements) when using DataFrames.
I suggest using get_ddl() in your select statement to get the object definition:
https://docs.snowflake.com/en/sql-reference/functions/get_ddl.html

Store json file in MS SQL

I have a JSON file, how to store a JSON file in MS SQL? And read the file data after storing it into Database?
I'm using a python script to interact with SQL Server.
Note: I don't want to store key-value pairs as an individual records in DB, I want to store the whole file in DB using python.
There is no specific data type for JSON in SQL Server, unlike say XML which has the xml data type.
If you are, however, storing JSON data in SQL Server then you will want to use an nvarchar(MAX). If you are on SQL Server 2016+ I also recommend adding a CHECK CONSTRAINT to the column to ensure that the JSON is valid, as otherwise parsing it (in SQL) will be impossible. You can check if a value is valid JSON using ISJSON. For example, if you were adding the column to an existing table:
ALTER TABLE dbo.YourTable ADD YourJSON nvarchar(MAX) NULL;
GO
ALTER TABLE dbo.YourTable ADD CONSTRAINT chk_YourTable_ValidJSON CHECK (ISJSON(YourJSON) = 1 OR YourJSON IS NULL);
SQL server has a JSON data type for this. This is wrong.
If your version doesn’t, you can just store it as a string with VARCHAR or TEXT.
This article reckons NVARCHAR(max) is the answer for documents greater than 8KB, for documents under that you can use NVARCHAR(4000) which apparently has better performance.

Adding a column from an existing BQ table to another BQ table using Python

I am trying to experiment with creating new tables from existing BQ tables, all within python. So far I've successfully created the table using some similar code, but now I want to add another column to it from another table - which I have not been successful with. I think the problem comes somewhere within my SQL code.
Basically what I want here is to add another column named "ip_address" and put all the info from another table into that column.
I've tried splitting up the two SQL statements and running them separately, I've tried many different combinations of the commands (taking our CHAR, adding (32) after, combining all into one statement, etc.), and still I run into problems.
from google.cloud import bigquery
def alter(client, sql_alter, job_config, table_id):
query_job = client.query(sql_alter, job_config=job_config)
query_job.result()
print(f'Query results appended to table {table_id}')
def main():
client = bigquery.Client.from_service_account_json('my_json')
table_id = 'ref.datasetid.tableid'
job_config = bigquery.QueryJobConfig()
sql_alter = """
ALTER TABLE `ref.datasetid.tableid`
ADD COLUMN ip_address CHAR;
INSERT INTO `ref.datasetid.tableid` ip_address
SELECT ip
FROM `ref.datasetid.table2id`;
"""
alter(client, sql_alter, job_config, table_id)
if __name__ == '__main__':
main()
With this code, the current error is "400 Syntax error: Unexpected extra token INSERT at [4:9]" Also, do I have to continuously reference my table with ref.datasetid.tableid or can I write just tableid? I've run into errors before it gets there so I'm still not sure. Still a beginner so help is greatly appreciated!
BigQuery does not support ALTER TABLE or other DDL statements, take a look into how Modifying table schemas there you can find an example of how to add a new column when you append data to a table during a load job.

Can I somehow query all the existing tables in peewee / postgres?

I am writing a basic gui for a program which uses Peewee. In the gui, I would like to show all the tables which exist in my database.
Is there any way to get the names of all existing tables, lets say in a list?
Peewee has the ability to introspect Postgres, MySQL and SQLite for the following types of schema information:
Table names
Columns (name, data type, null?, primary key?, table)
Primary keys (column(s))
Foreign keys (column, dest table, dest column, table)
Indexes (name, sql*, columns, unique?, table)
You can get this metadata using the following methods on the Database class:
Database.get_tables()
Database.get_columns()
Database.get_indexes()
Database.get_primary_keys()
Database.get_foreign_keys()
So, instead of using a cursor and writing some SQL yourself, just do:
db = PostgresqlDatabase('my_db')
tables = db.get_tables()
For even more craziness, check out the reflection module, which can actually generate Peewee model classes from an existing database schema.
To get a list of the tables in your schema, make sure that you have established your connection and cursor and try the following:
cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
myables = cursor.fetchall()
mytables = [x[0] for x in mytables]
I hope this helps.

Getting the table name to select on from the user

My problem is I have a SELECT query but the table to select the data from needs to be specified by the user, from an HTML file. Can anyone suggest a way to do this?
I am querying a postgres database and the SQL queries are in a python file.
Create a variable table_name and verify it contains only characters allowed in a table name. Then put it into the SQL query:
sql = "SELECT ... FROM {} WHERE ...".format(table_name) # replace ... with real sql
If you don't verify it and the user sends something nasty, you run into risk.

Categories

Resources