psycopg3 interpret empty space column strings as NULL

psycopg3 interpret empty space column strings as NULL - python

I am using the new psycopg3 driver and implementing a copy from syntax as follows:
#Connect to an existing database
with psycopg.connect(f'postgresql://{config.USER_PG}:{config.PASS_PG}#{config.HOST_PG}:{config.PORT_PG}/{config.DATABASE_PG}') as conn:
# Open a cursor to perform database operations
with conn.cursor() as cur:
cur.execute("TRUNCATE TABLE eim RESTART IDENTITY")
# Copy buffer into PG
with cur.copy("COPY eim(FIMNO,SPV_COMPONENT_ID,EIM_TECHNOLOGY_NAME,EIM_TECHNOLOGY_VERSION,ACTUAL_REMEDIATION_DATE,INTENDED_REMEDIATION_DATE,ISSUESTATUS,CURRENT_MANU_PHASE,CURRENT_MANU_PHASE_START,CURRENT_MANU_PHASE_END,REMEDIATION_TYPE,ISSUE_COMPONENT_ID,MIDDLEWARE_INSTANCE_NAME) FROM STDIN WITH CSV") as copy:
print(next(stream))
for data in stream:
copy.write(data)
The code above works, but the strange thing is that if I omit the column names (FIMNO,SPV_COMPONENT_ID,EIM_TECHNOLOGY_NAME,EIM_TECHNOLOGY_VERSION,ACTUAL_REMEDIATION_DATE,INTENDED_REMEDIATION_DATE,ISSUESTATUS,CURRENT_MANU_PHASE,CURRENT_MANU_PHASE_START,CURRENT_MANU_PHASE_END,REMEDIATION_TYPE,ISSUE_COMPONENT_ID,MIDDLEWARE_INSTANCE_NAME) it throws me an error.
My data contain null values and the csv looks like that (empty string for null):
,13,lot_of_other_data,
Apparently the command does not properly interpret the field without explicitly expressing the column names. I would like to make it work without the column names, maybe by translating empty strings to null.
This is the ordinal position of my columns. Note that the last one is id which has a serial default type. I think when I don't specify the columns the positional inference is not working properly.
"fimno" 1
"spv_component_id" 2
"eim_technology_name" 3
"eim_technology_version" 4
"obsolete_start_date" 5
"actual_remediation_date" 6
"intended_remediation_date" 7
"issuestatus" 8
"current_manu_phase" 9
"current_manu_phase_start" 10
"current_manu_phase_end" 11
"remediation_type" 12
"issue_component_id" 13
"middleware_instance_name" 14
"id" 15
Note the extra serial id on position 15, this is is not present in my string buffer which I'm writing to the table.

Related

Creating a table in MariaDB using a list of column names in Python

I am trying to create a table in mariadb using python. I have all the column names stored in a list as shown below.
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
This is just the sample list. Actual list has 200 items in the list. I am trying to create a table using the above collist elements as columns and the datatype for the columns is VARCHAR.
This is the code I am using to create a table
for p in collist:
cur.execute('CREATE TABLE IF NOT EXISTS table1 ({} VARCHAR(45))'.format(p)
The above code is executing but only the first element of the list is being added as a column in the table and I cannot see the remaining elements. I'd really appreciate if I can get a help with this.

You can build the string in 3 parts and then .join() those together. The middle portion is the column definitions, joining each of the item in the original list. This doesn't seem particularly healthy; both in the number of columns and the fact that everything is VARCHAR(45) but that's your decision:
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
query = ''.join(["(CREATE TABLE IF NOT EXISTS table1 ",
' VARCHAR(45), '.join(collist),
' VARCHAR(45))'])
Because we used join, you need to specify the last column type separately (the third item in the list) to correctly close the query.
NOTE: If the input data comes from user input then this would be susceptible to SQL injection since you are just formatting unknown strings in, to be executed. I am assuming the list of column names is internal to your program.

Not able to convert cassandra blob/bytes string to integer

I have a column-family/table in cassandra-3.0.6 which has a column named "value" which is defined as a blob data type.
CQLSH query select * from table limit 2; returns me:
id | name | value
id_001 | john | 0x010000000000000000
id_002 | terry | 0x044097a80000000000
If I read this value using cqlengine(Datastax Python Driver), I get the output something like:
{'id':'id_001', 'name':'john', 'value': '\x01\x00\x00\x00\x00\x00\x00\x00\x00'}
{'id':'id_002', 'name':'terry', 'value': '\x04#\x97\xa8\x00\x00\x00\x00\x00'}
Ideally the values in the "value" field are 0 and 1514 for row1 and row2 resp.
However, I am not sure how I can convert the "value" field values extracted using cqlengine to 0 and 1514. I tried few methods like ord(), decode(), etc but nothing worked. :(
Questions:
What is this format?
'\x01\x00\x00\x00\x00\x00\x00\x00\x00' or
'\x04#\x97\xa8\x00\x00\x00\x00\x00'?
How I can convert these arbitrary values to 0 and 1514?
NOTE: I am using python 2.7.9 on Linux
Any help or pointers would be useful.
Thanks,

Blob will be converted to a byte array in Python if you read it directly. That looks like a byte array containing the Hex value of the blob.
One way is to explicitly do the conversion in your query.
select id, name, blobasint(value) from table limit 3
There should be a conversion method with the Python driver as well.

How to insert empty excel dates into oracle with Python+Pandas?

I've got a Python application that is using pandas to grok some excel spreadsheets and insert values into an oracle database.
For date cells that have a value, this works fine. For empty date cells I am inserting a NaT, which I would have thought would be fine, but in Oracle that is becoming some weird invalid time that displays as "0001-255-255 00:00:00" (Something like MAXINT or 0 being converted to a timestamp I'm guessing?)
In[72]: x.iloc[0][9]
Out[72]: NaT
Above is the bit of data in the DataFrame, you can see it's a NaT.
But this is what I see in Oracle..
SQL> select TDATE from TABLE where id=5067 AND version=5;
TDATE
---------
01-NOVEMB
SQL> select dump("TDATE") TABLE where id=5067 AND version=5;
DUMP("TDATE")
--------------------------------------------------------------------------------
Typ=12 Len=7: 100,101,255,255,1,1,1
I tried doing df.replace and/or df.where to convert NaT to None but I get assorted errors with either of these that seem to imply the substitution is not valid in that way.
Any way to ensure consistency of a null date across these datastores?!

This issue has been fixed in Pandas 15.0.
If you can, update to Pandas >= 15.0. Starting with that version, NaN and NaT are properly stored as NULL in the database.
After having performed some experiments, it appears that Pandas pass NaT to SQLAlchemy and down to cx_Oracle -- which in its turn blindly send an invalid date to Oracle (which in its turn does not complain).
Anyway, one I was able to come with is to add a BEFORE INSERT TRIGGER to fix incoming timestamps. For that to work, you will have to manually create the table first.
-- Create the table
CREATE TABLE W ("ID" NUMBER(5), "TDATE" TIMESTAMP);
And then the trigger:
-- Create a trigger on the table
CREATE OR REPLACE TRIGGER fix_null_ts
BEFORE INSERT ON W
FOR EACH ROW WHEN (extract(month from new.tdate) = 255)
BEGIN
:new.tdate := NULL;
END;
/
After that, from Python, using pandas.DataFrame.toSql(..., if_exists='append'):
>>> d = [{"id":1,"tdate":datetime.now()},{"id":2}]
>>> f = pd.DataFrame(d)
>>> f.to_sql("W",engine, if_exists='append', index=False)
# ^^^^^^^^^^^^^^^^^^
# don't drop the table! append data to an existing table
And check:
>>> result = engine.execute("select * from w")
>>> for row in result:
... print(row)
...
(1, datetime.datetime(2014, 10, 31, 1, 10, 2))
(2, None)
Beware that, if you ever need to rewrite an other DataFrame to the same table, you will first need to delete its content -- but not drop it, otherwise you would loose the trigger at the same time. For example:
# Some new data
>>> d = [{"id":3}]
>>> f = pd.DataFrame(d)
# Truncate the table and write the new data
>>> engine.execute("truncate table w")
>>> f.to_sql("W",engine, if_exists='append', index=False)
>>> result = engine.execute("select * from w")
# Check the result
>>> for row in result:
... print(row)
...
(3, None)

I hope the data type of the date column in Oracle database is DATE.
In that case, remember, date has a date part and time part together as THE DATE. While loading into database, make sure you use TO_DATE and put a proper datetime format to the date literal.
That's about loading. Now, to display, use TO_CHAR with proper datetime format to see the value the way the human eyes want to see a datetime value.
And, regarding the NULL values, unless you have NOT NULL constraint, I don't see any issue with loading. The NULL values would anyway loaded as NULL. If you want to manipulate the NULL values, use NVL function and use the desired value you want to replace the NULL value with.

How to easily initialize a new object with all values null using sqlalchemy

I have a database table with about 15 columns, and I'm using sqlalchemy to access the database. If I want to create a new row to add to the database table, and I want the values of every column (except the id) to be null, is there an easier and more elegant way to do this rather than doing:
new_object = table(col1 = None, col2 = None, col3 = None....) //all the way until column 15
The error I get when I only give the value of the id and no other parameters is as follows:
"TypeError: __init__() missing 15 required positional arguments:"...
and then it lists the 15 parameters I didn't assign values to.

The INSERT statement fills all columns of a table that are no mentioned explicitely with their respective column default. If none has been defined, NULL is the default default (sic!).
Plus, you can instruct Postgres to insert the column default with the key word DEFAULT:
INSERT INTO tbl (id) VALUES (DEFAULT) RETURNING id;
Should do what you are after and return the newly created id.
Not sure how to translate this to your brand of ORM.

Dynamically handling data columns in csv for import to Postgresql

I'm new to python (3) and having a hard time with finding relevant examples for how to handle the following scenario. I know this is on the verge of being a "what's best" question, but hopefully there is a clearly appropriate methodology for this.
I have csv data files that contain timestamps and then at least one column of data with a name defined by a master list (i.e. all possible column headers are known). For example:
File1.csv
date-time, data a, data b
2014-01-01, 23, 22
2014-01-01, 23, 22d
File2.csv
date-time, data d, data a
2014-01-01, 99, 20
2014-01-01, 100, 22
I've been going in circles trying to understand when to use tuples, lists, and dictionaries for this type of scenario for import into postgresql. Since the column order can change and the list of columns is different each time (although always from a master set), I'm not sure on how to best generate a data set that includes the time stamp and columns and then perform an insert into a postgresql table where unspecified columns are provided a value.
Given the dynamic nature of the columns' presence and the need to maintain the relationship with the timestamp for the Postgresql import via psycopg, what is recommended? Lists, lists of lists, dictionaries, or tuples?
I'm not begging for specific code, just some guidance. Thanks.

You can use csv module to parse input file and by it's first row you can build (prepare) psycopg insert statement with column names and %s instead of values. For rest of rows simply execute this statement with row as values:
connect_string = 'dbname=test host=localhost port=5493 user=postgres password=postgres'
connection = psycopg2.connect(connect_string)
cursor = connection.cursor()
f = open(fn, 'rt')
try:
reader = csv.reader(f)
cols = []
for row in reader:
if not cols:
cols = row
psycopg_marks = ','.join(['%s' for s in cols])
insert_statement = "INSERT INTO xyz (%s) VALUES (%s)" % (','.join(cols), psycopg_marks)
print(insert_statement)
else:
print(row)
cursor.execute(insert_statement, row)
finally:
f.close()
...
For your example you will have to correct column names.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

psycopg3 interpret empty space column strings as NULL - python

Related

Creating a table in MariaDB using a list of column names in Python

Not able to convert cassandra blob/bytes string to integer

How to insert empty excel dates into oracle with Python+Pandas?

How to easily initialize a new object with all values null using sqlalchemy

Dynamically handling data columns in csv for import to Postgresql

Categories

Resources