Postgresql: how to copy multiple columns from one table to another? - python

I am trying to copy some columns from a table called temporarytable to another one called scalingData using psycopg2 in python.
scalingData is a pandas dataframe. The dataframe contains data from cities such as: nameOfCities, population, etc.
scalingData = pd.read_csv('myFile.csv') ## 'myFile.csv' is the datasource
each column of the dataframe has a different kind of data, such as 'int64', 'float64' or 'O'.
Here a screen shot from Jupyter
import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
j = header[i]
stat = """ ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
if tmp[i] == 'int64':
stat = stat+'bigint'
if tmp[i] == 'float64':
stat = stat+'double precision'
if tmp[i] == 'O':
stat = stat+'text'
### Add Column
cur.execute(stat)
stat1 = """INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""%(j,j)
### Copy Column
cur.execute(stat1)
cur.close()
con.close()
My problem is that if I look at scalingData only the first column is copied while the others are empty.
Here a screenshot of the table from pgAdmin afer the query:
Also if I copy for instance the second column as first column it works, but then it fails with the others as well.

This happens because you add 1 field to your new table, than insert data only with that field set up, and you do it 5 times. So you should actually see 5 copies of your original table with only 1 field set up.
You need to first set up the structure for your scalingData table, then insert all the records with all fields.
Here is the code (not a Python developer):
import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
j = header[i]
stat = """ ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
if tmp[i] == 'int64':
stat = stat+'bigint'
if tmp[i] == 'float64':
stat = stat+'double precision'
if tmp[i] == 'O':
stat = stat+'text'
### Add Column
cur.execute(stat)
fieldsStr = '"' + '", "'.join([header]) + '"' ### will return "header1", "header2", ... , "header5"
stat1 = """INSERT INTO "scalingData" (%s) SELECT %s FROM temporarytable"""%(fieldsStr,fieldsStr)
### Copy Table
cur.execute(stat1)
cur.close()
con.close()

I'm not familiar with Python, but just a guess as to where the issue might be coming from:
"""INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""
... will transform the "%s" bit into "foo, bar, baz" rather than "foo", "bar", "baz".
Put another way you should remove the unneeded double quotes in your statement and escape the individual column names instead.
Double quotes are used in PG to quote identifiers. You can literally have an table or column called "foo, bar, baz" and PG will work just fine when you do - provided it's always in-between double quotes when you use it in a statement.

Related

How to update rows of an attribute in SQLite3 when dealing with date-time string

I am trying to do a patch on a weather report database. I need to manually change each rows of a date column like so: by changing the date format to this new format: .
Bear in mind that this table and its attributes was made without any conditions. So, int/str/bools/any may as well be in any of these rows and it should work.
This new format was carried out by this script, a for loop which simply extracts the old database values and returns variables containing the formatted string.
connection = sqlite3.connect('\\.db\\')
cursor = connection.cursor()
ROWID = cursor.execute('SELECT ROWID FROM update_test').fetchall()
Date = cursor.execute('SELECT Date FROM update_test').fetchall()
for row, dates in zip(ROWID, Date): # tuple
for i, x in zip(row, dates): # strings
try:
weekdays = r'^...'
regex = r'...-[\d\d]{1,}-Jan'
new_year = re.findall(regex, x)
for match in new_year:
updated_dates = f'{"2022"}{re.sub(weekdays, "", match)}'
date_object = datetime.datetime.strptime(updated_dates, '%Y-%d-%b').strftime('%Y-%m-%d')
print(i, date_object)
# update('test.db', 'update_test', 'date', date_object, i) # I want this bit to work
except TypeError:
pass
Now, I would normally just pass these variables into an INSERT function such as this:
def update(url, table, setVar, setVal, setID):
try:
connection = sqlite3.connect(url)
cursor = connection.cursor()
try:
cursor.execute(f'UPDATE {table} SET {setVar} = {setVal} WHERE ROWID = {setID}')
connection.commit()
except sqlite3.Error as error:
print(f'Error: \n {error}')
...
cursor.execute("SELECT name "
"FROM sqlite_master "
"WHERE type='table'")
... logging
... logging
... logging
... logging
... logging
connection.close()
... logging
except pandas.io.sql.DatabaseError:
...logging
But a really weird thing happens where it would only update the year of the formatted string like so:
Additionally, often, when used in a for loop, this year would increment -= 1 year. So: 2019, 2018, 2017 ... for each row specified in the update function.
My ideal output would be that dates would change into the new format I had initializing in that for loop (first script preview) and only those rows which specified (which already works anyway).
update('test.db', 'update_test', 'date', date_object, i) # I want this bit to work
The problem is that you are doing your own substitutions into the SQL. You will end up with:
UPDATE table SET setVar = 2022-03-01 WHERE ROWID = xxx
Sqlite sees that as an arithmetic expression. 2022 minus 3 minus 1 is 2018.
The short-term fix is to quote the value:
cursor.execute(f'UPDATE {table} SET {setVar} = "{setVal}" WHERE ROWID = {setID}')
A better fix is to let the connector do the substitution:
cursor.execute(f'UPDATE {table} SET {setVar} = ? WHERE ROWID = ?', (setVal, setID))
FOLLOWUP
As a side note, your regular expressions are totally unnecessary here.
connection = sqlite3.connect('\\.db\\')
cursor = connection.cursor()
rowset = cursor.execute('SELECT ROWID,Date FROM update_test')
for rowid,date in rowset:
parts = date.split('-')
if parts[2] == 'Jan':
parts[0] = '2022'
updated_dates = '-'.join(parts)
date_object = datetime.datetime.strptime(updated_dates, '%Y-%d-%b').strftime('%Y-%m-%d')
print(rowid, date_object)

Using variable in Snowflake SQL in Python script

I am trying to create a view that contains a variable in Snowflake SQL. The whole thing is being done in Python script. Initially, I tried the binding variable approach but binding does not work in view creation SQL. Is there any other way I can proceed with this? I have given the code below.
Code:
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = '',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS SELECT * FROM (SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE FROM TBL_DIM DIM INNER JOIN TBL_MSTR MSTR ON DIM.ATTR_KEY=MSTR.ATTR_KEY ) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (*row))
AS P
ORDER BY P.PROD_CODE;""")
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)
finally:
cs.close()
ctx.close()
Error:
File "C:\Users\Anand Singh\anaconda3\lib\site-packages\snowflake\connector\errors.py", line 179, in default_errorhandler
raise error_class(
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 2 at position 65 unexpected 'row'.
Looking at the Python binding example
and your code it appears, you need
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", row)
but *row will pass the many argugments to I have changed to build the string or comman seperated as a single string.
More pythonic way to implement this is using f-string
row1 = cs.execute(f"""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN ({row}))
AS P
ORDER BY P.PROD_CODE;""")
It is also more readable especially if you have multiple parameters in the f-string
Issue resolved! Thanks a lot, Simeon for your help.
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = 'AzureSn0flake#123',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", ','.join(row))
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)

cx_oracle python iteration

I have my python script which reads an excel column row by row and returns all rows str(values).
I want to write another script which will allow put these values to sql db. I've already written connect method:
def db_connect():
adr = 'some_addr'
uid = 'some_uid'
pwd = 'pwd'
port = port
dsn_tns = cx_Oracle.makedsn(adr, port, SID)
db = cx_Oracle.connect('username', 'pass', dsn_tns)
cur = db.cursor()
cur.execute('update TABLE set ROW = 666 where ANOTHER_ROW is null')
db.commit()
This method does an update but it sets 666 for ALL rows. How to do it by kind of iteration in sql? For example, first row of output == 1, second == 23, third == 888.
If I understand correctly what you are trying to do here it should be done in two phases. First select all rows for update (based on chosen condition), then you can iteratively update each of these rows.
It cannot be done in single query (or on only single condition that does not change through a number of queries), because SQL works on sets, that's why each time your query is executed you are updating whole table, and in the end only getting result of the last query.
You can use the "rownum" expression, as in:
cur.execute("update TABLE set ROW = rownum where ANOTHER_ROW is null")
This will start with the value 1 and increment up by one for each row updated.
If you want more control over the value to set, you can also do the following in PL/SQL (untested):
cur.execute("""
declare
t_NewValue number;
cursor c_Data is
select ROW, ANOTHER_ROW
from TABLE
where ANOTHER_ROW is null
for update;
begin
t_NewValue := 1;
for row in c_Data loop
update TABLE set ROW = t_NewValue
where current of c_Data;
t_NewValue := t_NewValue + 1;
end loop;
end;""")
This gives you the most control. You can use whatever logic you require to control what the new value should be.
Please take a look at another method which is writing to excel:
adr = 'some_addr'
uid = 'some_uid'
pwd = 'pwd'
port = port
dsn_tns = cx_Oracle.makedsn(adr, port, SID)
db = cx_Oracle.connect('username', 'pass', dsn_tns)
cur = db.cursor()
cells = excel.read_from_cell()
indices_and_statuses = []
stat = execute_script(some_js)
for req_id in cells:
indices_and_statuses.append((cells.index(req_id), stat))
cur.execute("""update TABLE set ROW ="""+"'"+req_id+"'"+"""where ANOTHER_ROW is null""")
db.commit()
db.close()
And in this code when you put print(req_id) in this FOR statement, you will see that req_id is changing. But in DB only the last req_id is saved.

Loop not working for sql update statement (mysqldb)

I have a folder called 'testfolder' that includes two files -- 'Sigurdlogfile' and '2004ADlogfile'. Each file has a list of strings called entries. I need to run my code on both of them and am using glob to do this. My code creates a dictionary for each file and stores data extracted using regex where the dictionary keys are stored in commonterms below. Then it inserts each dictionary into a mysql table. It does all of this successfully, but my second sql statement is not inserting how it should (per file).
import glob
import re
files = glob.glob('/home/user/testfolder/*logfile*')
commonterms = (["freq", "\s?(\d+e?\d*)\s?"],
["tx", "#txpattern"],
["rx", "#rxpattern"], ...)
terms = [commonterms[i][0] for i in range(len(commonterms))]
patterns = [commonterms[i][1] for i in range(len(commonterms))]
def getTerms(entry):
for i in range(len(terms)):
term = re.search(patterns[i], entry)
if term:
term = term.groups()[0] if term.groups()[0] is not None else term.groups()[1]
else:
term = 'NULL'
d[terms[i]] += [term]
return d
for filename in files:
#code to create 'entries'
objkey = re.match(r'/home/user/testfolder/(.+?)logfile', filename).group(1)
d = {t: [] for t in terms}
for entry in entries:
d = getTerms(entry)
import MySQLdb
db = MySQLdb.connect(host='', user='', passwd='', db='')
cursor = db.cursor()
cols = d.keys()
vals = d.values()
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = "INSERT INTO table (%s) VALUES (%s);" % (','.join(cols), csv.replace("'NULL'", "NULL"))
cursor.execute(sql1)
#now in my 2nd sql statement I need to update the table with data from an old table, which is where I have the problem...
sql2 = "UPDATE table, oldtable SET table.key1 = oldtable.key1,
table.key2 = oldtable.key2 WHERE oldtable.obj = %s;" % repr(objkey)
cursor.execute(sql2)
db.commit()
db.close()
The problem is that in the second sql statement, it ends up inserting that data into all columns of the table from only one of the objkeys, but I need it to insert different data depending on which file the code is currently running on. I can't figure out why this is, since I've defined objkey inside my for filename in files loop. How can I fix this?
Instead of doing separate INSERT and UPDATE, do them together to incorporate the fields from the old table.
for i in range(len(entries)):
lst = [item[i] for item in vals]
csv = "'{}'".format("','".join(lst))
sql1 = """INSERT INTO table (key1, key2, %s)
SELECT o.key1, o.key2, a.*
FROM (SELECT %s) AS a
LEFT JOIN oldtable AS o ON o.obj = %s""" % (','.join(cols), csv.replace("'NULL'", "NULL"), repr(objkey))
cursor.execute(sql1)

Insert data instead of drop table into mysql

I'm attempting to get a python script to insert data into a database without having it drop the table first.. I'm sure this isn't hard to do but I can't seem to get the code right..
Here is the full python script..
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
import hashlib
import time
import MySQLdb
#Dont forget to fill in PASSWORD and URL TO saveTemp (twice) in this file
sensorids = ["28-000004944b63", "28-000004c01b2c"]
avgtemperatures = []
for sensor in range(len(sensorids)):
temperatures = []
for polltime in range(0,3):
text = '';
while text.split("\n")[0].find("YES") == -1:
# Open the file that we viewed earlier so that python can see what is in it. Replace the serial number as before.
tfile = open("/sys/bus/w1/devices/"+ sensorids[sensor] +"/w1_slave")
# Read all of the text in the file.
text = tfile.read()
# Close the file now that the text has been read.
tfile.close()
time.sleep(1)
# Split the text with new lines (\n) and select the second line.
secondline = text.split("\n")[1]
# Split the line into words, referring to the spaces, and select the 10th word (counting from 0).
temperaturedata = secondline.split(" ")[9]
# The first two characters are "t=", so get rid of those and convert the temperature from a string to a number.
temperature = float(temperaturedata[2:])
# Put the decimal point in the right place and display it.
temperatures.append(temperature / 1000 * 9.0 / 5.0 + 32.0)
avgtemperatures.append(sum(temperatures) / float(len(temperatures)))
print avgtemperatures[0]
print avgtemperatures[1]
#connect to db
db = MySQLdb.connect("localhost","user","password","temps" )
#setup cursor
cursor = db.cursor()
#create temps table
cursor.execute("DROP TABLE IF EXISTS temps")
sql = """CREATE TABLE temps (
temp1 FLOAT,
temp2 FLOAT )"""
cursor.execute(sql)
#insert to table
try:
cursor.execute("""INSERT INTO temps VALUES (%s,%s)""",(avgtemperatures[0],avgtemperatures[1]))
db.commit()
except:
db.rollback()
#show table
cursor.execute("""SELECT * FROM temps;""")
print cursor.fetchall()
((188L, 90L),)
db.close()
This is the part I need assistance with..
If I have it drop the table it works fine but I don't want it to drop the table, just insert the new data into the same table.
#connect to db
db = MySQLdb.connect("localhost","user","pasword1","temps" )
#setup cursor
cursor = db.cursor()
#create temps table
cursor.execute("DROP TABLE IF EXISTS temps")
sql = """CREATE TABLE temps (
temp1 FLOAT,
temp2 FLOAT )"""
cursor.execute(sql)
#insert to table
try:
cursor.execute("""INSERT INTO temps VALUES (%s,%s)""",(avgtemperatures[0],avgtemperatures[1]))
db.commit()
except:
db.rollback()
#show table
cursor.execute("""SELECT * FROM temps;""")
print cursor.fetchall()
((188L, 90L),)
db.close()
You shouldn`t have to drop a table each time you want to enter data. In fact, it defeats the whole purpose of the database since you will remove all the previous data each time you run your script.
You should ask to create the table but only if it does not exists. Use the following.
sql = """CREATE TABLE IF NOT EXISTS temps (
temp1 FLOAT,
temp2 FLOAT )"""
cursor.execute(sql)
I've had this problem with updating. Try adding COMMIT to the end of your sql. I use psycopg2 to connect to a postgresql database. Here is an example.
def simple_insert():
sql = '''INSERT INTO films VALUES ('UA502', 'Bananas', 105, '1971-07-13', 'Comedy', '82 minutes'); COMMIT;'''
try:
conn = psycopg2.connect(database)
cur = conn.cursor()
cur.execute(sql)
except:
raise
I think your problem is your not saving the transaction and the COMMIT command should fix it.

Categories

Resources