I'm having a problem with sqlAlchemy when i try to execute a query. My script has been working fine and every query.execute worked good until now. Here is the code:
for i in listaUnificacion:
usu = "'AUTO'"
incabuniper = "'S'"
sCodPersonaPr, sPers = i[0], i[1]
engine = sqla.create_engine(URL_ORACLE)
connection = engine.connect()
seq_query = sqla.Sequence('SEQ_PERUNI')
pnCodSecPerUni = connection.execute(seq_query)
query = "INSERT INTO TABLE1(SEC, CD, CDUNIF, DATE, USU, INCABUNIPER) VALUES({0}, {1}, {2}, SYSDATE, {3}, {4})".format(pnCodSecPerUni, sCodPersonaPr, sPers, str(usu), str(incabuniper))
query = sqla.text(query)
print(query)
connection.execute(query)
query = "UPDATE TABLE2 SET type = 'M' WHERE cd = {}".format(sPers);
connection.execute(query)
query_uni = "DECLARE\
res varchar2(100);\
errorm varchar2(1000);\
BEGIN\
res := USER.FNC({0},{1},{2},'AUTO',errorm);\
END;".format(pnCodSecPerUni, sCodPersonaPr, sPers)
query_uni = sqla.text(query_unifica)
connection.execute(query_uni)
connection.close()
When I try to execute query_unifica, it doesn't work but it doesn't show any error. I put here the execution with some prints:
PARES
(11005202, 11002071)
INSERT INTO TABLE1(SEC, CD, CDUNIF,, DATE, USU, INCABUNIPER)
VALUES(1628226, 11005202, 11002071, SYSDATE, 'AUTO', 'S') --> WORKS FINE
UPDATE TABLE2 SET type = 'M' WHERE cd = 11002071 --> works fine
DECLARE res varchar2(100); errorm
varchar2(1000); BEGIN res :=
USER.FNC(1628226,11005202,11002071,'AUTO',errorm); END; --
> DOSEN'T WORK!!!
Related
My Query is,
engine = create_engine("postgres://")
conn = engine.connect()
conn.autocommit = True
In Flask Route i am using this query,
result = conn.execute("""UPDATE business_portal SET business_name ="""+str(business_name)+""", name_tag ="""+str(business_tag)+""",name_atr = """+str(business_attr)+""", address =""" +str(address)+""",address_tag =""" +str(address_tag)+""", address_atr = """+str(address_attr)+""", city = """+str(city)+""", city_tag ="""+str(city_tag)+""", city_atr =""" +str(city_attr)+""", state = """+str(state)+""", state_tag = """+str(state_tag)+""",state_atr = """+str(state_attr)+""",zip_code = """+str(zipcode)+""",zip_tag ="""+str(zip_tag)+""",zip_atr ="""+str(zip_attr)+""",contact_number ="""+str(contact_number)+""",num_tag = """+str(contact_tag)+""", num_atr ="""+str(contact_attr)+""",domain ="""+str(domain)+""", search_url = """+str(search_url)+""",category =""" +str(category)+""", logo_path =""" +str(logo_path)+""" WHERE id=%s """,(id))
The above query accepts the data without space (eg abcd).... But when data are with spaces (eg abcd efgh ijkl) it displays a syntax error.
Can any one help me?
The values for in the SET clause need to be quoted in the same way as the values in the WHERE clause.
>>> cur = conn.cursor()
>>> stmt = "UPDATE tbl SET col = %s WHERE id = %s"
>>>
>>> # Observe that the SET value is three separate characters
>>> cur.mogrify(stmt % ('a b c', 37))
b'UPDATE tbl SET col = a b c WHERE id = 42'
>>>
>>> # Observe that the SET value is a single, quoted value
>>> cur.mogrify(stmt, ('a b c', 37))
b"UPDATE tbl SET col = 'a b c' WHERE id = 42"
NB cursor.mogrify is a psycopg2 method that prints the query that would be sent to the server by cursor.execute: it doesn't actually execute the query.
We're working on a python program, where we have trouble sending data to our MySQL database. So far, we are selecting data from our database and we want to do something with the data in our python program and then send it back to our database.
Unfortunately, we're having some challenges, which we hope you can help us with.
We're receiving this error:
[SQL: INSERT INTO `Raw_Validated` (time_start, time_end, first_temp_lpn, first_temp_lpn_validated, second_temp_lpn, second_temp_lpn_validated, third_temp_lpn, third_temp_lpn_validated) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)]
[parameters: ('2019-08-29 16:20:00', '2019-08-29 17:20:00', array([25.69]), 1, array([25.21]), 1, array([25.09]), 1)]
And we can conclude that instead of inserting a value, an array is inserted. We have no idea why this is happening or how we can prevent this, but instead of the parameters above, we want it to become like this:
[parameters: ('2019-08-29 16:20:00', '2019-08-29 17:20:00', 25.69, 1, 25.21, 1, 25.09, 1)]
We're running a for loop which iterate 3x times, which means we are receiving 3x 'a_temp' values, which are saved into our list 'list_lpn_temp (for-loop is not shown in code snippet):
list_lpn_temp = []
new_list_lpn_temp = []
engine = create_engine("mysql://xxx:xxx#localhost/xxx")
conn = engine.connect()
a_temp = pd.read_sql('SELECT temperature FROM Raw_Data WHERE topic = "lpn1" AND timestamp > "%s" AND timestamp < "%s" ORDER BY timestamp DESC LIMIT 1' % (x, x+datetime.timedelta(minutes=20)), conn).astype(float).values
list_lpn_temp.extend(a_temp)
We then have another for loop (keep in mind that list_station has not been initialized, but in our program it has been):
for i in range (len(list_lpn_temp)):
if -1.5 < list_station_temp[i]-list_lpn_temp[i] < 1.5:
validated_lpn = 1
list_validated.append(validated_lpn)
new_list_lpn_temp.extend(list_lpn_temp[i])
print(f'New LPN List = {new_list_lpn_temp}')
else:
validated_lpn = 0
list_validated.append(validated_lpn)
We then prepare the data so we can send it further to the database (there are a lot of new uninitialized variables here, which we have initialized in our program, but not here, as they simply dont matter). Only list_lpn_temp[] matters here:
df2 = pd.DataFrame(columns=['time_start', 'time_end', 'first_temp_lpn', 'first_temp_lpn_validated', 'second_temp_lpn', 'second_temp_lpn_validated', 'third_temp_lpn', 'third_temp_lpn_validated'])
df2 = df2.append({'time_start' : time_start, 'time_end' : time_end, 'first_temp_lpn' : list_lpn_temp[0], 'first_temp_lpn_validated' : list_validated[0], 'second_temp_lpn' : list_lpn_temp[1], 'second_temp_lpn_validated' : list_validated[1$
with engine.connect() as conn, conn.begin():
df2.to_sql('Raw_Validated', conn, if_exists='append', index=False)
Just add one more level of indexing to all list_lpn_temp accesses, so list_lpn_temp[0] will become list_lpn_temp[0][0] and list_lpn_temp[1] will become list_lpn_temp[1][0] etc.
df2 = pd.DataFrame(columns=['time_start', 'time_end', 'first_temp_lpn', 'first_temp_lpn_validated', 'second_temp_lpn', 'second_temp_lpn_validated', 'third_temp_lpn', 'third_temp_lpn_validated'])
df2 = df2.append({'time_start' : time_start, 'time_end' : time_end, 'first_temp_lpn' : list_lpn_temp[0][0], 'first_temp_lpn_validated' : list_validated[0], 'second_temp_lpn' : list_lpn_temp[1][0], 'second_temp_lpn_validated' : list_validated[1$ # Your question cut this line off here also.
with engine.connect() as conn, conn.begin():
df2.to_sql('Raw_Validated', conn, if_exists='append', index=False)
Using psycopg2, I'm able to select data from a table in one PostgreSQL database connection and INSERT it into a table in a second PostgreSQL database connection.
However, I'm only able to do it by setting the exact feature I want to extract, and writing out separate variables for each column I'm trying to insert.
Does anyone know of a good practice for either:
moving an entire table between databases, or
iterating through features while not having to declare variables for every column you want to move
or...?
Here's the script I'm currently using where you can see the selection of a specific feature, and the creation of variables (it works, but this is not a practical method):
import psycopg2
connDev = psycopg2.connect("host=host1 dbname=dbname1 user=postgres password=*** ")
connQa = psycopg2.connect("host=host2 dbname=dbname2 user=postgres password=*** ")
curDev = connDev.cursor()
curQa = connQa.cursor()
sql = ('INSERT INTO "tempHoods" (nbhd_name, geom) values (%s, %s);')
curDev.execute('select cast(geom as varchar) from "CCD_Neighborhoods" where nbhd_id = 11;')
tempGeom = curDev.fetchone()
curDev.execute('select nbhd_name from "CCD_Neighborhoods" where nbhd_id = 11;')
tempName = curDev.fetchone()
data = (tempName, tempGeom)
curQa.execute (sql, data)
#commit transactions
connDev.commit()
connQa.commit()
#close connections
curDev.close()
curQa.close()
connDev.close()
connQa.close()
One other note is that python allows the ability to explicitly work with SQL functions / data type casting, which for us is important as we work with the GEOMETRY data type. Above you can see I'm casting it to TEXT then dumping it into an existing geometry column in the source table - this will work with MSSQL Server, which is a huge feature in the geospatial community...
In your solution (your solution and your question have a different order of statements) change the lines which start with 'sql = ' and the loop before '#commit transactions' comment to
sql_insert = 'INSERT INTO "tempHoods" (nbhd_id, nbhd_name, typology, notes, geom) values '
sql_values = ['(%s, %s, %s, %s, %s)']
data_values = []
# you can make this larger if you want
# ...try experimenting to see what works best
batch_size = 100
sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'
for i, row in enumerate(rows, 1):
data_values += row[:5]
if i % batch_size == 0:
curQa.execute (sql_stmt , data_values )
data_values = []
if (i % batch_size != 0):
sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) + ';'
curQa.execute (sql_stmt , data_values )
BTW, I don't think you need to commit. You don't begin any transactions. So there should not be any need to commit them. Certainly, you don't need to commit a cursor if all you did was a bunch of selects on it.
Here's my updated code based on Dmitry's brilliant solution:
import psycopg2
connDev = psycopg2.connect("host=host1 dbname=dpspgisdev user=postgres password=****")
connQa = psycopg2.connect("host=host2 dbname=dpspgisqa user=postgres password=****")
curDev = connDev.cursor()
curQa = connQa.cursor()
print "Truncating Source"
curQa.execute('delete from "tempHoods"')
connQa.commit()
#Get Data
curDev.execute('select nbhd_id, nbhd_name, typology, notes, cast(geom as varchar) from "CCD_Neighborhoods";') #cast geom to varchar and insert into geometry column!
rows = curDev.fetchall()
sql_insert = 'INSERT INTO "tempHoods" (nbhd_id, nbhd_name, typology, notes, geom) values '
sql_values = ['(%s, %s, %s, %s, %s)'] #number of columns selecting / inserting
data_values = []
batch_size = 1000 #customize for size of tables...
sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'
for i, row in enumerate(rows, 1):
data_values += row[:5] #relates to number of columns (%s)
if i % batch_size == 0:
curQa.execute (sql_stmt , data_values )
connQa.commit()
print "Inserting..."
data_values = []
if (i % batch_size != 0):
sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) + ';'
curQa.execute (sql_stmt, data_values)
print "Last Values..."
connQa.commit()
# close connections
curDev.close()
curQa.close()
connDev.close()
connQa.close()
I want to speed up one of my tasks and I wrote a little program:
import psycopg2
import random
from concurrent.futures import ThreadPoolExecutor, as_completed
def write_sim_to_db(all_ids2):
if all_ids1[i] != all_ids2:
c.execute("""SELECT count(*) FROM similarity WHERE prod_id1 = %s AND prod_id2 = %s""", (all_ids1[i], all_ids2,))
count = c.fetchone()
if count[0] == 0:
sim_sum = random.random()
c.execute("""INSERT INTO similarity(prod_id1, prod_id2, sim_sum)
VALUES(%s, %s, %s)""", (all_ids1[i], all_ids2, sim_sum,))
conn.commit()
conn = psycopg2.connect("dbname='db' user='user' host='localhost' password='pass'")
c = conn.cursor()
all_ids1 = list(n for n in range(1000))
all_ids2_list = list(n for n in range(1000))
for i in range(len(all_ids1)):
with ThreadPoolExecutor(max_workers=5) as pool:
results = [pool.submit(write_sim_to_db, i) for i in all_ids2_list]
For a while, the program is working correctly. But then I get an error:
Segmentation fault (core dumped)
Or
*** Error in `python3': double free or corruption (out): 0x00007fe574002270 ***
Aborted (core dumped)
If I run this program in one thread, it works great.
with ThreadPoolExecutor(max_workers=1) as pool:
Postgresql seems no time to process the transaction. But I'm not sure. In the log file any mistakes there.
I do not know how to find the error.
Help.
I had to use connection pool.
import psycopg2
import random
from concurrent.futures import ThreadPoolExecutor, as_completed
from psycopg2.pool import ThreadedConnectionPool
def write_sim_to_db(all_ids2):
if all_ids1[i] != all_ids2:
conn = tcp.getconn()
c = conn.cursor()
c.execute("""SELECT count(*) FROM similarity WHERE prod_id1 = %s AND prod_id2 = %s""", (all_ids1[i], all_ids2,))
count = c.fetchone()
if count[0] == 0:
sim_sum = random.random()
c.execute("""INSERT INTO similarity(prod_id1, prod_id2, sim_sum)
VALUES(%s, %s, %s)""", (all_ids1[i], all_ids2, sim_sum,))
conn.commit()
tcp.putconn(conn)
DSN = "postgresql://user:pass#localhost/db"
tcp = ThreadedConnectionPool(1, 10, DSN)
all_ids1 = list(n for n in range(1000))
all_ids2_list = list(n for n in range(1000))
for i in range(len(all_ids1)):
with ThreadPoolExecutor(max_workers=2) as pool:
results = [pool.submit(write_sim_to_db, i) for i in all_ids2_list]
This is the sane approach to speed it up. It will be much faster and simpler than your code.
tuple_list = []
for p1 in range(3):
for p2 in range(3):
if p1 == p2: continue
tuple_list.append((p1,p2,random.random()))
insert = """
insert into similarity (prod_id1, prod_id2, sim_sum)
select prod_id1, prod_id2, i.sim_sum
from
(values
{}
) i (prod_id1, prod_id2, sim_sum)
left join
similarity s using (prod_id1, prod_id2)
where s is null
""".format(',\n '.join(['%s'] * len(tuple_list)))
print cur.mogrify(insert, tuple_list)
cur.execute(insert, tuple_list)
Output:
insert into similarity (prod_id1, prod_id2, sim_sum)
select prod_id1, prod_id2, i.sim_sum
from
(values
(0, 1, 0.7316830646236253),
(0, 2, 0.36642199082207805),
(1, 0, 0.9830936499726003),
(1, 2, 0.1401200246162232),
(2, 0, 0.9921581283868096),
(2, 1, 0.47250175432277497)
) i (prod_id1, prod_id2, sim_sum)
left join
similarity s using (prod_id1, prod_id2)
where s is null
BTW there is no need for Python at all. It can all be done in a plain SQL query.
I have to construct a dynamic update query for postgresql.
Its dynamic, because beforehand I have to determine which columns to update.
Given a sample table:
create table foo (id int, a int, b int, c int)
Then I will construct programmatically the "set" clause
_set = {}
_set['a'] = 10
_set['c'] = NULL
After that I have to build the update query. And here I'm stuck.
I have to construct this sql Update command:
update foo set a = 10, b = NULL where id = 1
How to do this with the psycopg2 parametrized command? (i.e. looping through the dict if it is not empty and build the set clause) ?
UPDATE
While I was sleeping I have found the solution by myself. It is dynamic, exactly how I wanted to be :-)
create table foo (id integer, a integer, b integer, c varchar)
updates = {}
updates['a'] = 10
updates['b'] = None
updates['c'] = 'blah blah blah'
sql = "upgrade foo set %s where id = %s" % (', '.join("%s = %%s" % u for u in updates.keys()), 10)
params = updates.values()
print cur.mogrify(sql, params)
cur.execute(sql, params)
And the result is what and how I needed (especially the nullable and quotable columns):
"upgrade foo set a = 10, c = 'blah blah blah', b = NULL where id = 10"
There is actually a slightly cleaner way to make it, using the alternative column-list syntax:
sql_template = "UPDATE foo SET ({}) = %s WHERE id = {}"
sql = sql_template.format(', '.join(updates.keys()), 10)
params = (tuple(addr_dict.values()),)
print cur.mogrify(sql, params)
cur.execute(sql, params)
Using psycopg2.sql – SQL string composition module
The module contains objects and functions useful to generate SQL dynamically, in a convenient and safe way.
from psycopg2 import connect, sql
conn = connect("dbname=test user=postgres")
upd = {'name': 'Peter', 'age': 35, 'city': 'London'}
ref_id = 12
sql_query = sql.SQL("UPDATE people SET {data} WHERE id = {id}").format(
data=sql.SQL(', ').join(
sql.Composed([sql.Identifier(k), sql.SQL(" = "), sql.Placeholder(k)]) for k in upd.keys()
),
id=sql.Placeholder('id')
)
upd.update(id=ref_id)
with conn:
with conn.cursor() as cur:
cur.execute(sql_query, upd)
conn.close()
Running print(sql_query.as_string(conn)) before closing connection will reveal this output:
UPDATE people SET "name" = %(name)s, "age" = %(age)s, "city" = %(city)s WHERE id = %(id)s
No need for dynamic SQL. Supposing a is not nullable and b is nullable.
If you want to update both a and b:
_set = dict(
id = 1,
a = 10,
b = 20, b_update = 1
)
update = """
update foo
set
a = coalesce(%(a)s, a), -- a is not nullable
b = (array[b, %(b)s])[%(b_update)s + 1] -- b is nullable
where id = %(id)s
"""
print cur.mogrify(update, _set)
cur.execute(update, _set)
Output:
update foo
set
a = coalesce(10, a), -- a is not nullable
b = (array[b, 20])[1 + 1] -- b is nullable
where id = 1
If you want to update none:
_set = dict(
id = 1,
a = None,
b = 20, b_update = 0
)
Output:
update foo
set
a = coalesce(NULL, a), -- a is not nullable
b = (array[b, 20])[0 + 1] -- b is nullable
where id = 1
An option without python format using psycopg2's AsIs function for column names (although that doesn't prevent you from SQL injection over column names). Dict is named data:
update_statement = f'UPDATE foo SET (%s) = %s WHERE id_column=%s'
columns = data.keys()
values = [data[column] for column in columns]
query = cur.mogrify(update_statement, (AsIs(','.join(columns)), tuple(values), id_value))
Here's my solution that I have within a generic DatabaseHandler class that provides a lot of flexibility when using pd.DataFrame as your source.
def update_data(
self,
table: str,
df: pd.DataFrame,
indexes: Optional[list] = None,
column_map: Optional[dict] = None,
commit: Optional[bool] = False,
) -> int:
"""Update data in the media database
Args:
table (str): the "tablename" or "namespace.tablename"
df (pandas.DataFrame): dataframe containing the data to update
indexes (list): the list of columns in the table that will be in the WHERE clause of the update statement.
If not provided, will use df indexes.
column_map (dict): dictionary mapping the columns in df to the columns in the table
columns in the column_map that are also in keys will not be updated
Key = df column.
Value = table column.
commit (bool): if True, the transaction will be committed (default=False)
Notes:
If using a column_map, only the columns in the data_map will be updated or used as indexes.
Order does not matter. If not using a column_map, all columns in df must exist in table.
Returns:
int : rows updated
"""
try:
if not indexes:
# Use the dataframe index instead
indexes = []
for c in df.index.names:
if not c:
raise Exception(
f"Dataframe contains indexes without names. Unable to determine update where clause."
)
indexes.append(c)
update_strings = []
tdf = df.reset_index()
if column_map:
target_columns = [c for c in column_map.keys() if c not in indexes]
else:
column_map = {c: c for c in tdf.columns}
target_columns = [c for c in df.columns if c not in indexes]
for i, r in tdf.iterrows():
upd_params = ", ".join(
[f"{column_map[c]} = %s" for c in target_columns]
)
upd_list = [r[c] if pd.notna(r[c]) else None for c in target_columns]
upd_str = self._cur.mogrify(upd_params, upd_list).decode("utf-8")
idx_params = " AND ".join([f"{column_map[c]} = %s" for c in indexes])
idx_list = [r[c] if pd.notna(r[c]) else None for c in indexes]
idx_str = self._cur.mogrify(idx_params, idx_list).decode("utf-8")
update_strings.append(f"UPDATE {table} SET {upd_str} WHERE {idx_str};")
full_update_string = "\n".join(update_strings)
print(full_update_string) # Debugging
self._cur.execute(full_update_string)
rowcount = self._cur.rowcount
if commit:
self.commit()
return rowcount
except Exception as e:
self.rollback()
raise e
Example usages:
>>> df = pd.DataFrame([
{'a':1,'b':'asdf','c':datetime.datetime.now()},
{'a':2,'b':'jklm','c':datetime.datetime.now()}
])
>>> cls.update_data('my_table', df, indexes = ['a'])
UPDATE my_table SET b = 'asdf', c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1;
UPDATE my_table SET b = 'jklm', c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2;
>>> cls.update_data('my_table', df, indexes = ['a','b'])
UPDATE my_table SET c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1 AND b = 'asdf';
UPDATE my_table SET c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2 AND b = 'jklm';
>>> cls.update_data('my_table', df.set_index('a'), column_map={'a':'db_a','b':'db_b','c':'db_c'} )
UPDATE my_table SET db_b = 'asdf', db_c = '2023-01-17T22:13:37.095245'::timestamp WHERE db_a = 1;
UPDATE my_table SET db_b = 'jklm', db_c = '2023-01-17T22:13:37.095250'::timestamp WHERE db_a = 2;
Note however that this is not safe from SQL injection due to the way it generates the where clause.