Import data from python (probleme with where condition) - python

I work in Python
I have code that allows me to import a dataset that works fine. However in my dataset I have 3 different patients and I would like to import only the patient that interests me (possible by adding the WHERE statement in the SQL query.
So the following code works:
def importecdata():
query2 = "SELECT TECDATA.[Vol_Recalage_US_VD], TECDATA.[Vol_Recalage_Us_VG], TECDATA.[SUBJID] FROM TECDATA INNER JOIN MEDDATA ON TECDATA.DateTime = MEDDATA.DateTime WHERE TECDATA.[SUBJID]='patient14';"
dftec1 = pd.read_sql(query2, sql_conn, chunksize=100000)
dftec = pd.concat(dftec1)
return(dftec)
It return the patient 14 data
But now I want to put the patient's name as a variable in my function so I made the following code:
def importecdata(patient):
query2 = "SELECT TECDATA.[Vol_Recalage_US_VD], TECDATA.[Vol_Recalage_Us_VG], TECDATA.[SUBJID] FROM TECDATA INNER JOIN MEDDATA ON TECDATA.DateTime = MEDDATA.DateTime WHERE TECDATA.[SUBJID]=patient;"
dftec1 = pd.read_sql(query2, sql_conn, chunksize=100000)
dftec = pd.concat(dftec1)
return(dftec)
I chek and the patient variable got the value patient14. But it don't work... i try to modify the value of the variable patient to 'patient14' it don't work too i have the same error :
invalid column name \xa0: 'patient'. So the code works, the problem is from the "where" condition with the patient variable
(sorry for my english i'm french)

You have to add your patient value in the query string check below code:
def importecdata(patient):
query2 = "SELECT TECDATA.[Vol_Recalage_US_VD], TECDATA.[Vol_Recalage_Us_VG], TECDATA.[SUBJID] FROM TECDATA INNER JOIN MEDDATA ON TECDATA.DateTime = MEDDATA.DateTime WHERE TECDATA.[SUBJID]='{0}';"
query2 = query2.format(patient)
dftec1 = pd.read_sql(query2, sql_conn, chunksize=100000)
dftec = pd.concat(dftec1)
return(dftec)

Related

Postgres: invalid input syntax for type date

I have created a database and I am trying to fetch data from it. I have a class Query and inside the class I have a function that calls a table called forecasts. The function is as follows:
def forecast(self, provider: str, zone: str='Mainland',):
self.date_start = date_start)
self.date_end = (date_end)
self.df_forecasts = pd.DataFrame()
fquery = """
SELECT dp.name AS provider_name, lf.datetime_from AS date, fr.name AS run_name, lf.value AS value
FROM load_forecasts lf
INNER JOIN bidding_zones bz ON lf.zone_id = bz.zone_id
INNER JOIN data_providers dp ON lf.provider_id = dp.provider_id
INNER JOIN forecast_runs fr ON lf.run_id = fr.run_id
WHERE bz.name = '{zone}'
AND dp.name = '{provider}'
AND date(lf.datetime_from) BETWEEN '{self.date_start}' AND '{self.date_end}'
"""
df_forecasts = pd.read_sql_query(fquery, self.connection)
return df_forecasts
In the scripts that I run I am calling the Query class giving it my inputs
query = Query(date_start, date_end)
And the function
forecast_df = query.forecast(provider='Meteologica')
I run my script in the command line in the classic way
python myscript.py '2022-11-10' '2022-11-18'
My script shows the error
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidDatetimeFormat) invalid input syntax for type date: "{self.date_start}"
LINE 9: AND date(lf.datetime_from) BETWEEN '{self.date_start...
when I use this syntax, but when I manually input the string for date_start and date_end it works.
I cannot find a way to solve the problem with sqlalchemy, so I opened a cursor with psycopg2.
# Returns the datetime, value and provider name and issue date of the forecasts in the load_forecasts table
# The dates range is specified by the user when the class is called
def forecast(self, provider: str, zone: str='Mainland',):
# Opens a cursor to get the data
cursor = self.connection.cursor()
# Query to run
query = """
SELECT dp.name, lf.datetime_from, fr.name, lf.value, lf.issue_date
FROM load_forecasts lf
INNER JOIN bidding_zones bz ON lf.zone_id = bz.zone_id
INNER JOIN data_providers dp ON lf.provider_id = dp.provider_id
INNER JOIN forecast_runs fr ON lf.run_id = fr.run_id
WHERE bz.name = %s
AND dp.name = %s
AND date(lf.datetime_from) BETWEEN %s AND %s
"""
# Execute the query, bring the data and close the cursor
cursor.execute(query, (zone, provider, self.date_start, self.date_end))
self.df_forecasts = cursor.fetchall()
cursor.close()
return self.df_forecasts
If anyone finds the answer with sqlalchemy, I would love to see it!

Use query string template from config in bigquery result

Trying to use the dynamic query string from config file and format in the result section using pyodbc, where the query string is printing as static value. Expecting to print the DB result populate dynamically.
Code:
def bq_exec_sql(sql):
client = bigquery.Client(project='development')
return client.query(sql)
def generate_select(sql, templete_select):
job = bq_exec_sql(sql)
result = ''
print(templete_select)
try:
for row in job:
result += templete_select
print(result)
if __name__ == '__main__':
for source in dashboard_activity_list:
sql = config.get(source).source_select // from config file
templete_select =config.get(source).select_template // from config file
generate_select(sql, templete_select)
Actual Output:
select '"+row['table_id']+"' as table_id, '"+row['frequency']+"' as frequency from `trend-dev.test.table1`
sselect '"+row['table_id']+"' as table_id, '"+row['info']+"' as info from `trend-dev.test.table2`
Expected Output:
select table_name1 as table_id, daily from `trend-dev.test.table1`
select table_name2 as table_id, chart_data from `trend-dev.test.table2`
Config file:
dashboard_activity_list: [source_partner_trend, source_partner_normal]
source_partner_trend:
source_select : select * from `trend-dev.test.trend_partner`
source_key : trend_source_partner
distination_table : test.trend_partner_dashboard_feed
select_template : select '"+row['table_id']+"' as table_id, '"+row['frequency']+"' as frequency from `trend-dev.test.table1`
source_partner_normal:
source_select : select * from `trend-dev.test.normal_partner`
source_key : normal_source_partner
distination_table : test.normal_partner_dashboard_feed
select_template : select '"+row['table_id']+"' as table_id, '"+row['info']+"' as info from `trend-dev.test.table2`
I have replicated your code and it appears that its passing the whole query in templete_select variable as text string, I modified it so that It can separate the row from text string at the select statement and produced the expected output.
Working Snippet:
from google.cloud import bigquery
def bq_exec_sql(sql):
client = bigquery.Client(project='development')
return client.query(sql)
def generate_select(sql, templete_select):
job = bq_exec_sql(sql)
result = ''
try:
for row in job:
result += templete_select.format(row['table_id'])
print(result)
except:
print('error')
if __name__ == '__main__':
sql = 'select * from `<table_name>` limit 2'
templete_select =" select {} as table_id"
generate_select(sql, templete_select)
I found out one of best option to sort out the question, where use eval() method to assign dynamic string value formated to the resulted loop.
result += eval(f'f{templete_select!r}')
Ref : https://python-forum.io/thread-24481.html

Troubleshooting uncooperative For Loop of SQLalchemy results

Looking for a second set of eyes here. I cannot figure out why the following loop will not continue past the first iteration.
The 'servicestocheck' sqlalchemy query returns 45 rows in my test, but I cannot iterate through the results like I'm expecting... and no errors are returned. All of the functionality works on the first iteration.
Anyone have any ideas?
def serviceAssociation(current_contact_id,perm_contact_id):
servicestocheck = oracleDB.query(PORTAL_CONTACT).filter(
PORTAL_CONTACT.contact_id == current_contact_id
).order_by(PORTAL_CONTACT.serviceID).count()
print(servicestocheck) # returns 45 items
servicestocheck = oracleDB.query(PORTAL_CONTACT).filter(
PORTAL_CONTACT.contact_id = current_contact_id
).order_by(PORTAL_CONTACT.serviceID).all()
for svc in servicestocheck:
#
# Check to see if already exists
#
check_existing_association = mysqlDB.query(
CONTACTTOSERVICE).filter(CONTACTTOSERVICE.contact_id ==
perm_contact_id,CONTACTTOSERVICE.serviceID ==
svc.serviceID).first()
#
# If no existing association
#
if check_existing_association is None:
print ("Prepare Association")
assoc_contact_id = perm_contact_id
assoc_serviceID = svc.serviceID
assoc_role_billing = False
assoc_role_technical = False
assoc_role_commercial = False
if svc.contact_type == 'Billing':
assoc_role_billing = True
if svc.contact_type == 'Technical':
assoc_role_technical = True
if svc.contact_type == 'Commercial':
assoc_role_commercial = True
try:
newAssociation = CONTACTTOSERVICE(
assoc_contact_id, assoc_serviceID,
assoc_role_billing,assoc_role_technical,
assoc_role_commercial)
mysqlDB.add(newAssociation)
mysqlDB.commit()
mysqlDB.flush()
except Exception as e:
print(e)
This function is called from a script, and it is called from within another loop. I can't find any issues with nested loops.
Ended up being an issue with SQLAlchemy ORM (see SqlAlchemy not returning all rows when querying table object, but returns all rows when I query table object column)
I think the issue is due to one of my tables above does not have a primary key in real life, and adding a fake one did not help. (I don't have access to the DB to add a key)
Rather than fight it further... I went ahead and wrote raw SQL to move my project along.
This did the trick:
query = 'SELECT * FROM PORTAL_CONTACT WHERE contact_id = ' + str(current_contact_id) + 'ORDER BY contact_id ASC'
servicestocheck = oracleDB.execute(query)

python cassandra get big result of select * in generator (without storage result in ram)

I want to get all data in cassandra table "user"
i have 840000 users and i don't want to get all users in python list.
i want get users in packs of 100 users
in cassandra doc https://datastax.github.io/python-driver/query_paging.html
i see i can use fetch_size, but in my python code i have database object that contains all cql instruction
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
class Database:
def __init__(self, name, salary):
self.cluster = Cluster(['192.168.1.1', '192.168.1.2'])
self.session = cluster.connect()
def get_users(self):
users_list = []
query = "SELECT * FROM users"
statement = SimpleStatement(query, fetch_size=10)
for user_row in session.execute(statement):
users_list.append(user_row.name)
return users_list
actually get_users return very big list of user name
but i want to transform return get_users to a "generator"
i don't want get all users name in 1 list and 1 call of function get_users, but i want to have lot of call get_users and return list with only 100 users max every call function
for example :
list1 = database.get_users()
list2 = database.get_users()
...
listn = database.get_users()
list1 contains 100 first user in query
list2 contains 100 "second" users in query
listn contains the latest elements in query (<=100)
is this possible ?
thanks for advance for your answer
According to Paging Large Queries:
Whenever there are no more rows in the current page, the next page
will be fetched transparently.
So, if you execute your code like this, you will still the whole result set, but this is paged in a transparent manner.
In order to achieve what you need to use callbacks. You can also find some code sample on the link above.
I added below the full code for reference.
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
from threading import Event
class PagedResultHandler(object):
def __init__(self, future):
self.error = None
self.finished_event = Event()
self.future = future
self.future.add_callbacks(
callback=self.handle_page,
errback=self.handle_error)
def handle_page(self, rows):
for row in rows:
process_row(row)
if self.future.has_more_pages:
self.future.start_fetching_next_page()
else:
self.finished_event.set()
def handle_error(self, exc):
self.error = exc
self.finished_event.set()
def process_row(user_row):
print user_row.name, user_row.age, user_row.email
cluster = Cluster()
session = cluster.connect()
query = "SELECT * FROM myschema.users"
statement = SimpleStatement(query, fetch_size=5)
future = session.execute_async(statement)
handler = PagedResultHandler(future)
handler.finished_event.wait()
if handler.error:
raise handler.error
cluster.shutdown()
Moving to next page is done in handle_page when start_fetching_next_page is called.
If you replace the if statement with self.finished_event.set() you will see that the iteration stops after the first 5 rows as defined in fetch_size

PyODBC SQL type error when reading query in Pandas

Looking for some help with a specific error when I write out from a pyodbc connection. How do I fix the error:
ODBC SQL type -360 is not yet supported. column-index=1 type=-360', 'HY106' error from PYODBC
Here is my code:
import pyodbc
import pandas as pd
import sqlparse
## Function created to read SQL Query
def create_query_string(sql_full_path):
with open(sql_full_path, 'r') as f_in:
lines = f_in.read()
# remove any common leading whitespace from every line
query_string = textwrap.dedent("""{}""".format(lines))
## remove comments from SQL Code
query_string = sqlparse.format(query_string, strip_comments=True)
return query_string
query_string = create_query_string("Bad Code from R.sql")
## initializes the connection string
curs = conn.cursor()
df=pd.read_sql(query_string,conn)
df.to_csv("TestSql.csv",index=None)
We are using the following SQL code in query string:
SELECT loss_yr_qtr_cd,
CASE
WHEN loss_qtr_cd <= 2 THEN loss_yr_num
ELSE loss_yr_num + 1
END AS LOSS_YR_ENDING,
snap_yr_qtr_cd,
CASE
WHEN snap_qtr_cd <= 2 THEN snap_yr_num
ELSE snap_yr_num + 1
END AS CAL_YR_ENDING,
cur_ctstrph_loss_ind,
clm_symb_grp_cd,
adbfdb_pol_form_nm,
risk_st_nm,
wrt_co_nm,
wrt_co_part_cd,
src_of_bus_cd,
rt_zip_dlv_ofc_cd,
cur_rst_rt_terr_cd,
Sum(xtra_cntrc_py_amt) AS XTRA_CNTRC_PY_AMT
FROM (SELECT DT.loss_yr_qtr_cd,
DT.loss_qtr_cd,
DT.loss_yr_num,
SNAP.snap_yr_qtr_cd,
SNAP.snap_qtr_cd,
SNAP.snap_yr_num,
CLM.cur_ctstrph_loss_ind,
CLM.clm_symb_grp_cd,
POL_SLCT.adbfdb_pol_form_nm,
POL_SLCT.adbfdb_pol_form_cd,
CVR.bsic_cvr_ind,
POL_SLCT.priv_pass_ind,
POL_SLCT.risk_st_nm,
POL_SLCT.wrt_co_nm,
POL_SLCT.wrt_co_part_cd,
POL_SLCT.src_of_bus_cd,
TERR.rt_zip_dlv_ofc_cd,
TERR.cur_rst_rt_terr_cd,
LOSS.xtra_cntrc_py_amt
FROM ahshdm1d.vmaloss_day_dt_dim DT,
ahshdm1d.vmasnap_yr_mo_dim SNAP,
ahshdm1d.tmaaclm_dim CLM,
ahshdm1d.tmaapol_slct_dim POL_SLCT,
ahshdm1d.tmaacvr_dim CVR,
ahshdm1d.tmaart_terr_dim TERR,
ahshdm1d.tmaaloss_fct LOSS,
ahshdm1d.tmaaprod_bus_dim BUS
WHERE SNAP.snap_yr_qtr_cd BETWEEN '20083' AND '20182'
AND TRIM(POL_SLCT.adbfdb_lob_cd) = 'A'
AND CVR.bsic_cvr_ind = 'Y'
AND POL_SLCT.priv_pass_ind = 'Y'
AND POL_SLCT.adbfdb_pol_form_cd = 'V'
AND POL_SLCT.src_of_bus_cd NOT IN ( 'ITC', 'INV' )
AND LOSS.xtra_cntrc_py_amt > 0
AND LOSS.loss_day_dt_id = DT.loss_day_dt_dim_id
AND LOSS.cvr_dim_id = CVR.cvr_dim_id
AND LOSS.pol_slct_dim_id = POL_SLCT.pol_slct_dim_id
AND LOSS.rt_terr_dim_id = TERR.rt_terr_dim_id
AND LOSS.prod_bus_dim_id = BUS.prod_bus_dim_id
AND LOSS.clm_dim_id = CLM.clm_dim_id
AND LOSS.snap_yr_mo_dt_id = SNAP.snap_yr_mo_dt_id) AS TABLE1
GROUP BY loss_yr_qtr_cd,
loss_qtr_cd,
loss_yr_num,
snap_yr_qtr_cd,
snap_qtr_cd,
snap_yr_num,
cur_ctstrph_loss_ind,
clm_symb_grp_cd,
adbfdb_pol_form_nm,
risk_st_nm,
wrt_co_nm,
wrt_co_part_cd,
src_of_bus_cd,
rt_zip_dlv_ofc_cd,
cur_rst_rt_terr_cd
FOR FETCH only
Just looking how to properly write out the database.
Thanks,
Justin

Categories

Resources