I have the following code that demands a -p argument when calling. However, how do I call the -p argument in the SQL query? I am also looking to use this -p argument text in the output file name.
#!/usr/bin/python
import argparse
import psycopg2
import csv
parser = argparse.ArgumentParser(description='insert the project ID as an
argument')
parser.add_argument('-p','--project_id', help='project_id to pull files from
ERAPRO',required=True)
args = parser.parse_args()
conn = psycopg2.connect(database="XXX", user="XXX", password="XXX",
host="XXX", port="5432")
cur = conn.cursor()
cur.execute("""SELECT project_analysis.project_accession,
analysis.analysis_accession, file.filename, file.file_md5, file.file_location
FROM project_analysis
LEFT JOIN analysis on project_analysis.analysis_accession = analysis.analysis_accession
LEFT JOIN analysis_file on analysis.analysis_accession = analysis_file.analysis_accession
LEFT JOIN file on analysis_file.file_id = file.file_id
WHERE project_accession = <INSERT -p ARGUMENT HERE> and analysis.hidden_in_eva = '0';""")
records = cur.fetchall()
with open ('/nfs/production3/eva/user/gary/evapro_ftp/<INSERT -p ARGUMENT
HERE>.csv', 'w') as f:
writer = csv.writer (f, delimiter = ',')
for row in records:
writer.writerow(row)
conn.close()
All help appreciated.
Thanks
First assign your argument to variable using dest argument to add_argument(). lets say we assign the input to the project_id variable.
This way we can reference it in the code.
parser.add_argument('-p','--project_id',
help='project_id to pull files from
ERAPRO',
required=True,
dest='project_id') # notice the dest argument
cur.execute("""SELECT project_analysis.project_accession,
analysis.analysis_accession, file.filename, file.file_md5, file.file_location
FROM project_analysis
LEFT JOIN analysis on project_analysis.analysis_accession = analysis.analysis_accession
LEFT JOIN analysis_file on analysis.analysis_accession = analysis_file.analysis_accession
LEFT JOIN file on analysis_file.file_id = file.file_id
WHERE project_accession = %s and analysis.hidden_in_eva = '0';""", (args.project_id))
Notice the use of execute(' ... %s ...', (args.project_id)) by doing this we interpolated the value referenced by project_id into the string.
After calling args = parser.parse_args() you can obtain the value of the arguments like this:
pid = args.project_id
Then you can use that value of pid in your code by using normal string substitution. However, it's better to use psycopg2's inbuilt method for passing parameters to SQL queries to prevent SQL injection.
Normal string substitution:
'hello world {]'.format(var_name)
psycopg2:
cur.execute('SELECT * from %s', (var_name))
Related
I have created a database and I am trying to fetch data from it. I have a class Query and inside the class I have a function that calls a table called forecasts. The function is as follows:
def forecast(self, provider: str, zone: str='Mainland',):
self.date_start = date_start)
self.date_end = (date_end)
self.df_forecasts = pd.DataFrame()
fquery = """
SELECT dp.name AS provider_name, lf.datetime_from AS date, fr.name AS run_name, lf.value AS value
FROM load_forecasts lf
INNER JOIN bidding_zones bz ON lf.zone_id = bz.zone_id
INNER JOIN data_providers dp ON lf.provider_id = dp.provider_id
INNER JOIN forecast_runs fr ON lf.run_id = fr.run_id
WHERE bz.name = '{zone}'
AND dp.name = '{provider}'
AND date(lf.datetime_from) BETWEEN '{self.date_start}' AND '{self.date_end}'
"""
df_forecasts = pd.read_sql_query(fquery, self.connection)
return df_forecasts
In the scripts that I run I am calling the Query class giving it my inputs
query = Query(date_start, date_end)
And the function
forecast_df = query.forecast(provider='Meteologica')
I run my script in the command line in the classic way
python myscript.py '2022-11-10' '2022-11-18'
My script shows the error
sqlalchemy.exc.DataError: (psycopg2.errors.InvalidDatetimeFormat) invalid input syntax for type date: "{self.date_start}"
LINE 9: AND date(lf.datetime_from) BETWEEN '{self.date_start...
when I use this syntax, but when I manually input the string for date_start and date_end it works.
I cannot find a way to solve the problem with sqlalchemy, so I opened a cursor with psycopg2.
# Returns the datetime, value and provider name and issue date of the forecasts in the load_forecasts table
# The dates range is specified by the user when the class is called
def forecast(self, provider: str, zone: str='Mainland',):
# Opens a cursor to get the data
cursor = self.connection.cursor()
# Query to run
query = """
SELECT dp.name, lf.datetime_from, fr.name, lf.value, lf.issue_date
FROM load_forecasts lf
INNER JOIN bidding_zones bz ON lf.zone_id = bz.zone_id
INNER JOIN data_providers dp ON lf.provider_id = dp.provider_id
INNER JOIN forecast_runs fr ON lf.run_id = fr.run_id
WHERE bz.name = %s
AND dp.name = %s
AND date(lf.datetime_from) BETWEEN %s AND %s
"""
# Execute the query, bring the data and close the cursor
cursor.execute(query, (zone, provider, self.date_start, self.date_end))
self.df_forecasts = cursor.fetchall()
cursor.close()
return self.df_forecasts
If anyone finds the answer with sqlalchemy, I would love to see it!
I have to pass multiple arguments while executing the python script as a condition.
Below is my code but i have to perform same steps with multiple condition.
there are 4 different files for client 1 and 2 with data and metadata errors.
so, If I pass python.py client1,data,date
then my function should pick the first file name client1_data_error_file_1 and create a dataframe and insert in the database.
import pandas as pd
from operator import itemgetter
import glob
import bz2
import csv
import import argparse
client1_data_error_file_1=10_client1_AAAAAA_data_error_date.bz2
client1_metadata_error_file_1=10_client1_AAAAAA_metadata_error_date.bz2
client2_data_error_file_1=20_client2_AAAAAA_data_error_date.bz2
client2_metadata_error_file_1=10_client1_AAAAAA_metadata_error_date.bz2
def load_errors_database(argument,client,error):
header = ["filedate", "errorcode", "errorROEID", "ROEID", "type", "rawrecord", "filename"]
data = []
req_cols = itemgetter(0, 1, 2, 3, 4, 9, 10)
for error_filename in glob.glob("*.bz2"):
with bz2.open(error_filename, "rt", encoding="utf-8") as f_error_file:
csv_input = csv.reader(f_error_file, skipinitialspace=True)
for orig_row in csv_input:
row = req_cols(orig_row)
data.append([row[0], row[1], row[2], row[3], row[4], ",".join(orig_row), error_filename])
df = pd.DataFrame(data, columns=header)
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database)
cursor = cnxn.cursor()
# Insert Dataframe into SQL Server:
for index, row in df.iterrows():
cursor.execute("INSERT INTO dbo.error_table (filedate, errorcode, errorROEID, ROEID, type, rawrecord, filename) values(?,?,?,?,?,?,?)", row.filedate, row.errorcode, row.errorROEID, row.ROEID, row.type, row.rawrecord, row.filename)
cnxn.commit()
cursor.close()
How do I pass these arugments as a condition? not necessarily it has to be function.
When I execute my python code from terminal, I would like to pass
python_error_file.py client1, data,date
Now it should pick the first file and do the necessary steps. If I pass
python_error_file.py client2, metadata,date
It should pick the 4th file and do the required steps.
Steps are same for all four files. I just have to pass these as parameter while executing the code.
Can anyone please help me with this?
argparse is your friend for creating hand command-line tools with simple syntax. Here is a snippet to help you:
import argparse
parser = argparse.ArgumentParser(description='client file parser')
parser.add_argument(
'-c', '--client',
help='client name',
type=str
)
parser.add_argument(
'-m', '--metadata',
help='meta data',
type=str,
default=''
)
parser.add_argument(
'-d', '--data',
help='data',
type=str,
default=''
)
args = parser.parse_args()
load_errors_database(args.client, args.metadata, args.data)
Usage:
python file.py -c client1 -m metadata -d data
I m a newbie to python. I want to pass a command-line argument to my presto query which is inside a function and then writes the result as a CSV file. But when I try to run it on the terminal it says 'Traceback (most recent call last): File "function2.py", line 3, in <module> from pyhive import presto ModuleNotFoundError: No module named 'pyhive'
The pyhive requirement is already satisfied. Please find attached my code:
from sys import argv
import argparse
from pyhive import presto
import prestodb
import csv
import sys
import pandas as pd
connection = presto.connect(host='xyz',port=8889,username='test')
cur = connection.cursor()
print('Connection Established')
def func1(object,start,end):
object = argv[1]
start = argv[2]
end = argv[3]
result = cur.execute("""
with map_date as
(
SELECT
object,
epoch,
timestamp,
date,
map_agg(name, value) as map_values
from hive.schema.test1
where object = '${object}'
and (epoch >= '${start}' and epoch <= '${end}')
and name in ('x','y')
GROUP BY object,epoch,timestamp,date
order by timestamp asc
)
SELECT
epoch
, timestamp
, CASE WHEN element_at(map_values, 'x') IS NOT NULL THEN map_values['x'] ELSE NULL END AS x
, CASE WHEN element_at(map_values, 'y') IS NOT NULL THEN map_values['y'] ELSE NULL END AS y
, object
, date AS date
from map_date
""")
rows = cur.fetchall()
print('Query Finished') #Returns the list with one entry for each record
fp = open('/Users/xyz/Desktop/Python/function.csv', 'w')
print('File Created')
myFile = csv.writer(fp)
colnames = [desc[0] for desc in cur.description] #store the headers in variable called 'colnames'
myFile.writerow(colnames) #write the header to the file
myFile.writerows(rows)
fp.close()
func1(object,start,end)
cur.close()
connection.close()
How can I pass the command line argument to my Presto query which is written inside a function?
Any help is much appreciated. Thank you In advance!
I only describe how to pass command line arguments to function and query.
If you define function
def func1(object, start, end):
# code
then you have to send values as varaibles and you have to use sys.argv outside function
connection = presto.connect(host='xyz', port=8889, username='test') # PEP8: spaces after commas
cur = connection.cursor()
print('Connection Established')
object_ = sys.argv[1] # PEP8: there is class `object` so I add `_` to create different name
start = sys.argv[2]
end = sys.argv[3]
func1(object_, start, end)
cur.close()
connection.close()
You don't have to use the same names outside function
args1 = sys.argv[1]
args2 = sys.argv[2]
args3 = sys.argv[3]
func1(args1, args2, args3)
and you can even do
func1(sys.argv[1], sys.argv[2], sys.argv[3])
becuse when you run this line then python gets definition def func1(object, start, end): and it creates local variables with names object, start, end inside func1 and it assigns external value to these local variables
object=objec_, start=start, end=end
or
object=args1, start=args2, end=args2
or
object=sys.argv[1], start=sys.argv[1], end=sys.argv[1]
It would be good to send explicitly also cur to function
def func1(cur, object_, start, end):
# code
and
func1(cur, sys.argv[1], sys.argv[2], sys.argv[3])
I don't know what you try to do in SQL query but Python uses {start} (without $) to put value in string (Bash uses ${start}) and it needs prefix f to create f-string - f"""... {start}....""". Without f you have to use normal string formatting """... {start}....""".format(start=start)
import sys
import csv
from pyhive import presto
# --- functions ----
def func1(cur, object_, start, end): # PEP8: spaces after commas
# Python use `{star} {end}`, Bash uses `${start} ${end}`
# String needs prefix `f` to use `{name} {end}` in f-string
# or you have to use `"{start} {end}".format(start=value1, end=value2)`
result = cur.execute(f"""
WITH map_date AS
(
SELECT
object,
epoch,
timestamp,
date,
map_agg(name, value) AS map_values
FROM hive.schema.test1
WHERE object = '{object_}'
AND (epoch >= '{start}' AND epoch <= '{end}')
AND name IN ('x','y')
GROUP BY object,epoch,timestamp,date
ORDER BY timestamp asc
)
SELECT
epoch,
timestamp,
CASE WHEN element_at(map_values, 'x') IS NOT NULL THEN map_values['x'] ELSE NULL END AS x,
CASE WHEN element_at(map_values, 'y') IS NOT NULL THEN map_values['y'] ELSE NULL END AS y,
object,
date AS date
FROM map_date
""")
rows = cur.fetchall()
colnames = [desc[0] for desc in cur.description] # store the headers in variable called 'colnames'
print('Query Finished') # returns the list with one entry for each record
fp = open('/Users/xyz/Desktop/Python/function.csv', 'w')
my_file = csv.writer(fp) # PEP8: lower_case_names for variables
my_file.writerow(colnames) # write the header to the file
my_file.writerows(rows)
fp.close()
print('File Created')
# --- main ---
connection = presto.connect(host='xyz', port=8889, username='test') # PEP8: spaces after commas
cur = connection.cursor()
print('Connection Established')
#object_ = sys.argv[1] # PEP8: there is class `object` so I add `_` to create different name
#start = sys.argv[2]
#end = sys.argv[3]
#func1(cur, object_, start, end)
func1(cur, sys.argv[1], sys.argv[2], sys.argv[3])
cur.close()
connection.close()
If you plan to use argparse
parser = argparse.ArgumentParser()
parser.add_argument('-o', '--object', help='object to search')
parser.add_argument('-s', '--start', help='epoch start')
parser.add_argument('-e', '--end', help='epoch end')
args = parser.parse_args()
and then
func1(cur, args.object, args.start, args.end)
import argparse
# ... imports and functions ...
# --- main ---
parser = argparse.ArgumentParser()
parser.add_argument('-o', '--object', help='object to search')
parser.add_argument('-s', '--start', help='epoch start')
parser.add_argument('-e', '--end', help='epoch end')
#parser.add_argument('-D', '--debug', action='store_true', help='debug (display extra info)')
args = parser.parse_args()
#if args.debug:
# print(args)
connection = presto.connect(host='xyz', port=8889, username='test') # PEP8: spaces after commas
cur = connection.cursor()
print('Connection Established')
func1(cur, args.object, args.start, args.end)
cur.close()
I'm trying to pull data from an Excel spreadsheet to MySQL. My script can't find the path to the Excel file, and my IDE (Spyder) is giving an error on this line:
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
invalid syntax
import openpyxl
import pymysql as mdb
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
masterdict = {}
wb = openpyxl.load_workbook('main.xlsx')
for sheet in wb:
for arow in range(2, sheet.max_row+1):
if sheet['A'+str(arow)].value:
masterdict[sheet['A'+str(arow)].value] = {
'Equipment Number':sheet['B'+str(arow)].value,
'Number':sheet['C'+str(arow)].value,
'Description':sheet['D'+str(arow)].value,
'Manufacturer':sheet['E'+str(arow)].value,
'Serial Number':sheet['F'+str(arow)].value,
'Country of Manufacturer':sheet['G'+str(arow)].value,
'Functional Location Description':sheet['H'+str(arow)].value,
'Functional Location Number (Short)':sheet['I'+str(arow)].value,
'Functional Location Number':sheet['J'+str(arow)].value,
'COST OF SERVICING AND MAINTENANCE':sheet['K'+str(arow)].value,
'Office Location':sheet['L'+str(arow)].value
}
return masterdict
def inputIntoMySQL(masterdict):
con = mdb.connect(host= '127.0.0.1', user = 'root', password =None,db='scraping')
cur = con.cursor()
with con:
cur.execute("DROP TABLE IF EXISTS main")
cur.execute("CREATE TABLE main (rid INT PRIMARY KEY, EquipmentNumber VARCHAR(75), Description VARCHAR(75),\
Manufacturer VARCHAR(50), SerialNumber INT,CountryOfManufacturer VARCHAR(25), \
FunctionalLocationDescription VARCHAR(50), FunctionalLocationNumberShort VARCHAR(75), FunctionalLocationNumber VARCHAR(25),\
CostOfServicingAndMaintenance DECIMAL(15,2),OfficeLocation VARCHAR(35))")
for i in masterdict:
cur.execute('INSERT INTO DISTRIBUTORS_NESTLE(rid, EquipmentNumber,Description,Manufacturer,SerialNumber,\
CountryOfManufacturer,FunctionalLocationDescription, FunctionalLocationNumberShort,FunctionalLocationNumber\
CostOfServicingAndMaintenance,OfficeLocation) VALUES("%s", "%s", "%s","%s","%s","%s","%s","%s","%s","%s","%s")'
%(i,masterdict[i]['Equipment Number'],masterdict[i]['Description'],
masterdict[i]['Manufacturer'],masterdict[i]['Serial Number'],masterdict[i]['Country of Manufacturer'],
masterdict[i]['Functional Location Description'], masterdict[i]['Functional Location Number (Short)'], masterdict[i]['Functional Location Number'],
masterdict[i]['COST OF SERVICING AND MAINTENANCE'], masterdict[i]['Office Location']))
con.commit()
con.close()
The syntax error is because you're defining a function (read_excel) and you're putting the excel filepath directly in the function definition - with this syntax you the excel filepath isnt assigned to a variable so you wouldn't be able to use it within the function.
def read_excel(r'C:\Users\ParaSystems Limited\Desktop\main.xlsx')#Syntax error
To fix this you could create a parameter and make that particular filepath the default value:
def read_excel(excel_file_path = r'C:\Users\ParaSystems Limited\Desktop\main.xlsx')
Then when you call the function, you can call it without any parameters and the excel_file_path will default to that e.g.
read_excel()#Calls with excel_file_path as your default value
or
read_excel(excel_file_path = r'path\to\another\excel.xlsx') #Calls with excel_file_path as the passed parameter value
If there really isn't any need to call this function on any other excel, just declare it in the read_excel function and leave the parameters blank. e.g.
def read_excel():
excel_file_path = r'C:\Users\ParaSystems Limited\Desktop\main.xlsx'
This is not a valid function definition:
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
masterdict = {}
wb = openpyxl.load_workbook('main.xlsx')
...
You don't have any named parameters inside the parentheses, just a raw string.
It looks like you actually meant something like:
def read_excel(fname=r'C:\Users\ParaSystems Limited\Desktop\main.xlsx'):
masterdict = {} # [unchanged]
wb = openpyxl.load_workbook(fname) # _Uses_ the parameter.
...
Also, since you are using a raw string (r'...'), you shouldn't need to double the backslashes.
Single backslashes should work.
(You'll have to verify this yourself.
I don't have access to a Windows system, so I can't test this.)
I am attempting to figure out how to use the IN keyword in a django model query.
I was attempting to replace:
db = database.connect()
c = db.cursor()
c.execute("SELECT MAX(Date) FROM `Requests` WHERE UserId = %%s AND VIN = %%s AND Success = 1 AND RptType in %s" % str(cls.SuperReportTypes), (userID, vin))
With this:
myrequests = Request.objects.filter(user=userID, vin = vin, report_type in cls.SuperReportTypes)
myrequests.aggregate(Max('Date'))
I get a:
SyntaxError: non-keyword arg after keyword arg (<console>, line 1)
When I remove the ending "report_type in cls.SuperReportTypes" the query functions properly.
I recognize that there is a way to do this after the query managing the result set but I was hoping to deal with this in such a way that MYSQL would do the execution.
field__in=seq
You are using the wrong in statement
https://docs.djangoproject.com/en/dev/ref/models/querysets/#in