I am writing a function that takes some arguments and builds a SQL query string. I currently have each of the keyword arguments assigned to the string by name, and I am thinking that there has to be an easier way to automatically format the keyword arguments expected in the string. I'm wondering if it is possible to automatically format (or assign) a keyword argument to the query string if the argument name matches.
Here is my current function:
def create_query(
who_id,
what_id=None,
owner_id=None,
subject,
description,
due_date,
is_closed=False,
is_priority=False,
is_reminder=True,
):
query_str = """
'WhoId':'{who_id}',
'Subject':'{subject}',
'Description':'{description}',
'ActivityDate':'{due_date}',
'IsClosed':'{is_closed}',
'IsPriority':'{is_priority}',
'IsReminderSet':'{is_reminder}',
""".format(
who_id=who_id,
subject=subject, # Mapping each of these to the
description=description, # identically named variable
due_date=due_date, # seems dumb
is_closed=is_closed,
is_priority=is_priority,
is_reminder=is_reminder,
)
And I would like something that is more like this:
def create_query(**kwargs):
query_string = """
'WhoId':'{who_id}',
'Subject':'{subject}',
'Description':'{description}',
...
""".format(
for key in **kwargs:
if key == << one of the {keyword} variables in query_string >>:
key = << the matching variable in query_string >>
def create_query(**kwargs):
query_string = """
'WhoId':'{who_id}',
'Subject':'{subject}',
'Description':'{description}',
...
""".format(**kwargs)
print(query_string)
Can be used like this then:
create_query(who_id=123, subject="Boss", description="he's bossy",
will_be_ignored="really?")
Prints:
'WhoId':'123',
'Subject':'Boss',
'Description':'he is bossy',
...
Note the additional parameter will_be_ignored that I entered. It will simply be ignored. You don't have to filter that yourself.
But better don't build the query string yourself. Let the database connector handle that. For example, if my example said "he's bossy" then that would break the query because it uses ' as delimiter. Your database connector should allow you to give it queries with placeholders and values, and then properly replace the placeholders with the values.
Alternative:
def create_query(**kwargs):
parameters = (('WhoId', 'who_id'),
('Subject','subject'),
('Description', 'description'))
query_string = ','.join(r"'{}':'{{{}}}'".format(name, kwargs[id])
for name, id in parameters
if id in kwargs)
print(query_string)
create_query(who_id=123, description="he's bossy",
will_be_ignored="really?")
Prints:
'WhoId':'{123}','Description':'{he's bossy}'
Related
I'm trying to get a column, then use values to create file names.
I've tried the following, which should create a csv with the name of the first value in the column specified. It says the list is empty though when I try to use it
bq_data = []
get_data = BigQueryGetDataOperator(
task_id='get_data_from_bq',
dataset_id='SK22',
table_id='current_times',
max_results='100',
selected_fields='current_timestamps',
)
def process_data_from_bq(**kwargs):
ti = kwargs['ti']
global bq_data
bq_data = ti.xcom_pull(task_ids='get_data_from_bq')
process_data = PythonOperator(
task_id='process_data_from_bq',
python_callable=process_data_from_bq,
provide_context=True)
run_export = BigQueryToCloudStorageOperator(
task_id=f"save_data_on_storage{str(bq_data[0])}",
source_project_dataset_table="a-data-set",
destination_cloud_storage_uris=[f"gs://europe-west1-airflow-bucket/data/test{bq_data[0]}.csv"],
export_format="CSV",
field_delimiter=",",
print_header=False,
dag=dag,
)
get_data >> process_data >> run_export
I think no need to use a PythonOperator between BigQueryGetDataOperator and BigQueryToCloudStorageOperator, you can directly use xcom pull in BigQueryToCloudStorageOperator :
get_data = BigQueryGetDataOperator(
task_id='get_data_from_bq',
dataset_id='SK22',
table_id='current_times',
max_results='100',
selected_fields='current_timestamps',
)
run_export = BigQueryToCloudStorageOperator(
task_id="save_data_on_storage",
source_project_dataset_table="a-data-set",
destination_cloud_storage_uris=[f"gs://europe-west1-airflow-bucket/data/test" + "{{ ti.xcom_pull(task_ids='get_data_from_bq')[0] }}" + ".csv"],
export_format="CSV",
field_delimiter=",",
print_header=False,
dag=dag,
)
get_data >> run_export
destination_cloud_storage_uris is a templated param and you can pass Jinja template syntax inside.
I don't tested the syntax but it should work.
I also don't recommend you using global variable like bq_data to pass data between operators, because it doesn't work, you need to find a way to use xcom directly in the operator (Jinja template or access to the current Context of the operator).
I also noticed that you are not using the latest Airflow operators :
BigQueryToCloudStorageOperator -> BigQueryToGCSOperator
If you want using all the list provided by BigQueryGetDataOperator operator and calculate a list of destination URIs from it, I propose you another solution :
from __future__ import annotations
from typing import List, Dict, Sequence
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import BigQueryToGCSOperator
from google.cloud.bigquery import DEFAULT_RETRY
from urllib3 import Retry
class CustomBigQueryToGCSOperator(BigQueryToGCSOperator):
def __init__(self,
source_project_dataset_table: str,
project_id: str | None = None,
compression: str = "NONE",
export_format: str = "CSV",
field_delimiter: str = ",",
print_header: bool = True,
gcp_conn_id: str = "google_cloud_default",
delegate_to: str | None = None,
labels: dict | None = None,
location: str | None = None,
impersonation_chain: str | Sequence[str] | None = None,
result_retry: Retry = DEFAULT_RETRY,
result_timeout: float | None = None,
job_id: str | None = None,
force_rerun: bool = False,
reattach_states: set[str] | None = None,
deferrable: bool = False,
**kwargs) -> None:
super(CustomBigQueryToGCSOperator, self).__init__(
source_project_dataset_table=source_project_dataset_table,
destination_cloud_storage_uris=[],
project_id=project_id,
compression=compression,
export_format=export_format,
field_delimiter=field_delimiter,
print_header=print_header,
gcp_conn_id=gcp_conn_id,
delegate_to=delegate_to,
labels=labels,
location=location,
impersonation_chain=impersonation_chain,
result_retry=result_retry,
result_timeout=result_timeout,
job_id=job_id,
force_rerun=force_rerun,
reattach_states=reattach_states,
deferrable=deferrable,
**kwargs
)
def execute(self, context):
task_instance = context['task_instance']
data_from_bq: List[Dict] = task_instance.xcom_pull('get_data_from_bq')
destination_cloud_storage_uris: List[str] = list(map(self.to_destination_cloud_storage_uris, data_from_bq))
self.destination_cloud_storage_uris = destination_cloud_storage_uris
super(CustomBigQueryToGCSOperator, self).execute(context)
def to_destination_cloud_storage_uris(self, data_from_bq: Dict) -> str:
return f"gs://europe-west1-airflow-bucket/data/test{data_from_bq['your_field']}.csv"
Example of instantiation of this operator (without destination_cloud_storage_uris field because it's calculated inside the operator):
CustomBigQueryToGCSOperator(
task_id="save_data_on_storage",
source_project_dataset_table="airflow-proj.source_table.attribute_table",
export_format="CSV",
field_delimiter=","
)
Some explanations :
I created a custom operator that extends BigQueryToGCSOperator
In the execute method, I have access to the current context of the operator
From the context, I can retrieve the list from BQ provided by the BigQueryGetDataOperator. I assume it's a list of Dict but you have to confirm this
I calculate a list of destination GCS URIs from this list of Dict
I assign the calculated destination GCS URIs to the corresponding field in the operator
The pros of this solution, you have more flexibility to apply logic based on xcom value.
The cons is it's little verbose.
Just as an update, I realised what I was really trying to do was have dynamic stages in a DAG (hence why I was looking at using this as a list). I solved the issue by creating a function that uses the standard BQ client to get a column and parse it as a list, which I then call in the relevant stages in the dag with the expand function built into BQ. Originally I was looking at just getting the first row of a column, but this was the first stage of what I was trying to do, so the other very detailed and well written answer more than answers this question.
I'm writing test automation for API in BDD behave. I need a switcher between environments. Is any possible way to change one value in one place without adding this value to every functions? Example:
I've tried to do it by adding value to every function but its makes all project very complicated
headers = {
'Content-Type': 'application/json',
'country': 'fi'
}
what i what to switch only country value in headers e.g from 'fi' to 'es'
and then all function should switch themselves to es environment, e.g
def sending_post_request(endpoint, user):
url = fi_api_endpoints.api_endpoints_list.get(endpoint)
personalId = {'personalId': user}
json_post = requests.post(url,
headers=headers,
data=json.dumps(personalId)
)
endpoint_message = json_post.text
server_status = json_post.status_code
def phone_number(phone_number_status):
if phone_number_status == 'wrong':
cursor = functions_concerning_SQL_conection.choosen_db('fi_sql_identity')
cursor.execute("SELECT TOP 1 PersonalId from Registrations where PhoneNumber is NULL")
result = cursor.fetchone()
user_with_no_phone_number = result[0]
return user_with_no_phone_number
else:
cursor = functions_concerning_SQL_conection.choosen_db('fi_sql_identity')
cursor.execute("SELECT TOP 1 PersonalId from Registrations where PhoneNumber is not NULL")
result = cursor.fetchone()
user_with_phone_number = result[0]
return user_with_phone_number
and when i will change from 'fi' to 'es' in headers i want:
fi_sql_identity change to es_sql_identity
url = fi_api_endpoints.api_endpoints_list.get(endpoint) change to
url = es_api_endpoints.api_endpoints_list.get(endpoint)
thx and please help
With respect to your original question, a solution for this case is closure:
def f(x):
def long_calculation(y):
return x * y
return long_calculation
# create different functions without dispatching multiple times
g = f(val_1)
h = f(val_2)
g(val_3)
h(val_3)
Well, the problem is why do you hardcode everything? With the update you can simplify your function as:
def phone_number(phone_number_status, db_name='fi_sql_identity'):
cursor = functions_concerning_SQL_conection.choosen_db(db_name)
if phone_number_status == 'wrong':
sql = "SELECT TOP 1 PersonalId from Registrations where PhoneNumber is NULL"
else:
sql = "SELECT TOP 1 PersonalId from Registrations where PhoneNumber is not NULL"
cursor.execute(sql)
result = cursor.fetchone()
return result[0]
Also please don't write like:
# WRONG
fi_db_conn.send_data()
But use a parameter:
region = 'fi' # or "es"
db_conn = initialize_conn(region)
db_conn.send_data()
And use a config file to store your endpoints with respect to your region, e.g. consider YAML:
# config.yml
es:
db_name: es_sql_identity
fi:
db_name: fi_sql_identity
Then use them in Python:
import yaml
with open('config.yml') as f:
config = yaml.safe_load(f)
region = 'fi'
db_name = config[region]['db_name'] # "fi_sql_identity"
# status = ...
result = phone_number(status, db_name)
See additional useful link for using YAML.
First, provide an encapsulation how to access the resources of a region by providing this encapsulation with a region parameter. It may also be a good idea to provide this functionality as a behave fixture.
CASE 1: region parameter needs to vary between features / scenarios
For example, this means that SCENARIO_1 needs region="fi" and SCENARIO_2 needs region="es".
Use fixture and fixture-tag with region parameter.
In this case you need to write own scenarios for each region (BAD TEST REUSE)
or use a ScenarioOutline as template to let behave generate the tests for you (by using a fixture-tag with a region parameter value for example).
CASE 2: region parameter is constant for all features / scenarios (during test-run)
You can support multiple test-runs with different region parameters by using a userdata parameter.
Look at behave userdata concept.
This allows you to run behave -D region=fi ... and behave -D region=es ...
This case provides a better reuse of testsuite, meaning a large part of the testsuite is the common testsuite that is applied to all regions.
HINT: Your code examples are too specific ("fi" based) which is a BAD-SMELL.
I'm trying to pull data from an Excel spreadsheet to MySQL. My script can't find the path to the Excel file, and my IDE (Spyder) is giving an error on this line:
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
invalid syntax
import openpyxl
import pymysql as mdb
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
masterdict = {}
wb = openpyxl.load_workbook('main.xlsx')
for sheet in wb:
for arow in range(2, sheet.max_row+1):
if sheet['A'+str(arow)].value:
masterdict[sheet['A'+str(arow)].value] = {
'Equipment Number':sheet['B'+str(arow)].value,
'Number':sheet['C'+str(arow)].value,
'Description':sheet['D'+str(arow)].value,
'Manufacturer':sheet['E'+str(arow)].value,
'Serial Number':sheet['F'+str(arow)].value,
'Country of Manufacturer':sheet['G'+str(arow)].value,
'Functional Location Description':sheet['H'+str(arow)].value,
'Functional Location Number (Short)':sheet['I'+str(arow)].value,
'Functional Location Number':sheet['J'+str(arow)].value,
'COST OF SERVICING AND MAINTENANCE':sheet['K'+str(arow)].value,
'Office Location':sheet['L'+str(arow)].value
}
return masterdict
def inputIntoMySQL(masterdict):
con = mdb.connect(host= '127.0.0.1', user = 'root', password =None,db='scraping')
cur = con.cursor()
with con:
cur.execute("DROP TABLE IF EXISTS main")
cur.execute("CREATE TABLE main (rid INT PRIMARY KEY, EquipmentNumber VARCHAR(75), Description VARCHAR(75),\
Manufacturer VARCHAR(50), SerialNumber INT,CountryOfManufacturer VARCHAR(25), \
FunctionalLocationDescription VARCHAR(50), FunctionalLocationNumberShort VARCHAR(75), FunctionalLocationNumber VARCHAR(25),\
CostOfServicingAndMaintenance DECIMAL(15,2),OfficeLocation VARCHAR(35))")
for i in masterdict:
cur.execute('INSERT INTO DISTRIBUTORS_NESTLE(rid, EquipmentNumber,Description,Manufacturer,SerialNumber,\
CountryOfManufacturer,FunctionalLocationDescription, FunctionalLocationNumberShort,FunctionalLocationNumber\
CostOfServicingAndMaintenance,OfficeLocation) VALUES("%s", "%s", "%s","%s","%s","%s","%s","%s","%s","%s","%s")'
%(i,masterdict[i]['Equipment Number'],masterdict[i]['Description'],
masterdict[i]['Manufacturer'],masterdict[i]['Serial Number'],masterdict[i]['Country of Manufacturer'],
masterdict[i]['Functional Location Description'], masterdict[i]['Functional Location Number (Short)'], masterdict[i]['Functional Location Number'],
masterdict[i]['COST OF SERVICING AND MAINTENANCE'], masterdict[i]['Office Location']))
con.commit()
con.close()
The syntax error is because you're defining a function (read_excel) and you're putting the excel filepath directly in the function definition - with this syntax you the excel filepath isnt assigned to a variable so you wouldn't be able to use it within the function.
def read_excel(r'C:\Users\ParaSystems Limited\Desktop\main.xlsx')#Syntax error
To fix this you could create a parameter and make that particular filepath the default value:
def read_excel(excel_file_path = r'C:\Users\ParaSystems Limited\Desktop\main.xlsx')
Then when you call the function, you can call it without any parameters and the excel_file_path will default to that e.g.
read_excel()#Calls with excel_file_path as your default value
or
read_excel(excel_file_path = r'path\to\another\excel.xlsx') #Calls with excel_file_path as the passed parameter value
If there really isn't any need to call this function on any other excel, just declare it in the read_excel function and leave the parameters blank. e.g.
def read_excel():
excel_file_path = r'C:\Users\ParaSystems Limited\Desktop\main.xlsx'
This is not a valid function definition:
def read_excel(r'C:\\Users\\ParaSystems Limited\\Desktop\\main.xlsx'):
masterdict = {}
wb = openpyxl.load_workbook('main.xlsx')
...
You don't have any named parameters inside the parentheses, just a raw string.
It looks like you actually meant something like:
def read_excel(fname=r'C:\Users\ParaSystems Limited\Desktop\main.xlsx'):
masterdict = {} # [unchanged]
wb = openpyxl.load_workbook(fname) # _Uses_ the parameter.
...
Also, since you are using a raw string (r'...'), you shouldn't need to double the backslashes.
Single backslashes should work.
(You'll have to verify this yourself.
I don't have access to a Windows system, so I can't test this.)
I am attempting to figure out how to use the IN keyword in a django model query.
I was attempting to replace:
db = database.connect()
c = db.cursor()
c.execute("SELECT MAX(Date) FROM `Requests` WHERE UserId = %%s AND VIN = %%s AND Success = 1 AND RptType in %s" % str(cls.SuperReportTypes), (userID, vin))
With this:
myrequests = Request.objects.filter(user=userID, vin = vin, report_type in cls.SuperReportTypes)
myrequests.aggregate(Max('Date'))
I get a:
SyntaxError: non-keyword arg after keyword arg (<console>, line 1)
When I remove the ending "report_type in cls.SuperReportTypes" the query functions properly.
I recognize that there is a way to do this after the query managing the result set but I was hoping to deal with this in such a way that MYSQL would do the execution.
field__in=seq
You are using the wrong in statement
https://docs.djangoproject.com/en/dev/ref/models/querysets/#in
With the Python construct library, the data I'm parsing has a field which only has meaning if a flag is set.
However, the data field is always present.
Therefore, I would like to consume the data in any case, but only set the field value based on the flag's value.
For example, if the structure is (incorrectly) defined as:
struct = Struct("struct",
Flag("flag"),
UBInt8("optional_data"),
UBInt8("mandatory")
)
For the data:
>>> struct.parse("010203".decode("hex"))
The result should be:
Container({'flag': True, 'mandatory': 3, 'optional_data': 2})
And for data:
>>> struct.parse("000203".decode("hex"))
The desired result is:
Container({'flag': False, 'mandatory': 3, 'optional_data': None})
I have tried the following:
struct = Struct("struct",
Flag("flag"),
IfThenElse("optional_data", lambda ctx: ctx.flag,
UBInt8("dummy"),
Padding(1)
),
UBInt8("mandatory")
)
However, Padding() puts the raw data in the field, like so:
>>> struct.parse("000203".decode("hex"))
Container({'flag': False, 'mandatory': 3, 'optional_data': '\x02'})
Thank you
I am not sure if I understand your problem correctly. If your problem is only that the padding is not parsed as int, then you dont need the IFThenElse. You can check the parsed container in your code for flag and choose to ignore the optional_data field.
struct = Struct("struct",
Flag("flag"),
UBInt8("optional_data"),
UBInt8("mandatory")
)
If your problem is that you want the name Optional Data to be used only when the flag is set but the name dummy to be used when flag is not set, then you need to define two Ifs
struct = Struct("struct",
Flag("flag"),
If("optional_data", lambda ctx: ctx.flag,
UBInt8("useful_byte"),
),
If("dummy", lambda ctx: !ctx.flag,
UBInt8("ignore_byte"),
),
UBInt8("mandatory")
)
Perhaps you could use an Adapter similar to the LengthValueAdapter for a Sequence
class LengthValueAdapter(Adapter):
"""
Adapter for length-value pairs. It extracts only the value from the
pair, and calculates the length based on the value.
See PrefixedArray and PascalString.
Parameters:
* subcon - the subcon returning a length-value pair
"""
__slots__ = []
def _encode(self, obj, context):
return (len(obj), obj)
def _decode(self, obj, context):
return obj[1]
class OptionalDataAdapter(Adapter):
__slots__ = []
def _decode(self, obj, context):
if context.flag:
return obj
else
return None
So
struct = Struct("struct",
Flag("flag"),
OptionalDataAdapter(UBInt8("optional_data")),
UBInt8("mandatory")
)