Unable to run a standardSQL query to bigquery in python - python

I am trying to query a table in Bigquery via a Python script. However I have written the query as a standard sql query. For this I need to start my query with '#standardsql'. However when I do this it then comments out the rest of my query. I have tried to write the query using multiple lines but it does not allow me to do this either. Has anybody dealt with a problem like this and found out a solution? Below is my first code where the query becomes commented out.
client = bigquery.Client('dataworks-356fa')
query = ("#standardsql SELECT count(distinct serial) FROM `dataworks-356fa.FirebaseArchive.test2` Where (PeripheralType = 1 or PeripheralType = 2 or PeripheralType = 12) AND EXTRACT(WEEK FROM createdAt) = EXTRACT(WEEK FROM CURRENT_TIMESTAMP()) - 1 AND serial != 'null'")
dataset = client.dataset('FirebaseArchive')
table = dataset.table('test2')
tbl = dataset.table('Count_BB_Serial_weekly')
job = client.run_async_query(str(uuid.uuid4()), query)
job.destination = tbl
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
When I try to write the query like this python does not read anything past on the second line as the query.
query = ("#standardsql
SELECT count(distinct serial) FROM `dataworks-356fa.FirebaseArchive.test2` Where (PeripheralType = 1 or PeripheralType = 2 or PeripheralType = 12) AND EXTRACT(WEEK FROM createdAt) = EXTRACT(WEEK FROM CURRENT_TIMESTAMP()) - 1 AND serial != 'null'")
The query Im running selects values that have been produced within the last week. If there is a variation of this that would not be required to use standardsql I would be willing to switch my other queries as well but I have not been able to figure out how to do that. I would prefer for this to be the last resort though. Thank you for the help!

If you want to flag you'll be using Standard SQL inside the query itself, you can build it like:
query = """#standardSQL
SELECT count(distinct serial) FROM `dataworks-356fa.FirebaseArchive.test2` Where (PeripheralType = 1 or PeripheralType = 2 or PeripheralType = 12) AND EXTRACT(WEEK FROM createdAt) = EXTRACT(WEEK FROM CURRENT_TIMESTAMP()) - 1 AND serial != 'null'
"""
Another option you can use as well is setting the property use_legacy_sql of the job created to False, something like:
job = client.run_async_query(job_name, query)
job.use_legacy_sql = False # -->this also makes the API use Standard SQL
job.begin()

Related

Python MySQL search entire database for value

I have a GUI interacting with my database, and MySQL database has around 50 tables. I need to search each table for a value and return the field and key of the item in each table if it is found. I would like to search for partial matches. ex.( Search Value = "test", "Protest", "Test123" would be matches. Here is my attempt.
def searchdatabase(self, event):
print('Searching...')
self.connect_mysql() #Function to connect to database
d_tables = []
results_list = [] # I will store results here
s_string = "test" #Value I am searching
self.cursor.execute("USE db") # select the database
self.cursor.execute("SHOW TABLES")
for (table_name,) in self.cursor:
d_tables.append(table_name)
#Loop through tables list, get column name, and check if value is in the column
for table in d_tables:
#Get the columns
self.cursor.execute(f"SELECT * FROM `{table}` WHERE 1=0")
field_names = [i[0] for i in self.cursor.description]
#Find Value
for f_name in field_names:
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
print(table)
I get an error on print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
Exception: (1054, "Unknown column 'test' in 'where clause'")
I use a similar insert query that works fine so I am not understanding what the issue is.
ex. insert_query = (f"INSERT INTO `{source_tbl}` ({query_columns}) VALUES ({query_placeholders})")
May be because of single quote you have missed while checking for some columns.
TRY :
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE '{f_name}' LIKE '{s_string}'"))
Have a look -> here
Don’t insert user-provided data into SQL queries like this. It is begging for SQL injection attacks. Your database library will have a way of sending parameters to queries. Use that.
The whole design is fishy. Normally, there should be no need to look for a string across several columns of 50 different tables. Admittedly, sometimes you end up in these situations because of reasons outside your control.

How to use hex(x ' ' ) inside of a sql statement with python execute?

I am working in Python/Django with a MySQL Db. This sql query works fine in my Workbench
SELECT * FROM frontend_appliance_sensor_reading
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x'28E86479972003D2') AND
appliance_id = 185)
I am returning the latest record for a sensor. The hex string is the sensor ID that I need to pass in as a variable in python.
Here is my python function that would return this object
def get_last_sensor_reading(appliance_id, sensor_id):
dw_conn = connection
dw_cur = dw_conn.cursor()
appliance_id_lookup = appliance_id
dw_cur.execute('''
SELECT * FROM frontend_appliance_sensor_reading as sr
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x{sensor_id_lookup}) AND
appliance_id = {appliance_id_lookup})
'''.format(appliance_id_lookup=appliance_id_lookup, sensor_id_lookup = str(sensor_id)))
values = dw_cur.fetchall()
dw_cur.close()
dw_conn.close()
print values
However it seems to concat the x with the variable like this:
(1054, "Unknown column 'x9720032F0100DE86' in 'where clause'")
I have tried various string combinations to get it to execute correctly with no luck. What am I missing? Does the x actually get interpreted as a str? Or should I be converting it to something else prior?
Also, I cannot not use Django's ORM for this as the sensor id is stored in a BinaryField as BLOB data. You cannot filter by BLOB data in Django. This is the reason I am using a sql command instead of just doing SensorReading.objects.filter(sensor_id = sensor).latest('id)

Django SQL Windows Function with Group By

I have this MariaDB Query
SELECT
DAYOFWEEK(date) AS `week_day`,
SUM(revenue)/SUM(SUM(revenue)) OVER () AS `rev_share` FROM orders
GROUP BY DAYOFWEEK(completed)
It will result in a table which show the revenue share.
My goal is to archive the same with the help of Django's ORM layer.
I tried the following by using RawSQL:
share = Orders.objects.values(week_day=ExtractIsoWeekDay('date')) \
.annotate(revenue_share=RawSQL('SUM(revenue)/SUM(SUM(revenue)) over ()'))
This results in a single value without a group by. The query which is executed:
SELECT
WEEKDAY(`orders`.`date`) + 1 AS `week_day`,
(SUM(revenue)/SUM(SUM(revenue)) over ()) AS `revenue_share`
FROM `orders`
And also this by using the Window function:
share = Orders.objects.values(week_day=ExtractIsoWeekDay('date')) \
.annotate(revenue_share=Sum('revenue')/Window(Sum('revenue')))
Which results into the following query
SELECT
WEEKDAY(`order`.`date`) + 1 AS `week_day`,
(SUM(`order`.`revenue`) / SUM(`order`.`revenue`) OVER ()) AS `rev_share`
FROM `order` GROUP BY WEEKDAY(`order`.`date`) + 1 ORDER BY NULL
But the data is completely wrong. It looks like the Window is not using the whole table.
Thanks for your help in advance.

Python and Pandas to Query API's and update DB

I've been querying a few API's with Python to individually create CSV's for a table.
I would like to try and instead of recreating the table each time, update the existing table with any new API data.
At the moment the way the Query is working, I have a table that looks like this,
From this I am taking the suburbs of each state and copying them into a csv for each different state.
Then using this script I am cleaning them into a list (the api needs the %20 for any spaces),
"%20"
#suburbs = ["want this", "want this (meh)", "this as well (nope)"]
suburb_cleaned = []
#dont_want = frozenset( ["(meh)", "(nope)"] )
for urb in suburbs:
cleaned_name = []
name_parts = urb.split()
for part in name_parts:
if part in dont_want:
continue
cleaned_name.append(part)
suburb_cleaned.append('%20'.join(cleaned_name))
Then taking the suburbs for each state and putting them into this API to return a csv,
timestr = time.strftime("%Y%m%d-%H%M%S")
Name = "price_data_NT"+timestr+".csv"
url_price = "http://mwap.com/api"
string = 'gxg&state='
api_results = {}
n = 0
y = 2
for urbs in suburb_cleaned:
url = url_price + urbs + string + "NT"
print(url)
print(urbs)
request = requests.get(url)
api_results[urbs] = pd.DataFrame(request.json())
n = n+1
if n == y:
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
y = y+2
continue
print("made it through"+urbs)
# print(request.json())
# print(api_results)
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
Then adding the states manually in excel, and combining and cleaning the suburb names.
# use pd.concat
df = pd.concat([act, vic,nsw,SA,QLD,WA]).reset_index().set_index(['key']).rename_axis('suburb').reset_index().set_index(['state'])
# apply lambda to clean the %20
f = lambda s: s.replace('%20', ' ')
df['suburb'] = df['suburb'].apply(f)
and then finally inserting it into a db
engine = create_engine('mysql://username:password#localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Price_historic', conn, if_exists='replace',index=False)
Leading this this sort of output
Now, this is a hek of a process. I would love to simplify it and make the database only update the values that are needed from the API, and not have this much complexity in getting the data.
Would love some helpful tips on achieving this goal - I'm thinking I could do an update on the mysql database instead of insert or something? and with the querying of the API, I feel like I'm overcomplicating it.
Thanks!
I don't see any reason why you would be creating CSV files in this process. It sounds like you can just query the data and then load it into a MySql table directly. You say that you are adding the states manually in excel? Is that data not available through your prior api calls? If not, could you find that information and save it to a CSV, so you can automate that step by loading it into a table and having python look up the values for you?
Generally, you wouldn't want to overwrite the mysql table every time. When you have a table, you can identify the column or columns that uniquely identify a specific record, then create a UNIQUE INDEX for them. For example if your street and price values designate a unique entry, then in mysql you could run:
ALTER TABLE `Price_historic` ADD UNIQUE INDEX(street, price);
After this, your table will not allow duplicate records based on those values. Then, instead of creating a new table every time, you can insert your data into the existing table, with instructions to either update or ignore when you encounter a duplicate. For example:
final_str = "INSERT INTO Price_historic (state, suburb, property_price_id, type, street, price, date) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s) " \
"ON DUPLICATE KEY UPDATE " \
"state = VALUES(state), date = VALUES(date)"
con = pdb.connect(db_host, db_user, db_pass, db_name)
with con:
try:
cur = con.cursor()
cur.executemany(final_str, insert_list)
If the setup you are trying is something for longer term , I would suggest running 2 diff processes in parallel-
Process 1:
Query API 1, obtain required data and insert into DB table, with binary / bit flag that would specify only API 1 has been called.
Process 2:
Run query on DB to obtain all records needed for API call 2 based on binary/bit flag that we set in process 1--> For corresponding data run call 2 and update data back to DB table based on primary Key
Database : I would suggest adding Primary Key as well as [Bit Flag][1] that gives status of different API call statuses. Bit Flag also helps you
- in case you want to double confirm if specific API call has been made for specific record not.
- Expand your project to additional API calls and can still track status of each API call at record level
[1]: Bit Flags: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions014.htm#SQLRF00612

What kind of table is created when executing a query that returns nothing?

I'm using petl and trying to create a simple table with a value from a query. I have written the following:
#staticmethod
def get_base_price(date):
# open connection to db
# run SQL query to check if price exists for that date
# set base price to that value if it exists
# set it to 100 if it doesn't
sql = '''SELECT [TimeSeriesValue]
FROM [RAP].[dbo].[TimeSeriesPosition]
WHERE TimeSeriesTypeID = 12
AND SecurityMasterID = 45889
AND FundID = 7
AND EffectiveDate = %s''' % date
with self.job.rap.connect() as conn:
data = etl.fromdb(conn, sql).cache()
return data
I'm connecting to the database, and if there's a value for that date, then i'll be able to create a table that would look like this:
+-----------------+
| TimeSeriesValue |
+=================+
| 100 |
+-----------------+
However, if the query returns nothing, what would the table look like?
I want to set the TimeSeriesValue to 100 if the query returns nothing. Not sure how to do that.
You should be passing in parameters when you execute the statement rather than munging the string . . but that is not central to your question.
Possibly the simplest solution is to do all the work in SQL. If you are expecting at most one row from the query, then:
SELECT COALESCE(TimeSeriesValue, 100) as TimeSeriesValue
FROM [RAP].[dbo].[TimeSeriesPosition]
WHERE TimeSeriesTypeID = 12 AND
SecurityMasterID = 45889 AND
FundID = 7 AND
EffectiveDate = %s
This will always return one row and if nothing is found, it will put in the 100 value.
A query that returns nothing would just display the column name(s) with nothing under them. I'd try something like this:
IF(TimeSeriesValue IS NOT NULL)
<your query>
ELSE
SET TimeSeriesValue = 100

Categories

Resources