I have connected SAS to python and trying to extract data. The date is attached to table name and I am not allowed to change the table format (total.gross_data_20211201). The table should be dynamic to the salary date. I tried below method but its not working. I expect that 'user date' should be applied to make the table name dynamic. Please suggest.
import saspy
salary_date = (pd.to_datetime('01-Dec-2021').strftime('%d-%b-%Y')).upper()
user_date = dt.datetime.strptime(salary_date, '%d-%b-%Y').strftime('%Y%m%d')
sas.symput('user_date', user_date)
xyz = sas.submit("""proc sql;
create table data_extract as
select ID
FROM total.gross_data_20211201
;quit; """)
You can change this in either SAS or Python, but given you're submitting this in python, it seems like you should do it there.
In that case, you're just submitting a string to sas.submit - so modify it using any of the normal means you have of modifying things in python. A formatted string would be the most typical way of doing that.
But you could also use &user_date in the SAS code, so
FROM total.gross_data_&user_date.
Related
I am trying to implement a transformation in python sql on databricks, I have tried several ways but without success, I request a validation please:
%sql
SELECT aa.AccountID__c as AccountID__c_2,
aa.LastModifiedDate,
to_timestamp(aa.LastModifiedDate, "yyyy-MM-dd HH:mm:ss.SSS") as test
FROM EVENTS aa
The output is as follows:
It can be seen that the validation is not correct, but even so it is executed on the engine and returns null.
I have also tried performing a substring on the LastModifiedDate field from 1 to 19, but without success ...
The date format you provided does not agree with the date format of that column, so you got null. Having said that, for standard date formats like the format you have, there is no need to provide any date format at all. Simply using to_timestamp will give the correct results.
%sql
SELECT aa.AccountID__c as AccountID__c_2,
aa.LastModifiedDate,
to_timestamp(aa.LastModifiedDate) as test
FROM EVENTS aa
So I have several tables with each product for each year and tables go like:
2020product5, 2019product5, 2018product6 and so on. I have added two custom parameters in google data studio as well named year and product_id but could not use them in table names themselves. I have used parameterized queries before but in conditions like where product_id = #product_id but this setup only works if all of the data is in same table which is not the current case with me. In python I use string formatters like f"{year}product{product_id}" but that obviously does not work in this case...
Using Bigquery Default CONCAT & FORMAT functions does not help as both throw following validation error: Table-valued function not found: CONCAT at [1:15]
So how do I get around with querying bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?
After much research I (kinda) sorted it out. Turns out it is a database level feature to query schema-level entities e.g. table names dynamically. BigQuery does not support formatting within table name like tables as per in question (e.g. 2020product5, 2019product5, 2018product6) cannot be queried directly. However, it does have a TABLE_SUFFIX function which allow you to access tables dynamically given that changes in table names are located at the end of the table. (This feature also allowed for dateweise partitioning and many tools which use BQ as data sink, utilize this. So If you are using BQ as data sink, there is good chance that your original data source is already doing so). Thus, table names like (product52020, product52019, product62018) as well can be accessed dynamically and of course from data studio too using following:
SELECT * FROM `project_salsa_101.dashboards.product*` WHERE _table_Suffix = CONCAT(#product_id,#year)
P.S.: Used python to create a dirty script which looped through products and tables and copied and created new ones which goes as follows: (Adding script with formatted string so it might be useful for anyone with such case wtih nominal effort)
import itertools
credentials = service_account.Credentials.from_service_account_file(
'project_salsa_101-bq-admin.json')
project_id = 'project_salsa_101'
schema = 'dashboards'
client = bigquery.Client(credentials= credentials,project=project_id)
for product_id, year in in itertools.product(product_ids, years):
df = client.query(f"""
SELECT * FROM `{project_id}.{schema}.{year}product{product_id}`
""").result().to_dataframe()
df.to_gbq(project_id = project_id,
destination_table = f'{schema}.product{product_id}{year}',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
if_exists = 'replace')
client.query(f"""
DROP TABLE `{project_id}.{schema}.{year}product{product_id}`""").result()
I am trying to assign a table id to the rendered HTML file generated from pandas using Styler.render()
After defining the styles I want for my table in my python file.
I styled my table below:
styler = df.style.set_table_styles(styles)
Then rendering my styles gives me a dynamic table id
styler.render()
How do I get the table id or assign a table id so that I can use it elsewhere in my code?
I read the following:
How to inject a table id into pandas.DataFrame.to_html() output?
But this is not working for me maybe because I am not exporting from from pandas but rather styling and rendering? Kindly advise
I'm using
styler.render(uuid="my_id")
which gives a table with id:
<table id="T_my_id" >
This at least gives a predictable id, although you probably didn't want the T_ prefix
Looking at the template used for html rendering, it looks like the table id is set as
id="T_{{uuid}}"
then any attributes you pass are appended after that. This will be why df.style.set_table_attributes('id="ABC"').render() gave you two ids, as mentioned in the comments to your question.
I think if you want to avoid the T_ prefix you'd have to provide your own template for rendering, although I didn't look into how to achieve that.
I'm using the Python SDK to create a TDE file. I want to add multiple tables to the TDE file. So I tried doing that but I got a duplicate name error:
dataextract.Exceptions.TableauException: TableauException (303):
duplicate table name
No problemo, I changed the name so that it counts up with each table I create:
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
but then I get a new and exciting error:
dataextract.Exceptions.TableauException: TableauException (303): table
name must be "Extract"
Perhaps Extracts created through the SDK cannot have more than one table per extract? If every table in an extract needs to be named the same thing, but they can't have duplicate names... I'm confused. Can someone help clarify this for me?
Here's all the relevant code I think, but I don't know if it'll be much help:
...
for i, df in enumerate(dataframes):
table_return_list = _form_table_definition(df,data_types,read_out)
table_definition = table_return_list[0]
header_type_map = table_return_list[1]
#use the table definition to create the table and row
tde_table = tde_file.addTable('Extract'+str(i), table_definition)
tde_row = tde.Row(table_definition)
...
Seems that it's impossible at the present moment to add more than one table to a data extract through the Python SDK. I don't know otherwise.
http://onlinehelp.tableau.com/current/api/sdk/en-us/SDK/Python/html/classtableausdk_1_1_extract_1_1_extract.html#a70b49a6eca6f1724bd89a928c73ecc8c
From their SDK documentation:
def tableausdk.Extract.Extract.addTable ( self, name,
tableDefinition ) Adds a table to the extract.
Parameters
self The object pointer.
name The name of the table to add.
Currently, this method can only add a table named "Extract".
I want to select all data or select with conditional in table random but I can't find any guide in MongoDB in Python to do this.
And I can't show all data was select.
Here my code:
def mongoSelectStatement(result_queue):
client = MongoClient('mongodb://localhost:27017')
db = client.random
cursor = db.random.find({"gia_tri": "0.5748676522161966"})
# cursor = db.random.find()
inserted_documents_count = cursor.count()
for document in cursor:
result_queue.put(document)
There is a quite comprehensive documentation for mongodb. For python (Pymongo) here is the URL: https://api.mongodb.org/python/current/
Note: Consider the version you are running. Since the latest version has new features and functions.
To verify pymongo version you are using execute the following:
import pymongo
pymongo.version
Now. Regarding the select query you asked for. As far as I can tell the code you presented is fine. Here is the select structure in mongodb.
First off it is called find().
In pymongo; if you want to select specific rows( not really rows in mongodb they are called documents. I am saying rows to make it easy to understand. I am assuming you are comparing mongodb to SQL); alright so If you want to select specific document from the table (called collection in mongodb) use the following structure (I will use random as collection name; also assuming that the random table has the following attributes: age:10, type:ninja, class:black, level:1903):
db.random.find({ "age":"10" }) This will return all documents that have age 10 in them.
you could add more conditions simply by separating with commas
db.random.find({ "age":"10", "type":"ninja" }) This will select all data with age 10 and type ninja.
if you want to get all data just leave empty as:
db.random.find({})
Now the previous examples display everything (age, type, class, level and _id). If you want to display specific attributes say only the age you will have to add another argument to find called projection eg: (1 is show, 0 is do not show):
{'age':1}
Note here that this returns age as well as _id. _id is always returned by default. You have to explicitly tell it not to returning it as:
db.random.find({ "age":"10", "name":"ninja" }, {"age":1, "_id":0} )
I hope that could get you started.
Take a look at the documentation is very thorough.