Tableau embedded data source refresh using python

Tableau embedded data source refresh using python - python

Is there a way to refresh Tableau embedded datasource using python. I am currently using Tableau server client library to refresh published datasources which is actually working fine. Can someone help me to figure out a way?

The way you can reach them is kinda annoying from my perpective.
You need to use populate_connections() function to load embedded datasources. It would be easier if you know the name of the workbook.
import tableauserverclient as TSC
#sign in using personal access token
server = TSC.Server(server_address='server_name', use_server_version=True)
server.auth.sign_in_with_personal_access_token(auth_req=TSC.PersonalAccessTokenAuth(token_name='tokenName', personal_access_token='tokenValue', site_id='site_name'))
#use RequestOptions() with a filter to pull an specific workbook
def get_workbook(name):
req_opt = TSC.RequestOptions()
req_opt.filter.add(TSC.Filter(req_opt.Field.Name, req_opt.Operator.Equals, name))
return server.workbooks.get(req_opt)[0][0] #workbooks.get () function is intended to return a list items that you can iterate, but here we are assuming it will be find only one result
workbook = get_workbook(name='workbook_name') #gets the workbook
server.workbooks.populate_connections(workbook) #this function will load all the embedded datasources in the workbook
for datasource in workbook.connections: #iterate in datasource list
#Note: each element of this list is not an TSC.DatabaseItem, so, you will need to load a valid one using the "datasource_id" attribute from the element.
#If you try server.datasources.refresh(datasource) it will fail
ds = server.datasources.get_by_id(datasource.datasource_id) #loads a valid TSC.DatabaseItem
server.datasources.refresh(ds) #finally, you will be able to refresh it
...
The best practice is do not embeddeding datasources but publish them independently.
Update:
There is an easy way to achieve this. There are two types of extract tasks, Workbook and Data source. So, for embedded data sources, you need to perform a workbook refresh.
workbook = get_workbook(name='workbook_name')
server.workbooks.refresh(workbook.id)

You can use "tableauserverclient" Python package. You can pip install it from PyPy.
After installing it, you can consult the docs.
I will attach an example I used some time ago:
import tableauserverclient as TSC
tableau_auth = TSC.TableauAuth('user', 'pass', 'homepage')
server = TSC.Server('server')
with server.auth.sign_in(tableau_auth):
all_datasources, pagination_item = server.datasources.get()
print("\nThere are {} datasources on
site:".format(pagination_item.total_available))
print([datasource.name for datasource in all_datasources])

Related

Query S3 from Python

I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package.
import boto3
import awswrangler as wr
import time
import pandas as pd
boto3.setup_default_session(region_name="us-east-1")
sql="show create table 'table-name'"
query_exec_id = wr.athena.start_query_execution(sql=sql, database='database-name')
time.sleep(20)
res=wr.athena.get_query_execution(query_execution_id=query_exec_id)
The code above creates a dict object that stores query results in an s3 link.
The link can be accessed by
res['ResultConfiguration']['OutputLocation']. It's a text link: s3://.....txt
Can someone help me figure how to access the output in the link. I tried using readlines() but it seemes to error out.
Here is what I did
import urllib3
target_url = res['ResultConfiguration']['OutputLocation']
f = urllib3.urlopen(target_url)
for l in f.readlines():
print (l)
Or if someone can suggest an easier way to get table DDL in python.

Keep in mind that the returned link will time out after a short while... and make sure your credentials allow you to get the data from the URL specified. If you drop the error message here we can help you better. –
Oh... "It's a text link: s3://.....txt" is not a standard URL. You cant read that with urllib3. You can use awswrangler to read the bucket. –
I think the form is
wr.s3.read_fwf(...)

Hide the password to get data from a protected excel file without user intervention

I have a script that runs during the overnight in order to load some tables from my database. This scripts runs automatically (don't need any user interaction).
One of the modules is to get some data from a excel, that is a protected file and it requires a password.
To get the data from my file I am using the following code:
import xlwings as xw
PATH = 'filename.xlsx'
app = xw.App(visible=False)
wb = xw.Book(PATH, password='ASD')
sheet = wb.sheets['sheet']
My question is: there exists any other way to hide the password from the script? Maybe I am trying to get following code:
wb = xw.Book(PATH, password='******')
Any suggestion?

You would typically use an environment variable. For how to set one on Windows, see e.g. here.
import os
wb = xw.Book(PATH, password=os.environ['EXCEL_FILE_PASSWORD'])
Note that this allows you to keep the password out of the source code, so it doesn't end up on your Git repo for example. But anybody who has access to the computer with the environment variable has also access to the content of it which is your password.

Reading Data From Cloud Storage Via Cloud Functions

I am trying to do a quick proof of concept for building a data processing pipeline in Python. To do this, I want to build a Google Function which will be triggered when certain .csv files will be dropped into Cloud Storage.
I followed along this Google Functions Python tutorial and while the sample code does trigger the Function to create some simple logs when a file is dropped, I am really stuck on what call I have to make to actually read the contents of the data. I tried to search for an SDK/API guidance document but I have not been able to find it.
In case this is relevant, once I process the .csv, I want to be able to add some data that I extract from it into GCP's Pub/Sub.

The function does not actually receive the contents of the file, just some metadata about it.
You'll want to use the google-cloud-storage client. See the "Downloading Objects" guide for more details.
Putting that together with the tutorial you're using, you get a function like:
from google.cloud import storage
storage_client = storage.Client()
def hello_gcs_generic(data, context):
bucket = storage_client.get_bucket(data['bucket'])
blob = bucket.blob(data['name'])
contents = blob.download_as_string()
# Process the file contents, etc...

This is an alternative solution using pandas:
Cloud Function Code:
import pandas as pd
def GCSDataRead(event, context):
bucketName = event['bucket']
blobName = event['name']
fileName = "gs://" + bucketName + "/" + blobName
dataFrame = pd.read_csv(fileName, sep=",")
print(dataFrame)

How can I combine two presentations (pptx) into one master presentation?

I'm part of a project team that created PPTX presentations to present to clients. After creating all of the files, we need to add additional slides to each presentation. All of the new slides will be the same across the each presentation.
What is the best way to accomplish this programmatically?
I don't want to use VBA because (as far as I understand) I would have to open each presentation to run the script.
I've tried using the python-pptx library. But the documentation states:
"Copying a slide from one presentation to another turns out to be pretty hard to get right in the general case, so that probably won’t come until more of the backlog is burned down."
I was hoping something like the following would work -
from pptx import Presentation
main = Presentation('Universal.pptx')
abc = Presentation('Test1.pptx')
main_slides = main.slides.get(1)
abc_slides = abc.slides.get(1)
full = main.slides.add_slide(abc_slides[1])
full.save('Full.pptx')
Has anyone had success do anything like that?

I was able to achieve this by using python and win32com.client. However, this doesn't work quietly. What I mean is that it launches Microsoft PowerPoint and opens input files one by one, then copies all slides from an input file and pastes them to an output file in a loop.
import win32com.client
from os import walk
def mergePresentations(inputFileNames, outputFileName):
Application = win32com.client.Dispatch("PowerPoint.Application")
outputPresentation = Application.Presentations.Add()
outputPresentation.SaveAs(outputFileName)
for file in inputFileNames:
currentPresentation = Application.Presentations.Open(file)
currentPresentation.Slides.Range(range(1, currentPresentation.Slides.Count+1)).copy()
Application.Presentations(outputFileName).Windows(1).Activate()
outputPresentation.Application.CommandBars.ExecuteMso("PasteSourceFormatting")
currentPresentation.Close()
outputPresentation.save()
outputPresentation.close()
Application.Quit()
# Example; let's say you have a folder of presentations that need to be merged
# to new file named "allSildesMerged.pptx" in the same folder
path,_,files = next(walk('C:\\Users\\..\\..\\myFolder'))
outputFileName = path + '\\' + 'allSildesMerged.pptx'
inputFiles = []
for file in files:
inputFiles.append(path + '\\' + file)
mergePresentations(inputFiles, outputFileName)

The GroupDocs.Merger REST API is also another option to merge multiple PowerPoint presentations into a single document. It is paid API but provides 150 monthly free API calls.
Currently, it supports working with cloud providers: Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage, FTP Storage along with GroupDocs internal Cloud Storage. However, in near future, it has a plan to support merge files from the request body(stream).
P.S: I'm developer evangelist at GroupDocs.
# For complete examples and data files, please go to https://github.com/groupdocs-merger-cloud/groupdocs-merger-cloud-python-samples
# Get Client ID and Client Secret from https://dashboard.groupdocs.cloud
client_id = "XXXX-XXXX-XXXX-XXXX"
client_secret = "XXXXXXXXXXXXXXXX"
documentApi = groupdocs_merger_cloud.DocumentApi.from_keys(client_id, client_secret)
item1 = groupdocs_merger_cloud.JoinItem()
item1.file_info = groupdocs_merger_cloud.FileInfo("four-slides.pptx")
item2 = groupdocs_merger_cloud.JoinItem()
item2.file_info = groupdocs_merger_cloud.FileInfo("one-slide.docx")
options = groupdocs_merger_cloud.JoinOptions()
options.join_items = [item1, item2]
options.output_path = "Output/joined.pptx"
result = documentApi.join(groupdocs_merger_cloud.JoinRequest(options))

A free tool called "powerpoint join" can help you.

How do I read a google spreadsheet in python using key, without signing in?

I have used the gspread library to read a CSV file from Google docs but it first requires me to log in.
gc = gspread.login('email','password')
sheetData = gc.open("NSEport").sheet1
I want to directly open a spreadsheet using the key generated when we shared the spreadsheet, without logging in to a Google account.

full_doc = gc.open("NSEport")
list_of_worksheets = full_doc.worksheets()
one_I_want = full_doc.get_worksheet(0) # gets first worksheet

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tableau embedded data source refresh using python - python

Is there a way to refresh Tableau embedded datasource using python. I am currently using Tableau server client library to refresh published datasources which is actually working fine. Can someone help me to figure out a way?

Related

Query S3 from Python

Hide the password to get data from a protected excel file without user intervention

Reading Data From Cloud Storage Via Cloud Functions

How can I combine two presentations (pptx) into one master presentation?

How do I read a google spreadsheet in python using key, without signing in?

Categories

Resources