I used python scripting to do a series of complex queries from 3 different RDS's, and then exported the data into a CSV file. I am now trying to find a way to automate publishing a dashboard that uses this data into Tableau server on a weekly basis, such that when I run my python code, it will generate new data, and subsequently, the dashboard on Tableau server will be updated as well.
I already tried several options, including using the full UNC path to the csv file as the live connection, but Tableau server had trouble reading this path. Now I'm thinking about just creating a powershell script that can be run weekly that calls the python script to create the dataset and then refreshes tableau desktop, then finally re-publishes/overwrites the dashboard to tableau server.
Any ideas on how to proceed with this?
Getting data from excel to Tableau Server:
Setup the UNC path so it is accessible from your server. If you do this, you can then set up an extract refresh to read in the UNC path at the frequency desired.
Create an extract with the Tableau SDK.
Use the Tableau SDK to read in the CSV file and generate a file.
In our experience, #2 is not very fast. The Tableau SDK seems very slow when generating the extract, and then the extract has to be pushed to the server. I would recommend transferring the file to a location accessible to the server. Even a daily file copy to a shared drive on the server could be used if you're struggling with UNC paths. (Tableau does support UNC paths; you just have to be sure to use them rather than a mapped drive in your setup.)
It can be transferred as a file and then pushed (which may be fastest) or it can be pushed remotely.
As far as scheduling two steps (python and data extract refresh), I use a poor man's solution myself, where I update a csv file at one point (task scheduler or cron are some of the tools which could be used) and then setup the extract schedule at a slightly later point in time. While it does not have the linkage of running the python script and then causing the extract refresh (surely there is a tabcmd for this), it works just fine for my purposes to put 30 minutes in between as my processes are reliable and the app is not mission critical.
Related
I am updating new data every day using db connect.
However, due to the large amount of data and unstable db, I want to embed data and distribute it to customers.
The problem is that new data needs to be updated and embedded every day,
but there are many dxp files and cannot be opened in the manual every day.
Can you automate it with Python package or c#?
※ I succeeded in converting the sbdf file.
Just Only Python code (pip install spotfire)
Is there any way to embed with thepython spotfire api?
thank you.
I haven't done it with code before but I use Spotfire's automation services to do this all the time. If you have a spotfire server it should work for you.
Under the tools menu in Spotfire Analyst there is Automation Services job builder. I create a folder in the spotfire library for "published" folders and then setup a job to open the original project and save it in the "published" folder. The trick is when you save there is a check box to say "embed data in analysis". You can see in the attached picture, the job only has two steps but you could do a number of open and saves in a row. I then save the job in the library and schedule it on the server to run nightly. (Via the spotfire web admin tool, but it can be done on a command line too.)
I then inform users that the published job is updated nightly and opens in a few seconds. If you need the latest and greatest you can still open the original project and wait the 5-10 minutes it take to load.
It seems a right candidate for something like a Automation Services job . You could even create a custom task as per your use case if needed.
But please be aware of the limitations of the size of the library items in a database. (If the analysis files get too large)
2 GB for SQL Server ,
4 GB for Oracle
See the KB article
https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-48568
I have been trying to utilize Google Drive's REST API to recieve file changes as push notifications, but I have no idea on where to start. As I am new to programming all together, I am unable to find any solutions.
I am using Python to develop my code, and the script that I am writing is to monitor any changes in any given spreadsheet to run some operations on then modified spreadsheet data.
Considering I was able to set up the Sheets and Drive (readonly) APIs properly, I am confident that given some direction, I would be able to setup this notification reciever/listener as well.
Here is Google Drive API Feature Page.
Just follow the guide in Detect Changes
For Google Drive apps that need to keep track of changes to files, the
Changes collection provides an efficient way to detect changes to all
files, including those that have been shared with a user. The
collection works by providing the current state of each file, if and
only if the file has changed since a given point in time.
Retrieving changes requires a pageToken to indicate a point in time to
fetch changes from.
There's a github code demo that you can test and base your project on.
How to transfer files from one cloud storage to another. The files are CSV.
Where is the best place to start in relation to this problem?
For the time being the file just needs to transfer the files every week via manual execution. Eventually the files will be transferred on a scheduled basis.
You can start searching for this sites APIs. For example, Dropbox has a very well documented API for python.
If you want to automate your script every X days/hours/etc, you can make use of cron if you are running Unix based systems.
Hope that helped.
I'm in the process of porting over a MySQL database to a Heroku hosted, dedicated PostgreSQL instance. I understand how to get the initial data over to Heroku. However, there is a daily "feed" of data from an external company that will need to be imported each day. It is pushed up to an FTP server and it's a zip file containing several different CSV files. Normally, I could/would just scp it over to the Postgres box and then have a cron job that does a "COPY tablename FROM path/to/file.csv" to import the data. However, using Heroku has me a bit baffled as to the best way to do this. Note: I've seen and reviewed the heroku dev article on importing data. But, this is more of a dump file. I'm just dealing with a daily import from a CSV file.
Does anyone do something similar to this on Heroku? If so, can you give any advice on what's the best way.
Just a bit more info: My application is Python/Django 1.3.3 on the Cedar stack. And my files can be a bit large. Some of them can be over 50K records. So, to loop through them and use the Django ORM is probably going to be a bit slow (but, still might be the best/only solution).
Two options:
Boot up a non-heroku EC2 instance, fetch from FTP, unzip and initiate the copy from there. By making use of the COPY STDIN option (http://www.postgresql.org/docs/9.1/static/sql-copy.html) you can instruct it that the data is coming from the client connection, as opposed to from a file on the server's filesystem which you don't have access to.
How large is the file? It might fit in a dyno's ephemeral filesystem, so a process or one off job can download the file from the FTP server and do the whole process from within a dyno. Once the process exits, away goes the filesystem data.
Background:
While coding on GAE's local Development Web Server, user need to upload Mega-level datas and store (not straight forward store, but need many format check and translate) them into Datastore using deferred library.
Usually about 50,000 entities, CSV File size is about 5MB, and I tried to insert 200 entities each time using deferred library.
And I used python.
Problem
The development server is really slow that I need to wait one/more hours to finish this upload process.
I used --use_sqlite option to speed up the development web server.
Question:
Is there any other method or tuning that can make it faster?
appengine-mapreduce is definitely an option for loading CSV files. Use blobstore to upload CSV file and then setup BlobstoreLineInputReader mapper type to load data into datastore.
Some more links: Python Guide to mapreduce reader types is here. The one of interest is BlobstoreLineInputReader. The only input it requires is the key to the blobstore record containing uploaded CSV file.