How do I copy a file from DBFS to REPOS on Databricks?

How do I copy a file from DBFS to REPOS on Databricks? - python

I'm currently working on moving a python .whl file that I have generated in dbfs to my repo located in /Workspace/Repos/My_Repo/My_DBFS_File, to commit the file to Azure DevOps.
As Databricks Repos is a Read Only location it does not permit me to programmatically copy the file to the Repo location.
However, the UI provides various options to create or import files from various locations except dbfs.
Is there a workaround to actually move dbfs files to repos and then commit them to Azure DevOps?

The documentation says:
Databricks Runtime 11.2 or above.
In a Databricks Repo, you can programmatically create directories and create and append to files. This is useful for creating or modifying an environment specification file, writing output from notebooks, or writing output from execution of libraries, such as Tensorboard.
Using a Databricks cluster with Runtime 11.2 solved my issue

Related

How to download entire directory from azure file share

Not able to download entire directory from azure file share in python
I have used all basic stuffs available in google
share = ShareClient.from_connection_string(connection_string, "filshare")
my_file = share.get_file_client("dir1/sub_idr1")
# print(dir(my_file))
stream_1 = my_file.download_file()

I tried in my environment and got below results:
Initially I tried with python,
Unfortunately, ShareServiceClient Class which Interacts with A client to interact with the File Share Service at the account level. does not yet support Download operation in the Azure Python SDK.
ShareClient Class which only interacts with specific file Share in the account does not support Download Directory or file share option Python SDK..
But there's one class ShareFileClient Class which supports downloading of individual files inside a directory but not entire directory, you can use this class to download the files from directory with Python SDK.
Also you check Azure Portal > Select your Storage account > File Share and Directory > There's no option in Portal too to download the directory, but there's an option to download specific file.
As workaround, if you need to download directory from file-share, you can use Az copy tool command to download the directory in your local machine.
I tried to download the Directory with Az-copy command and was able to download to it successfully!
Command:
azcopy copy 'https://mystorageaccount.file.core.windows.net/myfileshare/myFileShareDirectory?sv=2018-03-28&ss=bjqt&srs=sco&sp=rjklhjup&se=2019-05-10T04:37:48Z&st=2019-05-09T20:37:48Z&spr=https&sig=/SOVEFfsKDqRry4bk3xxxxxxxx' 'C:\myDirectory' --recursive --preserve-smb-permissions=true --preserve-smb-info=true
Console:
Local environment:

Reading BitBucket files via Databricks notebook using python

I am new to Databricks and want to clone Bitbucket repo to Databricks and access the respective files.
How can I achieve this?

Unable to save file from Databricks to Desktop

I installed the CLI rest api and now I want to save a test file to my local desktop. This is the command I have but it throws me a syntax error:
dbfs cp dbfs:/myname/test.pptx /Users/myname/Desktop.
SyntaxError: invalid syntax
dbfs cp dbfs:/myname/test.pptx /Users/myname/Desktop
Note, I am on a Mac, so hopefully the path is correct. What am I doing wrong?

To download files from DBFS to local machine, you can checkout similar SO thread, which addressing similar issue:
Not able to copy file from DBFS to local desktop in Databricks
OR
Alternately, you can use GUI tool called DBFS Explorer to download the files on you local machine.
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.
Reference: DBFS Explorer for Databricks
Hope this helps.

How to save and download locally csv in DBFS? [duplicate]

Normally I use below URL to download file from Databricks DBFS FileStore to my local computer.
*https://<MY_DATABRICKS_INSTANCE_NAME>/fileStore/?o=<NUMBER_FROM_ORIGINAL_URL>*
However, this time the file is not downloaded and the URL lead me to Databricks homepage instead.
Does anyone have any suggestion on how I can download file from DBFS to local area? or how should fix the URL to make it work?
Any suggestions would be greatly appreciated!
PJ

Method1: Using Databricks portal GUI, you can download full results (max 1 millions rows).
Method2: Using Databricks CLI
To download full results, first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.
dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics"
You can access DBFS objects using the DBFS CLI, DBFS API, Databricks file system utilities (dbutils.fs), Spark APIs, and local file APIs.
In a Spark cluster you access DBFS objects using Databricks file system utilities, Spark APIs, or local file APIs.
On a local computer you access DBFS objects using the Databricks CLI or DBFS API.
Reference: Azure Databricks – Access DBFS
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Reference: Installing and configuring Azure Databricks CLI
Method3: Using third-party tool named DBFS Explorer
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.

Simple Google Cloud deployment: Copy Python files from Google Cloud repository to app engine

I'm implementing continuous integration and continuous delivery for a large enterprise data warehouse project.
All the code reside in Google Cloud Repository and I'm able to set up Google Cloud Build trigger, so that every time code of specific file type (Python scripts) are pushed to the master branch, a Google Cloud build starts.
The Python scripts doesn't make up an app. They contain an ODBC connection string and script to extract data from a source and store it as a CSV-file. The Python scripts are to be executed on a Google Compute Engine VM Instance with AirFlow installed.
So the deployment of the Python scripts is as simple as can be: The .py files are only to be copied from the Google Cloud repository folder to a specific folder on the Google VM instance. There is not really a traditionally build to run, as all the Python files are separate for each other and not part of an application.
I thought this would be really easy, but now I have used several days trying to figure this out with no luck.
Google Cloud Platform provides several Cloud Builders, but as far as I can see none of them can do this simple task. Using GCLOUD also does not work. It can copy files but only from local pc to VM not from source repository to VM.
What I'm looking for is a YAML or JSON build config file to copy those Python files from source repository to Google Compute Engine VM Instance.
Hoping for some help here.

The files/folders in the Google Cloud repository aren't directly accessible (it's like a bare git repository), you need to first clone the repo then copy the desired files/folders from the cloned repo to their destinations.
It might be possible to use a standard Fetching dependencies build step to clone the repo, but I'm not 100% certain of it in your case, since you're not actually doing a build:
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/GoogleCloudPlatform/cloud-builders']
If not you may need one (or more) custom build steps. From Creating Custom Build Steps:
A custom build step is a container image that the Cloud Build worker
VM pulls and runs with your source volume-mounted to /workspace.
Your custom build step can execute any script or binary inside the
container; as such, it can do anything a container can do.
Custom build steps are useful for:
Downloading source code or packages from external locations
...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.