I installed the CLI rest api and now I want to save a test file to my local desktop. This is the command I have but it throws me a syntax error:
dbfs cp dbfs:/myname/test.pptx /Users/myname/Desktop.
SyntaxError: invalid syntax
dbfs cp dbfs:/myname/test.pptx /Users/myname/Desktop
Note, I am on a Mac, so hopefully the path is correct. What am I doing wrong?
To download files from DBFS to local machine, you can checkout similar SO thread, which addressing similar issue:
Not able to copy file from DBFS to local desktop in Databricks
OR
Alternately, you can use GUI tool called DBFS Explorer to download the files on you local machine.
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.
Reference: DBFS Explorer for Databricks
Hope this helps.
Related
Not able to download entire directory from azure file share in python
I have used all basic stuffs available in google
share = ShareClient.from_connection_string(connection_string, "filshare")
my_file = share.get_file_client("dir1/sub_idr1")
# print(dir(my_file))
stream_1 = my_file.download_file()
I tried in my environment and got below results:
Initially I tried with python,
Unfortunately, ShareServiceClient Class which Interacts with A client to interact with the File Share Service at the account level. does not yet support Download operation in the Azure Python SDK.
ShareClient Class which only interacts with specific file Share in the account does not support Download Directory or file share option Python SDK..
But there's one class ShareFileClient Class which supports downloading of individual files inside a directory but not entire directory, you can use this class to download the files from directory with Python SDK.
Also you check Azure Portal > Select your Storage account > File Share and Directory > There's no option in Portal too to download the directory, but there's an option to download specific file.
As workaround, if you need to download directory from file-share, you can use Az copy tool command to download the directory in your local machine.
I tried to download the Directory with Az-copy command and was able to download to it successfully!
Command:
azcopy copy 'https://mystorageaccount.file.core.windows.net/myfileshare/myFileShareDirectory?sv=2018-03-28&ss=bjqt&srs=sco&sp=rjklhjup&se=2019-05-10T04:37:48Z&st=2019-05-09T20:37:48Z&spr=https&sig=/SOVEFfsKDqRry4bk3xxxxxxxx' 'C:\myDirectory' --recursive --preserve-smb-permissions=true --preserve-smb-info=true
Console:
Local environment:
I'm currently working on moving a python .whl file that I have generated in dbfs to my repo located in /Workspace/Repos/My_Repo/My_DBFS_File, to commit the file to Azure DevOps.
As Databricks Repos is a Read Only location it does not permit me to programmatically copy the file to the Repo location.
However, the UI provides various options to create or import files from various locations except dbfs.
Is there a workaround to actually move dbfs files to repos and then commit them to Azure DevOps?
The documentation says:
Databricks Runtime 11.2 or above.
In a Databricks Repo, you can programmatically create directories and create and append to files. This is useful for creating or modifying an environment specification file, writing output from notebooks, or writing output from execution of libraries, such as Tensorboard.
Using a Databricks cluster with Runtime 11.2 solved my issue
I have a file on my computer that I want to upload to Google Colab. I know there are numerous ways to do this, including a
from google.colab import files
uploaded = files.upload()
or just uploading manually from the file system. But I want to upload that specific file without needing to choose that file myself.
Something like:
from google.colab import files
file_path = 'path/to/the/file'
files.upload(file_path)
Is there any way to do this?
Providing a file path directly rather than clicking through the GUI for an upload requires access to your local machine's file system. However, when your run cell IPython magic commands such as %pwd in Google collab, you'll notice that the current working directory shown is that of the notebook environment - not that of your machine. The way to eschew the issue are as follows.
1. Local Runtime
Only local runtimes via Jupyter seems to enable such access to the local file system. This necessitates the installation of jupyterlab, a Jupyter server extension for using a WebSocket, and launching a local server. See this tutorial.
2. Google Drive
In case Google Drive is convenient, you can upload files into Google Drive from your local machine without clicking through a GUI.
3. Embracing the GUI
If these options seem overkill, you, unfortunately, have to stick with
from google.colab import files
uploaded = files.upload()
as you alluded to.
Currently, I know of two ways to download a file from a bucket to your computer
1) Manually go to the bucket and clicking download
2) Use gsutil
Is there a way to do this in a program in Google Cloud Functions? You can't execute gsutil in a Python script. I also tried this:
with open("C:\", "wb") as file_obj:
blob.download_to_file(file_obj)
but I believe that looks for a directory on Google Cloud. Please help!
The code you are using is for downloading the files from your local machine, this is covered in this document, however, as you mention this would look for the local file in the machine executing the Cloud Function.
I would suggest you to create a cron to download the files to your local instance through gsutil, doing this through a Cloud Function is not going to be possible.
Hope you find this useful.
You can't directly achieve what you want. Cloud Function must be able to reach your local computer for copying the file on it.
The common way for sending file to a computer is to use FTP protocol. So install a FTP server on your computer and to set up your function for reading into your bucket and then send the file to your FTP server (you have to get your public IP, be sure that the firewall rules/routers are configured for this,...). It's not the easiest way.
A gsutil with a rsync command work perfectly. Use a planned task on Windows if you want to cron this.
Normally I use below URL to download file from Databricks DBFS FileStore to my local computer.
*https://<MY_DATABRICKS_INSTANCE_NAME>/fileStore/?o=<NUMBER_FROM_ORIGINAL_URL>*
However, this time the file is not downloaded and the URL lead me to Databricks homepage instead.
Does anyone have any suggestion on how I can download file from DBFS to local area? or how should fix the URL to make it work?
Any suggestions would be greatly appreciated!
PJ
Method1: Using Databricks portal GUI, you can download full results (max 1 millions rows).
Method2: Using Databricks CLI
To download full results, first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.
dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics"
You can access DBFS objects using the DBFS CLI, DBFS API, Databricks file system utilities (dbutils.fs), Spark APIs, and local file APIs.
In a Spark cluster you access DBFS objects using Databricks file system utilities, Spark APIs, or local file APIs.
On a local computer you access DBFS objects using the Databricks CLI or DBFS API.
Reference: Azure Databricks – Access DBFS
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Reference: Installing and configuring Azure Databricks CLI
Method3: Using third-party tool named DBFS Explorer
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.