Upload file to Google bucket directly from SFTP server using Python - python

I am trying to upload file from SFTP server to GCS bucket using cloud function. But this code not working. I am able to sftp. But when I try to upload file in GCS bucket, it doesn't work, and the requirement is to use cloud function with Python.
Any help will be appreciated. Here is the sample code I am trying. This code is working except sftp.get("test_report.csv", bucket_destination). Please help.
destination_bucket ="gs://test-bucket/reports"
with pysftp.Connection(host, username, password=sftp_password) as sftp:
print ("Connection successfully established ... ")
# Switch to a remote directory
sftp.cwd('/test/outgoing/')
bucket_destination = "destination_bucket"
sftp.cwd('/test/outgoing/')
if sftp.exists("test_report.csv"):
sftp.get("test_report.csv", bucket_destination)
else:
print("doesnt exist")

The pysftp cannot work with GCP directly.
Imo, you cannot actually upload a file directly from SFTP to GCP anyhow, at least not from a code running on yet another machine. But you can transfer the file without storing it on the intermediate machine, using pysftp Connection.open (or better using Paramiko SFTPClient.open) and GCS API Blob.upload_from_file. That's what many actually mean by "directly".
client = storage.Client(credentials=credentials, project='myproject')
bucket = client.get_bucket('mybucket')
blob = bucket.blob('test_report.csv')
with sftp.open('test_report.csv', bufsize=32768) as f:
blob.upload_from_file(f)
For the rest of the GCP code, see How to upload a file to Google Cloud Storage on Python 3?
For the purpose of bufsize, see Reading file opened with Python Paramiko SFTPClient.open method is slow.
Consider not using pysftp, it's dead project. Use Paramiko directly (the code will be mostly the same). See pysftp vs. Paramiko.

i got the solution based on your reply thanks here is the code
bucket = client.get_bucket(destination_bucket)
`blob = bucket.blob(destination_folder +filename)
with sftp.open(sftp_filename, bufsize=32768) as f:
blob.upload_from_file(f)

This is exactly what we built SFTP Gateway for. We have a lot of customers that still want to use SFTP, but we needed to write files directly to Google Cloud Storage. Files don't get saved temporarily on another machine. The data is streamed directly from the SFTP Client (python, filezilla, or any other client), straight to GCS.
https://console.cloud.google.com/marketplace/product/thorn-technologies-public/sftp-gateway?project=thorn-technologies-public
Full disclosure, this is our product and we use it for all our consulting clients. We are happy to help you get it setup if you want to try it.

Related

Authenticating and transferring CSV files from GCS to a remote network drive in Python

I'm wanting to use a containerised Python app to connect and authenticate to an AD managed network drive on a physical file server to then transfer some csv files to it from a Google Cloud bucket and I'm wondering what the best options are to do this?
So far I have established that I can see the server using:
try:
smbclient.register_session("xx.xx.xx.xx", username="user", password="pass")
except Exception as exec:
print(exec)
But this leads me to the problem of the best way to authenticate against it. I'm unfortunately not a very experienced programmer, I was wondering if there was some sort of token exchange that I could implement?
I used:
mount_string = f'mount -t cifs -o username={secret_dict["user"]}, {secret_dict["password"]} //<serverip>/<share> {mount}'
try:
os.system(mount_string)
except Exception as exec:
print(exec)

gcs client library stopped working with dev_appserver

Google cloud storage client library is returning 500 error when I attempt to upload via development server.
ServerError: Expect status [200] from Google Storage. But got status 500.
I haven't changed anything with the project and the code still works correctly in production.
I've attempted gcloud components update to get the latest dev_server and I've updated to the latest google cloud storage client library.
I've run gcloud init again to make sure credentials are loaded and I've made sure I'm using the correct bucket.
The project is running on windows 10.
Python version 2.7
Any idea why this is happening?
Thanks
Turns out this has been a problem for a while.
It has to do with how blobstore filenames are generated.
https://issuetracker.google.com/issues/35900575
The fix is to monkeypatch this file:
google-cloud-sdk\platform\google_appengine\google\appengine\api\blobstore\file_blob_storage.py
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
# Remove bad characters.
import re
blob_fname = re.sub(r"[^\w\./\\]", "_", str(blob_key))
# Make sure it's a relative directory.
if blob_fname and blob_fname[0] in "/\\":
blob_fname = blob_fname[1:]
return os.path.join(self._DirectoryForBlob(blob_key), blob_fname)

Azure web app service with python script

I am completely new to Azure. I have a python Script that does few operations and give me a output.
I have an azure connection that i would like to connect to blob storage from python script which upload and read files.
1) I created a app service where i changed few settings like python3.4 to use
2) created a blob storage account with container.
3) I connected the blob storage to my app service using "data connection" from mobile option.
I now want to write a python that will upload a file and reads it from the blob to process. I came across here, here
I am wondering where i can write my python script to connect to blob and read. All I am seeing is just connecting to github, one drive, dropbox. Is there a way i write python script inside azure? I tried reading the documentation of Azure. All it says is connecting to github or use Azure SDK python which is not clear to me.
I saw Azure console where i learned to pip install packages. Where can i open a python env , write code and run it and test?
I'd recommend checking out this documentation: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
It shows how to Upload, download, and list blobs using Python (in your case you will list then download the blobs for processing):
Listing blobs:
# List the blobs in the container
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
Downloading:
# Download the blob(s).
# Add '_DOWNLOADED' as prefix to '.txt' so you can see both files in Documents.
full_path_to_file2 = os.path.join(local_path, string.replace(local_file_name ,'.txt', '_DOWNLOADED.txt'))
print("\nDownloading blob to " + full_path_to_file2)
block_blob_service.get_blob_to_path(container_name, local_file_name, full_path_to_file2)
I can suggest using Logic Apps with the blob connectors, more details can be found here: https://learn.microsoft.com/en-us/azure/connectors/connectors-create-api-azureblobstorage
You can use triggers (actions) to perform specific tasks with blobs.

how can i download my data from google-cloud-platform using python?

I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .

How to copy a file via the browser to Amazon S3 using Python (and boto)?

Creating a file (key) into Amazon S3 using Python (and boto) is not a problem.
With this code, I can connect to a bucket and create a key with a specific content:
bucket_instance = connection.get_bucket('bucketname')
key = bucket_instance.new_key('testfile.txt')
key.set_contents_from_string('Content for File')
I want to upload a file via the browser (file dialogue) into Amazon S3.
How can I realize this with boto?
Thanks in advance
You can't do this with boto, because what you're asking for is purely client-side - there's no direct involvement from the server except to generate the form to post.
What you need to use is Amazon's browser-based upload with POST support. There's a demo of it here.
do you mean this one? Upload files in Google App Engine

Categories

Resources