List Google Cloud Storage buckets using Python

List Google Cloud Storage buckets using Python - python

I'm following this tutorial: https://developers.google.com/storage/docs/gspythonlibrary and getting couple of errors while trying to list my buckets.
I've downloaded gsutil and added it to my PYTHONPATH which looks like this:
/home/nicolasalvo/tools/gsutil/third_party/boto:/home/nicolasalvo/tools/gsutil
I've also executed:
pip install -U oauth2client
The code I'm trying to run is:
import StringIO
import os
import shutil
import tempfile
import time
from gslib.third_party.oauth2_plugin import oauth2_plugin
import boto
# URI scheme for Google Cloud Storage.
GOOGLE_STORAGE = 'gs'
# URI scheme for accessing local files.
LOCAL_FILE = 'file'
uri = boto.storage_uri('', GOOGLE_STORAGE)
for bucket in uri.get_all_buckets():
print bucket.name
The first error I've got is:
File "<stdin>", line 1, in <module>
File "/home/nicolasalvo/tools/gsutil/gslib/third_party/oauth2_plugin/oauth2_plugin.py", line 3, in <module>
import oauth2_client
File "/home/nicolasalvo/tools/gsutil/gslib/third_party/oauth2_plugin/oauth2_client.py", line 33, in <module>
import socks
Which I've fixed manually changing:
import socks
to be
import httplib2.socks
Now the error I'm facing is:
File "/home/nicolasalvo/tools/gsutil/third_party/boto/boto/connection.py", line 876, in _mexe
request.authorize(connection=self)
File "/home/nicolasalvo/tools/gsutil/third_party/boto/boto/connection.py", line 377, in authorize
connection._auth_handler.add_auth(self, **kwargs)
File "/home/nicolasalvo/tools/gsutil/gslib/third_party/oauth2_plugin/oauth2_plugin.py", line 22, in add_auth
self.oauth2_client.GetAuthorizationHeader()
File "/home/nicolasalvo/tools/gsutil/gslib/third_party/oauth2_plugin/oauth2_client.py", line 325, in GetAuthorizationHeader
return 'Bearer %s' % self.GetAccessToken().token
File "/home/nicolasalvo/tools/gsutil/gslib/third_party/oauth2_plugin/oauth2_client.py", line 288, in GetAccessToken
token_exchange_lock.acquire()
NameError: global name 'token_exchange_lock' is not defined
I've tried to declare the global object before using it, but it hasn't helped.
Also I would expect that I should be able to use Google provided libraries for Cloud Storage without requiring manual fixes.
I'm running Python 2.7.3
Any help is greatly appreciated

This worked for me:
import threading
from gslib.third_party.oauth2_plugin import oauth2_client
oauth2_client.token_exchange_lock = threading.Lock()

import StringIO
import os
import shutil
import tempfile
import time
import threading
import boto
from gslib.third_party.oauth2_plugin import oauth2_plugin
from gslib.third_party.oauth2_plugin import oauth2_client
oauth2_client.token_exchange_lock = threading.Lock()
# URI scheme for Google Cloud Storage.
GOOGLE_STORAGE = 'gs'
# URI scheme for accessing local files.
LOCAL_FILE = 'file'
project_id = 'abc.com:abc'
uri = boto.storage_uri('', GOOGLE_STORAGE)
header_values = {"x-goog-project-id": project_id}
for bucket in uri.get_all_buckets(headers=header_values):
print bucket.name
Yes it works!!

Related

Python code in AWS Lambda to load Shapefile to PostGIS(RDS)

I am new to GIS implementation. I am trying to develop AWS Lambda Code in Python to Load Shape File Dynamically.
I developed the code after doing some research and it is perfectly working on my local.
But the same code is troubling when I am trying to run in AWS Lambda.
I have added libraries(Lambda Layers) for 'OSGEO/GDAL' in AWS Lambda and tested it by calling import and it's working fine.
Following is the code :
import os
import subprocess
import boto3
import urllib.parse
from osgeo import gdal
from osgeo import ogr
s3 = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
s3key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
# input shapefile
input_shp = ('s3://' + bucket + '/' + s3key)
# database options
db_schema = "SCHEMA=public"
overwrite_option = "OVERWRITE=YES"
geom_type = "MULTILINESTRING"
output_format = "PostgreSQL"
# database connection string
db_connection = """PG:host=<RDS host name> port=5432 user=<RDS User Name> dbname= postgres password=<RDS Password>"""
# call ogr2ogr from python
subprocess.call(["ogr2ogr", "-lco", db_schema, "-lco", overwrite_option, "-nlt", geom_type, "-f", output_format, db_connection, input_shp])
Error Message is :
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: 'ogr2ogr': 'ogr2ogr'
The same code is working fine on my local with only difference that instead of S3 I am providing hard-coded path where shape file is stored on my local machine.
Any suggestions!

Read an object from Alibaba OSS and modify it using pandas python

So, my data is in the format of CSV files in the OSS bucket of Alibaba Cloud.
I am currently executing a Python script, wherein:
I download the file into my local machine.
Do the changes using Python script in my local machine.
Store it in AWS Cloud.
I have to modify this method and schedule a cron job in Alibaba Cloud to automate the running of this script.
The Python script will be uploaded into Task Management of Alibaba Cloud.
So the new steps will be:
Read a file from the OSS bucket into Pandas.
Modify it - Merging it with other data, some column changes. - Will be done in pandas.
Store the modified file into AWS RDS.
I am stuck at the first step itself.
Error Log:
"No module found" for OSS2 & pandas.
What is the correct way of doing it?
This is a rough draft of my script (on how was able to execute script in my local machine):
import os,re
import oss2 -- **throws an error. No module found.**
import datetime as dt
import pandas as pd -- **throws an error. No module found.**
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice
dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):
#Authentication
auth = oss2.Auth(access_key_id, access_key_secret)
# Bucket name
bucket = oss2.Bucket(auth, endpoint, bucket)
# Download the file
try:
# List all objects in the fun folder and its subfolders.
for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
order_file = obj.key
objectName = order_file.split('/')[1]
df = pd.read_csv(bucket.get_object(order_file)) # to read into pandas
# FUNCTION to modify and upload
print("File downloaded")
except:
print("Pls check!!! File not read")
return objectName

import os,re
import oss2
import datetime as dt
import pandas as pd
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice
import io ## include this new library
dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):
#Authentication
auth = oss2.Auth(access_key_id, access_key_secret)
# Bucket name
bucket = oss2.Bucket(auth, endpoint, bucket)
# Download the file
try:
# List all objects in the fun folder and its subfolders.
for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
order_file = obj.key
objectName = order_file.split('/')[1]
bucket_object = bucket.get_object(order_file).read() ## read the file from OSS
img_buf = io.BytesIO(bucket_object))
df = pd.read_csv(img_buf) # to read into pandas
# FUNCTION to modify and upload
print("File downloaded")
except:
print("Pls check!!! File not read")
return objectName

global name 'json' is not defined

Here is the beginning of createPeliMelo.py
def creation(path,session):
myPathFile=path+session+'.txt'
print myPathFile
pelimeloFile = open(path+session+'.txt', 'r')
with pelimeloFile as inf:
data = json.loads(inf.read())
Here is my Python script inside Maya:
import maya.cmds as cmds
import json
import os
from itertools import islice
import createPeliMelo as PeliMelo
PeliMelo.creation('C:/Users/francesco/Desktop/pelimelo video printemps/','session5723')
Here is the error I got:
Error: line 1: NameError: file C:/Users/francesco/Documents/maya/2016/scripts\createPeliMelo.py line
17: global name 'json' is not defined #
Line 17 is: data = json.loads(inf.read())
Where am I wrong?

When you import something, that import only applies to the file that you imported it in. This means that if you want to use json in createPeliMelo.py you need to do import json in THAT file, not your second script. Imports from one file will not propagate over to another.

Django not recognizing or seeing JSON file

I've been working on trying to integrate google sheets with django, i'm trying to use gspread. I can see the data using python filename.py, but when I run python manage.py runserver, I keep getting this error:
IOError: [Errno 2] No such file or directory: 'key.json'
It's not recognizing for seeing my json file for some reason, i've also tried using 'key' without the .json, no luck. I've been googling here, any ideas here? Here's my code below
*************************** code below *******************************
import gspread
import json
from oauth2client.service_account import ServiceAccountCredentials
import os
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('key.json', scope)
gc = gspread.authorize(credentials)
wks = gc.open("RAMP - Master").sheet1
print wks
cell_list = wks.range('A1:B7')
print cell_list

If key.json is in the same directory as the file you're running, then the correct syntax is:
import os
DIRNAME = os.path.dirname(__file__)
credentials = ServiceAccountCredentials.from_json_keyfile_name(
os.path.join(DIRNAME, 'key.json'),
scope
)

Twilio Python Module Errors After Compiling

I have written a simple program that opens a csv file and text all the numbers in it. I am using Twilio (twilio-python) as a service provider. My code works fine as a python script. However, when I compile the script (using py2exe), the exe file errors. This is the error I receive from the log file....
Traceback (most recent call last):
File "sms.py", line 39, in <module>
File "twilio\rest\resources\messages.pyc", line 112, in create
File "twilio\rest\resources\base.pyc", line 352, in create_instance
File "twilio\rest\resources\base.pyc", line 204, in request
File "twilio\rest\resources\base.pyc", line 129, in make_twilio_request
File "twilio\rest\resources\base.pyc", line 101, in make_request
File "httplib2\__init__.pyc", line 1570, in request
File "httplib2\__init__.pyc", line 1317, in _request
File "httplib2\__init__.pyc", line 1252, in _conn_request
File "httplib2\__init__.pyc", line 1021, in connect
File "httplib2\__init__.pyc", line 80, in _ssl_wrap_socket
File "ssl.pyc", line 387, in wrap_socket
File "ssl.pyc", line 141, in __init__
ssl.SSLError: [Errno 185090050] _ssl.c:340: error:0B084002:x509 certificate
routines:X509_load_cert_crl_file:system lib
I don't recieve this error when i am using the un-compiled code (below)
import sys #2 params --- /path/to/contact/file --- up to 160 char msg
import csv
import time
from twilio.rest import TwilioRestClient
ACCOUNT_SID = "**************************"
AUTH_TOKEN = "**************************"
client = TwilioRestClient(ACCOUNT_SID, AUTH_TOKEN)
sys.argv.pop(0)
contactFile = sys.argv[0]
sys.argv.pop(0)
msg = (' ').join(sys.argv)
print contactFile
print " "
print msg
info = []
with open(contactFile,'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
info.append(row)
contactCount = len(info)-1
if contactCount > 0:
#remove first item from list because its not a value that is needed....
info.pop(0)
for i in info:
print " "
contactName = i[0]
phoneNumber = i[1]
print "Texting " + contactName + "... \n"
client.messages.create(
to=phoneNumber,
from_="+14782856136",
body=msg
)
time.sleep(1.5)
else:
print("SMSify Error \n The contact file doesn't have any contacts in it.")
Any thoughts on what is going on??
EDIT:
Here is my setup.py file
from distutils.core import setup
import py2exe, sys, os
sys.argv.append('py2exe')
Mydata_files = [('cacert.pem', ['C:\\Python27\\Lib\\site-
packages\\twilio\\conf\\cacert.pem'])]
setup(
console=['sms.py'],
data_files = Mydata_files,
options={
"py2exe":{
"bundle_files": 1,
"compressed": True
}
}
)

It's happened because self-signed certificate file missed in bundle.
This problem is same for requests and httplib2 modules.
For example, if you have a file named req_example.py that using request module:
import requests
url = 'https://google.com/'
requests.get(url)
It works when you run it as python req_example.py, but when bundle it, it's not working.
Or if you have a file named http2_example.py that using http2 module:
import httplib2
url = 'https://google.com/'
http = httplib2.Http()
http.request(url)
It works when you run it as python http2_example.py , but when bundle it, it's not working.
To fix that, you have two options, one bad and one good.
Disable verifying SSL certificates:
To do that for requests module:
import requests
url = 'https://google.com/'
requests.get(url, verify=False)
And for httplib2 module:
import httplib2
http = httplib2.Http(disable_ssl_certificate_validation=True)
http.request(url)
Add self-signed certificate file to bundle:
For requests module, the file cacert.pem is located in:
.../PythonXX/lib/site-packages/requests/cacert.pem
And for httplib2 module is in:
.../PythonXX/lib/site-packages/httplib2/cacerts.txt
For each of them, you can copy it to inside of your project (or just address to it),
And config setup.py for including it:
setup(console=['temp.py'],
# for `requests` module
data_files=['cacert.pem'] ) # or data_files=['cacerts.txt'] ) for `httplib2`
And change your code to using that, for request module:
import os
import requests
url = 'https://google.com/'
cert ='cacert.pem'
# or os.environ['REQUESTS_CA_BUNDLE'] = cert
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(os.getcwd(), cert)
requests.get(url)
And for httplib2 module:
import httplib2
cert = 'cacerts.txt'
http = httplib2.Http(ca_certs=cert)
http.request(url)
Or if your httplib2 version is 0.8, you can create a file that
should named ca_certs_locater.py, and define a get function,
that return path of ca_certs file.
def get():
return 'cacerts.txt'
Ok, now for your error, and for twilio module, it use httplib2, and cacert.pem of it is in:
.../twilio/conf/cacert.pem
So you need to add this file to setup.py as described above.
But twilio itself has a function with name get_cert_file that pass ca_cert file to httplib2.
I think if you using ca_certs_locater.py that described above, it also will works for that,
but if not, you have yet an ugly option, so you can monkey patch get_cert_file function of twilio:
from twilio.rest.resources.base import get_cert_file
get_cert_file = lambda: 'cacert.pem'
Note that this may an issue for twilio or even for py2exe or PyInstaller .

I had the same problem with twilio and pyinstaller, and was able to fix it by modifying the base.py module in twilio\rest\resources:
def get_cert_file():
""" Get the cert file location or bail """
# XXX - this currently fails test coverage because we don't actually go
# over the network anywhere. Might be good to have a test that stands up a
# local server and authenticates against it.
try:
# Apparently __file__ is not available in all places so wrapping this
# in a try/catch
current_path = os.path.realpath(__file__)
#ca_cert_path = os.path.join(current_path, "..", "..", "..", (old path)
# "conf", "cacert.pem")
ca_cert_path = os.getcwd() + '\Config\cacert.pem' (my new path)
return os.path.abspath(ca_cert_path)
(I am storing my cacert.pem file in a Config folder off my main script directory)

There should probably be a way for py2exe to bundle up non-Python files, like templates or in this case the SSL certificate stored in cacert.pem. Normally this is done automatically with the MANIFEST.in, but I am not sure how that project handles it. Check the documentation there for more information.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

List Google Cloud Storage buckets using Python - python

This worked for me: import threading from gslib.third_party.oauth2_plugin import oauth2_client oauth2_client.token_exchange_lock = threading.Lock()

Related

Python code in AWS Lambda to load Shapefile to PostGIS(RDS)

Read an object from Alibaba OSS and modify it using pandas python

global name 'json' is not defined

Django not recognizing or seeing JSON file

Twilio Python Module Errors After Compiling

Categories

Resources