I tried to run a mrjob script on Amazon EMR. It worked well when I used instance c1.medium, however, it had an error when I changed instnace to t2.micro. The full error message was shown below.
C:\Users\Administrator\MyIpython>python word_count.py -r emr 111.txt
using configs in C:\Users\Administrator.mrjob.conf creating new
scratch bucket mrjob-875a948553aab9e8 using
s3://mrjob-875a948553aab9e8/tmp/ as our scratch dir on S3 creating tmp
directory c:\users\admini~1\appdata\local\temp\word_count.Administr
ator.20150731.013007.592000 writing master bootstrap script to
c:\users\admini~1\appdata\local\temp\word_cou
nt.Administrator.20150731.013007.592000\b.py
PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by
default. It's recommended you run your job with --strict-protocols or
set up mrjob.conf as de scribed at
https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protoc
ols
creating S3 bucket 'mrjob-875a948553aab9e8' to use as scratch space
Copying non-input files into
s3://mrjob-875a948553aab9e8/tmp/word_count.Administ
rator.20150731.013007.592000/files/ Waiting 5.0s for S3 eventual
consistency Creating Elastic MapReduce job flow Traceback (most recent
call last): File "word_count.py", line 16, in
MRWordFrequencyCount.run() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\job.py", line 461, in run
mr_job.execute() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\job.py", line 479, in execute
super(MRJob, self).execute() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\launch.py", line 153, in
execute
self.run_job() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\launch.py", line 216, in
run_job
runner.run() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\runner.py", line 470, in run
self._run() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 881, in
_run
self._launch() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 886, in
_launch
self._launch_emr_job() File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 1593, in
_launch_emr_job
persistent=False) File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 1327, in
_create_job_flow
self._job_name, self._opts['s3_log_uri'], **args) File "F:\Program Files\Anaconda\lib\site-packages\mrjob\retry.py", line
149, i n call_and_maybe_retry
return f(*args, **kwargs) File "F:\Program Files\Anaconda\lib\site-packages\mrjob\retry.py", line 71, in
call_and_maybe_retry
result = getattr(alternative, name)(*args, **kwargs) File "F:\Program Files\Anaconda\lib\site-packages\boto\emr\connection.py",
lin e 581, in run_jobflow
'RunJobFlow', params, RunJobFlowResponse, verb='POST') File "F:\Program Files\Anaconda\lib\site-packages\boto\connection.py", line
12 08, in get_object
raise self.ResponseError(response.status, response.reason, body) boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request
Sender
ValidationError
Instance type 't2.micro' is not supported c3ee1107-3723-11e5-8d8e-f1011298229d
This is my config file detail
runners:
emr:
aws_access_key_id: xxxxxxxxxxx
aws_secret_access_key: xxxxxxxxxxxxx
aws_region: us-east-1
ec2_key_pair: EMR
ec2_key_pair_file: C:\Users\Administrator\EMR.pem
ssh_tunnel_to_job_tracker: false
ec2_instance_type: t2.micro
num_ec2_instances: 2
EMR doesn't support the t2 instance type. If you're worried about money, spot instances are a very cost-effective option: right now m1.xlarge is less than $0.05 per hour, and m1.medium is $0.01 per hour (cheaper than t2.micro anyway) Supported types are the following (screenshot from the EMR webapp console:
Related
I am deploying containers to GKE that contain Python apps and encountering an error when I try to use OpenCensus to send trace messages:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/opencensus/metrics/transport.py", line 59, in func
return self.func(*aa, **kw)
File "/usr/local/lib/python3.7/site-packages/opencensus/metrics/transport.py", line 113, in export_all
export(itertools.chain(*all_gets))
File "/usr/local/lib/python3.7/site-packages/opencensus/ext/stackdriver/stats_exporter/__init__.py", line 162, in export_metrics
self.client.project_path(self.options.project_id), ts_batch)
File "/usr/local/lib/python3.7/site-packages/google/cloud/monitoring_v3/gapic/metric_service_client.py", line 1024, in create_time_series
request, retry=retry, timeout=timeout, metadata=metadata
File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 273, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 182, in retry_target
return target()
File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 One or more TimeSeries could not be written: The set of resource labels is incomplete. Missing labels: (container_name namespace_name).: timeSeries[0-199]
The interesting part seems to be this sentence: Missing labels: (container_name namespace_name).
When I run the exact same code locally, I do not receive any errors and I do see my tracing appearing in Stackdriver Metrics Explorer, so the problem appears to be related specifically to running inside a container in GKE.
Is there something specific that is required to get OpenCensus working in a GKE container?
The answer is that you need to manually set two environment variables in your container: CONTAINER_NAME and NAMESPACE. I believe GKE should be setting these and isn't, and so OpenCensus can't find the expected values. A sample fix would involve including those two variables in the podspec:
spec:
containers:
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CONTAINER_NAME
value: {{ APP }}-collectors-{{ NAME }}
More details: https://github.com/census-instrumentation/opencensus-python/issues/796#issuecomment-539109321
I am a newbie in Toil and AWS trying to run HelloWorld.py example in the Toil Document. I have already successfully installed toil and related python packages on my local mac laptop and have setup my account at AWS. I have created a small leader/worker cluster
$ cgcloud create-cluster toil -s 2 -t m3.large
and started it:
$ cgcloud ssh toil-leader
This changed my screen prompt to:
mesosbox#ip-172-31-25-135:~$
Then from an other window on my mac, I started the Toil HellowWorld example with with command:
$ python2.7 HelloWorld.py --batchSystem=mesos --mesosMaster=mesos-master:5050 aws:us-west-2:my-aws-jobstore
And I got the following output:
Apples-Air 2017-06-02 19:30:53,524 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
Apples-Air 2017-06-02 19:30:53,524 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
Apples-Air 2017-06-02 19:30:54,852 MainThread WARNING toil.jobStores.aws.jobStore: Exception during panic
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 209, in initialize
self.destroy()
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 1334, in destroy
self._bind(create=False, block=False)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 241, in _bind
versioning=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 721, in _bindBucket
bucket = self.s3.get_bucket(bucket_name, validate=True)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket
raise err
S3ResponseError: S3ResponseError: 403 Forbidden
Traceback (most recent call last):
File "helloWorld.py", line 22, in <module>
print(Job.Runner.startToil(j, options)) #Prints Hello, world!, ….
File "/usr/local/lib/python2.7/site-packages/toil/job.py", line 740, in startToil
with Toil(options) as toil:
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 614, in __enter__
jobStore.initialize(config)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 209, in initialize
self.destroy()
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 206, in initialize
self._bind(create=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 241, in _bind
versioning=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 721, in _bindBucket
bucket = self.s3.get_bucket(bucket_name, validate=True)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket
raise err
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
Please help.
Thanks.
---John
I realize that this answer is a little late. One problem I notice is with the mesosMaster argument.
Instead, your command should have look like
python2.7 HelloWorld.py --batchSystem=mesos --mesosMaster=172.31.25.135:5050 aws:us-west-2:my-aws-jobstore
Notice that I replaces mesos-master with the actual IP address from
mesosbox#ip-172-31-25-135:~$
Hopefully in the future, one will not need to pass this argument at all, however this is not yet implemented as of 26 July 2017.
Also for further problems with Toil you will probably have better luck posting a new issue to the Toil Github page.
I'm having trouble getting the python appengine-gcs-client demo working using the 1.9.40 (latest presently) SDK's dev_appserver.py.
I followed the Setting Up Google Cloud Storage and the App Engine and Google Cloud Storage Sample instructions.
I created the default bucket for a paid app, with billing enabled and a non-zero daily spending limit set. I successfully uploaded a file to that bucket using the developer console.
I cloned the GoogleCloudPlatform/appengine-gcs-client repo from github. I copied the python/src/cloudstorage dir into the python/demo dir, which now looks like this:
dancorn-laptop.acasa:/home/dancorn/src/appengine-gcs-client/python> find demo/ | sort
demo/
demo/app.yaml
demo/blobstore.py
demo/cloudstorage
demo/cloudstorage/api_utils.py
demo/cloudstorage/api_utils.pyc
demo/cloudstorage/cloudstorage_api.py
demo/cloudstorage/cloudstorage_api.pyc
demo/cloudstorage/common.py
demo/cloudstorage/common.pyc
demo/cloudstorage/errors.py
demo/cloudstorage/errors.pyc
demo/cloudstorage/__init__.py
demo/cloudstorage/__init__.pyc
demo/cloudstorage/rest_api.py
demo/cloudstorage/rest_api.pyc
demo/cloudstorage/storage_api.py
demo/cloudstorage/storage_api.pyc
demo/cloudstorage/test_utils.py
demo/__init__.py
demo/main.py
demo/main.pyc
demo/README
This is how I executed the devserver and the errors reported when trying to access http://localhost:8080 as instructed:
dancorn-laptop.acasa:/home/dancorn/src/appengine-gcs-client/python> /home/usr_local/google_appengine_1.9.40/dev_appserver.py demo
INFO 2016-08-04 01:07:51,786 sdk_update_checker.py:229] Checking for updates to the SDK.
INFO 2016-08-04 01:07:51,982 sdk_update_checker.py:257] The SDK is up to date.
INFO 2016-08-04 01:07:52,121 api_server.py:205] Starting API server at: http://localhost:50355
INFO 2016-08-04 01:07:52,123 dispatcher.py:197] Starting module "default" running at: http://localhost:8080
INFO 2016-08-04 01:07:52,124 admin_server.py:116] Starting admin server at: http://localhost:8000
INFO 2016-08-04 01:08:03,461 client.py:804] Refreshing access_token
INFO 2016-08-04 01:08:05,234 client.py:827] Failed to retrieve access token: {
"error" : "internal_failure"
}
ERROR 2016-08-04 01:08:05,236 api_server.py:272] Exception while handling service_name: "app_identity_service"
method: "GetAccessToken"
request: "\n7https://www.googleapis.com/auth/devstorage.full_control"
request_id: "ccqdTObLrl"
Traceback (most recent call last):
File "/home/usr_local/google_appengine_1.9.40/google/appengine/tools/devappserver2/api_server.py", line 247, in _handle_POST
api_response = _execute_request(request).Encode()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/tools/devappserver2/api_server.py", line 186, in _execute_request
make_request()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/tools/devappserver2/api_server.py", line 181, in make_request
request_id)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/apiproxy_stub.py", line 131, in MakeSyncCall
method(request, response)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/app_identity/app_identity_defaultcredentialsbased_stub.py", line 192, in _Dynamic_GetAccessToken
token = credentials.get_access_token()
File "/home/usr_local/google_appengine_1.9.40/lib/oauth2client/oauth2client/client.py", line 689, in get_access_token
self.refresh(http)
File "/home/usr_local/google_appengine_1.9.40/lib/oauth2client/oauth2client/client.py", line 604, in refresh
self._refresh(http.request)
File "/home/usr_local/google_appengine_1.9.40/lib/oauth2client/oauth2client/client.py", line 775, in _refresh
self._do_refresh_request(http_request)
File "/home/usr_local/google_appengine_1.9.40/lib/oauth2client/oauth2client/client.py", line 840, in _do_refresh_request
raise AccessTokenRefreshError(error_msg)
AccessTokenRefreshError: internal_failure
WARNING 2016-08-04 01:08:05,239 tasklets.py:468] suspended generator _make_token_async(rest_api.py:55) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
WARNING 2016-08-04 01:08:05,240 tasklets.py:468] suspended generator get_token_async(rest_api.py:224) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
WARNING 2016-08-04 01:08:05,240 tasklets.py:468] suspended generator urlfetch_async(rest_api.py:259) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
WARNING 2016-08-04 01:08:05,240 tasklets.py:468] suspended generator run(api_utils.py:164) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
WARNING 2016-08-04 01:08:05,240 tasklets.py:468] suspended generator do_request_async(rest_api.py:198) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
WARNING 2016-08-04 01:08:05,241 tasklets.py:468] suspended generator do_request_async(storage_api.py:128) raised RuntimeError(AccessTokenRefreshError(u'internal_failure',))
ERROR 2016-08-04 01:08:05,241 main.py:62] AccessTokenRefreshError(u'internal_failure',)
Traceback (most recent call last):
File "/home/dancorn/src/appengine-gcs-client/python/demo/main.py", line 43, in get
self.create_file(filename)
File "/home/dancorn/src/appengine-gcs-client/python/demo/main.py", line 89, in create_file
retry_params=write_retry_params)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/cloudstorage_api.py", line 97, in open
return storage_api.StreamingBuffer(api, filename, content_type, options)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/storage_api.py", line 697, in __init__
status, resp_headers, content = self._api.post_object(path, headers=headers)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/rest_api.py", line 82, in sync_wrapper
return future.get_result()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/storage_api.py", line 128, in do_request_async
deadline=deadline, callback=callback)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/rest_api.py", line 198, in do_request_async
follow_redirects=False)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/api_utils.py", line 164, in run
result = yield tasklet(**kwds)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/rest_api.py", line 259, in urlfetch_async
self.token = yield self.get_token_async()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/rest_api.py", line 224, in get_token_async
self.scopes, self.service_account_id)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/home/dancorn/src/appengine-gcs-client/python/demo/cloudstorage/rest_api.py", line 55, in _make_token_async
token, expires_at = yield rpc
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
result = rpc.get_result()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/app_identity/app_identity.py", line 519, in get_access_token_result
rpc.check_success()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/home/usr_local/google_appengine_1.9.40/google/appengine/api/apiproxy_rpc.py", line 157, in _WaitImpl
self.request, self.response)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/remote_api/remote_api_stub.py", line 201, in MakeSyncCall
self._MakeRealSyncCall(service, call, request, response)
File "/home/usr_local/google_appengine_1.9.40/google/appengine/ext/remote_api/remote_api_stub.py", line 235, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
RuntimeError: AccessTokenRefreshError(u'internal_failure',)
INFO 2016-08-04 01:08:05,255 module.py:788] default: "GET / HTTP/1.1" 200 249
I was surprised when I saw the attempt to contact a Google server, I was expecting to use a faked, local filesystem-based emulation, based on these notes from the App Engine and Google Cloud Storage Sample instructions:
Using the client library with the development app server:
You can use the client library with the development server.
**Note**: Files saved locally are subject to the file size and naming conventions imposed by the local filesystem.
app.yaml walkthrough:
You specify the project ID in the line application: your-app-id,
replacing the value your-app-id. This value isn't used when running
locally, but you must supply a valid project ID before deploying: the
deployment utility reads this entry to determine where to deploy your
app.
Deploying the Sample, step 5:
In your browser, visit https://.appspot.com; the
application will execute on page load, just as it did when running
locally. Only this time, the app will actually be writing to and
reading from a real bucket.
I even placed my real app's ID into the app.yaml file, but that didn't make any difference.
I've checked the known GAE issues and only found this potentially related one, but on a much older SDK version:
Issue 11690 GloudStorage bug in GoogleAppEngineLanucher development server
I checked a few older SDK versions I have around (1.9.30, 1.9.35), just in case - no difference either.
My questions:
How can I make the GCS client operate locally (w/ faked GCS based on the local filesystem) when it's used with dev_appserver.py?
Since it's mentioned it should work with the real GCS as well even when used with dev_appserver.py what do I need to do to achieve that? (less important, more of a curiosity)
Actually the reason was what IMHO is a quite silly bug - inability to read the credentials from a local file written by an earlier version of the SDK (or related package?) and failure to fallback to a more decent action which leads to a rather misleading traceback throwing off the investigation.
Credit goes to this answer: https://stackoverflow.com/a/35890078/4495081 ('tho the bug mentioned in the post was for something else, ultimately triggering the similar end result)
After removing the ~/.config/gcloud/application_default_credentias.json file the demo completed successfully using the local filesystem. And my real app worked fine as well.
My 2nd question stands, but I'm not too worried about it - personally I don't see a great value in using the real GCS storage with the local development server - I have to do testing on a real staging GAE app anyways for other reasons.
For testing purpose I want to start two instances of a GAE app locally. However the second instance will fail to start because there is already a lock on the local database imposed by the first instance.
INFO 2014-09-28 05:14:22,751 admin_server.py:117] Starting admin server at: http://localhost:8081
OperationalError('database is locked',)
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/cherrypy/cherrypy/wsgiserver/wsgiserver2.py", line 1302, in communicate
req.respond()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/cherrypy/cherrypy/wsgiserver/wsgiserver2.py", line 831, in respond
self.server.gateway(self).respond()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/cherrypy/cherrypy/wsgiserver/wsgiserver2.py", line 2115, in respond
response = self.req.server.wsgi_app(self.env, self.start_response)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/devappserver2/wsgi_server.py", line 266, in __call__
return app(environ, start_response)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/devappserver2/module.py", line 1431, in __call__
return self._handle_request(environ, start_response)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/devappserver2/module.py", line 641, in _handle_request
module=self._module_configuration.module_name)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/apiproxy_stub.py", line 165, in WrappedMethod
return method(self, *args, **kwargs)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/logservice/logservice_stub.py", line 172, in start_request
host, start_time, method, resource, http_version, module))
OperationalError: database is locked
Is there any way I can specify an alternative data store location in the second instance of my app?
Depends on how you start your application.
If using Java, might want to look at this answer.
But keep in mind your two apps won't be talking to the same datastores, so if you need data to persist between your instances, this won't work.
I am wondering if anyone knows a way to use a service account to authenticate if I want to access data in Cloud Storage by:
1. Using boto library (and gcs_oauth2_boto_plugin)
2. Running in Google App Engine (GAE)
Following https://developers.google.com/storage/docs/gspythonlibrary I am using boto and gcs_oauth2_boto_plugin to authenticate and perform actions against Cloud Storage (upload/download files). I am using a service account to authenticate so that we don't have to authenticate with a Google account periodically (the thought being that if we run this in GCE, it'll run with the GCE service account -- haven't actually done that yet). Locally, I've set up my boto config file to use the service account and point to a p12 key file. This runs fine locally.
I would like to use the same code to interact with Cloud Storage from within Google App Engine (GAE). We are running a light weight ETL process that transforms and loads the data into Big Query. We want to run this code in App Engine task queue (the task is getting triggered by an Object Change Notification from Cloud Storage).
Since we're currently relying on the boto config (~/.boto), I adapted http://thurloat.com/2010/06/07/google-storage-and-app-engine to put the relevant config items for a service account.
When I finally run the code from App Engine (dev_appserver.py), I get the below stack trace:
Traceback (most recent call last):
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/home/some-user/dev/myApp/main.py", line 247, in post
gs.download(fname, fp)
File "/home/some-user/dev/myApp/cloudstorage.py", line 107, in download
bytes = src_uri.get_key().get_contents_to_file(fp)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 336, in get_key
bucket = self.get_bucket(validate, headers)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 181, in get_bucket
conn = self.connect()
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 140, in connect
**connection_args)
File "/home/some-user/dev/myApp/boto/gs/connection.py", line 47, in __init__
suppress_consec_slashes=suppress_consec_slashes)
File "/home/some-user/dev/myApp/boto/s3/connection.py", line 190, in __init__
validate_certs=validate_certs, profile_name=profile_name)
File "/home/some-user/dev/myApp/boto/connection.py", line 568, in __init__
host, config, self.provider, self._required_auth_capability())
File "/home/some-user/dev/myApp/boto/auth.py", line 929, in get_auth_handler
ready_handlers.append(handler(host, config, provider))
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_plugin.py", line 56, in __init__
cred_type=oauth2_client.CredTypes.OAUTH2_SERVICE_ACCOUNT)
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_helper.py", line 48, in OAuth2ClientFromBotoConfig
token_cache = oauth2_client.FileSystemTokenCache()
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_client.py", line 175, in __init__
tempfile.gettempdir(), 'oauth2_client-tokencache.%(uid)s.%(key)s')
File "/home/some-user/google-cloud-sdk/platform/google_appengine/google/appengine/dist/tempfile.py", line 61, in PlaceHolder
raise NotImplementedError("Only tempfile.TemporaryFile is available for use")
NotImplementedError: Only tempfile.TemporaryFile is available for use
Looks like the problem is just with gcs_oauth2_boto_plugin trying to use a temporary directory when caching the oauth credentials (App Engine only supports tempfile.TemporaryFile).
Rather than try and patch gcs_oauth2_boto_plugin, is there potentially another solution? Can we use a service account with gcs_oauth2_boto_plugin/boto on App Engine to access Cloud Storage resources?
Or, am I using the wrong authentication method here?
This doesn't quite answer the question directly, but instead of using boto and gcs_oauth2_boto_plugin, I am using the "Google Cloud Storage
Python Client Library", GoogleAppEngineCloudStorageClient from pip.
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/