Extracting BigQuery Data From a Shared Dataset

Extracting BigQuery Data From a Shared Dataset - python

Is it possible to extract data (to google cloud storage) from a shared dataset (where I have only have view permissions) using the client APIs (python)?
I can do this manually using the web browser, but cannot get it to work using the APIs.
I have created a project (MyProject) and a service account for MyProject to use as credentials when creating the service using the API. This account has view permissions on a shared dataset (MySharedDataset) and write permissions on my google cloud storage bucket. If I attempt to run a job in my own project to extract data from the shared project:
job_data = {
'jobReference': {
'projectId': myProjectId,
'jobId': str(uuid.uuid4())
},
'configuration': {
'extract': {
'sourceTable': {
'projectId': sharedProjectId,
'datasetId': sharedDatasetId,
'tableId': sharedTableId,
},
'destinationUris': [cloud_storage_path],
'destinationFormat': 'AVRO'
}
}
}
I get the error:
googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json
returned "Value 'myProjectId' in content does not agree with value
sharedProjectId'. This can happen when a value set through a parameter
is inconsistent with a value set in the request.">
Using the sharedProjectId in both the jobReference and sourceTable I get:
googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json
returned "Access Denied: Job myJobId: The user myServiceAccountEmail
does not have permission to run a job in project sharedProjectId">
Using myProjectId for both the job immediately comes back with a status of 'DONE' and with no errors, but nothing has been exported. My GCS bucket is empty.
If this is indeed not possible using the API, is there another method/tool that can be used to automate the extraction of data from a shared dataset?
* UPDATE *
This works fine using the API explorer running under my GA login. In my code I use the following method:
service.jobs().insert(projectId=myProjectId, body=job_data).execute()
and removed the jobReference object containing the projectId
job_data = {
'configuration': {
'extract': {
'sourceTable': {
'projectId': sharedProjectId,
'datasetId': sharedDatasetId,
'tableId': sharedTableId,
},
'destinationUris': [cloud_storage_path],
'destinationFormat': 'AVRO'
}
}
}
but this returns the error
Access Denied: Table sharedProjectId:sharedDatasetId.sharedTableId: The user 'serviceAccountEmail' does not have permission to export a table in
dataset sharedProjectId:sharedDatasetId
My service account now is an owner on the shared dataset and has edit permissions on MyProject, where else do permissions need to be set or is it possible to use the python API using my GA login credentials rather than the service account?
* UPDATE *
Finally got it to work. How? Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!)

After trying to reproduce the issue, I was running into the parse errors.
I did how ever play around with the API on the Developer Console [2] and it worked.
What I did notice is that the request code below had a different format than the documentation on the website as it has single quotes instead of double quotes.
Here is the code that I ran to get it to work.
{
'configuration': {
'extract': {
'sourceTable': {
'projectId': "sharedProjectID",
'datasetId': "sharedDataSetID",
'tableId': "sharedTableID"
},
'destinationUri': "gs://myBucket/myFile.csv"
}
}
}
HTTP Request
POST https://www.googleapis.com/bigquery/v2/projects/myProjectId/jobs
If you are still running into problems, you can try the you can try the jobs.insert API on the website [2] or try the bq command tool [3].
The following command can do the same thing:
bq extract sharedProjectId:sharedDataSetId.sharedTableId gs://myBucket/myFile.csv
Hope this helps.
[2] https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert
[3] https://cloud.google.com/bigquery/bq-command-line-tool

Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!)

Related

How to transfers files with Google Workspace Admin SDK (python)

I am trying to write a program that transfers users drive and docs files from one user to another. It looks like I can do it using this documentation Documentation.
I created the data transfer object, which looks like this:
datatransfer = {
'kind': 'admin#datatransfer#DataTransfer',
'oldOwnerUserId': 'somenumberhere',
'newOwnderUserId': 'adifferentnumberhere',
'applicationDataTransfers':[
{
'applicationId': '55656082996', #the app id for drive and docs
'applicationTransferParams': [
{
'key': 'PRIVACY_LEVEL',
'value': [
{
'PRIVATE',
'SHARED'
}
]
}
]
}
]
}
I have some code here for handling Oauth and then I bind the service with:
service = build('admin', 'datatransfer_v1', credentials=creds)
Then I attempt to call insert() with
results = service.transfers().insert(body=datatransfer).execute()
and I get back an error saying that it 'missing required field: resource'.
I tried nesting all of this inside a field called resource and I get the same message.
I tried passing in JUST a json structure that looked like this {'resource': 'test'} and I get the same message.
So I tried using the "Try this method" live tool on the documentation website,
If I pass in no arguments at all, or just pass in the old and new user, I get the same message 'missing required nested field: resource'.
If I put in 'id':'55656082996' with ANY other arguments it just says error code 500 backend error.
I tried manually adding a field named "resource" to the live tool and it says 'property 'resource' does not exist in object specification"

I finally got this to work. If anyone else is struggling with this and stumbles on this, "applicationId" is a number, not a string. Also, the error message is misleading - there is no nested field called "resource." This is what worked for me:
datatransfer = {
"newOwnerUserId": "SomeNumber",
"oldOwnerUserId": "SomeOtherNumber",
"kind": "admin#datatransfer#DataTransfer",
"applicationDataTransfers": [
{
"applicationId": 55656082996,
"applicationTransferParams": [
{
"key": "PRIVACY_LEVEL"
},
{
"value": [
"{PRIVATE, SHARED}"
]
}
]
}
]
}
service = build('admin', 'datatransfer_v1', credentials=creds)
results = service.transfers().insert(body=datatransfer).execute()
print(results)
To get the user's Id's I'm first using the Directory API to query all users who are suspended, and getting their ID from that. Then passing their ID into this to transfer their files to another user before deleting them.

Check if entry exists in Firebase Realtime Databse (Python)

I am currently trying to retrieve login information from my Realtime DB in Firebase.
As an example, I want to detect whether the provided username already exists in the db.
Here is a code snippet I'm attemping to work with.
ref.child("Users").order_by_child("Username").equal_to("Hello").get()
It does not however work, and instead displays this message:
firebase_admin.exceptions.InvalidArgumentError: Index not defined, add ".indexOn": "Username", for path "/Users", to the rules
Is there something wrong with my code snippet or should I use a different alternative?

As the error suggests, you need to add ".indexOn": "Username" in your security rules as shown below:
{
"rules": {
"Users": {
".indexOn": "Username",
// ... rules
"$uid": {
// ... rules
}
}
}
}
Checkout the documentation for more information.

firebase_admin prints None but I have data in db and permission is allow all [duplicate]

I have a chat app using Firebase that keeps on having a
setValue at x failed: DatabaseError: permission denied
error every time I type a message.
I set my Database to be public already:
service cloud.firestore {
match /databases/{database}/documents {
match /{allPaths=**} {
allow read, write: if request.auth.uid != null;
}
}
}
Is it something from within my chat reference?
private void displayChat() {
ListView listOfMessage = findViewById(R.id.list_of_message);
Query query = FirebaseDatabase.getInstance().getReference();
FirebaseListOptions<Chat> options = new FirebaseListOptions.Builder<Chat>()
.setLayout(R.layout.list_item)
.setQuery(query, Chat.class)
.build();
adapter = new FirebaseListAdapter<Chat>(options) {
#Override
protected void populateView(View v, Chat model, int position) {
//Get reference to the views of list_item.xml
TextView messageText, messageUser, messageTime;
messageText = v.findViewById(R.id.message_text);
messageUser = v.findViewById(R.id.message_user);
messageTime = v.findViewById(R.id.message_time);
messageText.setText(model.getMessageText());
messageUser.setText(model.getMessageUser());
messageTime.setText(DateFormat.format("dd-MM-yyyy (HH:mm:ss)", model.getMessageTime()));
}
};
listOfMessage.setAdapter(adapter);
}

Your code is using the Firebase Realtime Database, but you're changing the security rules for Cloud Firestore. While both databases are part of Firebase, they are completely different and the server-side security rules for one, don't apply to the other.
When you go the database panel in the Firebase console, you most likely end up in the Cloud Firestore rules:
If you are on the Cloud Firestore rules in the Firebase console, you can change to the Realtime Database rules by clicking Cloud Firestore BETA at the top, and then selecting Realtime Database from the list.
You can also directly go to the security rules for the Realtime Database, by clicking this link.
The security rules for the realtime database that match what you have are:
{
"rules": {
".read": "auth.uid !== null",
".write": "auth.uid !== null"
}
}
This will grant any authenticated user full read and write access to the entire database. Read my answer to this question on more on the security/risk trade-off for such rules: Firebase email saying my realtime database has insecure rules.

change this
request.auth.uid != null
to
request.auth.uid == null
or defined a proper auth mechanism before starting the conversation where user defined by userID

botocore.errorfactory.InvalidS3ObjectException

I have aws recognition code written in Python, and it run's by Node API, which works fine on Windows system but when I'm deploying it on Linux I'm facing this issue:- botocore.errorfactory.InvalidS3ObjectException: An error occurred (InvalidS3ObjectException) when calling the DetectText operation: Unable to get object metadata from S3. Check object key, region and/or access permissions.
I have given both AmazonRekognitionFullAccess and AmazonS3ReadOnlyAccess access role to I'm user. Still I don't know how to get things going.
Python code:-
bucket = 'image-test'
def image_to_dict(fileName, bucket):
client = boto3.client('rekognition', 'us-east-2')
response = client.detect_text(Image = { 'S3Object': { 'Bucket': bucket,
'Name': fileName } })
return response
Node Code used to run Python script:-
var options = {
mode: 'text',
pythonPath:"/usr/bin/python2.7"
pythonOptions: ['-u'],
scriptPath: "/home/ubuntu/test",
args: [imageURl]
};
PythonShell.run('script.py', options, function (err, results) {
if (err)
throw err;
console.log("Data is: "+results)
I have Python version 2.7 installed on my Ubuntu, pip version 10.0.1.

Thanks for the help.
The reason behind the issue was, when i was passing image name as argument from Node API, the name was getting manipulated due to some Substring logic.So when python script goes with that manipulated name to search in S3 bucket, it used to through above error as that name was not existing in the S3 bucket.

(Fine uploader) Error uploading file to s3 using external ec2 server as signature handler

I been struggling for a couple of days trying to get this working. So now I thought I'd ask for some help.
I'm not able to sign headers and I don't know how to proceed. I have followed this guide to every detail: https://blog.fineuploader.com/fine-uploader-s3-upload-directly-to-amazon-s3-from-your-browser-3d9dcdcc0f33#sign-policy
This is my setup:
Javascript to handle upload:
var uploader = new qq.s3.FineUploader({
element: document.getElementById("uploader"),
debug: true,
request: {
endpoint: 'my-bucket.s3-accelerate.amazonaws.com',
accessKey: 'my-access-key'
},
signature: {
endpoint: 'my-external-signature-server'
},
uploadSuccess: {
endpoint: 'my-external-signature-server/s3/success'
},
iframeSupport: {
localBlankPagePath: '/success.html'
},
validation: {
allowedExtensions: ["jpeg", "jpg", "png"],
acceptFiles: "image/jpeg, image/png",
sizeLimit: 10000000,
itemLimit: 1
},
retry: {
enableAuto: true // defaults to false
},
paste: {
targetElement: document,
promptForName: true
}
});
Server setup:
Python flask enviornment set up according to the github example: https://github.com/FineUploader/server-examples/tree/master/python/flask-fine-uploader-s3. I have set it up according to the setup. Do I need to do more? Create my own policy documents or is everything included in the app.py file?
The app is running and publicly available by it's url. This is the url I link to in the signature endpoint above.
I get the following error when I try to upload an image:
GET https://cdn.shopify.com/s/files/1/2116/3741/t/1/assets/edit.gif 404 ()
OPTIONS https://my-external-signature-server/ net::ERR_CONNECTION_REFUSED
s3.fine-uploader.js?11308611504760701622:165 [Fine Uploader 5.15.5] POST request for 0 has failed - response code 0
[Fine Uploader 5.15.5] Received an empty or invalid response from the server!
[Fine Uploader 5.15.5] Policy signing failed. Received an empty or invalid response from the server!
my-external-signature-server is of course a stand in term.
I hope someone has an idea what could be wrong?
I'll gladly provide more information if necessary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting BigQuery Data From a Shared Dataset - python

Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!)

Related

How to transfers files with Google Workspace Admin SDK (python)

Check if entry exists in Firebase Realtime Databse (Python)

firebase_admin prints None but I have data in db and permission is allow all [duplicate]

botocore.errorfactory.InvalidS3ObjectException

(Fine uploader) Error uploading file to s3 using external ec2 server as signature handler

Categories

Resources