Search a folder using ID via Google Drive API in python

Search a folder using ID via Google Drive API in python - python

I am using the Google Drive's API to create and search files and folders using python 3. Searching a folder is done using the files.list method of the drive API. Each search requires a query q as an input. Some examples of queries are given here. I can successfully search for a folder named hello using its name as given below:
service.files().list(q="name = 'hello'", pageSize=10, fields=files(id, name)).execute()
Now, let us say the ID of the folder is 123, then how can I search for this folder using the ID instead of the name. I have tried replacing q with id = '123' as well as id = 123 but none of it seems to work. The closest query I was able to find here was:
q = "'123' in parents"
But this will return all the files whose parent folder is the folder we are looking for.
Is there any way in which we can directly search folders or files by their ID?
Thank you.

I believe your goal is as follows.
You want to retrieve the folder metadata by the folder ID Drive API with googleapis for python.
In this case, how about the following sample script?
folderId = "###" # Please set the folder ID.
res = service.files().get(fileId=folderId, fields="id, name").execute()
In this case, "Files: get" method is used.
Additional information:
If you want to retrieve the folder list just under the specific folder, you can use the following modified search query.
q = "'123' in parents and mimeType='application/vnd.google-apps.folder'"
In this case, 123 is the top folder ID.
In this modification, the retrieved mimeType is given. By this, only the folder list can be retrieved.
If you want to retrieve the file list except for the folder, you can also use the following search query.
q = "'123' in parents and not mimeType='application/vnd.google-apps.folder'"
If you want to retrieve the folder ID from the folder name, you can use the following sample script. In this case, when there are same folder names of hello, those folders are returned.
res = service.files().list(q="name = 'hello' and mimeType='application/vnd.google-apps.folder'", pageSize=10, fields="files(id, name)").execute()
Note:
In your showing script, please modify fields=files(id, name) of service.files().list(q="name = 'hello'", pageSize=10, fields=files(id, name)).execute() to fields="files(id, name)".
References:
Files: get
Search for files and folders

Related

Iterate over files in databricks Repos

I would like to iterate over some files in a folder that has its path in databricks Repos.
How would one do this? I don't seem to be able to access the files in Repos
I have added a picture that shows what folders i would like to access (the dbrks & sql folders)
Thanks :)
Image of the repo folder hierarchy

You can read files from repo folders. The path is /mnt/repos/, this is the top folder when opening the repo window. You can then iterate yourself over these files.
Whenever you find the file you want you can read it with (for example) Spark. Example if you want to read a CSV file.
spark.read.format("csv").load(
path, header=True, inferSchema=True, delimiter=";"
)

If you just want to list files in the repositories, then you can use the list command of Workspace REST API. Using it you can implement recursive listing of files. The actual implementation would different, based on your requirements, like, if you need to generate a list of full paths vs. list with subdirectories, etc. This could be something like this (not tested):
import requests
my_pat = "generated personal access token"
workspace_url = "https://name-of-workspace"
def list_files(base_path: str):
lst = requests.request(method='get',
url=f"{workspace_url}/api/2.0/workspace/list",
headers={"Authentication": f"Bearer {my_pat}",
json={"path": base_path}).json()["objects"]
results = []
for i in lst:
if i["object_type"] == "DIRECTORY" or i["object_type"] == "REPO":
results.extend(list_files(i["path"]))
else:
results.append(i["path"])
return results
all_files = list_files("/Repos/<my-initial-folder")
But if you want to read a content of the files in the repository, then you need to use so-called Arbitrary Files support that is available since DBR 8.4.

How to find the sub folder id in Google Drive using pydrive in Python?

The directory stricture on Google Drive is as follows:
Inside mydrive/BTP/BTP-4
I need to get the folder ID for BTP-4 so that I can transfer a specific file from the folder. How do I do it?
fileList = GoogleDrive(self.driveConn).ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in fileList:
if (file['title'] == "BTP-4"):
fileID = file['id']
print(remoteFile, fileID)
return fileID

Will be able to give path like /MyDrive/BTP/BTP-4 and filename as "test.csv" and then directly download the file?
Answer:
Unfortunately, this is not possible.
More Information:
Google Drive supports creating multiple files or folders with the same name in the same location:
As a result of this, in some cases, providing a file path isn't enough to identify a file or folder uniquiely - in this case mydrive/Parent folder/Child folder/Child doc points to two different files, and mydrive/Parent folder/Child folder/Child folder points to five different folders.
You have to either directly search for the folder with its ID, or to get a folder/file's ID you have to search for children recursively through the folders like you are already doing.

How to find folder ID by path in Google Drive API?

I'm able to find a Google Drive Folder's ID by searching its name, but how can I find the Folder ID by searching its path?
I'm searching within a Drive that has multiple folders with the same name, but different paths, so I want to ensure I'm finding the correct folder ID.
results = service.files().list(q="name='folder_name'",
pageSize=10, fields="nextPageToken, files(id, name)",
supportsAllDrives=True, driveId='driveId', includeItemsFromAllDrives=True, corpora='drive').execute()
The above returns multiple folders. I don't see a query parameter for the path.
Sample folder structure:
Food
\Vegetables
\Folder 1
Item 1
Item 2
Item 3
\Folder 2
Item 1
\Fruit
\Folder 1
\Protein

How about this answer?
Issue:
Unfortunately, in the current stage, there are no methods for directly retrieving the files and folders in the specific folder using the folder name. So in your case, it is required to use a workaround.
Workaround:
I would like to propose the workaround as follows.
From your sample folder structure, I could understand that you want to retrieve the file list under the folders of Folder 1 and Folder 2 in Vegetables. In this case, how about using the following search query?
q="name='Folder 1' and '### folderId of Vegetables ###' in parents"
In this case, please retrieve the folder ID of Vegetables. By above modified search query, the folder ID of Folder 1 in the folder of Vegetables can be retrieved.
Of course, you can also retrieve the folder ID of Vegetables using the script.
Note:
In this answer, it supposes that you have already been able to get the file metadata using Drive API.
References:
Files: list
Search for files and folders

Certain files missing in Google Drive API v3 Python files().list()

I'm new to using the Google Drive API for Python (v3) and I've been trying to access and update the sub-folders in a particular parent folder for which I have the fileId. Here is my build for the API driver:
store = file.Storage('token.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('credentials.json',
scope='https://www.googleapis.com/auth/drive')
creds = tools.run_flow(flow, store)
service = build('drive', 'v3', http=creds.authorize(Http()))
I am able to successfully access most of the sub-folders by using files().list() but at least one was missing from the list of results returned:
results = service.files().list(
q="parents in '1QXPl6z04GsYAO0GKHBk2oBjEweaAbczw'",
fields="files(id, name), incompleteSearch, nextPageToken").execute()
items = results['files']
I double checked and there was no nextPageToken key in the results and the value of incompleteSearch was False, which I assume means the full list of results were returned. In addition when I accessed the list of parents for the missing file by using the file().get() method, the only parent listed is the one in the query above:
service.files().get(
fileId='1WHP02DtXfJHfkdr47xSeeRIj0sCrihPA',
fields='parents, name').execute()
and returns this:
{'name': 'Sara Gaul -Baltimore Corps docs and schedules',
'parents': ['1QXPl6z04GsYAO0GKHBk2oBjEweaAbczw']}
Other details that may be relevant:
This particular folder that is not appearing in the list was renamed by a collaborator.
I'm running this code on a jupyter notebook instead from a python file.
I'm a named collaborator with write access on all of the sub-folders, including the one that's not showing up.
UPDATES
The files().list() query used to return 40 records of the 41 in the folder. Now it is only returning 39.
Both of the folders that are no longer being returned were renamed by someone who accessed the folder using the link that extends write level permissions.
When their folder details are queried directly using files().get() both of the non-returned folders still have the parent folder as their only parent, and their permissions have not changed.
Main questions:
Why isn't this file which clearly has the parent id listed in my file().list() query showing up in the results of that query? And is there any way to adjust the query or the file to ensure that it does?
Is there an easier way to list all of the files contained within a folder in the Google Drive API v3? I know that v2 had a children() method for folders, but it's been deprecated in v3 to my knowledge

I figured out the error with my code:
My previous query parameter in the files().list() method was:
results = service.files().list(
q="parents in '1QXPl6z04GsYAO0GKHBk2oBjEweaAbczw'",
fields="files(id, name), incompleteSearch, nextPageToken").execute()
items = results['files']
After looking at another bug someone had posted in Google's issue tracker for the API, I saw the preferred syntax for that query was:
results = service.files().list(
q="'1QXPl6z04GsYAO0GKHBk2oBjEweaAbczw' in parents",
fields="files(id, name), incompleteSearch, nextPageToken").execute()
items = results['files']
In other words switching the order of parents in fileId to fileId in parents. With the resulting change in syntax all 41 files were returned.
I have two follow-up questions that hopefully someone can clarify:
Why would the first syntax return any records at all if it is incorrect? And why would changing the name of a file prevent it from being returned using the first syntax?
If you wanted to return a list of files that were stored in one of a few folders, is there any way to pass multiple parent ids to the query as the parents in ... syntax would suggest? Or do they have to be evaluated as separate conditions i.e. fileId1 in parents or fileId2 in parents?
If someone could comment on this answer with those explanations or post a more complete answer, I would gladly select it as the best response.

How to delete multiple files at once using Google Drive API

I'm developing a python script that will upload files to a specific folder in my drive, as I come to notice, the drive api provides an excellent implementation for that, but I did encountered one problem, how do I delete multiple files at once?
I tried grabbing the files I want from the drive and organize their Id's but no luck there... (a snippet below)
dir_id = "my folder Id"
file_id = "avoid deleting this file"
dFiles = []
query = ""
#will return a list of all the files in the folder
children = service.files().list(q="'"+dir_id+"' in parents").execute()
for i in children["items"]:
print "appending "+i["title"]
if i["id"] != file_id:
#two format options I tried..
dFiles.append(i["id"]) # will show as array of id's ["id1","id2"...]
query +=i["id"]+", " #will show in this format "id1, id2,..."
query = query[:-2] #to remove the finished ',' in the string
#tried both the query and str(dFiles) as arg but no luck...
service.files().delete(fileId=query).execute()
Is it possible to delete selected files (I don't see why it wouldn't be possible, after all, it's a basic operation)?
Thanks in advance!

You can batch multiple Drive API requests together. Something like this should work using the Python API Client Library:
def delete_file(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
batch = service.new_batch_http_request(callback=delete_file)
for file in children["items"]:
batch.add(service.files().delete(fileId=file["id"]))
batch.execute(http=http)

If you delete or trash a folder, it will recursively delete/trash all of the files contained in that folder. Therefore, your code can be vastly simplified:
dir_id = "my folder Id"
file_id = "avoid deleting this file"
service.files().update(fileId=file_id, addParents="root", removeParents=dir_id).execute()
service.files().delete(fileId=dir_id).execute()
This will first move the file you want to keep out of the folder (and into "My Drive") and then delete the folder.
Beware: if you call delete() instead of trash(), the folder and all the files within it will be permanently deleted and there is no way to recover them! So be very careful when using this method with a folder...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search a folder using ID via Google Drive API in python - python

Related

Iterate over files in databricks Repos

How to find the sub folder id in Google Drive using pydrive in Python?

How to find folder ID by path in Google Drive API?

Certain files missing in Google Drive API v3 Python files().list()

How to delete multiple files at once using Google Drive API

Categories

Resources