How to find folder ID by path in Google Drive API? - python

I'm able to find a Google Drive Folder's ID by searching its name, but how can I find the Folder ID by searching its path?
I'm searching within a Drive that has multiple folders with the same name, but different paths, so I want to ensure I'm finding the correct folder ID.
results = service.files().list(q="name='folder_name'",
pageSize=10, fields="nextPageToken, files(id, name)",
supportsAllDrives=True, driveId='driveId', includeItemsFromAllDrives=True, corpora='drive').execute()
The above returns multiple folders. I don't see a query parameter for the path.
Sample folder structure:
Food
\Vegetables
\Folder 1
Item 1
Item 2
Item 3
\Folder 2
Item 1
\Fruit
\Folder 1
\Protein

How about this answer?
Issue:
Unfortunately, in the current stage, there are no methods for directly retrieving the files and folders in the specific folder using the folder name. So in your case, it is required to use a workaround.
Workaround:
I would like to propose the workaround as follows.
From your sample folder structure, I could understand that you want to retrieve the file list under the folders of Folder 1 and Folder 2 in Vegetables. In this case, how about using the following search query?
q="name='Folder 1' and '### folderId of Vegetables ###' in parents"
In this case, please retrieve the folder ID of Vegetables. By above modified search query, the folder ID of Folder 1 in the folder of Vegetables can be retrieved.
Of course, you can also retrieve the folder ID of Vegetables using the script.
Note:
In this answer, it supposes that you have already been able to get the file metadata using Drive API.
References:
Files: list
Search for files and folders

Related

Search a folder using ID via Google Drive API in python

I am using the Google Drive's API to create and search files and folders using python 3. Searching a folder is done using the files.list method of the drive API. Each search requires a query q as an input. Some examples of queries are given here. I can successfully search for a folder named hello using its name as given below:
service.files().list(q="name = 'hello'", pageSize=10, fields=files(id, name)).execute()
Now, let us say the ID of the folder is 123, then how can I search for this folder using the ID instead of the name. I have tried replacing q with id = '123' as well as id = 123 but none of it seems to work. The closest query I was able to find here was:
q = "'123' in parents"
But this will return all the files whose parent folder is the folder we are looking for.
Is there any way in which we can directly search folders or files by their ID?
Thank you.
I believe your goal is as follows.
You want to retrieve the folder metadata by the folder ID Drive API with googleapis for python.
In this case, how about the following sample script?
folderId = "###" # Please set the folder ID.
res = service.files().get(fileId=folderId, fields="id, name").execute()
In this case, "Files: get" method is used.
Additional information:
If you want to retrieve the folder list just under the specific folder, you can use the following modified search query.
q = "'123' in parents and mimeType='application/vnd.google-apps.folder'"
In this case, 123 is the top folder ID.
In this modification, the retrieved mimeType is given. By this, only the folder list can be retrieved.
If you want to retrieve the file list except for the folder, you can also use the following search query.
q = "'123' in parents and not mimeType='application/vnd.google-apps.folder'"
If you want to retrieve the folder ID from the folder name, you can use the following sample script. In this case, when there are same folder names of hello, those folders are returned.
res = service.files().list(q="name = 'hello' and mimeType='application/vnd.google-apps.folder'", pageSize=10, fields="files(id, name)").execute()
Note:
In your showing script, please modify fields=files(id, name) of service.files().list(q="name = 'hello'", pageSize=10, fields=files(id, name)).execute() to fields="files(id, name)".
References:
Files: get
Search for files and folders

Is there a way to upload a file to a parent folder in google drive? [duplicate]

I'm trying to copy files from a local machine to a specific folder in GDrive using PyDrive. If the target folder does not yet exist, I want to create it. Here is the relevant section of my code:
gfile = drive.CreateFile({'title':'dummy.csv',
'mimeType':'text/csv',
'parent': tgt_folder_id})
gfile.SetContentFile('dummy.csv')
gfile.Upload() # Upload it
I am definitely creating/finding the target folder correctly, and the tgt_folder_id is correct, but PyDrive always writes the file to the root folder of my Google Drive, not the target folder I've specified via the 'parent' parameter.
What am I doing wrong here?
OK, looks like this is how you do it:
gfile = drive.CreateFile({'title':'dummy.csv', 'mimeType':'text/csv',
"parents": [{"kind": "drive#fileLink","id": tgt_folder_id}]})
The "parents" map is used in the Google Drive SDK, which PyDrive is supposed to wrap. But the very few examples I've seen with PyDrive use "parent" and don't seem to work.
Anyway, hope this helps anybody else who hits the same problem.
Ahoj #i-am-nik, to list subfolders you may use slightly altered line:
file_list = drive.ListFile({'q': 'trashed=false', 'maxResults': 10}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
This way it will list both folders and subfolders (of course, if you have many files, you may need to change maxResults value or add narrowing query.

Iterate over files in databricks Repos

I would like to iterate over some files in a folder that has its path in databricks Repos.
How would one do this? I don't seem to be able to access the files in Repos
I have added a picture that shows what folders i would like to access (the dbrks & sql folders)
Thanks :)
Image of the repo folder hierarchy
You can read files from repo folders. The path is /mnt/repos/, this is the top folder when opening the repo window. You can then iterate yourself over these files.
Whenever you find the file you want you can read it with (for example) Spark. Example if you want to read a CSV file.
spark.read.format("csv").load(
path, header=True, inferSchema=True, delimiter=";"
)
If you just want to list files in the repositories, then you can use the list command of Workspace REST API. Using it you can implement recursive listing of files. The actual implementation would different, based on your requirements, like, if you need to generate a list of full paths vs. list with subdirectories, etc. This could be something like this (not tested):
import requests
my_pat = "generated personal access token"
workspace_url = "https://name-of-workspace"
def list_files(base_path: str):
lst = requests.request(method='get',
url=f"{workspace_url}/api/2.0/workspace/list",
headers={"Authentication": f"Bearer {my_pat}",
json={"path": base_path}).json()["objects"]
results = []
for i in lst:
if i["object_type"] == "DIRECTORY" or i["object_type"] == "REPO":
results.extend(list_files(i["path"]))
else:
results.append(i["path"])
return results
all_files = list_files("/Repos/<my-initial-folder")
But if you want to read a content of the files in the repository, then you need to use so-called Arbitrary Files support that is available since DBR 8.4.

How to find the sub folder id in Google Drive using pydrive in Python?

The directory stricture on Google Drive is as follows:
Inside mydrive/BTP/BTP-4
I need to get the folder ID for BTP-4 so that I can transfer a specific file from the folder. How do I do it?
fileList = GoogleDrive(self.driveConn).ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file in fileList:
if (file['title'] == "BTP-4"):
fileID = file['id']
print(remoteFile, fileID)
return fileID
Will be able to give path like /MyDrive/BTP/BTP-4 and filename as "test.csv" and then directly download the file?
Answer:
Unfortunately, this is not possible.
More Information:
Google Drive supports creating multiple files or folders with the same name in the same location:
As a result of this, in some cases, providing a file path isn't enough to identify a file or folder uniquiely - in this case mydrive/Parent folder/Child folder/Child doc points to two different files, and mydrive/Parent folder/Child folder/Child folder points to five different folders.
You have to either directly search for the folder with its ID, or to get a folder/file's ID you have to search for children recursively through the folders like you are already doing.

Python - List files and folders in Bucket

I am playing around with the boto library to access an amazon s3 bucket. I am trying to list all the file and folders in a given folder in the bucket. I use this to get all the file and folders:
for key in bucket.list():
print key.name
This gives me all the files and folders within the root , along with the sub-folders it has the files within them, like this:
root/
file1
file2
folder1/file3
folder1/file4
folder1/folder2/file5
folder1/folder2/file6
How can I list only the contents of say folder1, where in it will list something like:
files:
file3
file4
folders:
folder2
I can navigate to a folder using the
for key in in bucket.list(prefix=path/to/folder/)
but in that case it lists the files in folder2 as files of folder1 because I am trying you use string manipuations on the bucket path. I have tried every scenario and it still breaks in case there are longer paths and when folders have multiple files and folders( and these folders have more files). Is there a recursive way to deal with this issue?
All of the information is the other answers is correct but because so many people store objects with path-like keys in S3, the API does provide some tools to help you deal with them.
For example, in your case if you wanted to list only the "subdirectories" of root without listing all of the objects below that you would do this:
for key in bucket.list(prefix='root/', delimiter='/'):
print(key.name)
which should produce the output:
file1
file2
folder1/
You could then do:
for key in bucket.list(prefix='root/folder1/', delimiter='/'):
print(key.name)
and get:
file3
file4
folder2/
And so forth. You can probably accomplish what you want with this approach.
What I found most difficult to fully grasp about S3 is that it is simply a key/value store and not a disk or other type of file-based store that most people are familiar with. The fact that people refer to keys as folders and values as files helps to lend to the initial confusion of working with it.
Being a key/value store, the keys are simply just identifiers and not actual paths into a directory structure. This means that you don't need to actually create folders before referencing them, so you can simply put an object in a bucket at a location like /path/to/my/object without first having to create the "directory" /path/to/my.
Because S3 is a key/value store, the API for interacting with it is more object & hash based than file based. This means that, whether using Amazon's native API or using boto, functions like s3.bucket.Bucket.list will list all the objects in a bucket and optionally filter on a prefix. If you specify a prefix /foo/bar then everything with that prefix will be listed, including /foo/bar/file, /foo/bar/blargh/file, /foo/bar/1/2/3/file, etc.
So the short answer is that you will need to filter out the results that you don't want from your call to s3.bucket.Bucket.list because functions like s3.bucket.Bucket.list, s3.bucket.Bucket.get_all_keys, etc. are all designed to return all keys under the prefix that you specify as a filter.
S3 has no concept of "folders" as may think of. It's a single-level hierarchy where files are stored by key.
If you need to do a single-level listing inside a folder, you'll have to constraint the listing in your code. Something like if key.count('/')==1

Categories

Resources