Sorry for my english. I use pydrive for work whith google drive api. I want get list of files. I do it like this:
return self.g_drive.ListFile({'q': 'trashed=false'}).GetList()
this return me list of files. But it list contains delete files. I think 'q': 'trashed=false' it get only exist files, not in the bucket.
How i can get only exist files and files shared with me
Remove the trashed=false and query to get shared files is:
sharedWithMe
Also there is no concept of bucket in google drive
Query to use:
{'q': 'sharedWithMe'}
EDIT
I still believe trashed=false should work
Work around:
There must be a better way but a trick is to do the following:
list_of_trash_files = drive.ListFile({'q': 'trashed=true'})
list_of_all_files = drive.ListFile({'q': ''})
final_required_list = set(list_of_all_files) - set(list_of_trash_files)
Related
I'm trying to copy files from a local machine to a specific folder in GDrive using PyDrive. If the target folder does not yet exist, I want to create it. Here is the relevant section of my code:
gfile = drive.CreateFile({'title':'dummy.csv',
'mimeType':'text/csv',
'parent': tgt_folder_id})
gfile.SetContentFile('dummy.csv')
gfile.Upload() # Upload it
I am definitely creating/finding the target folder correctly, and the tgt_folder_id is correct, but PyDrive always writes the file to the root folder of my Google Drive, not the target folder I've specified via the 'parent' parameter.
What am I doing wrong here?
OK, looks like this is how you do it:
gfile = drive.CreateFile({'title':'dummy.csv', 'mimeType':'text/csv',
"parents": [{"kind": "drive#fileLink","id": tgt_folder_id}]})
The "parents" map is used in the Google Drive SDK, which PyDrive is supposed to wrap. But the very few examples I've seen with PyDrive use "parent" and don't seem to work.
Anyway, hope this helps anybody else who hits the same problem.
Ahoj #i-am-nik, to list subfolders you may use slightly altered line:
file_list = drive.ListFile({'q': 'trashed=false', 'maxResults': 10}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
This way it will list both folders and subfolders (of course, if you have many files, you may need to change maxResults value or add narrowing query.
I would like to iterate over some files in a folder that has its path in databricks Repos.
How would one do this? I don't seem to be able to access the files in Repos
I have added a picture that shows what folders i would like to access (the dbrks & sql folders)
Thanks :)
Image of the repo folder hierarchy
You can read files from repo folders. The path is /mnt/repos/, this is the top folder when opening the repo window. You can then iterate yourself over these files.
Whenever you find the file you want you can read it with (for example) Spark. Example if you want to read a CSV file.
spark.read.format("csv").load(
path, header=True, inferSchema=True, delimiter=";"
)
If you just want to list files in the repositories, then you can use the list command of Workspace REST API. Using it you can implement recursive listing of files. The actual implementation would different, based on your requirements, like, if you need to generate a list of full paths vs. list with subdirectories, etc. This could be something like this (not tested):
import requests
my_pat = "generated personal access token"
workspace_url = "https://name-of-workspace"
def list_files(base_path: str):
lst = requests.request(method='get',
url=f"{workspace_url}/api/2.0/workspace/list",
headers={"Authentication": f"Bearer {my_pat}",
json={"path": base_path}).json()["objects"]
results = []
for i in lst:
if i["object_type"] == "DIRECTORY" or i["object_type"] == "REPO":
results.extend(list_files(i["path"]))
else:
results.append(i["path"])
return results
all_files = list_files("/Repos/<my-initial-folder")
But if you want to read a content of the files in the repository, then you need to use so-called Arbitrary Files support that is available since DBR 8.4.
I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?
Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.
I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)
I'm developing a python script that will upload files to a specific folder in my drive, as I come to notice, the drive api provides an excellent implementation for that, but I did encountered one problem, how do I delete multiple files at once?
I tried grabbing the files I want from the drive and organize their Id's but no luck there... (a snippet below)
dir_id = "my folder Id"
file_id = "avoid deleting this file"
dFiles = []
query = ""
#will return a list of all the files in the folder
children = service.files().list(q="'"+dir_id+"' in parents").execute()
for i in children["items"]:
print "appending "+i["title"]
if i["id"] != file_id:
#two format options I tried..
dFiles.append(i["id"]) # will show as array of id's ["id1","id2"...]
query +=i["id"]+", " #will show in this format "id1, id2,..."
query = query[:-2] #to remove the finished ',' in the string
#tried both the query and str(dFiles) as arg but no luck...
service.files().delete(fileId=query).execute()
Is it possible to delete selected files (I don't see why it wouldn't be possible, after all, it's a basic operation)?
Thanks in advance!
You can batch multiple Drive API requests together. Something like this should work using the Python API Client Library:
def delete_file(request_id, response, exception):
if exception is not None:
# Do something with the exception
pass
else:
# Do something with the response
pass
batch = service.new_batch_http_request(callback=delete_file)
for file in children["items"]:
batch.add(service.files().delete(fileId=file["id"]))
batch.execute(http=http)
If you delete or trash a folder, it will recursively delete/trash all of the files contained in that folder. Therefore, your code can be vastly simplified:
dir_id = "my folder Id"
file_id = "avoid deleting this file"
service.files().update(fileId=file_id, addParents="root", removeParents=dir_id).execute()
service.files().delete(fileId=dir_id).execute()
This will first move the file you want to keep out of the folder (and into "My Drive") and then delete the folder.
Beware: if you call delete() instead of trash(), the folder and all the files within it will be permanently deleted and there is no way to recover them! So be very careful when using this method with a folder...
I'm trying to copy files from a local machine to a specific folder in GDrive using PyDrive. If the target folder does not yet exist, I want to create it. Here is the relevant section of my code:
gfile = drive.CreateFile({'title':'dummy.csv',
'mimeType':'text/csv',
'parent': tgt_folder_id})
gfile.SetContentFile('dummy.csv')
gfile.Upload() # Upload it
I am definitely creating/finding the target folder correctly, and the tgt_folder_id is correct, but PyDrive always writes the file to the root folder of my Google Drive, not the target folder I've specified via the 'parent' parameter.
What am I doing wrong here?
OK, looks like this is how you do it:
gfile = drive.CreateFile({'title':'dummy.csv', 'mimeType':'text/csv',
"parents": [{"kind": "drive#fileLink","id": tgt_folder_id}]})
The "parents" map is used in the Google Drive SDK, which PyDrive is supposed to wrap. But the very few examples I've seen with PyDrive use "parent" and don't seem to work.
Anyway, hope this helps anybody else who hits the same problem.
Ahoj #i-am-nik, to list subfolders you may use slightly altered line:
file_list = drive.ListFile({'q': 'trashed=false', 'maxResults': 10}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
This way it will list both folders and subfolders (of course, if you have many files, you may need to change maxResults value or add narrowing query.