List of files in a google drive folder with python - python

I've got the exact same question as the one asked on this post: List files and folders in a google drive folder
I don't figure out in the google drive rest api documentation how to get a list of files in a folder of google drive

You can look here for an example of how to list files in Drive: https://developers.google.com/drive/api/v3/search-files . You need to construct a query that lists the files in a folder: use
q = "'1234' in parents"
where 1234 is the ID of the folder that you want to list. You can modify the query to list all the files of a particular type (such as all jpeg files in the folder), etc.

Here's a hacky-yet-successful solution. This actually gets all the files from a particular Google Drive folder (in this case, a folder called "thumbnails"). I needed to get (not just list) all the files from a particular folder and perform image adjustments on them, so I used this code:
`# First, get the folder ID by querying by mimeType and name
folderId = drive.files().list(q = "mimeType = 'application/vnd.google-apps.folder' and name = 'thumbnails'", pageSize=10, fields="nextPageToken, files(id, name)").execute()
# this gives us a list of all folders with that name
folderIdResult = folderId.get('files', [])
# however, we know there is only 1 folder with that name, so we just get the id of the 1st item in the list
id = folderIdResult[0].get('id')
# Now, using the folder ID gotten above, we get all the files from
# that particular folder
results = drive.files().list(q = "'" + id + "' in parents", pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
# Now we can loop through each file in that folder, and do whatever (in this case, download them and open them as images in OpenCV)
for f in range(0, len(items)):
fId = items[f].get('id')
fileRequest = drive.files().get_media(fileId=fId)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, fileRequest)
done = False
while done is False:
status, done = downloader.next_chunk()
fh.seek(0)
fhContents = fh.read()
baseImage = cv2.imdecode(np.fromstring(fhContents, dtype=np.uint8), cv2.IMREAD_COLOR)

See the API for the available functions...
You can search for files with the Drive API files: list method. You can call Files.list without any parameters, which returns all files on the user's drive. By default, Files.list only returns a subset of properties for a resource. If you want more properties returned, use the fields parameter that specifies which properties to return in the query string q. To make your search query more specific, you can use several operators with each query property.

# Import PyDrive and associated libraries.
# This only needs to be done once per notebook.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# List .txt files in the root.
#
# Search query reference:
# https://developers.google.com/drive/v2/web/search-parameters
listed = drive.ListFile({'q': "title contains 'CV'"}).GetList()
for file in listed:
print('title {}, id {}'.format(file['title'], file['id']))

Easiest solution if your are working with google collab.
Connect to your Drive in the collab notebook:
from google.colab import drive
drive.mount('/content/drive')
Use the special command '!' with the "ls" command to see the list of files in the path of folder drive you specify.
!ls PATH OF YOUR DRIVE FOLDER
Example: !ls drive/MyDrive/Folder1/Folder2/

Related

get list of all folders in firebase storage using python

i'm using django app and firebase-admin and i have a files and sub folders(nested folders) as shown in image
each folder has its own files and folders.
i want to get a List of all folders and files inside each root folder ,
my code is :
service_account_key = 'mysak.json'
cred = firebase_admin.credentials.Certificate(service_account_key)
default_app = firebase_admin.initialize_app(cred, {
'storageBucket': 'myBucketUrl'
})
bucket = storage.bucket()
blob = list(bucket.list_blobs()) #this is returning all objects and files in storage not for the folder i want
for example i want all files in first_stage/math so i can get a url for each file
i have also read the docs about firebase storage and there is no such a method
Based on documentation List objects | Cloud Storage | Google Cloud you can do something like
bucket.list_blobs(prefix="first_stage/math")

Remove CSV's, add new CSV's with python in Google API [duplicate]

I have this script written in python which looks thrue folder 'CSVtoGD', list every CSV there and send those CSV's as independent sheets to my google drive. I am trying to write a line which will delete the old files when I run the program again. What am I missing here? I am trying to achieve that by using:
sh = gc.del_spreadsheet(filename.split(".")[0]+" TTF")
Unfortunately the script is doing the same thing after adding this line. It is uploading new files but not deleting old ones.
Whole script looks like that
import gspread
import os
gc = gspread.oauth(credentials_filename='/users/user/credentials.json')
os.chdir('/users/user/CSVtoGD')
files = os.listdir()
for filename in files:
if filename.split(".")[1] == "csv":
folder_id = '19vrbvaeDqWcxFGwPV82APWYTmB'
sh = gc.del_spreadsheet(filename.split(".")[0]+" TTF")
sh = gc.create(filename.split(".")[0]+" TTF", folder_id)
content = open(filename, 'r').read().encode('utf-8')
gc.import_csv(sh.id, content)
Everything is working fine, CSVs from folder are uploaded to google drive, my problem is with deleting the old CSV (with the same name as new ones)
When I saw the document of gspread, it seems that the argument of the method of del_spreadsheet is the file ID. Ref When I saw your script, you are using the filename as the argument. I thought that this might be the reason for your issue. When this is reflected in your script, it becomes as follows.
From:
sh = gc.del_spreadsheet(filename.split(".")[0]+" TTF")
To:
sh = gc.del_spreadsheet(gc.open(filename.split(".")[0] + " TTF").id)
Note:
When the Spreadsheet of the filename of filename.split(".")[0] + " TTF" is not existing, an error occurs. Please be careful about this.
Reference:
del_spreadsheet(file_id)
Added:
From your reply of When I try do delete other file using this method from My Drive it is working well., it was found that my proposed modification can be used for "My Drive". But, it seems that this cannot be used for the shared drive.
When I saw the script of gspread again, I noticed that the current request cannot search the files in the shared drive using the filename. And also, I confirmed that in the current gspread, the Spreadsheet ID cannot be retrieved using gspread. Because the files cannot be searched from all shared drives. By this, I would like to propose the following modified script.
Modified script:
import gspread
import os
from googleapiclient.discovery import build
gc = gspread.oauth(credentials_filename='/users/user/credentials.json')
service = build("drive", "v3", credentials=gc.auth)
def getSpreadsheetId(filename):
q = f"name='{filename}' and mimeType='application/vnd.google-apps.spreadsheet' and trashed=false" # or q = "name='" + filename + "' and mimeType='application/vnd.google-apps.spreadsheet' and trashed=false"
res = service.files().list(q=q, fields="files(id)", corpora="allDrives", includeItemsFromAllDrives=True, supportsAllDrives=True).execute()
items = res.get("files", [])
if not items:
print("No files found.")
exit()
return items[0]["id"]
os.chdir('/users/user/CSVtoGD')
files = os.listdir()
for filename in files:
fname = filename.split(".")
if fname[1] == "csv":
folder_id = '19vrbvaeDqWcxFGwPV82APWYTmB'
oldSpreadsheetId = getSpreadsheetId(fname[0] + " TTF")
sh = gc.del_spreadsheet(oldSpreadsheetId)
sh = gc.create(fname[0] + " TTF", folder_id)
content = open(filename, "r").read().encode("utf-8")
gc.import_csv(sh.id, content)
In this modification, in order to retrieve the Spreadsheet ID from the filename in the shared drive, googleapis for python is used. Ref
But, in this case, it supposes that you have the permission for writing to the shared drive. Please be careful about this.

How do I iterate through multiple text(.txt) files in a folder on Google Drive to upload on Google Colab?

I have a folder on Google Drive that consists of multiple text files. I want to upload them on Google colab by iterating through each file in the folder. It would be great if someone could help me out with this
in order to read txt files from your google drive (not .zip or .rar folder):
First you have to mount (like most of colab codes which work along google drive at the same time)
from google.colab import drive
drive.mount('/content/drive')
then the following code will read any text file (any file ends with .txt) in path folder and save them to new_list.
import os
new_list = []
for root, dirs, files in os.walk("/content/.../folder_of_txt_files"):
for file in files:
if file.endswith('.txt'):
with open(os.path.join(root, file), 'r') as f:
text = f.read()
new_list.append(text)
obviously, you can save into a dictionary or dataframe or any data structure you prefer.
note: idk why but sometimes you need to change 'r' to 'rb'
You need a listOfFileNames.txt file located in the same folder; for example, I have a listOfDates.txt files that stores the files names titled by the date.
import numpy as np
import pandas pd
#listOfFilesNames = ['8_26_2021', '8_27_2021', '8_29_2021', '8_30_2021']
savedListOfFileNames = pd.read_csv('listOfFilesNames.txt', header = None).copy()
emptyVectorToStoreAllOfTheData = []
listOfFileNames = []
for iteratingThroughFileNames in range(len(savedListOfFileNames)):
listOfFileNames.append(savedListOfFileNames[0][iteratingThroughFileNames])
for iteratingThroughFileNames in range(len(listOfFileNames)):
currentFile = pd.read_csv(listOfFileNames[0][iteratingThroughFileNames] + '.txt').copy()
for iteratingThroughCurrentFile in range(len(currentFile)):
emptyVectorToStoreAllOfTheData.append(currentFile[0][iteratingThroughCurrentfile])
If you don't know how to access your folders and files, then you need to (1) mount your drive and (2) define a createWorkingDirectoryFunction:
import os
from google.colab import drive
myGoogleDrive = drive.mount('/content/drive', force_remount = True)
def createWorkingDirectoryFunction(projectFolder, rootDirectory):
if os.path.isdir(rootDirectory + projectFolder) == False:
os.mkdir(rootDirectory + projectFolder)
os.chdir(rootDirectory + projectFolder)
projectFolder = '/folderContainingMyFiles/' # Folder you want to access and/or create
rootDirectory = '/content/drive/My Drive/Colab Notebooks'
createWorkingDirectoryFunction(projectFolder, rootDirectory)

Make a deep copy of Google Drive file

Is it possible to perform a "deep" copy of Google Drive files, so that the copied file doesn't point to the same file object as the original? I'd like to be able to copy a file and have the copy be completely independent of the original, such that any modifications that are made to the copy don't also show up in the original. Using the following code I'm able to:
Create a folder in Google Drive
Copy a file into the new folder
But the problem is that any changes that are made to the copy also show up in the original. I'd like for the copied file to be a completely independent file. Is this possible?
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
#load previously generated credentials file
gauth.LoadCredentialsFile("mycreds3.txt")
drive = GoogleDrive(gauth)
#define ID of file to be copied
template_file_id = "1RQWYeeth-Ph ..."
#create a new folder to store the copied file
folder = drive.CreateFile({"title":"test_folder", 'mimeType': 'application/vnd.google-apps.folder'})
folder.Upload()
folder_id = folder['id']
#copy file into newly created folder
drive.auth.service.files().copy(fileId=template_file_id,body={'parents':[{"kind":'drive#file',"id":folder_id}], 'title':'new_file_title'}).execute()
EDIT:
I was able to perform a deep copy by copying a shared file. When a file is copied from a shared file (which doesn't have a shortcut in Drive that links to the original), a deep copy is created such that modifications to the copied file don't show up in the original. Copying shared folders this way threw an error, but individual files worked just fine.
destination_folder_id = 'YTRCA18EE ...'
shared_files = drive.ListFile({'q':'sharedWithMe'}).GetList()
for file in shared_files:
drive.auth.service.files().copy(fileId=file['id'],body={'parents':[{"kind":'drive#file',"id":destination_folder_id}], 'title':file['title']}).execute()
Lets take this step by step
The way this library works is that all calls must go through a service. In this case a drive service will give your application access to all the methods available in the Google drive api.
drive_service = GoogleDrive(gauth)
You have named your variable drive when creating your drive_service for constastancy.
Creating a new file and uploading it to google drive is a two part process. The first part is the file_metadata , that being the name and description of the file. The second is the media or the actual file data itself.
file_metadata = {'name': 'photo.jpg'}
media = MediaFileUpload('files/photo.jpg', mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print 'File ID: %s' % file.get('id')
Note: all fields does is limit the response returned by the api to only the file id.

Manage files from public Google Drive URL using PyDrive

I`m using PyDrive QuickStart script to list my Google Drive files.
Code:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
print(file_list)
I'm able to list my files normally, but I need to list and manage files from another public drive URL (which is not the my personal authenticated drive) from my already authenticated GoogleDrive account like if I was using requests lib.
Any ideas how to do it?
You need to get the folder ID. You can find the ID in the URL of the folder. An example would be:
https://drive.google.com/open?id=0B-schRXnDFZeX0t0RnhQVXXXXXX (the part of the URL after the id=).
List contents of a folder based on ID. Given your code you replace file_list = ... with:
file_id = '<Your folder id here.>'
file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % file_id}).GetList()
If this does not work, you may have to add the remote folder to your Google Drive using the "Add to Drive" button in the top right corner of the shared folder when opened in a browser.
2.1 Creating a file in a folder can be done like so:
file_object = drive.CreateFile({
"parents": [{"kind": "drive#fileLink",
"id": parent_id}],
'title': file_name,
# (Only!) If the new 'file' object is going be a folder:
'mimeType': "application/vnd.google-apps.folder"
})
file_object.Upload()
If this fails check whether you have write permissions to the folder.
2.2 Deleting/Trashing a file can be done with the updated version available from GitHub: pip install instructions, Delete/Trash/UnTrash documentation
Finally, there is a feature request to Upload to folders as described in 2.1, and listing files of a folder, as described in 2. - if you find the above not to work you can add this as an issue / feature request to the repository.

Categories

Resources