I want to upload a dataframe as csv from colab to google drive.I tried a lot but no luck. I can upload a simple text file but failed to upload a csv.
I tried the following code:
import pandas as pd
df=pd.DataFrame({1:[1,2,3]})
df.to_csv('abc',sep='\t')
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
uploaded = drive.CreateFile({'title': 'sample.csv', 'mimeType':'csv'})
uploaded.SetContentFile('abc')
uploaded.Upload()
It may be easier to use mounting instead of pydrive.
from google.colab import drive
drive.mount('drive')
After authentication, you can copy your csv file.
df.to_csv('data.csv')
!cp data.csv "drive/My Drive/"
Without using !cp command
from google.colab import drive
Mounts the google drive to Colab Notebook
drive.mount('/drive')
Make sure the folder Name is created in the google drive before uploading
df.to_csv('/drive/My Drive/folder_name/name_csv_file.csv')
# Import Drive API and authenticate.
from google.colab import drive
# Mount your Drive to the Colab VM.
drive.mount('/gdrive')
# Write the DataFrame to CSV file.
with open('/gdrive/My Drive/foo.csv', 'w') as f:
df.to_csv(f)
The other answers almost worked but I needed a small tweak:
from google.colab import drive
drive.mount('drive')
df.to_csv('/content/drive/My Drive/filename.csv', encoding='utf-8', index=False)
the /content/ bit proved necessary
If you want to save locally then you can use this
f.to_csv('sample.csv')
from google.colab import files
files.download("sample.csv")
If you don't want to work with Pandas, do this:
df_mini.coalesce(1)\
.write\
.format('com.databricks.spark.csv')\
.options(header='true', delimiter='\t')\
.save('gdrive/My Drive/base_mini.csv')
Related
I am trying to read a gsheet file in Google drive using Google Collab. I tried using drive.mount to get the file but I don't know how to get a dataframe with pandas from there. Here what I tried to do :
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
import os
import pandas as pd
from google.colab import drive
# setup
gc = gspread.authorize(GoogleCredentials.get_application_default())
drive.mount('/content/drive',force_remount=True)
# read data and put it in a dataframe
gsheets = gc.open_by_url('/content/drive/MyDrive/test/myGoogleSheet.gsheet')
As you can tell, I am quite lost with the libraries. I want to use the ability to access the drive with the drive library, to get the content from gspread, and read with pandas.
Can anyone help me find a solution, please ?
I have found a solution for my problem by looking further into the library gspread. I was able to load the gsheet file by id or by url which I did not know. Then I manage to get the content of a sheet and read it as pandas dataframe. Here is the code :
from google.colab import auth
auth.authenticate_user()
import gspread
import pandas as pd
from oauth2client.client import GoogleCredentials
# setup
gc = gspread.authorize(GoogleCredentials.get_application_default())
# read data and put it in a dataframe
# spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/google_sheet_id/edit#gid=0')
spreadsheet = gc.open_by_key('google_sheet_id')
wks = spreadsheet.worksheet('sheet_name')
data = wks.get_all_values()
headers = data.pop(0)
df = pd.DataFrame(data, columns=headers)
print(df)
could anyone please inform me how to automatically get a shareable link of a file in our google drive using Colab notebook?
Thank you.
You can use xattr to get file_id
from subprocess import getoutput
from IPython.display import HTML
from google.colab import drive
drive.mount('/content/drive') # access drive
# need to install xattr
!apt-get install xattr > /dev/null
# get the id
fid = getoutput("xattr -p 'user.drive.id' '/content/drive/My Drive/Colab Notebooks/R.ipynb' ")
# make a link and display it
HTML(f"<a href=https://colab.research.google.com/drive/{fid} target=_blank>notebook</a>")
Here I access my notebook file at /Colab Notebooks/R.ipynb and make a link to open it in Colab.
In my case, the suggested solution doesn't work. So I replaced the colab URL with "https://drive.google.com//file/d/"
Below what I used:
def get_shareable_link(file_path):
fid = getoutput("xattr -p 'user.drive.id' " + "'" + file_path + "'")
print(fid)
# make a link and display it
return HTML(f"<a href=https://drive.google.com/file/d/{fid} target=_blank>file URL</a>")
get_shareable_link("/content/drive/MyDrive/../img_01.jpg")
If you look into the documentation you can see a section that explain how to list files from Drive.
Using that and reading the documentation of the library used, I've created this script:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
files = drive.ListFile().GetList()
for file in files:
keys = file.keys()
if 'webContentLink' in keys:
link = file['webContentLink']
elif 'webViewLink' in keys:
link = file['webViewLink']
else:
link = 'No Link Available. Check your sharing settings.'
if 'name' in keys:
name = file['name']
else:
name = file['id']
print('name: {} link: {}'.format(name, link))
This is currently listing all files and providing a link to it.
You can then edit the function to find a specific file instead.
Hope this helps!
get_output wasn't defined for me in the other answers, but this worked instead, after doing the drive.mount command: (test.zip needs to point to the right folder and file in your Drive!):
!apt-get install -qq xattr
filename = "/content/drive/My\ Drive/test.zip"
# Retrieving the file ID for a file in `"/content/drive/My Drive/"`:
id = !xattr -p 'user.drive.id' {filename}
print(id)
When I upload data using following code, the data vanishes once I get disconnected.
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
Please suggest me ways to upload my data so that the data remains intact even after days of disconnection.
I keep my data stored permanently in a .zip file in google drive, and upload it to the google colabs VM using the following code.
Paste it into a cell, and change the file_id. You can find the file_id from the URL of the file in google drive. (Right click on file -> Get shareable link -> find the part of the URL after open?id=)
##title uploader
file_id = "1BuM11fJJ1qdZH3VbQ-GwPlK5lAvXiNDv" ##param {type:"string"}
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1gLBqEWEBQDYbKCDigHnUXNTkzl-OslSO
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
fileId = drive.CreateFile({'id': file_id }) #DRIVE_FILE_ID is file id example: 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
print(fileId['title'])
fileId.GetContentFile(fileId['title']) # Save Drive file as a local file
!unzip {fileId['title']}
Keeping data in GDrive is good (#skaem).
If your data contains code, I can suggest you to simply git clone your source repository from Github (or any other code versioning service), at the beginning of your colab notebook.
This way, you can develop offline, and perform your experiments in the cloud whenever you need, with up-to-date code.
I have some data files uploaded on my google drive.
I want to import those files into google colab.
The REST API method and PyDrive method show how to create a new file and upload it on drive and colab. Using that, I am unable to figure out how to read the data files already present on my drive in my python code.
I am a total newbie to this. Can someone help me out?
(Update April 15 2018: The gspread is frequently being updated, so to ensure stable workflow I specify the version)
For spreadsheet file, the basic idea is using packages gspread and pandas to read spreadsheets in Drive and convert them to pandas dataframe format.
In the Colab notebook:
#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0
#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user() # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
Then I know 3 ways to read Google spreadsheets.
By file name:
spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By url:
spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By file key/ID:
spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
I shared the code above in a Colab notebook:
https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing
Source: https://github.com/burnash/gspread
!) Set your data to be publicly available then
for public spreadsheets:
from StringIO import StringIO # got moved to io in python3.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?
key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=
['Quradate'])
In [11]: df.head()
More here: Getting Google Spreadsheet CSV into A Pandas Dataframe
If private data sort of the same but you will have to do some auth gymnastics...
From Google Colab snippets
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Your spreadsheet name').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)
I am using Python 2.7 and I am trying to upload a file (*.txt) into a folder that is shared with me.
So far I was able to upload it to my Drive, but how to set to which folder. I get the url to where I must place this file.
Thank you
this is my code so far
def Upload(file_name, file_path, upload_url):
upload_url = upload_url
client = gdata.docs.client.DocsClient(source=upload_url)
client.api_version = "3"
client.ssl = True
client.ClientLogin(username, passwd, client.source)
filePath = file_path
newResource = gdata.docs.data.Resource(filePath,file_name)
media = gdata.data.MediaSource()
media.SetFileHandle(filePath, 'mime/type')
newDocument = client.CreateResource(
newResource,
create_uri=gdata.docs.client.RESOURCE_UPLOAD_URI,
media=media
)
the API you are using is deprecated. Use google-api-python-client instead.
Follow this official python quickstart guide to simply upload a file to a folder. Additionally, send parents parameter in request body like this: body['parents'] = [{'id': parent_id}]
Or, you can use PyDrive, a Python wrapper library which simplifies a lot of works dealing with Google Drive API. The whole code is as simple as this:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
f = drive.CreateFile({'parent': parent_id})
f.SetContentFile('cat.png') # Read local file
f.Upload() # Upload it