How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive is a pydrive.drive.GoogleDrive object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last)
in ()
----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.
You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.
Here's a full example:
https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is:
https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install... line installs the xlrd library, which is needed to read Excel files.
Perhaps a simpler method:
#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAĆ„
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')
# #When done,
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')
First, I import io, pandas and files from google.colab
import io
import pandas as pd
from google.colab import files
Then I upload the file using an upload widget
uploaded = files.upload()
You will something similar to this (click on Choose Files and upload the xlsx file):
Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:
df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))
And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:
First, move the io call to another variable
xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))
And then, use the new variable to specify the sheet name, like this:
df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')
import pandas as pd
xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)
if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary
Easiest way I found so far.
Pretty similar to what we do on desktop.
Considering you uploaded the file to your Google Drive folder:
On the left bar click on Files ( below the {x} )
Select Mount Driver > drive > folder > file (left click and Copy Path)
After that just go to the code and past the path
pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')
Related
I'm trying to fetch the name of the file I upload. I'm wrote a program which does a statistical test based on the data in the file, the program is currently set up in two steps:
1 - upload the file using the following methods:
from google.colab import files
import io
uploaded = files.upload()
This triggers a small "uploader" as a widget
I then upload the CSV file and my next set of code only needs to read the file name, here's the code
2 - read the data by specifying uploaded file name (let's say "filename" for ex.)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))
Every time I run this code, I need to manually update the name of the file, I'm trying to automate the part of fetching the filename so it can be read automatically.
Thanks
To upload the file:
from google.colab import files
import numpy as np
import pandas as pd
import io
uploaded = files.upload()
To read the file: (currently name of the file needs to be updated manually each time)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))
The following contains the name of your csv
list(uploaded.keys())[0]
so your line should look like
data = pd.read_csv(io.BytesIO(uploaded[list(uploaded.keys())[0]]))
I am a beginner with Python. I have already enabled the Google APIs, would like to read a csv file stored in My Drive as a pandas data frame by using python. Is it possible to do it?
Thank you!
You can try this way:
import pandas as pd
import requests
from io import StringIO
orig_url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id = orig_url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(dwn_url).text
csv_raw = StringIO(url)
dfs = pd.read_csv(csv_raw)
If you have your folder synced to your machine it's simple enough just specifying the file path similar to this
import pandas as pd
test= pd.read_excel ('C:/Users/person/OneDrive - company/Documents/Projects/cortex/Group_status.xlsx')
print(test)
If you want to learn also how to use the Drive API, you could follow this Python quickstart guide.
My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?
I have a google cloud storage bucket called google-storage-bucket-1.
I'm connected my compute engine instance and I have a pandas dataframe variable which is created in python as a temporary variable called df1.
I want to save the dataframe as a csv file into the bucket. I use the following command.
import pandas as pd
df1.to_csv('gs://google-storge-bucket-1/test/dataframe1.csv')
But I get the following error,
OSError: Forbidden: https://www.googleapis.com/upload/storage/v1/b/xxx/o
Insufficient Permission
Whats the proper command to save the file to the bucket without saving it to disk first?
Like this:
from google.cloud import storage
import os
from io import StringIO
f = StringIO() ## this is to avoid creating local file
df1.to_csv(f)
f.seek(0)
gcs = storage.Client()
gcs.get_bucket('google-storge-bucket-1').blob('dataframe1.csv').upload_from_file(f, content_type='text/csv')
So, I used Jupyter Notebook and there using the 'sep' command was pretty simple. But now I'm slowly migrating to Google Colab, and while I can find the file and build the DataFrame with 'pd.read_csv()', I can't seem to separate the columns with the 'sep = ' command!
I mounted the Drive and located the file:
import pandas as pd
from google.colab import drive
drive.mount('/content/gdrive')
with open('/content/gdrive/My Drive/wordpress/cousins.csv','r') as f:
f.read()
Then I built the Dataframe:
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv',sep=";")
The dataframe is built, but it is not separated by columns! Below is a screenshot:
Built DataFrame
Last edit: Turns out the problem was with the data I was trying to use, because it also didn't work on Jupyter. There is no problem with the 'sep' command the way it was being used!
PS: I also tried 'sep='.'' and 'sep = ','' to see if it works, and nothing.
I downloaded the data as a 'csv' table from Football-Reference, paste it on excel, saved as a csv (UTF-8), an example of the file can be found here:
Pastebin Example File
This works for me:
My data:
a,b,c
5,6,7
8,9,10
You don't need sep for comma separated file.
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# suppose I have data in my Google Drive in the file path
# GoogleColaboratory/data/so/a.csv
# The folder GoogleColaboratory is in my Google Drive.
df = pd.read_csv('drive/My Drive/GoogleColaboratory/data/so/a.csv')
df.head()
Instead of
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', sep=";")
Use
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', delimiter=";")