Load xlsx file from drive in colaboratory

Load xlsx file from drive in colaboratory - python

How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive is a pydrive.drive.GoogleDrive object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last)
in ()
----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.

You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.
Here's a full example:
https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is:
https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install... line installs the xlrd library, which is needed to read Excel files.

Perhaps a simpler method:
#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAå
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')
# #When done,
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')

First, I import io, pandas and files from google.colab
import io
import pandas as pd
from google.colab import files
Then I upload the file using an upload widget
uploaded = files.upload()
You will something similar to this (click on Choose Files and upload the xlsx file):
Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:
df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))
And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:
First, move the io call to another variable
xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))
And then, use the new variable to specify the sheet name, like this:
df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')

import pandas as pd
xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)
if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary

Easiest way I found so far.
Pretty similar to what we do on desktop.
Considering you uploaded the file to your Google Drive folder:
On the left bar click on Files ( below the {x} )
Select Mount Driver > drive > folder > file (left click and Copy Path)
After that just go to the code and past the path
pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')

Related

Find name of the uploaded CSV in Python / Pandas

I'm trying to fetch the name of the file I upload. I'm wrote a program which does a statistical test based on the data in the file, the program is currently set up in two steps:
1 - upload the file using the following methods:
from google.colab import files
import io
uploaded = files.upload()
This triggers a small "uploader" as a widget
I then upload the CSV file and my next set of code only needs to read the file name, here's the code
2 - read the data by specifying uploaded file name (let's say "filename" for ex.)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))
Every time I run this code, I need to manually update the name of the file, I'm trying to automate the part of fetching the filename so it can be read automatically.
Thanks
To upload the file:
from google.colab import files
import numpy as np
import pandas as pd
import io
uploaded = files.upload()
To read the file: (currently name of the file needs to be updated manually each time)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))

The following contains the name of your csv
list(uploaded.keys())[0]
so your line should look like
data = pd.read_csv(io.BytesIO(uploaded[list(uploaded.keys())[0]]))

Read a csv file stored in Google Drive

I am a beginner with Python. I have already enabled the Google APIs, would like to read a csv file stored in My Drive as a pandas data frame by using python. Is it possible to do it?
Thank you!

You can try this way:
import pandas as pd
import requests
from io import StringIO
orig_url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id = orig_url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(dwn_url).text
csv_raw = StringIO(url)
dfs = pd.read_csv(csv_raw)

If you have your folder synced to your machine it's simple enough just specifying the file path similar to this
import pandas as pd
test= pd.read_excel ('C:/Users/person/OneDrive - company/Documents/Projects/cortex/Group_status.xlsx')
print(test)

If you want to learn also how to use the Drive API, you could follow this Python quickstart guide.

How to extract the name of the file uploaded on a jupyter file using python?

My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?

Saving pandas dataframe variable to csv from google compute engine to google storage bucket without saving to disk first

I have a google cloud storage bucket called google-storage-bucket-1.
I'm connected my compute engine instance and I have a pandas dataframe variable which is created in python as a temporary variable called df1.
I want to save the dataframe as a csv file into the bucket. I use the following command.
import pandas as pd
df1.to_csv('gs://google-storge-bucket-1/test/dataframe1.csv')
But I get the following error,
OSError: Forbidden: https://www.googleapis.com/upload/storage/v1/b/xxx/o
Insufficient Permission
Whats the proper command to save the file to the bucket without saving it to disk first?

Like this:
from google.cloud import storage
import os
from io import StringIO
f = StringIO() ## this is to avoid creating local file
df1.to_csv(f)
f.seek(0)
gcs = storage.Client()
gcs.get_bucket('google-storge-bucket-1').blob('dataframe1.csv').upload_from_file(f, content_type='text/csv')

How to use the Pandas 'sep' command in Google Colab?

So, I used Jupyter Notebook and there using the 'sep' command was pretty simple. But now I'm slowly migrating to Google Colab, and while I can find the file and build the DataFrame with 'pd.read_csv()', I can't seem to separate the columns with the 'sep = ' command!
I mounted the Drive and located the file:
import pandas as pd
from google.colab import drive
drive.mount('/content/gdrive')
with open('/content/gdrive/My Drive/wordpress/cousins.csv','r') as f:
f.read()
Then I built the Dataframe:
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv',sep=";")
The dataframe is built, but it is not separated by columns! Below is a screenshot:
Built DataFrame
Last edit: Turns out the problem was with the data I was trying to use, because it also didn't work on Jupyter. There is no problem with the 'sep' command the way it was being used!
PS: I also tried 'sep='.'' and 'sep = ','' to see if it works, and nothing.
I downloaded the data as a 'csv' table from Football-Reference, paste it on excel, saved as a csv (UTF-8), an example of the file can be found here:
Pastebin Example File

This works for me:
My data:
a,b,c
5,6,7
8,9,10
You don't need sep for comma separated file.
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# suppose I have data in my Google Drive in the file path
# GoogleColaboratory/data/so/a.csv
# The folder GoogleColaboratory is in my Google Drive.
df = pd.read_csv('drive/My Drive/GoogleColaboratory/data/so/a.csv')
df.head()

Instead of
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', sep=";")
Use
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', delimiter=";")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Load xlsx file from drive in colaboratory - python

Related

Find name of the uploaded CSV in Python / Pandas

Read a csv file stored in Google Drive

How to extract the name of the file uploaded on a jupyter file using python?

Saving pandas dataframe variable to csv from google compute engine to google storage bucket without saving to disk first

How to use the Pandas 'sep' command in Google Colab?

Categories

Resources