Read a csv file stored in Google Drive - python

I am a beginner with Python. I have already enabled the Google APIs, would like to read a csv file stored in My Drive as a pandas data frame by using python. Is it possible to do it?
Thank you!

You can try this way:
import pandas as pd
import requests
from io import StringIO
orig_url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id = orig_url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(dwn_url).text
csv_raw = StringIO(url)
dfs = pd.read_csv(csv_raw)

If you have your folder synced to your machine it's simple enough just specifying the file path similar to this
import pandas as pd
test= pd.read_excel ('C:/Users/person/OneDrive - company/Documents/Projects/cortex/Group_status.xlsx')
print(test)

If you want to learn also how to use the Drive API, you could follow this Python quickstart guide.

Related

How to read CSV file using Pandas (Jupyter notebooks)

(Very new coder, first time here, apologies if there are errors in writing)
I have a csv file I made from Excel called SouthKoreaRoads.csv and I'm supposed to read that csv file using Pandas. Below is what I used:
import pandas as pd
import os
SouthKoreaRoads = pd.read_csv("SouthKoreaRoads.csv")
I get a FileNotFoundError, and I'm really new and unsure how to approach this. Could anyone help, give advice, or anything? Many thanks in advance
just some explanation aside. Before you can use pd.read_csv to import your data, you need to locate your data in your filesystem.
Asuming you use a jupyter notebook or pyton file and the csv-file is in the same directory you are currently working in, you just can use:
import pandas as pd SouthKoreaRoads_df = pd.read_csv('SouthKoreaRoads.csv')
If the file is located in another directy, you need to specify this directory. For example if the csv is in a subdirectry (in respect to the python / jupyter you are working on) you need to add the directories name. If its in folder "data" then add data in front of the file seperated with a "/"
import pandas as pd SouthKoreaRoads_df = pd.read_csv('data/SouthKoreaRoads.csv')
Pandas accepts every valid string path and URLs, thereby you could also give a full path.
import pandas as pd SouthKoreaRoads_df = pd.read_csv('C:\Users\Ron\Desktop\Clients.csv')
so until now no OS-package needed. Pandas read_csv can also pass OS-Path-like-Objects but the use of OS is only needed if you want specify a path in a variable before accessing it or if you do complex path handling, maybe because the code you are working on needs to run in a nother environment like a webapp where the path is relative and could change if deployed differently.
please see also:
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
https://docs.python.org/3/library/os.path.html
BR
SouthKoreaRoads = pd.read_csv("./SouthKoreaRoads.csv")
Try this and see whether it could help!
Try to put the full path, like "C:/users/....".

How to extract the name of the file uploaded on a jupyter file using python?

My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?

Get access to zipped excel sheet online without saving it using python

I want to get access to a zipped excel sheet online using python without downloading it to my PC. The link is as follow https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip,
which points to a zipped excel. Does anyone know how to use python to deal with it? For example, I want to print the first row of the excel without unzipping and saving the file directly in my PC.
Downloading and unzipping a .zip file without writing to disk
I have found a similar question below, however, I cannot use this code to read the excel file.
You can use pandas to read the excel file.
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
resp = urlopen("https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip")
zipfile = ZipFile(BytesIO(resp.read()))
extracted_file = zipfile.open(zipfile.namelist()[0])
print(pd.read_excel(extracted_file))

How to use the Pandas 'sep' command in Google Colab?

So, I used Jupyter Notebook and there using the 'sep' command was pretty simple. But now I'm slowly migrating to Google Colab, and while I can find the file and build the DataFrame with 'pd.read_csv()', I can't seem to separate the columns with the 'sep = ' command!
I mounted the Drive and located the file:
import pandas as pd
from google.colab import drive
drive.mount('/content/gdrive')
with open('/content/gdrive/My Drive/wordpress/cousins.csv','r') as f:
f.read()
Then I built the Dataframe:
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv',sep=";")
The dataframe is built, but it is not separated by columns! Below is a screenshot:
Built DataFrame
Last edit: Turns out the problem was with the data I was trying to use, because it also didn't work on Jupyter. There is no problem with the 'sep' command the way it was being used!
PS: I also tried 'sep='.'' and 'sep = ','' to see if it works, and nothing.
I downloaded the data as a 'csv' table from Football-Reference, paste it on excel, saved as a csv (UTF-8), an example of the file can be found here:
Pastebin Example File
This works for me:
My data:
a,b,c
5,6,7
8,9,10
You don't need sep for comma separated file.
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# suppose I have data in my Google Drive in the file path
# GoogleColaboratory/data/so/a.csv
# The folder GoogleColaboratory is in my Google Drive.
df = pd.read_csv('drive/My Drive/GoogleColaboratory/data/so/a.csv')
df.head()
Instead of
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', sep=";")
Use
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', delimiter=";")

Load xlsx file from drive in colaboratory

How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive is a pydrive.drive.GoogleDrive object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last)
in ()
----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.
You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.
Here's a full example:
https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is:
https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install... line installs the xlrd library, which is needed to read Excel files.
Perhaps a simpler method:
#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAĆ„
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')
# #When done,
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')
First, I import io, pandas and files from google.colab
import io
import pandas as pd
from google.colab import files
Then I upload the file using an upload widget
uploaded = files.upload()
You will something similar to this (click on Choose Files and upload the xlsx file):
Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:
df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))
And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:
First, move the io call to another variable
xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))
And then, use the new variable to specify the sheet name, like this:
df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')
import pandas as pd
xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)
if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary
Easiest way I found so far.
Pretty similar to what we do on desktop.
Considering you uploaded the file to your Google Drive folder:
On the left bar click on Files ( below the {x} )
Select Mount Driver > drive > folder > file (left click and Copy Path)
After that just go to the code and past the path
pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')

Categories

Resources