So, I used Jupyter Notebook and there using the 'sep' command was pretty simple. But now I'm slowly migrating to Google Colab, and while I can find the file and build the DataFrame with 'pd.read_csv()', I can't seem to separate the columns with the 'sep = ' command!
I mounted the Drive and located the file:
import pandas as pd
from google.colab import drive
drive.mount('/content/gdrive')
with open('/content/gdrive/My Drive/wordpress/cousins.csv','r') as f:
f.read()
Then I built the Dataframe:
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv',sep=";")
The dataframe is built, but it is not separated by columns! Below is a screenshot:
Built DataFrame
Last edit: Turns out the problem was with the data I was trying to use, because it also didn't work on Jupyter. There is no problem with the 'sep' command the way it was being used!
PS: I also tried 'sep='.'' and 'sep = ','' to see if it works, and nothing.
I downloaded the data as a 'csv' table from Football-Reference, paste it on excel, saved as a csv (UTF-8), an example of the file can be found here:
Pastebin Example File
This works for me:
My data:
a,b,c
5,6,7
8,9,10
You don't need sep for comma separated file.
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
# suppose I have data in my Google Drive in the file path
# GoogleColaboratory/data/so/a.csv
# The folder GoogleColaboratory is in my Google Drive.
df = pd.read_csv('drive/My Drive/GoogleColaboratory/data/so/a.csv')
df.head()
Instead of
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', sep=";")
Use
df = pd.read_csv('/content/gdrive/My Drive/wordpress/cousins.csv', delimiter=";")
Related
I have loaded a excel into python (Google Colab), but I was wondering if there was a way of extracting the names of the excel (.xlsm) file. Please check attached image.
import pandas as pd
import io
from google.colab import files
uploaded = files.upload()
df = pd.read_excel(io.BytesIO(uploaded['202009 Testing - September - Diamond Plod Day & Night MKY021.xlsm']),sheet_name='1 D',header=8,usecols='BE,BH',nrows=4)
df1 = pd.read_excel(io.BytesIO(uploaded['202009 Testing - September - Diamond Plod Day & Night MKY021.xlsm']),sheet_name='1 D',header=3)
df=df.assign(PlodDate='D5')
df['PlodDate']=df1.iloc[0,3]
df=df.assign(PlodShift='D6')
df['PlodShift']=df1.iloc[1,3]
df =df.rename({'Qty.2':'Loads','Total (L)':'Litres'},axis=1)
df = df.reindex(columns=['PlodDate','PlodShift','Loads','Litres','DataSource'])
df=df.assign(DataSource='Name of the Source File')
df
Instead of the datasource='name of the source file', I want active excel sheet name.
Output should be:
Datasource='202009 Testing - September - Diamond Plod Day & Night MKY021'
As I have a file for every month, I just want a code that take the name of active excel sheet when I run the code.
I tried this code but it was not working in google colab.
import os
os.listdir('.')
Excel File Name:
Code Image:
Code in Google Colab
Excel File Attached
I have not used google colab, but I used to have a similar problem on how to extract sheet names of some Excel file. The solution turned out to be very simple:
using pandas as pd
excel_file = pd.ExcelFile("excel_file_name.xlsx")
sheet_names = excel_file.sheet_names
So, basically the idea is that you want to open the whole Excel file instead of specific sheet of it. This can be done by pd.ExcelFile( ... ). Once you have your excel file "open", you can get the names by some_excel_file.sheet_names. This is especially useful when you want to loop over all sheets in some excel file. For example, the code can be something like this:
excel_file = pd.ExcelFile("excel_file_name.xlsx")
sheet_names = excel_file.sheet_names
for sheet_name in sheet_names:
# do some operations here for this sheet
This is not a complete answer as I am not sure about Google Colab, but I hope this would give you an idea on what you can do to the sheet names.
I am doing something very simple, converting an excel spreadsheet to a pandas dataframe, but for some reason I keep getting this error: No such file or directory....
I have the file downloaded and saved to my computer and restarted my program, so I don't know what could be wrong. Any clue what's up?
Here is my code...
import pandas as pd
file_name ="file.xlsx"
dataframe = pd.read_excel(file_name)
print(dataframe)
You should have your "file.xlsx" in the same directory from where you call 'python' or specify full path to it (e.g. 'C:\file.xlsx' or '/home/user/file.xlsx')
I am a beginner with Python. I have already enabled the Google APIs, would like to read a csv file stored in My Drive as a pandas data frame by using python. Is it possible to do it?
Thank you!
You can try this way:
import pandas as pd
import requests
from io import StringIO
orig_url='https://drive.google.com/file/d/0B6GhBwm5vaB2ekdlZW5WZnppb28/view?usp=sharing'
file_id = orig_url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(dwn_url).text
csv_raw = StringIO(url)
dfs = pd.read_csv(csv_raw)
If you have your folder synced to your machine it's simple enough just specifying the file path similar to this
import pandas as pd
test= pd.read_excel ('C:/Users/person/OneDrive - company/Documents/Projects/cortex/Group_status.xlsx')
print(test)
If you want to learn also how to use the Drive API, you could follow this Python quickstart guide.
I want to read in an .dta file as a pandas data frame.
I've tried using code from https://www.fragilefamilieschallenge.org/using-dta-files-in-python/ but it gives me an error.
Thanks for any help!
import pandas as pd
df_path = "https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1"
df = None
with open(df_path, "r") as f:
df = pd.read_stata(f)
print df.head()
open can be used when you have a file saved locally on your machine. With pd.read_stata this is not necessary however, as you can specify the file path directly as a parameter.
In this case you want to read in a .dta file from a url so this does not apply. The solution is simple though, as pd.read_stata can read in files from urls directly.
import pandas as pd
url = 'https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1'
df = pd.read_stata(url)
How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive is a pydrive.drive.GoogleDrive object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last)
in ()
----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.
You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.
Here's a full example:
https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is:
https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install... line installs the xlrd library, which is needed to read Excel files.
Perhaps a simpler method:
#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAĆ„
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')
# #When done,
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')
First, I import io, pandas and files from google.colab
import io
import pandas as pd
from google.colab import files
Then I upload the file using an upload widget
uploaded = files.upload()
You will something similar to this (click on Choose Files and upload the xlsx file):
Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:
df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))
And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:
First, move the io call to another variable
xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))
And then, use the new variable to specify the sheet name, like this:
df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')
import pandas as pd
xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)
if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary
Easiest way I found so far.
Pretty similar to what we do on desktop.
Considering you uploaded the file to your Google Drive folder:
On the left bar click on Files ( below the {x} )
Select Mount Driver > drive > folder > file (left click and Copy Path)
After that just go to the code and past the path
pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')