Find name of the uploaded CSV in Python / Pandas - python

I'm trying to fetch the name of the file I upload. I'm wrote a program which does a statistical test based on the data in the file, the program is currently set up in two steps:
1 - upload the file using the following methods:
from google.colab import files
import io
uploaded = files.upload()
This triggers a small "uploader" as a widget
I then upload the CSV file and my next set of code only needs to read the file name, here's the code
2 - read the data by specifying uploaded file name (let's say "filename" for ex.)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))
Every time I run this code, I need to manually update the name of the file, I'm trying to automate the part of fetching the filename so it can be read automatically.
Thanks
To upload the file:
from google.colab import files
import numpy as np
import pandas as pd
import io
uploaded = files.upload()
To read the file: (currently name of the file needs to be updated manually each time)
data = pd.read_csv(io.BytesIO(uploaded["filename.csv"]))

The following contains the name of your csv
list(uploaded.keys())[0]
so your line should look like
data = pd.read_csv(io.BytesIO(uploaded[list(uploaded.keys())[0]]))

Related

How to import multiple csv files at once

I have 30 csv files of wind speed data on my computer- each file represents data at a different location. I have written code to calculate the statistics I need to run for each site; however, I am currently pulling in each csv file individually to do so(see code below):
from google.colab import files
data_to_load = files.upload()
import io
df = pd.read_csv(io.BytesIO(data_to_load['Downtown.csv']))
Is there a way to pull in all 30 csv files at once so each file is run through my statistical analysis code block and spits out an array with the file name and the statistic calculated?
use a loop
https://intellipaat.com/community/17913/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe
import glob
import pandas as pd
# get data file names
local_path = r'/my_files'
filenames = glob.glob(local_path + "/*.csv")
dfs = [pd.read_csv(filename)) for filename in filenames]
# if needed concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)
Also you can try put data online: github or google drive and read from there
https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92

How to extract the name of the file uploaded on a jupyter file using python?

My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?

Get access to zipped excel sheet online without saving it using python

I want to get access to a zipped excel sheet online using python without downloading it to my PC. The link is as follow https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip,
which points to a zipped excel. Does anyone know how to use python to deal with it? For example, I want to print the first row of the excel without unzipping and saving the file directly in my PC.
Downloading and unzipping a .zip file without writing to disk
I have found a similar question below, however, I cannot use this code to read the excel file.
You can use pandas to read the excel file.
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
resp = urlopen("https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip")
zipfile = ZipFile(BytesIO(resp.read()))
extracted_file = zipfile.open(zipfile.namelist()[0])
print(pd.read_excel(extracted_file))

Load xlsx file from drive in colaboratory

How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive is a pydrive.drive.GoogleDrive object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last)
in ()
----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.
You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.
Here's a full example:
https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is:
https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install... line installs the xlrd library, which is needed to read Excel files.
Perhaps a simpler method:
#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAĆ„
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')
# #When done,
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')
First, I import io, pandas and files from google.colab
import io
import pandas as pd
from google.colab import files
Then I upload the file using an upload widget
uploaded = files.upload()
You will something similar to this (click on Choose Files and upload the xlsx file):
Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:
df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))
And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:
First, move the io call to another variable
xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))
And then, use the new variable to specify the sheet name, like this:
df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')
import pandas as pd
xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)
if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary
Easiest way I found so far.
Pretty similar to what we do on desktop.
Considering you uploaded the file to your Google Drive folder:
On the left bar click on Files ( below the {x} )
Select Mount Driver > drive > folder > file (left click and Copy Path)
After that just go to the code and past the path
pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')

How to create a hierarchical csv file?

I have following N number of invoice data in Excel and I want to create CSV of that file so that it can be imported whenever needed...so how can I archive this?
Here is a screenshot:
Assuming you have a Folder "excel" full of Excel Files within your Project-Directory and you also have another folder "csv" where you intend to put your generated CSV Files, you could pretty much easily batch-convert all the Excel Files in the "excel" Directory into "csv" using Pandas.
It will be assumed that you already have Pandas installed on your System. Otherwise, you could do that via: pip install pandas. The fairly commented Snippet below illustrates the Process:
# IMPORT DATAFRAME FROM PANDAS AS WELL AS PANDAS ITSELF
from pandas import DataFrame
import pandas as pd
import os
# OUR GOAL IS:::
# LOOP THROUGH THE FOLDER: excelDir.....
# AT EACH ITERATION IN THE LOOP, CHECK IF THE CURRENT FILE IS AN EXCEL FILE,
# IF IT IS, SIMPLY CONVERT IT TO CSV AND SAVE IT:
for fileName in os.listdir(excelDir):
#DO WE HAVE AN EXCEL FILE?
if fileName.endswith(".xls") or fileName.endswith(".xlsx"):
#IF WE DO; THEN WE DO THE CONVERSION USING PANDAS...
targetXLFile = os.path.join(excelDir, fileName)
targetCSVFile = os.path.join(csvDir, fileName) + ".csv"
# NOW, WE READ "IN" THE EXCEL FILE
dFrame = pd.read_excel(targetXLFile)
# ONCE WE DONE READING, WE CAN SIMPLY SAVE THE DATA TO CSV
pd.DataFrame.to_csv(dFrame, path_or_buf=targetCSVFile)
Hope this does the Trick for you.....
Cheers and Good-Luck.
Instead of putting total output into one csv, you could go with following steps.
Convert your excel content to csv files or csv-objects.
Each object will be tagged with invoice id and save into dictionary.
your dictionary data structure could be like {'invoice-id':
csv-object, 'invoice-id2': csv-object2, ...}
write custom function which can reads your csv-object, and gives you
name,product-id, qty, etc...
Hope this helps.

Categories

Resources