Reading through different excel sheets to plot - python

I am trying to read through multiple sheets within same excel file. I want to plot specific columns for every sheet on same figure but it says that 'Excelfile' has no attribute 'iloc'. Can someone tell me what is wrong here? thank you
df = pd.ExcelFile ('Current parametric sweep_reference.xlsx')
Sheet=df.sheet_names
print(Sheet)
for sheet_names in Sheet:
plt.plot(df.iloc[:,1],iloc[:,9])

You are not using the data-frame but the sheet-names. You can do the following
dfs = pd.ExcelFile ('Current parametric sweep_reference.xlsx')
for sheet in df.sheet_names: #loop over all sheets
df = pd.read_excel("Current parametric sweep_reference.xlsx",sheet_name=sheet)
plt.plot(df.iloc[:,1],df.iloc[:,9])

Your object df is not a pandas DataFrame but an ExcelFile object, which does not support iloc. To use iloc you should first represent the individual sheets as DataFrames, like so:
...
for sheet_name in Sheet:
sheet_df = df.parse(sheet_name)

you should use ´pd.read_excel´ for loading your excel file. By providing ´sheet=None´ to ´pd.read_excel´ you load all sheets into a dictionary of dataframes per sheet. Then you can iterate over the sheets as following:
import pandas as pd
sheets = pd.read_excel("'Current parametric sweep_reference.xlsx'", sheet_name=None)
for sheetname, df in sheets.items():
plt.plot(df.iloc[:,1],df.iloc[:,9])

Related

Choose A Specific Sheet In Excel Containing a String Pandas

I'm currently creating a dataframe from an excel spreadsheet in Pandas. For most of the files, they only contain 1 sheet. However, with some of the files that I have the sheet is not the first sheet. However, all of the sheets in all of the files have the same format. They have 'ITD_XXX_XXXX'. Is there a way to input into pandas to select the sheet that has the form.
df = pd.read_excel(path, sheet_name = contains('ITD_')
Here pandas would only select data from the sheet that has the string 'ITD_' in front of it?
Cheers.
I think the answer here would probably give you what you need.
Bring in the file as an Excelfile before reading it as a dataframe. Get the Sheet_names, and then extract the sheet_name that has 'ITD_'.
excel = pd.ExcelFile("your_excel.xlsx")
excel.sheet_names
# ["Sheet1", "Sheet2"]
for n in excel.sheet_names:
if n.startswith('ITD_'):
sheetname = n
break
df = excel.parse(sheetname)

Convert Excel sheets to Pandas df's

I have an excel file with one sheet name "info" as follows
Name Number
S1 50
S2 100
S3 400
This sheet give info about other sheet which I need to convert into pandas df's.
but, when I read this sheet and loop to create other df's. My code is also looking for a sheet name "Name" and thus breaking...any way to avoid this?
Use a header row or skip the first row as mentioned in the comments.
df_info = pd.read_excel('file.xlsx', sheet_name='info', header=0)
sheets = {}
for sheet_name in df_info['Name']:
sheets[sheet_name] = pd.read_excel('file.xlsx', sheet_name=sheet_name, header=None)
Pandas Read Excel Documentation

Import several sheets from the same excel into one dataframe in pandas

I have one excel file with several identical structured sheets on it (same headers and number of columns) (sheetsname: 01,02,...,12).
How can I get this into one dataframe?
Right now I would load it all seperate with:
df1 = pd.read_excel('path.xls', sheet_name='01')
df2 = pd.read_excel('path.xls', sheet_name='02')
...
and would then concentate it.
What is the most pythonic way to do it and get directly one dataframe with all the sheets? Also assumping I do not know every sheetname in advance.
read the file as:
collection = pd.read_excel('path.xls', sheet_name=None)
combined = pd.concat([value.assign(sheet_source=key)
for key,value in collection.items()],
ignore_index=True)
sheet_name = None ensures all the sheets are read in.
collection is a dictionary, with the sheet_name as key, and the actual data as the values. combined uses the pandas concat method to get you one dataframe. I added the extra column sheet_source, in case you need to track where the data for each row comes from.
You can read more about it on the pandas doco
you can use:
df_final = pd.concat([pd.read_excel('path.xls', sheet_name="{:02d}".format(sheet)) for sheet in range(12)], axis=0)

Pandas read_excel() with multiple sheets and specific columns

I'm trying to use pandas.read_excel() to import multiple worksheets from a spreadsheet. If I do not specify the columns with the parse_cols keyword I'm able to get all the data from the sheets, but I can't seem to figure out how to specify specific columns for each sheet.
import pandas as pd
workSheets = ['sheet1', 'sheet2', 'sheet3','sheet4']
cols = ['A,E','A,E','A,C','A,E']
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols='A:E') #This works fine
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols=cols) #This returns empty dataFrames
Does anyone know if there is a way, using read_excel(), to import multiple worksheets from excel, but also specify specific columns based on which worksheet?
Thanks.
When you pass a list of sheet names to read_excel, it returns a dictionary. You can achieve the same thing with a loop:
workSheets = ['sheet1', 'sheet2', 'sheet3', 'sheet4']
cols = ['A,E', 'A,E', 'A,C', 'A,E']
df = {}
for ws, c in zip(workSheets, cols):
df[ws] = pd.read_excel(excelFile, sheetname=ws, parse_cols=c)
Below is update for Python 3.6.5 & Pandas 0.23.4:
pd.read_excel(excelFile, sheet_name=ws, usecols=c)

How to read Excel Workbook (pandas)

First I want to say that I am not an expert by any means. I am versed but carry a burden of schedule and learning Python like I should have at a younger age!
Question:
I have a workbook that will on occasion have more than one worksheet. When reading in the workbook I will not know the number of sheets or their sheet name. The data arrangement will be the same on every sheet with some columns going by the name of 'Unnamed'. The problem is that everything I try or find online uses the pandas.ExcelFile to gather all sheets which is fine but i need to be able to skips 4 rows and only read 42 rows after that and parse specific columns. Although the sheets might have the exact same structure the column names might be the same or different but would like them to be merged.
So here is what I have:
import pandas as pd
from openpyxl import load_workbook
# Load in the file location and name
cause_effect_file = r'C:\Users\Owner\Desktop\C&E Template.xlsx'
# Set up the ability to write dataframe to the same workbook
book = load_workbook(cause_effect_file)
writer = pd.ExcelWriter(cause_effect_file)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
# Get the file skip rows and parse columns needed
xl_file = pd.read_excel(cause_effect_file, skiprows=4, parse_cols = 'B:AJ', na_values=['NA'], convert_float=False)
# Loop through the sheets loading data in the dataframe
dfi = {sheet_name: xl_file.parse(sheet_name)
for sheet_name in xl_file.sheet_names}
# Remove columns labeled as un-named
for col in dfi:
if r'Unnamed' in col:
del dfi[col]
# Write dataframe to sheet so we can see what the data looks like
dfi.to_excel(writer, "PyDF", index=False)
# Save it back to the book
writer.save()
The link to the file i am working with is below
Excel File
Try to modify the following based on your specific need:
import os
import pandas as pd
df = pd.DataFrame()
xls = pd.ExcelFile(path)
Then iterate over all the available data sheets:
for x in range(0, len(xls.sheet_names)):
a = xls.parse(x,header = 4, parse_cols = 'B:AJ')
a["Sheet Name"] = [xls.sheet_names[x]] * len(a)
df = df.append(a)
You can adjust the header row and the columns to read for each sheet. I added a column that will indicate the name of the data sheet the row came from.
You probably want to look at using read_only mode in openpyxl. This will allow you to load only those sheets that you're interested and look at only the cells you're interested in.
If you want to work with Pandas dataframes then you'll have to create these yourself but that shouldn't be too hard.

Categories

Resources