I'm trying to use pandas.read_excel() to import multiple worksheets from a spreadsheet. If I do not specify the columns with the parse_cols keyword I'm able to get all the data from the sheets, but I can't seem to figure out how to specify specific columns for each sheet.
import pandas as pd
workSheets = ['sheet1', 'sheet2', 'sheet3','sheet4']
cols = ['A,E','A,E','A,C','A,E']
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols='A:E') #This works fine
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols=cols) #This returns empty dataFrames
Does anyone know if there is a way, using read_excel(), to import multiple worksheets from excel, but also specify specific columns based on which worksheet?
Thanks.
When you pass a list of sheet names to read_excel, it returns a dictionary. You can achieve the same thing with a loop:
workSheets = ['sheet1', 'sheet2', 'sheet3', 'sheet4']
cols = ['A,E', 'A,E', 'A,C', 'A,E']
df = {}
for ws, c in zip(workSheets, cols):
df[ws] = pd.read_excel(excelFile, sheetname=ws, parse_cols=c)
Below is update for Python 3.6.5 & Pandas 0.23.4:
pd.read_excel(excelFile, sheet_name=ws, usecols=c)
Related
I want to concat multiple dataframe with different sheet names and different columns, then export to excel.
column = [["Banana","apple"],
["Banana","Grape"],
["Apple","Pizza"]]
for i in range(3):
random_data = np.random.randint(10,25,size=(5,3))
df = pd.DataFrame(random_data, columns= column[i])
I hope there are three sheets, with different column names given.
I've tried something like pd.concat([sheet_df, df]), In this case, all the columns will show in that dataframe even that df doesn't have that column, but I don't want to.
I appreciate your help!
Use an ExcelWriter:
from pandas import ExcelWriter
...
sheets = ['Sheet1', 'Sheet2', 'Sheet3']
path = r'yourpath.xlsx'
with ExcelWriter(path, engine='openpyxl') as writer:
for cols, sheet in zip(column, sheets):
random_data = np.random.randint(10,25,size=(5,2))
df = pd.DataFrame(random_data, columns=cols)
df.to_excel(writer, sheet)
I have two xlsx files that have multiple tabs. I need to compare values in each tab based on the tab name,but in some tab there is a difference in the rows and columns (e.g. sheet1 in file1 needs to be compared with sheet1 in file2 and so on). When I use the following code, it will only compare and write the only for same numbers of rows and same number of columns. Please help me figure out why all tabs do not get compared.
import pandas as pd
import numpy as np
df1 = pd.read_excel('test_1.xlsx', sheet_name=None)
df2 = pd.read_excel('test_2.xlsx', sheet_name=None)
with pd.ExcelWriter('./Excel_diff.xlsx') as writer:
for sheet, df1 in df1.items():
# check if sheet is in the other Excel file
if sheet in df2:
df2sheet = df2[sheet]
comparison_values = df1.values == df2sheet.values
print(comparison_values)
rows, cols = np.where(comparison_values == False)
for item in zip(rows, cols):
df1.iloc[item[0], item[1]] = '{} → {}'.format(df1.iloc[item[0], item[1]], df2sheet.iloc[item[0], item[1]])
df1.to_excel(writer, sheet_name=sheet, index=False, header=True)
This endeavor is non-trivial irrespective of tool or library.
However, when you get it going or need to validate your progress, Microsoft have already built a very useful and intuitive tool to compare workbooks quite nicely. Consider using it to accompany your dev efforts: Excel Compare
I am trying to read through multiple sheets within same excel file. I want to plot specific columns for every sheet on same figure but it says that 'Excelfile' has no attribute 'iloc'. Can someone tell me what is wrong here? thank you
df = pd.ExcelFile ('Current parametric sweep_reference.xlsx')
Sheet=df.sheet_names
print(Sheet)
for sheet_names in Sheet:
plt.plot(df.iloc[:,1],iloc[:,9])
You are not using the data-frame but the sheet-names. You can do the following
dfs = pd.ExcelFile ('Current parametric sweep_reference.xlsx')
for sheet in df.sheet_names: #loop over all sheets
df = pd.read_excel("Current parametric sweep_reference.xlsx",sheet_name=sheet)
plt.plot(df.iloc[:,1],df.iloc[:,9])
Your object df is not a pandas DataFrame but an ExcelFile object, which does not support iloc. To use iloc you should first represent the individual sheets as DataFrames, like so:
...
for sheet_name in Sheet:
sheet_df = df.parse(sheet_name)
you should use ´pd.read_excel´ for loading your excel file. By providing ´sheet=None´ to ´pd.read_excel´ you load all sheets into a dictionary of dataframes per sheet. Then you can iterate over the sheets as following:
import pandas as pd
sheets = pd.read_excel("'Current parametric sweep_reference.xlsx'", sheet_name=None)
for sheetname, df in sheets.items():
plt.plot(df.iloc[:,1],df.iloc[:,9])
I need to merge different excel sheets into one and also add a new column as a corresponding sheet name
The below code merge all sheets, but how do I add a sheet name as a column ??
import pandas as pd
df = pd.concat(pd.read_excel(r"C:\\Users\\xx\\FC_List.xlsx", sheet_name=None), ignore_index=True)
print(single_df)
df.to_csv(r"C:\\Users\\Users\\FC_List.csv", index=False)
below code fetch sheet name
import pandas as pd
df = pd.read_excel(r"C:\\Users\\cc\\FC_List.xlsx", None);
df.keys()
can u advise how to add both together as a new column
Split it into steps.
import pandas as pd
dfs = pd.read_excel(r"C:\\Users\\xx\\FC_List.xlsx", sheet_name=None)
df = pd.concat(dfs,keys=dfs.keys())
This will set your index as the column name, you can then reset it and rename it.
you could also do something like.
df = pd.concat([sheet.assign(src_sheet=sheet_name) for sheet_name,sheet in dfs.items()])
I have one excel file with several identical structured sheets on it (same headers and number of columns) (sheetsname: 01,02,...,12).
How can I get this into one dataframe?
Right now I would load it all seperate with:
df1 = pd.read_excel('path.xls', sheet_name='01')
df2 = pd.read_excel('path.xls', sheet_name='02')
...
and would then concentate it.
What is the most pythonic way to do it and get directly one dataframe with all the sheets? Also assumping I do not know every sheetname in advance.
read the file as:
collection = pd.read_excel('path.xls', sheet_name=None)
combined = pd.concat([value.assign(sheet_source=key)
for key,value in collection.items()],
ignore_index=True)
sheet_name = None ensures all the sheets are read in.
collection is a dictionary, with the sheet_name as key, and the actual data as the values. combined uses the pandas concat method to get you one dataframe. I added the extra column sheet_source, in case you need to track where the data for each row comes from.
You can read more about it on the pandas doco
you can use:
df_final = pd.concat([pd.read_excel('path.xls', sheet_name="{:02d}".format(sheet)) for sheet in range(12)], axis=0)