PYTHON EXCEL COMBINE WORKSHEETS - python

Have an excel file consisting of multiple worksheets and each worksheet has one column named "Close" and under the "Close" column I have multiple numbers and data. Now using Python I want to combine all multiple worksheet in to one worksheet with side by side column of close and worksheet title as the header for each column? How can I do that?
the concat mehtod in pandas is just putting everything in one column without giving the name of the excel worksheet as title. CODING AS BELOW
df_combined = pd.DataFrame()
ex_file = pd.ExcelFile('123.xlsx')
result = pd.concat([df_combined], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
AND
RESULT I WANT

To make pd.concat fully works, you need to concat all dataframe in one. Either by having a list of all dataframe and then call concat, either by looping iteratively. I advice to read one worksheet at a time.
Second solution :
df_combined = pd.DataFrame()
file = ""
for worksheet_name in worksheet_names : # assuming worksheet_names is a list of all your worsheets
ws_dataframe = pandas.read_excel(file, sheet_name=worksheet_name)
df_combined = pd.concat([df_combined, ws_dataframe], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
Before exporting your dataframe df_combined, you can change the columns to include the name of your worksheet, in some kind of multiindex columns. For instance :
df_combined.colums = pd.MultiIndex.from_product([worksheet_names, ['Close']], names=['Worksheetname', 'col'])

Related

Python - How to set or autofit the column width in an Excel sheet?

I'm trying to write a pandas DataFrame to multiple Excel sheets, and the sheet names are determined by the "Service Type" column.
In my function, I'm trying to write some code that looks through each column in the Excel worksheets auto-adjusts the width so all of the text in each row is visible.
I think what I have written so far could work, but I'm not sure how to properly identify the sheet_name since I'm looking at a str(index).
This is what I've written so far:
# Create a final DataFrame called "final_df" where rows have an Error value of 1
final_df = stacked_df[stacked_df.Error == 1]
# Creates a Pandas Excel writer using XlsxWriter as the engine
writer = pd.ExcelWriter(LOGNAME, engine='xlsxwriter')
# Group the output by "Service type" and save each DataFrame to a seperate sheet
for index, group_df in final_df.groupby("Service type"):
group_df.to_excel(writer, sheet_name=str(index), index=False)
# Auto-adjust column's width
for column in stacked_df:
column_width = max(stacked_df[column].astype(str).map(len).max(), len(column))
col_idx = stacked_df.columns.get_loc(column)
writer.sheets[sheet_name=str(index)].set_column(col_idx, col_idx, column_width)
# Close the Pandas Excel writer and output the Excel file
writer.save()
This what the Excel sheet looks like:
This is what I want it to look like:
How do I make this work? Thank you.
The type of writer.sheets is dict, where the keys are the names of the sheets and the values are Worksheet objects, so the way you're trying to reference the sheets is not correct.
writer.sheets[sheet_name=str(index)] INCORRECT
writer.sheets[sheet_name_as_string] CORRECT
Beyond that, there seems to be a problem with the logic: the index variable you're trying to use in the second loop is not defined. If you're trying to use the index from the first for-loop, then you should nest the loops.
For example:
writer = pd.ExcelWriter(LOGNAME, engine="xlsxwriter")
for sheet_idx, group_df in data.groupby("Service type"):
# Create a worksheet from current GROUPBY object
group_df.to_excel(writer, sheet_name=str(sheet_idx), index=False)
# Loop through columns of current worksheet,
# and set correct width for each one
for column in group_df:
column_width = max(group_df[column].astype(str).map(len).max(), len(column))
col_idx = group_df.columns.get_loc(column)
writer.sheets[str(sheet_idx)].set_column(col_idx, col_idx, column_width)
writer.save()

Import several sheets from the same excel into one dataframe in pandas

I have one excel file with several identical structured sheets on it (same headers and number of columns) (sheetsname: 01,02,...,12).
How can I get this into one dataframe?
Right now I would load it all seperate with:
df1 = pd.read_excel('path.xls', sheet_name='01')
df2 = pd.read_excel('path.xls', sheet_name='02')
...
and would then concentate it.
What is the most pythonic way to do it and get directly one dataframe with all the sheets? Also assumping I do not know every sheetname in advance.
read the file as:
collection = pd.read_excel('path.xls', sheet_name=None)
combined = pd.concat([value.assign(sheet_source=key)
for key,value in collection.items()],
ignore_index=True)
sheet_name = None ensures all the sheets are read in.
collection is a dictionary, with the sheet_name as key, and the actual data as the values. combined uses the pandas concat method to get you one dataframe. I added the extra column sheet_source, in case you need to track where the data for each row comes from.
You can read more about it on the pandas doco
you can use:
df_final = pd.concat([pd.read_excel('path.xls', sheet_name="{:02d}".format(sheet)) for sheet in range(12)], axis=0)

Excel Writer Python Separate Sheet For Each Row/Index In DataFrame

I have a dataframe with 14 columns and about 300 rows. What I want to do is create an xlsx with multiple sheets, each sheet holding a single row of the main dataframe. I'm setting it up like this because I want to append to these individual sheets every day for a new instance of the same row to see how the column values for the unique rows change over time. Here is some code.
tracks_df = pd.read_csv('final_outputUSA.csv')
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
for track in tracks_df:
tracks_df.to_excel(writer2, sheet_name="Tracks", index=False, header=True)
writer2.save()
writer2.close()
Right now this just outputs the exact same format as the csv that I'm reading in. I know that I'm going to need to dynamically change the sheet_name based on an indexed value, I would like to have each sheet_name=df['Col1'] for each sheet. How do I output a xlsx with a separate sheet for each row in my dataframe?
Try this:
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
df.apply(lambda x: x.to_frame().T.to_excel(writer2, sheet_name=x['Col1'].astype('str'), index=True, header=True), axis=1)
writer2.save()
writer2.close()

Add header with merged cells from one excel and insert to another excel Pandas

I have been searching over on how to append/insert/concat a row from one excel to another but with merged cells. I was not able to find what I am looking for.
What I need to get is this:
and append to the very first row of this:
I tried using pandas append() but it destroyed the arrangement of columns.
df = pd.DataFrame()
for f in ['merge1.xlsx', 'test1.xlsx']:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
df.to_excel('test3.xlsx')
Is there way pandas could do it? I just need to literally insert the header to the top row.
Although I am still trying to find a way, it would actually be fine to me if this question had a duplicate as long as I can find answers or advice.
You can use pd.read_excel to read in the workbook with the data you want, in your case that is 'test1.xlsx'. You could then utilize openpyxl.load_workbook() to open an existing workbook with the header, in your case that is 'merge1.xlsx'. Finally you could save the new workbbok by a new name ('test3.xlsx') without changing the two existing workbooks.
Below I've provided a fully reproducible example of how you can do this. To make this example fully reproducible, I create 'merge1.xlsx' and 'test1.xlsx'.
Please note that if in your 'merge1.xlsx', if you only have the header that you want and nothing else in the file, you can make use of the two lines I've left commented out below. This would just append your data from 'test1.xlsx' to the header in 'merge1.xlsx'. If this is the case then you can get rid of the two for llops at the end. Otherwise as in my example it's a bit more complicated.
In creating 'test3.xlsx', we loop through each row and we determine how many columns there are using len(df3.columns). In my example this is equal to two but this code would also work for a greater number of columns.
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
df1 = pd.DataFrame()
writer = pd.ExcelWriter('merge1.xlsx') #xlsxwriter engine
df1.to_excel(writer, sheet_name='Sheet1')
ws = writer.sheets['Sheet1']
ws.merge_range('A1:C1', 'This is a merged cell')
ws.write('A3', 'some string I might not want in other workbooks')
writer.save()
df2 = pd.DataFrame({'col_1': [1,2,3,4,5,6], 'col_2': ['A','B','C','D','E','F']})
writer = pd.ExcelWriter('test1.xlsx')
df2.to_excel(writer, sheet_name='Sheet1')
writer.save()
df3 = pd.read_excel('test1.xlsx')
wb = load_workbook('merge1.xlsx')
ws = wb['Sheet1']
#for row in dataframe_to_rows(df3):
# ws.append(row)
column = 2
for item in list(df3.columns.values):
ws.cell(2, column=column).value = str(item)
column = column + 1
for row_index, row in df3.iterrows():
ws.cell(row=row_index+3, column=1).value = row_index #comment out to remove index
for i in range(0, len(df3.columns)):
ws.cell(row=row_index+3, column=i+2).value = row[i]
wb.save("test3.xlsx")
Expected Output of the 3 Workbooks:

How to change the order of columns while converting to excel in Pandas?

I have a dictionary, 'values', in Python. The 'values' contains lists of integers, except the 'RowHeaders'.
I would like to have the 'RowHeaders' as the first column in the excel file. In the following code, I cannot add a condition in 'from_items' method to put it as the first column. When I run this code, it doesn't put the 'RowHeaders' data in the first column.
values['RowHeaders'] = list_of_headers
for feat in features:
values.setdefault(feat, list())
for p in data:
values[feat].append(int(data[p][feat]))
writer = pd.ExcelWriter('output.xlsx')
df = pd.DataFrame.from_items([(f,values[f]) for f in values])
df.to_excel(writer, 'Sheet1', index=False)
writer.save()
Thanks.

Categories

Resources