I have a dataframe with 14 columns and about 300 rows. What I want to do is create an xlsx with multiple sheets, each sheet holding a single row of the main dataframe. I'm setting it up like this because I want to append to these individual sheets every day for a new instance of the same row to see how the column values for the unique rows change over time. Here is some code.
tracks_df = pd.read_csv('final_outputUSA.csv')
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
for track in tracks_df:
tracks_df.to_excel(writer2, sheet_name="Tracks", index=False, header=True)
writer2.save()
writer2.close()
Right now this just outputs the exact same format as the csv that I'm reading in. I know that I'm going to need to dynamically change the sheet_name based on an indexed value, I would like to have each sheet_name=df['Col1'] for each sheet. How do I output a xlsx with a separate sheet for each row in my dataframe?
Try this:
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
df.apply(lambda x: x.to_frame().T.to_excel(writer2, sheet_name=x['Col1'].astype('str'), index=True, header=True), axis=1)
writer2.save()
writer2.close()
Related
Have an excel file consisting of multiple worksheets and each worksheet has one column named "Close" and under the "Close" column I have multiple numbers and data. Now using Python I want to combine all multiple worksheet in to one worksheet with side by side column of close and worksheet title as the header for each column? How can I do that?
the concat mehtod in pandas is just putting everything in one column without giving the name of the excel worksheet as title. CODING AS BELOW
df_combined = pd.DataFrame()
ex_file = pd.ExcelFile('123.xlsx')
result = pd.concat([df_combined], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
AND
RESULT I WANT
To make pd.concat fully works, you need to concat all dataframe in one. Either by having a list of all dataframe and then call concat, either by looping iteratively. I advice to read one worksheet at a time.
Second solution :
df_combined = pd.DataFrame()
file = ""
for worksheet_name in worksheet_names : # assuming worksheet_names is a list of all your worsheets
ws_dataframe = pandas.read_excel(file, sheet_name=worksheet_name)
df_combined = pd.concat([df_combined, ws_dataframe], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
Before exporting your dataframe df_combined, you can change the columns to include the name of your worksheet, in some kind of multiindex columns. For instance :
df_combined.colums = pd.MultiIndex.from_product([worksheet_names, ['Close']], names=['Worksheetname', 'col'])
I'm trying to append 3 dataframes to 3 existing sheets in an Excel file (one dataframe per sheet).
This is my code:
with pd.ExcelWriter(output_path, mode="a", if_sheet_exists="overlay") as writer:
df_a.to_excel(writer, sheet_name="A", index=False)
df_b.to_excel(writer, sheet_name="B", index=False)
df_c.to_excel(writer, sheet_name="C", index=False)
However, the new data overwrites the old data rather than being appended at the end of the corresponding sheet. Note that I set mode="a" and if_sheet_exists="overlay", yet it overwrites and doesn't append.
How should I fix it?
You have to find last row and land new dataframe after it.
assuming you have some data in place and all headers, you can test like below:
with pd.ExcelWriter(output_path, mode="a", if_sheet_exists="overlay") as writer:
# getting last row from Sheet "A" and adding 1 as a starting row
lrow = pd.read_excel(output_path, sheet_name="A").shape[0]+1
# startrow statement at the end of the code shows there to start placing new data
dff.to_excel(writer, sheet_name="A", index=False,header=False,startrow=lrow)
function .shape gives the amount of rows and columns for example (1235,66), - using .shape[0], only amount of rows is taken
I'm trying to write a pandas DataFrame to multiple Excel sheets, and the sheet names are determined by the "Service Type" column.
In my function, I'm trying to write some code that looks through each column in the Excel worksheets auto-adjusts the width so all of the text in each row is visible.
I think what I have written so far could work, but I'm not sure how to properly identify the sheet_name since I'm looking at a str(index).
This is what I've written so far:
# Create a final DataFrame called "final_df" where rows have an Error value of 1
final_df = stacked_df[stacked_df.Error == 1]
# Creates a Pandas Excel writer using XlsxWriter as the engine
writer = pd.ExcelWriter(LOGNAME, engine='xlsxwriter')
# Group the output by "Service type" and save each DataFrame to a seperate sheet
for index, group_df in final_df.groupby("Service type"):
group_df.to_excel(writer, sheet_name=str(index), index=False)
# Auto-adjust column's width
for column in stacked_df:
column_width = max(stacked_df[column].astype(str).map(len).max(), len(column))
col_idx = stacked_df.columns.get_loc(column)
writer.sheets[sheet_name=str(index)].set_column(col_idx, col_idx, column_width)
# Close the Pandas Excel writer and output the Excel file
writer.save()
This what the Excel sheet looks like:
This is what I want it to look like:
How do I make this work? Thank you.
The type of writer.sheets is dict, where the keys are the names of the sheets and the values are Worksheet objects, so the way you're trying to reference the sheets is not correct.
writer.sheets[sheet_name=str(index)] INCORRECT
writer.sheets[sheet_name_as_string] CORRECT
Beyond that, there seems to be a problem with the logic: the index variable you're trying to use in the second loop is not defined. If you're trying to use the index from the first for-loop, then you should nest the loops.
For example:
writer = pd.ExcelWriter(LOGNAME, engine="xlsxwriter")
for sheet_idx, group_df in data.groupby("Service type"):
# Create a worksheet from current GROUPBY object
group_df.to_excel(writer, sheet_name=str(sheet_idx), index=False)
# Loop through columns of current worksheet,
# and set correct width for each one
for column in group_df:
column_width = max(group_df[column].astype(str).map(len).max(), len(column))
col_idx = group_df.columns.get_loc(column)
writer.sheets[str(sheet_idx)].set_column(col_idx, col_idx, column_width)
writer.save()
I'm using the following code to print a dataframe to a csv;
writer = pd.ExcelWriter('dataframe.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='dataframe')
writer.save()
But my df is about 200 columns wide (20 columns of 10 categories) and only 5 rows deep.
Is there any way of manipulating it so that you tell pandas where to print various columns in the excel file.
Eg. Print columns 1-10 on row 1 in the excel sheet. Print columns 11-20 on row 6 in the excel sheet. etc.
Really I'm just trying to do the formatting of the excel file in pandas as opposed to having to play with the excel sheet after.
One solution might be to transpose the dataset using .T:
writer = pd.ExcelWriter('dataframe.xlsx', engine='xlsxwriter')
df.T.to_excel(writer, sheet_name='dataframe')
writer.save()
I have a Pandas DataFrame with a bunch of rows and labeled columns.
I also have an excel file which I prepared with one sheet which contains no data but only
labeled columns in row 1 and each column is formatted as it should be: for example if I
expect percentages in one column then that column will automatically convert a raw number to percentage.
What I want to do is fill the raw data from my DataFrame into that Excel sheet in such a way
that row 1 remains intact so the column names remain. The data from the DataFrame should fill
the excel rows starting from row 2 and the pre-formatted columns should take care of converting
the raw numbers to their appropriate type, hence filling the data should not override the column format.
I tried using openpyxl but it ended up creating a new sheet and overriding everything.
Any help?
If you're certain about the order of columns is same, you can try this after opening the sheet with openpyxl:
df.to_excel(writer, startrow = 2,index = False, Header = False)
If your # of columns and order is same then you may try xlsxwriter and also mention the sheet name to want to refresh:
df.to_excel('filename.xlsx', engine='xlsxwriter', sheet_name='sheetname', index=False)