I'm trying to append 3 dataframes to 3 existing sheets in an Excel file (one dataframe per sheet).
This is my code:
with pd.ExcelWriter(output_path, mode="a", if_sheet_exists="overlay") as writer:
df_a.to_excel(writer, sheet_name="A", index=False)
df_b.to_excel(writer, sheet_name="B", index=False)
df_c.to_excel(writer, sheet_name="C", index=False)
However, the new data overwrites the old data rather than being appended at the end of the corresponding sheet. Note that I set mode="a" and if_sheet_exists="overlay", yet it overwrites and doesn't append.
How should I fix it?
You have to find last row and land new dataframe after it.
assuming you have some data in place and all headers, you can test like below:
with pd.ExcelWriter(output_path, mode="a", if_sheet_exists="overlay") as writer:
# getting last row from Sheet "A" and adding 1 as a starting row
lrow = pd.read_excel(output_path, sheet_name="A").shape[0]+1
# startrow statement at the end of the code shows there to start placing new data
dff.to_excel(writer, sheet_name="A", index=False,header=False,startrow=lrow)
function .shape gives the amount of rows and columns for example (1235,66), - using .shape[0], only amount of rows is taken
Related
Have an excel file consisting of multiple worksheets and each worksheet has one column named "Close" and under the "Close" column I have multiple numbers and data. Now using Python I want to combine all multiple worksheet in to one worksheet with side by side column of close and worksheet title as the header for each column? How can I do that?
the concat mehtod in pandas is just putting everything in one column without giving the name of the excel worksheet as title. CODING AS BELOW
df_combined = pd.DataFrame()
ex_file = pd.ExcelFile('123.xlsx')
result = pd.concat([df_combined], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
AND
RESULT I WANT
To make pd.concat fully works, you need to concat all dataframe in one. Either by having a list of all dataframe and then call concat, either by looping iteratively. I advice to read one worksheet at a time.
Second solution :
df_combined = pd.DataFrame()
file = ""
for worksheet_name in worksheet_names : # assuming worksheet_names is a list of all your worsheets
ws_dataframe = pandas.read_excel(file, sheet_name=worksheet_name)
df_combined = pd.concat([df_combined, ws_dataframe], axis=1)
df_combined.to_excel(writer2, index=False, header=False)
Before exporting your dataframe df_combined, you can change the columns to include the name of your worksheet, in some kind of multiindex columns. For instance :
df_combined.colums = pd.MultiIndex.from_product([worksheet_names, ['Close']], names=['Worksheetname', 'col'])
My first time using pandas. I am sure the answer is something along the lines of storing the worksheet names in a list, then looping through the list based on the name I am looking for. I'm just not experienced enough to know how to do that.
The goal is to use pandas to extract and concatenate data from multiple worksheets from a user selected workbook. The final output being a single worksheet excel containing all data extracted from the various worksheets.
The excel workbook consist of approximately 100 worksheets. The qty of visible sheets will always vary, with the qty of sheets occurring before 'Main Frames BUP1' being variable as well.
I currently have the portion of code checking for page visibility working. I can not seem to figure out how to start at a specific worksheet when that worksheets position in the workbook could vary (i.e. not always the 3rd worksheet starting from 0 it could be the 5th in a users excel). It will however, always be the sheet that data should start being pulled from. Everything I find are examples of specifying specific sheets to read.
Any help/direction would be appreciated.
# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)
xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets
for x in sheets:
print(x.title, x.sheet_state)
if x.sheet_state == 'visible':
df = pd.concat(pd.read_excel(xls, sheet_name=None, header=None,
skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()
*******Additional clarification on final goal: Exclude all sheets occurring before 'Main Frames BUP 1', only consider visible sheets, pull data from 'M6:AD37', if entire row is blank do not add(or at least remove) from data frame, stop pulling data at the sheet just before a worksheet who's name has a partial match to 'panel'
If I create a dictionary of visible sheets, how do you create a new dictionary useing that dictionary only consisting of 'Main Frames BUP 1' to whatever sheet occurs just before a partial match of 'panel'? Then I can use that dictionary for my data pull.
I created a minimal sample myself and worked it out for you.
xls = pd.ExcelFile('data/Test.xlsx')
sheets = xls.book.worksheets
sList = [x.title for x in sheets if x.sheet_state == 'visible']
dfs = [pd.read_excel('data/Test.xlsx', sheet_name=s, skiprows=5, nrows=32, usecols='M:AD') for s in sList]
dfconcat = pd.concat(dfs)
Now you need adjust the columns, headers and so on as you did in your question. I hope that it works out for you. From my side here it worked like a charm.
It is a bit hard without actually see what is going on with your data.
I believe that what you are missing is that you need to create one dataframe first and after concat the others. Also you need to pass a sheet(x) in order to pandas be able to read it, otherwise it will become a dictionary. In case it does not work, get the first sheet and create a df, then you concat.
# user selected file from GUI
xl = values["-xl_file-"]
loc = os.path.dirname(xl)
xls = pd.ExcelFile(xl)
sheets = xls.book.worksheets
df = pd.DataFrame()
for x in sheets:
print(x.title, x.sheet_state)
if x.sheet_state == 'visible':
df = pd.concat(pd.read_excel(xls, sheet_name=x, header=None,
skiprows=5, nrows=32, usecols='M:AD'), ignore_index=True)
writer = pd.ExcelWriter(f'{loc}/test.xlsx')
df.to_excel(writer, 'bananas')
writer.save()
You can also put all the dfs in a dictionary, again it is difficult without knowing what you are working with.
xl = pd.ExcelFile('yourFile.xlsx')
#collect all sheet names
sheets = xl.sheet_names
#build dictionaries from all sheets passing None to sheet_name
diDF = pd.read_excel('yourFile.xlsx', sheet_name=None)
di = {k : diDF[k] for k in diDF if k in sheets}
for x in sheets:
if x.sheet_state == 'visible':
dfs = {x: pd.DataFrame(di[x])}
I'm trying to write a pandas DataFrame to multiple Excel sheets, and the sheet names are determined by the "Service Type" column.
In my function, I'm trying to write some code that looks through each column in the Excel worksheets auto-adjusts the width so all of the text in each row is visible.
I think what I have written so far could work, but I'm not sure how to properly identify the sheet_name since I'm looking at a str(index).
This is what I've written so far:
# Create a final DataFrame called "final_df" where rows have an Error value of 1
final_df = stacked_df[stacked_df.Error == 1]
# Creates a Pandas Excel writer using XlsxWriter as the engine
writer = pd.ExcelWriter(LOGNAME, engine='xlsxwriter')
# Group the output by "Service type" and save each DataFrame to a seperate sheet
for index, group_df in final_df.groupby("Service type"):
group_df.to_excel(writer, sheet_name=str(index), index=False)
# Auto-adjust column's width
for column in stacked_df:
column_width = max(stacked_df[column].astype(str).map(len).max(), len(column))
col_idx = stacked_df.columns.get_loc(column)
writer.sheets[sheet_name=str(index)].set_column(col_idx, col_idx, column_width)
# Close the Pandas Excel writer and output the Excel file
writer.save()
This what the Excel sheet looks like:
This is what I want it to look like:
How do I make this work? Thank you.
The type of writer.sheets is dict, where the keys are the names of the sheets and the values are Worksheet objects, so the way you're trying to reference the sheets is not correct.
writer.sheets[sheet_name=str(index)] INCORRECT
writer.sheets[sheet_name_as_string] CORRECT
Beyond that, there seems to be a problem with the logic: the index variable you're trying to use in the second loop is not defined. If you're trying to use the index from the first for-loop, then you should nest the loops.
For example:
writer = pd.ExcelWriter(LOGNAME, engine="xlsxwriter")
for sheet_idx, group_df in data.groupby("Service type"):
# Create a worksheet from current GROUPBY object
group_df.to_excel(writer, sheet_name=str(sheet_idx), index=False)
# Loop through columns of current worksheet,
# and set correct width for each one
for column in group_df:
column_width = max(group_df[column].astype(str).map(len).max(), len(column))
col_idx = group_df.columns.get_loc(column)
writer.sheets[str(sheet_idx)].set_column(col_idx, col_idx, column_width)
writer.save()
I have a dataframe with 14 columns and about 300 rows. What I want to do is create an xlsx with multiple sheets, each sheet holding a single row of the main dataframe. I'm setting it up like this because I want to append to these individual sheets every day for a new instance of the same row to see how the column values for the unique rows change over time. Here is some code.
tracks_df = pd.read_csv('final_outputUSA.csv')
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
for track in tracks_df:
tracks_df.to_excel(writer2, sheet_name="Tracks", index=False, header=True)
writer2.save()
writer2.close()
Right now this just outputs the exact same format as the csv that I'm reading in. I know that I'm going to need to dynamically change the sheet_name based on an indexed value, I would like to have each sheet_name=df['Col1'] for each sheet. How do I output a xlsx with a separate sheet for each row in my dataframe?
Try this:
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
df.apply(lambda x: x.to_frame().T.to_excel(writer2, sheet_name=x['Col1'].astype('str'), index=True, header=True), axis=1)
writer2.save()
writer2.close()
I have a Pandas DataFrame with a bunch of rows and labeled columns.
I also have an excel file which I prepared with one sheet which contains no data but only
labeled columns in row 1 and each column is formatted as it should be: for example if I
expect percentages in one column then that column will automatically convert a raw number to percentage.
What I want to do is fill the raw data from my DataFrame into that Excel sheet in such a way
that row 1 remains intact so the column names remain. The data from the DataFrame should fill
the excel rows starting from row 2 and the pre-formatted columns should take care of converting
the raw numbers to their appropriate type, hence filling the data should not override the column format.
I tried using openpyxl but it ended up creating a new sheet and overriding everything.
Any help?
If you're certain about the order of columns is same, you can try this after opening the sheet with openpyxl:
df.to_excel(writer, startrow = 2,index = False, Header = False)
If your # of columns and order is same then you may try xlsxwriter and also mention the sheet name to want to refresh:
df.to_excel('filename.xlsx', engine='xlsxwriter', sheet_name='sheetname', index=False)