Fill an existing Excel file with data from a Pandas DataFrame - python

I have a Pandas DataFrame with a bunch of rows and labeled columns.
I also have an excel file which I prepared with one sheet which contains no data but only
labeled columns in row 1 and each column is formatted as it should be: for example if I
expect percentages in one column then that column will automatically convert a raw number to percentage.
What I want to do is fill the raw data from my DataFrame into that Excel sheet in such a way
that row 1 remains intact so the column names remain. The data from the DataFrame should fill
the excel rows starting from row 2 and the pre-formatted columns should take care of converting
the raw numbers to their appropriate type, hence filling the data should not override the column format.
I tried using openpyxl but it ended up creating a new sheet and overriding everything.
Any help?

If you're certain about the order of columns is same, you can try this after opening the sheet with openpyxl:
df.to_excel(writer, startrow = 2,index = False, Header = False)

If your # of columns and order is same then you may try xlsxwriter and also mention the sheet name to want to refresh:
df.to_excel('filename.xlsx', engine='xlsxwriter', sheet_name='sheetname', index=False)

Related

Make pandas to_excel stop styling

I'm using Pandas to edit an Excel file which other people are using. But when I save it using df.to_excel, Pandas adds an ugly looking black border to cells in the header and in the index. I want it to be written in a plain format, how a CSV file would look if I opened it up in Excel. It would be even better if it was written back using the same styles it was read in
Is there anyway to make df.to_excel write without styling or with the original styles?
Thanks.
Try this trick:
import io
pd.read_csv(io.StringIO(df.to_csv()), header=None)
.to_excel("output.xlsx", header=None, index=None)
If you still want index and header values - but without styling, you can use this (requires openpyxl):
def insert_dataframe(df,sheet,start_row=1,start_col=1):
"""inserts a dataframe into an openpyxl sheet at the given (row,col) position.
Parameters
----------
df : pandas.Dataframe
Any dataframe
sheet : openpyxl.worksheet.worksheet
The sheet where the dataframe should be inserted
start_row : int
The row where the dataframe should be insterted (default is 1)
start_col : int
The column where the dataframe should be insterted (default is 1)
"""
#iterate dataframe index names and insert
for name_idx, name in enumerate(df.index.names):
label_col=start_col+name_idx
sheet.cell(row=start_row, column=label_col, value=name)
#for each name iterate values as rows in the current index name column
value_row=start_row+1
for i_value in list(df.index.values):
if isinstance(df.index, pd.MultiIndex):
val=i_value[name_idx]
else:
val=i_value
sheet.cell(row=value_row, column=label_col, value=val)
#goto next row
value_row+=1
row_idx=0
col_idx=label_col+1
#insert values
for label,content in df.items():
sheet.cell(row=start_row, column=col_idx, value=label)
for row_idx,value_ in enumerate(content):
sheet.cell(row=start_row+row_idx+1, column=col_idx, value=value_)
col_idx+=1
Gist: https://gist.github.com/Aer0naut/094ff1b6838b2177a4222591ace8f6bf

OpenPyXL set number_format for the whole column

I'm exporting some data to Excel and I've successfully implemented formatting each populated cell in a column when exporting into Excel file with this:
import openpyxl
from openpyxl.utils import get_column_letter
wb = openpyxl.Workbook()
ws = wb.active
# Add rows to worksheet
for row in data:
ws.append(row)
# Disable formatting numbers in columns from column `D` onwards
# Need to do this manually for every cell
for col in range(3, ws.max_column+1):
for cell in ws[get_column_letter(col)]:
cell.number_format = '#'
# Export data to Excel file...
But this only formats populated cells in each column. Other cells in this column still have General formatting.
How can I set all empty cells in this column as # so that anyone, who will edit cells in these columns within this exported Excel file, will not have problems with inserting lets say phone numbers as actual Numbers.
For openpyxl you must always set the styles for every cell individually. If you set them for the column, then Excel will apply them when it creates new cells, but styles are always still applied to individual cells.
As you are iterating on the rows of that columns only to the max_cell those are the only cells that are being reformatted. While you can't reformat a column you can use a different way to set the format at least to a specific cell:
last_cell = 100
for col in range(3, ws.max_column+1):
for row in range(1, last_cell):
ws.cell(column=col, row=row).number_format = '#' # Changing format to TEXT
The following will format all the cell in the column up to last_cell you can use that, and, while it's not exactly what you need it's close enough.
conditional formatting will do the hack to put number formatting on the entire column. for applying thousand separator on entire column this worked for me:
diff_style = DifferentialStyle(numFmt = NumberFormat(numFmtId='4',formatCode='#,##0.00'))
rule1 = Rule(type="expression", dxf=diff_style)
rule1.formula = ["=NOT(ISBLANK($H2))"] // column on which thousand separator is to be applied
work_sheet.conditional_formatting.add("$H2:$H500001", rule1) // provide a range of cells

Excel Writer Python Separate Sheet For Each Row/Index In DataFrame

I have a dataframe with 14 columns and about 300 rows. What I want to do is create an xlsx with multiple sheets, each sheet holding a single row of the main dataframe. I'm setting it up like this because I want to append to these individual sheets every day for a new instance of the same row to see how the column values for the unique rows change over time. Here is some code.
tracks_df = pd.read_csv('final_outputUSA.csv')
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
for track in tracks_df:
tracks_df.to_excel(writer2, sheet_name="Tracks", index=False, header=True)
writer2.save()
writer2.close()
Right now this just outputs the exact same format as the csv that I'm reading in. I know that I'm going to need to dynamically change the sheet_name based on an indexed value, I would like to have each sheet_name=df['Col1'] for each sheet. How do I output a xlsx with a separate sheet for each row in my dataframe?
Try this:
writer2 = pd.ExcelWriter('please.xlsx', engine='xlsxwriter')
df.apply(lambda x: x.to_frame().T.to_excel(writer2, sheet_name=x['Col1'].astype('str'), index=True, header=True), axis=1)
writer2.save()
writer2.close()

Cleaning dataframe- assign value in one cell to column

I am reading multiple CSV files from a folder into a dataframe. I loop for all the files in the folder and then concat the dataframes to obtain the final dataframe.
However the CSV file has one summary row from which I want to extract the date, and then add as a new column for all the rows in that csv/dataframe.
'''
df=pd.read_csv(f,header=None,names=['Inverter',"Day Yield",'month Yield','Year Yield','SpecificYieldDay','SYMth','SYYear','Power'],sep=';', **kwargs)
df['date']=df.loc[[0],['Day Yield']]
df
I expect ['date'] column to be filled with the date for that file for all the rows in that particular csv, but it gets filled correctly only for the first row.
Refer to image of dataframe. I want all the rows of the 'date' column to be showing 7/25/2019 instead of only the first row.
I have also added an example of one of the csv files I am reading from
csv file
If I understood correctly, the value that you want to add as a new column for all rows is in df.loc[[0],['Day Yield']].
If that is correct you can do the following:
df = df.assign(date=[df.loc[[0],['Day Yield']]]*len(df))

Python 3 Pandas - Column with extra \n\t\t\t\t\t

I have a dataframe from a .xls spreadsheet and I print off the columns print(df.columns.values) and the output contains a column with the name: Poll Responses\n\t\t\t\t\t.
I look in the excel sheet and in the cell column header, there's no additional spaces or tabs.
So in order to get the data from those columns, I have to use print(df['Poll Responses\n\t\t\t\t\t'])
Is this is how it is, or am I doing something wrong?
Use .str.strip:
df.columns = df.columns.str.strip()
This will strip whitespace from column headings in dataframe.

Categories

Resources