I'm using Pandas to edit an Excel file which other people are using. But when I save it using df.to_excel, Pandas adds an ugly looking black border to cells in the header and in the index. I want it to be written in a plain format, how a CSV file would look if I opened it up in Excel. It would be even better if it was written back using the same styles it was read in
Is there anyway to make df.to_excel write without styling or with the original styles?
Thanks.
Try this trick:
import io
pd.read_csv(io.StringIO(df.to_csv()), header=None)
.to_excel("output.xlsx", header=None, index=None)
If you still want index and header values - but without styling, you can use this (requires openpyxl):
def insert_dataframe(df,sheet,start_row=1,start_col=1):
"""inserts a dataframe into an openpyxl sheet at the given (row,col) position.
Parameters
----------
df : pandas.Dataframe
Any dataframe
sheet : openpyxl.worksheet.worksheet
The sheet where the dataframe should be inserted
start_row : int
The row where the dataframe should be insterted (default is 1)
start_col : int
The column where the dataframe should be insterted (default is 1)
"""
#iterate dataframe index names and insert
for name_idx, name in enumerate(df.index.names):
label_col=start_col+name_idx
sheet.cell(row=start_row, column=label_col, value=name)
#for each name iterate values as rows in the current index name column
value_row=start_row+1
for i_value in list(df.index.values):
if isinstance(df.index, pd.MultiIndex):
val=i_value[name_idx]
else:
val=i_value
sheet.cell(row=value_row, column=label_col, value=val)
#goto next row
value_row+=1
row_idx=0
col_idx=label_col+1
#insert values
for label,content in df.items():
sheet.cell(row=start_row, column=col_idx, value=label)
for row_idx,value_ in enumerate(content):
sheet.cell(row=start_row+row_idx+1, column=col_idx, value=value_)
col_idx+=1
Gist: https://gist.github.com/Aer0naut/094ff1b6838b2177a4222591ace8f6bf
Related
I'm exporting some data to Excel and I've successfully implemented formatting each populated cell in a column when exporting into Excel file with this:
import openpyxl
from openpyxl.utils import get_column_letter
wb = openpyxl.Workbook()
ws = wb.active
# Add rows to worksheet
for row in data:
ws.append(row)
# Disable formatting numbers in columns from column `D` onwards
# Need to do this manually for every cell
for col in range(3, ws.max_column+1):
for cell in ws[get_column_letter(col)]:
cell.number_format = '#'
# Export data to Excel file...
But this only formats populated cells in each column. Other cells in this column still have General formatting.
How can I set all empty cells in this column as # so that anyone, who will edit cells in these columns within this exported Excel file, will not have problems with inserting lets say phone numbers as actual Numbers.
For openpyxl you must always set the styles for every cell individually. If you set them for the column, then Excel will apply them when it creates new cells, but styles are always still applied to individual cells.
As you are iterating on the rows of that columns only to the max_cell those are the only cells that are being reformatted. While you can't reformat a column you can use a different way to set the format at least to a specific cell:
last_cell = 100
for col in range(3, ws.max_column+1):
for row in range(1, last_cell):
ws.cell(column=col, row=row).number_format = '#' # Changing format to TEXT
The following will format all the cell in the column up to last_cell you can use that, and, while it's not exactly what you need it's close enough.
conditional formatting will do the hack to put number formatting on the entire column. for applying thousand separator on entire column this worked for me:
diff_style = DifferentialStyle(numFmt = NumberFormat(numFmtId='4',formatCode='#,##0.00'))
rule1 = Rule(type="expression", dxf=diff_style)
rule1.formula = ["=NOT(ISBLANK($H2))"] // column on which thousand separator is to be applied
work_sheet.conditional_formatting.add("$H2:$H500001", rule1) // provide a range of cells
I have a Pandas DataFrame with a bunch of rows and labeled columns.
I also have an excel file which I prepared with one sheet which contains no data but only
labeled columns in row 1 and each column is formatted as it should be: for example if I
expect percentages in one column then that column will automatically convert a raw number to percentage.
What I want to do is fill the raw data from my DataFrame into that Excel sheet in such a way
that row 1 remains intact so the column names remain. The data from the DataFrame should fill
the excel rows starting from row 2 and the pre-formatted columns should take care of converting
the raw numbers to their appropriate type, hence filling the data should not override the column format.
I tried using openpyxl but it ended up creating a new sheet and overriding everything.
Any help?
If you're certain about the order of columns is same, you can try this after opening the sheet with openpyxl:
df.to_excel(writer, startrow = 2,index = False, Header = False)
If your # of columns and order is same then you may try xlsxwriter and also mention the sheet name to want to refresh:
df.to_excel('filename.xlsx', engine='xlsxwriter', sheet_name='sheetname', index=False)
We have a data system that creates tables of data as Excel files. I'm trying to import this Excel file into a pandas dataframe.
In the Excel, Row 1 is some metadata I don't want, while row 2 is the column header. By default, Pandas correctly uses column 1 as the index (a lot number), but the second column is a production date, but it for what ever reason does not have a header in row 2.
So pandas seems to be creating a multi-index by default, is there a way to suppress this function? It seems to be doing this because there is no column header in row 2 column 2 (cell B2). If I manually edit the Excel to add a label, it imports as I want.
import pandas as pd
xlsx01 = pd.ExcelFile("C:/Users/maherp/Desktop/JunkFiles/Book1.xlsx")
df_01 = pd.read_excel(xlsx01, header=1)
I get an error that I cannot decipher when I try:
df_01 = pd.read_excel(xlsx01, header=1, index_col=0)
As suggested by #Peej1226, here is final solution which worked.
df_01 = pd.read_excel(xlsx01, sheet_name='Discrete', skiprows=1, header=0,index_col=0)
Im trying to add a new column to a pandas dataframe. Also, I try to give a name to index to be printed out in Excel when I export the data
import pandas as pd
import csv
#read csv file
file='RALS-04.csv'
df=pd.read_csv(file)
#select the columns that I want
column1=df.iloc[:,0]
column2=df.iloc[:,2]
column3=df.iloc[:,3]
column1.index.name="items"
column2.index.name="march2012"
column3.index.name="march2011"
df=pd.concat([column1, column2, column3], axis=1)
#create a new column with 'RALS' as a defaut value
df['comps']='RALS'
#writing them back to a new CSV file
with open('test.csv','a') as f:
df.to_csv(f, index=False, header=True)
The output is the 'RALS' that I added to the dataframe goes to Row 2000 while the data stops at row 15. How to constrain the RALS so that it doesnt go beyond the length of the data being exported? I would also prefer a more elegant, automated way rather than specifying at which row should the default value stops at.
The second question is, the labels that I have assigned to the columns using columns.index.name, does not appear in the output. Instead it is replaced by a 0 and a 1. Please advise solutions.
Thanks so much for inputs
Here the the code to process and save csv file, and raw input csv file and output csv file, using pandas on Python 2.7 and wondering why there is an additional column at the beginning when saving the file? Thanks.
c_a,c_b,c_c,c_d
hello,python,pandas,0.0
hi,java,pandas,1.0
ho,c++,numpy,0.0
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
sample.to_csv('saved.csv')
Here is the saved file, there is an additional column at the beginning, whose values are 0, 1, 2.
cat saved.csv
,c_a,c_b,c_c,c_d
0,hello,python,pandas,0
1,hi,java,pandas,1
2,ho,c++,numpy,0
The additional column corresponds to the index of the dataframe and is aggregated once you read the CSV file. You can use this index to slice, select or sort your DF in an effective manner.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.html
http://pandas.pydata.org/pandas-docs/stable/indexing.html
If you want to avoid this index, you can set the index flag to False when you save your dataframe with the function pd.to_csv. Also, you are removing the header and aggregating it later, but you can use the header of the CSV to avoid this step.
sample = pd.read_csv('123.csv', dtype={0:str, 1:str, 2:str, 3:float})
sample.to_csv('output.csv', index= False)
Hope it helps :)