How to I stop Pandas to_excel() function creating an extra column with the indexes in? If I run the following:
import pandas as pd
df = pd.read_excel('in.xlsx')
#do some stuff to the dataframe
writer = pd.ExcelWriter('out.xlsx')
df.to_excel(writer)
writer.save()
.. the newly created file (out.xlsx) has an additional column which I don't want. I just want the columns identified in df.columns outputting without the additional indexes column.
This is a small step in a larger process so i can't just manually delete the column. Also, i don't want to use any other Excel writing packages such as XlsxWriter
Many thanks!
You need to set index property to false, like this:
df.to_excel(writer, index=False)
As decribed in pandas documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html
Related
I have a python code wherein I take certain data from an excel and work with that data. Now I want that at the end of my code in the already existing Excel table a new column named XY is created. What would be your approach to this?
If you're using pandas to perform operations on the data, and you have it loaded as a df, just add:
import pandas as pd
import numpy as np
# Generating df with random numbers to show example
df = pd.DataFrame(np.random.randint(
0, 100, size=(15, 4)), columns=list('ABCD'))
print(df.head())
# Adding the empty column
df['xy'] = ''
print(df.head())
#exporting to excel
df.to_excel( FileName.xlsx, sheetname= 'sheet1')
This will add an empty column to the df, with the top cell labelled xy. If you want any values in the column, you can replace the empty '' with a list of whatever.
Hope this helps!
The easiest way get the right code is to record a macro in Excel. Go to your table in Excel, command 'Record macro' and manually perform required actions. Then command 'Stop recording' and go to VBA to discover the code. Then use the equivalent code in your Python app.
I was looking at the code related to xlsxwriter, when using Pandas' Dataframe.to_excel Command.
I ended up adding some formatting to the files, but the columns dont see to work. Ideally i was hoping to dynamically set column widths to fit the content.
I saw there was a command called: set_column which i thought might do the trick. https://xlsxwriter.readthedocs.io/worksheet.html#set_column Showed me though that it needs to be a number.
that number to me, needs to be the largest string in that column (including the column name itself). While I can process that, I thought it a bit extreme to do. I figured there might be a wrap command i could use which auto formats or something.
Some Simple Code I was using:
import pandas as pd
from pandas import DataFrame
df = DataFrame({"aadsfasdfasdfasdfasdf":[1,2,3]})
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
_base_sheet = "Sheet1"
df.to_excel(writer, sheet_name=_base_sheet, header=HEADERS)
workbook = writer.book
worksheet = writer.sheets[_base_sheet]
...
# Here I would want do set all columns to have some sort of auto-width
We are trying to read a sample simple csv file using pandas in python as follows -
df = pd.read_csv('example.csv')
print(df)
We need df by removing below red highlighted index column -
We have tried multiple ways by passing parameters but no luck.
Please help me in this issue!!
A dataframe requires having some kind of index as part of the structure.
If you want to simply print the output without the index you can use the approach suggested here, with Python 3 syntax:
print(df.to_string(index=False))
but it will not have the nice dataframe rendering in Jupyter as you have in your example.
If you want to avoid pandas outputting the index when writing to a CSV file you can use the option index=False, for example:
df.to_csv('example.csv', index=False)
This will avoid creating the index column in the saved CSV file.
add index_col=False
pd.read_csv('path.csv',index_col=False)
or remove index from dataframe
df.reset_index(drop=True, inplace=True)
Total newbie and this is my first ever question so apologies in advance for any inadvertent faux pas.
I have a large(ish) dataset in Excel xlsx format that I would like to import into a pandas dataframe. The data has column headers except for the first column which does not have a header label. Here is what the excel sheet looks like:
Raw data
I am using read_excel() in Pandas to read in the data. The code I am using is:
df = pd.read_excel('Raw_Data.xlsx', sheetname=0, labels=None, header=0, index_col=None)
(I have tried index_col = false or 0 but, for obvious reasons, it doesn't change anything)
The headers for the columns are picked up fine but the first column, circled in red in the image below, is assigned as the index.
wrong index
What I am trying to get from the read_excel command is as follows with the index circled in red:
correct index
I have other excel sheets that I have used read_excel() to import into pandas and pandas automatically adds in a numerical incremental index rather than inferring one of the columns as an index.
None of those excel sheets had missing label in the column header though which might be the issue here though I am not sure.
I understand that I can use the reset_index() command after the import to get the correct index.
Wondering if it can be done without having to do the reset_index() and within the read_excel() command. i.e. is there anyway to prevent an index being inferred or to force pandas to add in the index column like it normally does.
Thank you in advance!
I don't think you can do it with only the read_excel function because of the missing value in cell A1. If you want to insert something into that cell prior to reading the file with pandas, you could consider using openpyxl as below.
from openpyxl import load_workbook as load
path = 'Raw_Data.xlsx'
col_name = 'not_index'
cell = 'A1'
def write_to_cell(path, col_name, cell):
wb = load(path)
for sheet in wb.sheetnames:
ws = wb[sheet]
if ws[cell].value is None:
ws[cell] = col_name
wb.save(path)
I am trying to create a database and fill it with values gotten from an excel sheet.
My code:
new_db = pd.DataFrame()
workbook = pd.ExcelFile(filename)
df = workbook.parse('Sheet1')
print(df)
new_db.append(df)
print(new_db.head())
But whenever I seem to do this, I get an empty dataframe back.
My excel sheet however is packed with values. When it is printed(print(df)) it prints it out with ID values and all the correct columns and rows.
My knowledge with Pandas-Dataframes is limited so excuse me if I do not know something I should. All help is appreciated.
I think pandas.read_excel is what you're looking for. here is an example:
import pandas as pd
df = pd.read_excel(filename)
print(df.head())
df will have the type pandas.DataFrame
The default parameters of read_excel are set in a way that the first sheet in the excel file will be read, check the documentation for more options(if you provide a list of sheets to read by setting the sheetname parameter df will be a dictionary with sheetnames as keys and their correspoding Dataframes as values). Depending on the version of Python you're using and its distribution you may need to install the xlrd module, which you can do using pip.
You need to reassign the df after appending to it, as #ayhan pointed out in the comments:
new_db = new_db.append(df)
From the Panda's Documentation for append, it returns an appended dataframe, which means you need to assign it to a variable.