I was looking at the code related to xlsxwriter, when using Pandas' Dataframe.to_excel Command.
I ended up adding some formatting to the files, but the columns dont see to work. Ideally i was hoping to dynamically set column widths to fit the content.
I saw there was a command called: set_column which i thought might do the trick. https://xlsxwriter.readthedocs.io/worksheet.html#set_column Showed me though that it needs to be a number.
that number to me, needs to be the largest string in that column (including the column name itself). While I can process that, I thought it a bit extreme to do. I figured there might be a wrap command i could use which auto formats or something.
Some Simple Code I was using:
import pandas as pd
from pandas import DataFrame
df = DataFrame({"aadsfasdfasdfasdfasdf":[1,2,3]})
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
_base_sheet = "Sheet1"
df.to_excel(writer, sheet_name=_base_sheet, header=HEADERS)
workbook = writer.book
worksheet = writer.sheets[_base_sheet]
...
# Here I would want do set all columns to have some sort of auto-width
Related
I am trying to export a dataframe I've generated in Pandas to an Excel Workbook. I have been able to get that part working, but unfortunately no matter what I try, the dataframe goes into the workbook as a brand new worksheet.
What I am ultimately trying to do here is create a program that pulls API data from a website and imports it in an existing Excel sheet in order to create some sort of "live updating excel workbook". This means that the worksheet already has proper formatting, vba, and other calculated columns applied, and all of this would ideally stay the same except for the basic data in the dataframe I'm importing.
Anyway to go about this? Any direction at all would be quite helpful. Thanks.
Here is my current code:
file='testbook.xlsx'
writer = pd.ExcelWriter(file, engine = 'xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1")
workbook = writer.book
worksheet = writer.sheets["Sheet1")
writer.save
In case u have both existing excel file and DataFrame in same format then you can simply import your exiting excel file into another DataFrame and concat both the DataFrames then save into new excel or existing one.
df1["df"] = pd.read_excel('testbook.xlsx')
df2["df"] = 1#your dataFrame
df = pd.concat([df1, df2])
df.to_excel('testbook.xlsx')
There are multiple ways of doing it if you want to do it completely using pandas library this will work.
Using the Openpyxl engine for Pandas via pd.ExcelWriter, I'd like to know if there is a way to specify a (custom) Excel duration format for elapsed time.
The format I would like to use is: [hh]:mm:ss which should give a time like: 01:01:01 for 1 hour, 1 minute, 1 second.
I want to write from a DataFrame into this format so that Excel can recognize it when I open the spreadsheet file in the Excel application, after writing the file.
Here is my current demo code, taking a duration of two datetime.now() timestamps:
import pandas as pd
from time import sleep
from datetime import datetime
start_time = datetime.now()
sleep(1)
end_time = datetime.now()
elapsed_time = end_time - start_time
df = pd.DataFrame([[elapsed_time]], columns=['Elapsed'])
with pd.ExcelWriter('./sheet.xlsx') as writer:
df.to_excel(writer, engine='openpyxl', index=False)
Note that in this implementation, type(elapsed_time) is <type 'datetime.timedelta'>.
The code will create an Excel file with approximately the value 0.0000116263657407407 in the column of "Elapsed". In Excel's time/date format, the value 1.0 equals 1 full day, so this is roughly 1 second of that 1 day.
If I under Format > Cells > Number (CMD + 1) select the Custom Category and specify the custom format [hh]:mm:ss for the cell, I will now see:
This desired format I want to see, every time I open the file in Excel, after writing the file.
However, I have looked around for solutions, and I cannot find a way to inherently tell pd.ExcelWriter, df.to_excel, or Openpyxl how to format the datetime.timedelta object in this way.
The Openpyxl documentation gives some very sparse indications:
Handling timedelta values Excel users can use number formats
resembling [h]:mm:ss or [mm]:ss to display time interval durations,
which openpyxl considers to be equivalent to timedeltas in Python.
openpyxl recognizes these number formats when reading XLSX files and
returns datetime.timedelta values for the corresponding cells.
When writing timedelta values from worksheet cells to file, openpyxl
uses the [h]:mm:ss number format for these cells.
How can I accomplish my goal of writing Excel-interpretable time (durations) in the format [hh]:mm:ss?
To achieve this, I do not require to use the current method of creating a datetime.timedelta object via datetime.now(). If it's possible to achieve this objective by using/converting to a datetime object or similar and formatting it, I would like to know how.
NB: I am using Python 2 with its latest pandas version 0.24.2 (and the openpyxl version installed with pip is the latest, 2.6.4). I hope that is not a problem as I cannot upgrade to Python 3 and later versions of pandas right now.
It was some time ago I worked on this, but the below solution worked for me in Python 2.7.18 using Pandas 0.24.2 and Openpyxl 2.6.4 from PyPi.
As stated in the question comments, later versions may solve this more elegantly (and there might furthermore be a more elegant way to do it in the old versions I use):
If writing to a new Excel file:
writer = pd.ExcelWriter(file = './sheet.xlsx', engine='openpyxl')
# Writes dataFrame to Writer Sheet, including column header
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Selects which Sheet in Writer to manipulate
sheet = writer.sheets['Sheet1']
# Formats specific cell with desired duration format
cell = 'A2'
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
If writing to an existing Excel file:
file = './sheet.xlsx'
writer = pd.ExcelWriter(file = file, engine='openpyxl')
# Loads content from existing Sheet in file
workbook = load_workbook(file)
writer.book = workbook #writer.book potentially needs to be explicitly stated like this
writer.sheets = {sheet.title: sheet for sheet in workbook.worksheets}
sheet = writer.sheets['Sheet1']
# Writes dataFrame to Writer Sheet, below the last existing row, excluding column header
df.to_excel(writer, sheet_name='Sheet1', startrow=sheet.max_row, index=False, header=False)
# Updates the row count again, and formats specific cell with desired duration format
# (the last cell in column A)
cell = 'A' + str(sheet.max_row)
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
The above code can of course easily be abstracted into one function handling writing to both new files and existing files, and extended to managing any number of different sheets or columns, as needed.
Total newbie and this is my first ever question so apologies in advance for any inadvertent faux pas.
I have a large(ish) dataset in Excel xlsx format that I would like to import into a pandas dataframe. The data has column headers except for the first column which does not have a header label. Here is what the excel sheet looks like:
Raw data
I am using read_excel() in Pandas to read in the data. The code I am using is:
df = pd.read_excel('Raw_Data.xlsx', sheetname=0, labels=None, header=0, index_col=None)
(I have tried index_col = false or 0 but, for obvious reasons, it doesn't change anything)
The headers for the columns are picked up fine but the first column, circled in red in the image below, is assigned as the index.
wrong index
What I am trying to get from the read_excel command is as follows with the index circled in red:
correct index
I have other excel sheets that I have used read_excel() to import into pandas and pandas automatically adds in a numerical incremental index rather than inferring one of the columns as an index.
None of those excel sheets had missing label in the column header though which might be the issue here though I am not sure.
I understand that I can use the reset_index() command after the import to get the correct index.
Wondering if it can be done without having to do the reset_index() and within the read_excel() command. i.e. is there anyway to prevent an index being inferred or to force pandas to add in the index column like it normally does.
Thank you in advance!
I don't think you can do it with only the read_excel function because of the missing value in cell A1. If you want to insert something into that cell prior to reading the file with pandas, you could consider using openpyxl as below.
from openpyxl import load_workbook as load
path = 'Raw_Data.xlsx'
col_name = 'not_index'
cell = 'A1'
def write_to_cell(path, col_name, cell):
wb = load(path)
for sheet in wb.sheetnames:
ws = wb[sheet]
if ws[cell].value is None:
ws[cell] = col_name
wb.save(path)
I am trying to create a database and fill it with values gotten from an excel sheet.
My code:
new_db = pd.DataFrame()
workbook = pd.ExcelFile(filename)
df = workbook.parse('Sheet1')
print(df)
new_db.append(df)
print(new_db.head())
But whenever I seem to do this, I get an empty dataframe back.
My excel sheet however is packed with values. When it is printed(print(df)) it prints it out with ID values and all the correct columns and rows.
My knowledge with Pandas-Dataframes is limited so excuse me if I do not know something I should. All help is appreciated.
I think pandas.read_excel is what you're looking for. here is an example:
import pandas as pd
df = pd.read_excel(filename)
print(df.head())
df will have the type pandas.DataFrame
The default parameters of read_excel are set in a way that the first sheet in the excel file will be read, check the documentation for more options(if you provide a list of sheets to read by setting the sheetname parameter df will be a dictionary with sheetnames as keys and their correspoding Dataframes as values). Depending on the version of Python you're using and its distribution you may need to install the xlrd module, which you can do using pip.
You need to reassign the df after appending to it, as #ayhan pointed out in the comments:
new_db = new_db.append(df)
From the Panda's Documentation for append, it returns an appended dataframe, which means you need to assign it to a variable.
I have a pandas DataFrame that I would like to write to Excel. For one column, I have data values that are comma-delimited strings, like "val1,val2" or "val1,val2,val3". When I write that column, I would like to replace the commas with the equivalent of pressing ALT-ENTER in Excel, so that there are line breaks between the values. So, my first example would display as val1, then a break within the cell, then val2, and so forth. You can also do this in Excel by making the cell a formula and putting &"char(10)"& between each value.
I see how I could do this by coding up formulas via XLSXWriter and writing cells individually. However I'm hopefully (or lazily) wondering whether there's a way to encode the breaks right into the data so that they would all come out via a simple call to to_excel() on the DataFrame.
In XlsxWriter you can use newlines in the string and the text_wrap format property to wrap it onto separate lines. See this section of the Format docs.
wrap_format = workbook.add_format({'text_wrap': True})
worksheet.write(0, 0, "Val1\nval2", wrap_format)
To do it from Pandas you could convert the commas in the strings to \n and then apply the text_wrap format to the column in the target spreadsheet.
I still had to do a bit of research after reading the answers here and therefore I wanted to post here a fully working example
df = pd.DataFrame({'Data': ['bla\nbla', 'blubb\nblubb']})
with pd.ExcelWriter('test.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
worksheet = writer.sheets['Sheet1']
cell_format = workbook.add_format({'text_wrap': True})
worksheet.set_column('A:Z', cell_format=cell_format)
In particular, I had to figure out that after creating the format object I still need to set it on the respective cells.
Note: You need to pip install xlsxwriter before doing it.
Here, again, the link to the Format Class documenation
for me this sort of code worked perfectly:
liste=['adsf','asdfas','dasdsas','asdfsdaf']
text=''
for elem in liste:
text=text+elem+'\n'
ws.Cells(1,1).Value=text