Hi I am trying to create a table in excel using a dataframe from another excel spreadsheet and writing the table to a new one. I believe my code is correct but the table isn't writing to the new excel spreadsheet. Can someone take a look at my code and tell me what's wrong?
import xlsxwriter
import pandas as pd
import openpyxl as pxl
import xlsxwriter
import numpy as np
from openpyxl import load_workbook
path = '/Users/benlong/Downloads/unemployment.xlsx'
df = pd.read_excel(path)
rows = df.shape[0]
columns = df.shape[1]
wb = xlsxwriter.Workbook('UE2.xlsx')
ws = wb.add_worksheet('Sheet1')
ws.add_table(0,0,rows,columns, {'df': df})
wb.close()
You should convert your dataframe to list . By using df.values.tolist() and use the key data.
In your case , you also should set the header of df and avoid getting a nan value error.
eg:
import xlsxwriter as xlw
# while got NaN/Inf values from ur dataframe , u'll get a value of '#NUM!' instead in saved excel
wb = xlw.Workbook('UE2.xlsx',{'nan_inf_to_errors': True})
ws = wb.add_worksheet('Sheet1')
cell_range = xlw.utility.xl_range(0, 0, rows, columns-1)
header = [{'header': str(di)} for di in df.columns.tolist()]
ws.add_table(cell_range, {'header_row': True,'first_column': False,'columns':header,'data':df.values.tolist()})
wb.close()
Possible duplicate: How to use xlsxwriter .add_table() method with a dataframe?
You can try converting the dataframe to a list of lists and use the data keyword.
ws.add_table(0,0,rows,columns, {'data': df.values.T.tolist()})
Related
Pandas.DataFrame, I have this output data as a dataframe and i wanted to write back this data back to excel.
This is excel sheet format
I wanted to write dataframe row in excel cell, for example :- Kosten EK will goes in excel sheet D4, IRR mit Finanzierung will go in excel sheet D5. I have same dataframe in which Soll-SOC 1-12 value is single value not an array and it is working properly, but for this case because of array i could not write. how can i solve this?
I am using xlwings, xlwriter to write data back to excel
import xlwings as xw
wb = xw.Book(file_path) # wb = xw.Book(filename) would open an existing file
Working_Sheet = wb.sheets["sheet_name"] # activating working sheet
Working_Sheet.range('D4:D15').options(index=False,header=False).value = Data[20000][0.25]
You should try to convert to a pd.DataFrame object.
import pandas as pd
import xlwings as xw
df = pd.DataFrame(...)
import xlwings as xw
wb = xw.Book(file_path) # wb = xw.Book(filename) would open an existing file
Working_Sheet = wb.sheets["sheet_name"] # activating working sheet
Working_Sheet.range('D4:D15').options(convert=pd.DataFrame, index=False,header=False).value = Data[20000][0.25]
I have write some content to a xlsx file by using xlsxwriter
workbook = xlsxwriter.Workbook(file_name)
worksheet = workbook.add_worksheet()
worksheet.write(row, col, value)
worksheet.close()
I'd like to add a dataframe after the existing rows to this file by to_excel
df.to_excel(file_name,
startrow=len(existing_content),
engine='xlsxwriter')
However, this seems not work.The dataframe not inserted to the file. Anyone knows why?
Unfortunately, as the content above is not specifically written, let's take a look at to_excel and XlsxWriter as examples.
using xlsxwriter
import xlsxwriter
# Create a new Excel file and add a worksheet
workbook = xlsxwriter.Workbook('example.xlsx')
worksheet = workbook.add_worksheet()
# Add some data to the worksheet
worksheet.write('A1', 'Language')
worksheet.write('B1', 'Score')
worksheet.write('A2', 'Python')
worksheet.write('B2', 100)
worksheet.write('A3', 'Java')
worksheet.write('B3', 98)
worksheet.write('A4', 'Ruby')
worksheet.write('B4', 88)
# Save the file
workbook.close()
Using the above code, we have saved the table similar to the one below to an Excel file.
Language
Score
Python
100
Java
98
Ruby
88
Next, if we want to add rows using a dataframe.to_excel :
using to_excel
import pandas as pd
# Load an existing Excel file
existing_file = pd.read_excel('example.xlsx')
# Create a new DataFrame to append
df = pd.DataFrame({
'Language': ['C++', 'Javascript', 'C#'],
'Score': [78, 97, 67]
})
# Append the new DataFrame to the existing file
result = pd.concat([existing_file, df])
# Write the combined DataFrame to the existing file
result.to_excel('example.xlsx', index=False)
The reason for using pandas concat:
To append, it is necessary to use pandas.DataFrame.ExcelWriter(), but XlsxWriter does not support append mode in ExcelWriter
Although the task can be accomplished using pandas.DataFrame.append(), the append method is slated to be deleted in the future, so we use concat instead.
The OP is using xlsxwriter in the engine parameter. Per XlsxWriter documentation "XlsxWriter is designed only as a file writer. It cannot read or modify an existing Excel file." (link to XlsxWriter Docs).
Below I've provided a fully reproducible example of how you can go about modifying an existing .xlsx workbook using the openpyxl module (link to Openpyxl Docs).
For demonstration purposes, I'll first create create a workbook called test.xlsx using pandas:
import pandas as pd
df = pd.DataFrame({'Col_A': [1,2,3,4],
'Col_B': [5,6,7,8],
'Col_C': [0,0,0,0],
'Col_D': [13,14,15,16]})
df.to_excel('test.xlsx', index=False)
This is the Expected output at this point:
Using openpyxl you can use another dataset to load the existing workbook ('test.xlsx') and modify the third column with different data from the new dataframe while preserving the other existing data. In this example, for simplicity, I update it with a one column dataframe but you could extend it to update or add more data.
from openpyxl import load_workbook
import pandas as pd
df_new = pd.DataFrame({'Col_C': [9, 10, 11, 12]})
wb = load_workbook('test.xlsx')
ws = wb['Sheet1']
for index, row in df_new.iterrows():
cell = 'C%d' % (index + 2)
ws[cell] = row[0]
wb.save('test.xlsx')
With the Expected output at the end:
I'm trying to read spreadsheets in an xlsb file in python and I've used to code below to do so. I found the code in stack overflow and I'm sure that it reads every single column in a row of a spreadsheet and appends it to a dataframe. How can I modify this code so that it only reads/appends specific columns of the spreadsheet i.e. I only want to append data in columns B through D into my dataframe.
Any help would be appreciated.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row])
df = pd.DataFrame(df[1:], columns=df[0])
pyxlsb itself cannot do it, but it is doable with the help of xlwings.
import pandas as pd
import xlwings as xw
from pyxlsb import open_workbook as open_xlsb
with open_xlsb(r"W:\path\filename.xlsb") as wb:
Data=xw.Range('B:D').value
#Creates a dataframe using the first list of elements as columns
Data_df = pd.DataFrame(Data[1:], columns=Data[0])
Just do:
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row if item.c > 0 and item.c < 4])
df = pd.DataFrame(df[1:], columns=df[0])
item.c refers to the column number starting at 0
I have been searching this question to write in an existing excel sheet starting from specific row and column however methods like dataframe_to_rows is not writing from a specific position in a cell.
I am now using a custom loop to write this however was wondering if there is a better approach.
The loops works like this
import pandas as pd
import numpy as np
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
df = pd.DataFrame(np.random.randn(20, 4), columns=list('ABCD'))
file = "C:\\somepath\\some_existing_file.xlsx"
wb = load_workbook(filename=file, read_only=False)
ws = wb['some_existing_sheet']
##Fill up the row and column needed
stcol = 5
strow = 5
## Writing the column header
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow)].value = df.columns[c]
## Writing the data
for r in range(0,len(df)):
for c in range(0,len(df.columns)):
ws[get_column_letter(c+stcol)+str(strow+r+1)].value = df.iloc[r][c]
wb.save(file)
Please let me know if there is a better way to write to specefic position in a cell. By any chance if this turns out to be duplicate question, happy to merge in the original thread.
I do have another approach however with xlsx writer but this removes all other data from existing sheet
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application') # opens Excel
writer = pd.ExcelWriter(file', engine='xlsxwriter')
df.to_excel(writer, sheet_name='abc', startrow=5, startcol=5,index=False)
writer.save()
Instead of
ws[get_column_letter(c+stcol)+str(strow)]
you can use
ws.cell(column=c+stcol, row=strow)
I am using Python 3.4 and xlrd. I want to sort the Excel sheet based on the primary column before processing it. Is there any library to perform this ?
There are a couple ways to do this. The first option is to utilize xlrd, as you have this tagged. The biggest downside to this is that it doesn't natively write to XLSX format.
These examples use an excel document with this format:
Utilizing xlrd and a few modifications from this answer:
import xlwt
from xlrd import open_workbook
target_column = 0 # This example only has 1 column, and it is 0 indexed
book = open_workbook('test.xlsx')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in xrange(sheet.nrows)]
labels = data[0] # Don't sort our headers
data = data[1:] # Data begins on the second row
data.sort(key=lambda x: x[target_column])
bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)
for idx, label in enumerate(labels):
sheet.write(0, idx, label)
for idx_r, row in enumerate(data):
for idx_c, value in enumerate(row):
sheet.write(idx_r+1, idx_c, value)
bk.save('result.xls') # Notice this is xls, not xlsx like the original file is
This outputs the following workbook:
Another option (and one that can utilize XLSX output) is to utilize pandas. The code is also shorter:
import pandas as pd
xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort(columns="Header Row")
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,sheet_name='Sheet1',columns=["Header Row"],index=False)
writer.save()
This outputs:
In the to_excel call, the index is set to False, so that the Pandas dataframe index isn't included in the excel document. The rest of the keywords should be self explanatory.
I just wanted to refresh the answer as the Pandas implementation has changed a bit over time. Here's the code that should work now (pandas 1.1.2).
import pandas as pd
xl = pd.ExcelFile("test.xlsx")
df = xl.parse("Sheet1")
df = df.sort_values(by="Header Row")
...
The sort function is now called sort_by and columns is replaced by by.