Pandas Not Reading Excel Properly

Pandas Not Reading Excel Properly - python

I am trying to use and Add-In for Excel that gets removed when I use win32com.client forcing me to restart my computer. I have found a work around using xlrd, openpyxl, and pandas but I have run into a completely new issue.
I first open Excel with the pandas and read through the file extracting the information that I require.
xl = pandas.ExcelFile(xlsx)
sheets = xl.sheet_names
df = xl.parse(sheets[2])
I then have to go into the same workbook and update the Meter Name and the date.
for i, value in enumerate(dataList):
wb = openpyxl.load_workbook(xlsx)
worksheets = wb.sheetnames
worksheet = wb.get_sheet_by_name(worksheets[0])
rowCoordinate = i
meterName = value[0]
creationDate = value[1]
units = value[2]
worksheet.cell(row=1, column=2).value = meterName
wb.save(copyXlsx)
dateList = []
for k, dateRange in enumerate(value[3]):
sDate = dateRange[0]
eDate = dateRange[1]
wb = openpyxl.load_workbook(copyXlsx)
worksheets = wb.sheetnames
worksheet = wb.get_sheet_by_name(worksheets[0])
worksheet.cell(row=2, column=2).value = sDate
worksheet.cell(row=3, column=2).value = eDate
wb.save(copyXlsx1)
print meterName, dateRange
xl1 = pandas.ExcelFile(copyXlsx1)
sheets = xl1.sheet_names
df = xl.parse(sheets[0])
print df
My issue is that the excel file opens and write the information perfectly. but pandas has all the header information updated but the numbers are the same from the original document. I have gone in and explored the Intermediate Excel Document and it doesn't match the number pandas shows

Related

Excel file corrupt or wrong extension error openpyxl & writerxlsx

I am using the following code to create and excel file using xlsxwriter and openpyxl to edit it as I may need to read from other excel files later down the line, however when I try to open the file it gives me an error that the excel file is corrupt or the extension is incorrect. When the source file is saved as .xlsm this error is not present, I would like to know why that is.
import xlsxwriter
import openpyxl
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Dabble dabble.xlsx')
worksheet = workbook.add_worksheet()
workbook.close()
target_file = 'Dabble dabble.xlsx'
i = 2
No = i - 1
Company = "Panasonic"
Location = "California"
Store_type = "Hyper Market"
Date = "1/1/2020"
No_loc = "A" + str(i)
company_loc = "C" + str(i)
location_loc = "B" + str(i)
store_type_loc = "D" + str(i)
date_loc = "E" + str(i)
srcfile = openpyxl.load_workbook(target_file, read_only=False,
keep_vba=True)
sheetname = srcfile['Sheet1']
sheetname[No_loc] = No
sheetname[company_loc] = Company
sheetname[location_loc] = Location
sheetname[store_type_loc] = Store_type
sheetname[date_loc] = Date
# Table headers
sheetname["B1"] = "Location"
sheetname["C1"] = "Company"
sheetname["D1"] = "Store Type"
sheetname["E1"] = "Date"
i = i + 1
srcfile.save(target_file) #Saving data to file
import pandas as pd
target_file = "Dabble dabble.xlsx"
df= pd.read_excel(target_file)
print (df)
However, when I parse the file using pandas it reads the data, which shows me that the file was created and written.
Unnamed: 0 Location Company Store Type Date
0 1 California Panasonic Hyper Market 1/1/2020

The issue is that you are setting keep_vba=True but the file you are dealing with isn't a xlsm file and doesn't have a vbaProject file. Just set it to false or omit the option.
srcfile = openpyxl.load_workbook(target_file,
read_only=False,
keep_vba=False)

Read and Write multiple excel data into one excel file using openpyxl

I am trying to copy the data from multiple excel into one excel. I am novice to python and openpyxl. So i have opened each file and went row by row and copied them. I want to do this with multiple files. How do i loop through row and columns and copy the data consider the column in all the files are same order?
import openpyxl as xl
from openpyxl import workbook
incident_wb = xl.load_workbook('incident resolved yesterday.xlsx')
incident_sheet = incident_wb['Page 1']
combined_wb = xl.Workbook()
combined_sheet = combined_wb.active
combined_sheet.title = "combined_sheet"
combined_wb.save('combined_sheet.xlsx')
for row in range(1, incident_sheet.max_row+1):
incident_no = incident_sheet.cell(row,1)
opened_date = incident_sheet.cell(row,2)
shrt_desc = incident_sheet.cell(row,3)
requester = incident_sheet.cell(row,4)
incdnt_type = incident_sheet.cell(row,5)
priority = incident_sheet.cell(row,6)
assgn_grp = incident_sheet.cell(row,7)
assgn_to = incident_sheet.cell(row,8)
updated = incident_sheet.cell(row,9)
status = incident_sheet.cell(row,10)
sub_status = incident_sheet.cell(row,11)
##copy the data into the new sheet
incident_no_1 = combined_sheet.cell(row,1)
incident_no_1.value = incident_no.value
opened_date_1 = combined_sheet.cell(row,2)
opened_date_1.value = opened_date.value
shrt_desc_1 = combined_sheet.cell(row,3)
shrt_desc_1.value = shrt_desc.value
requester_1 = combined_sheet.cell(row,4)
requester_1.value = requester.value
incdnt_type_1 = combined_sheet.cell(row,5)
incdnt_type_1.value = incdnt_type.value
priority_1 = combined_sheet.cell(row,6)
priority_1.value = priority.value
assgn_grp_1 = combined_sheet.cell(row,7)
assgn_grp_1.value = assgn_grp.value
assgn_to_1 = combined_sheet.cell(row,8)
assgn_to_1.value = assgn_to.value
updated_1 = combined_sheet.cell(row,9)
updated_1.value = updated.value
status_1 = combined_sheet.cell(row,10)
status_1.value = status.value
sub_status_1 = combined_sheet.cell(row,11)
sub_status_1.value = sub_status.value
##print(f"The incident resolved yesterday {incident_no.value}")
combined_wb.save('combined_sheet.xlsx')

An alternative approach would be to build a list of date from multiple excel files and then write it to another file.
As a proof of concept:
import openpyxl as xl
from openpyxl import workbook
def provide_data(workbookName, sheetName):
wb = xl.load_workbook(workbookName)
sheet = wb[sheetName]
return [[y.value for y in x] for x in sheet.iter_rows()]
# This creates an array of rows, which contain an array of cell values.
# It will be much better to provide mapping for cells and return business object.
def save_data(list_of_sheets):
combined_wb = xl.Workbook()
combined_sheet = combined_wb.active
combined_sheet.title = "combined_sheet"
for sheet in list_of_sheets:
for row in sheet:
combined_sheet.append(row) # combining multiple rows.
combined_wb.save('combined_sheet.xlsx')
workSheetsToCopy = [['incident resolved yesterday.xlsx', 'Page 1'], ['other.xlsx', 'Page 1']]
workSheetsToCopy = [provide_data(x[0], x[1]) for x in workSheetsToCopy]
save_data(workSheetsToCopy)

How to stop openpyxl - python from clearing my excel file every time I re-run the program?

I wrote a simple program for testing with openpyxl where I simply open the .xlsx file, input data into a certain cell, then close the program and run it again, inputting data in a different cell, but when I open the .xlsx after running the program for the second.
My assumption is that openpyxl clears the entire .xlsx file everytime you open it again, is there a way to avoid this?
Here is my code:
from openpyxl import Workbook
wb = Workbook()
dest_filename = 'teste.xlsx'
ws = wb.active
ws.title = "2017"
Row = int(input('row: '))
Column = int(input('column: '))
data = input('data: ')
ws.cell(row = Row, column = Column).value = data
wb.save(filename = dest_filename)
Here is the .xlsx file after running the program for the first time
Here is the .xlsx file after running the program for the second time

You have not read the excel file at all:
Use this to read the existing workbook:
from openpyxl import Workbook,load_workbook
import os
dest_filename = 'teste.xlsx'
if os.path.isfile(dest_filename):
wb = load_workbook(filename = dest_filename)
else:
wb = Workbook()
ws = wb.active
ws.title = "2017"
Row = int(input('row: '))
Column = int(input('column: '))
data = input('data: ')
ws.cell(row = Row, column = Column).value = data
wb.save(filename = dest_filename)
Output:

Importing Multiple Excel Files using OpenPyXL

I am trying to read in multiple excel files and append the data from each file into one master file. Each file will have the same headers (So I can skip the import of the first row after the initial file).
I am pretty new to both Python and the OpenPyXL module. I am able to import the first workbook without problem. My problem comes in when I need to open the subsequent file and copy the data to paste into the original worksheet.
Here is my code so far:
# Creating blank workbook
from openpyxl import Workbook
wb = Workbook()
# grab active worksheet
ws = wb.active
# Read in excel data
from openpyxl import load_workbook
wb = load_workbook('first_file.xlsx') #explicitly loading workbook, will automate later
# grab active worksheet in current workbook
ws = wb.active
#get max columns and rows
sheet = wb.get_sheet_by_name('Sheet1')
print ("Rows: ", sheet.max_row) # for debugging purposes
print ("Columns: ", sheet.max_column) # for debugging purposes
last_data_point = ws.cell(row = sheet.max_row, column = sheet.max_column).coordinate
print ("Last data point in current worksheet:", last_data_point) #for debugging purposes
#import next file and add to master
append_point = ws.cell(row = sheet.max_row + 1, column = 1).coordinate
print ("Start new data at:", append_point)
wb = load_workbook('second_file.xlsx')
sheet2 = wb.get_sheet_by_name('Sheet1')
start = ws.cell(coordinate='A2').coordinate
print("New data start: ", start)
end = ws.cell(row = sheet2.max_row, column = sheet2.max_column).coordinate
print ("New data end: ", end)
# write a value to selected cell
#sheet[append_point] = 311
#print (ws.cell(append_point).value)
#save file
wb.save('master_file.xlsx')
Thanks!

I don't really understand your code. It looks too complicated. When copying between worksheets you probably want to use ws.rows.
wb1 = load_workbook('master.xlsx')
ws2 = wb1.active
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
for row in ws2.rows[1:]:
ws1.append((cell.value for cell in row))

updating excel using python

I have a workbook with multiple sheets in it. I am trying to read one sheet data and match to column fields in other sheet to see if they match update some column for that sheet. This is what I was trying. But as I understand XLRD can't be used to write. Can anyone point me to python library or module which can do both read and write at sam time:
`#!/usr/bin/python
import xlrd, xlwt
workbook = xlrd.open_workbook('nagios.xlsx')
workbook1 = xlwt.Workbook()
worksheet1 = workbook.sheet_by_name('contacts_users')
worksheet2 = workbook.sheet_by_name('contact_group_nagios')
for row in range(1, worksheet2.nrows):
print "value: ", worksheet2.cell(row,0).value
print "value: ", worksheet2.cell(row,1).value
s = worksheet2.cell(row,1).value
grp_name = worksheet2.cell(row,0).value
members = s.split(",")
for member in members:
for row1 in range(1, worksheet1.nrows):
if member == worksheet1.cell(row1,0).value:
s1 = worksheet1.cell(row1,3).value
s1 += grp_name
worksheet1.append(row1,3, s1)`

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Not Reading Excel Properly - python

Related

Excel file corrupt or wrong extension error openpyxl & writerxlsx

Read and Write multiple excel data into one excel file using openpyxl

How to stop openpyxl - python from clearing my excel file every time I re-run the program?

Importing Multiple Excel Files using OpenPyXL

updating excel using python

Categories

Resources