Creating a new sheet every month with openpyxl - python

Hi Im involved with tourist lodges in Namibia. We record water readings ect. every day and input to an Excel file and calculate consumption per Pax , the problem is not every staff member understands Excel. So I wrote a simple Python program to input readings into excel automatically. It works the only problem is I want to save each month in a new sheet and have all the data grouped by month (eg. January(all readings) February(all readings)) . I can create a new sheet but I cannot input data to the new sheet, it just overwrites my data from the previous months... The code looks as follows
*import tkinter
from openpyxl import load_workbook
from openpyxl.styles import Font
import time
import datetime
book = load_workbook('sample.xlsx')
#sheet = book.active
Day = datetime.date.today().strftime("%B")
x = book.get_sheet_names()
list= x
if Day in list: # this checks if the sheet exists to stop the creation of multiple sheets with the same name
sheet = book.active
else:
book.create_sheet(Day)
sheet = book.active
#sheet = book.active*
And to write to the sheet I use and entry widget then save the value as follow:
Bh1=int(Bh1In.get())
if Bh1 == '0':
import Error
else:
sheet.cell(row=Day , column =4).value = Bh1
number_format = 'Number'
Maybe I'm being stupid but please help!!

You're depending on getting the active worksheet instead of accessing it by name. Simply using something like:
try:
sheet = wb[Day]
except KeyError:
sheet = wb.create_sheet(Day)
is probably all you need.

Try
if Day in list: # this checks if the sheet exists to stop the creation of multiple sheets with the same name
sheet = book.get_sheet_by_name(Day)
else:
book.create_sheet(Day)
book.save('sample.xlsx')
sheet = book.get_sheet_by_name(Day)

Related

How to remove a large number of rows from Excel spreadsheet based on time?

I have an Excel spreadsheet with over 44,000 rows (sensor readings taken each minute for a month). I want to reduce them to every 15 minutes.
I want to remove rows where Time -column does not end in :
01
16
31
46
We can do something using two main figures:
openpyxl package to retrieve the data from the Excel sheet.
pip3 install openpyxl
String Operation to compare the value if its == 15 minutes:
Insert into a list.
If you want to appened it into the Excel sheet, again. Please, check this reference writing into Excel using openpyxl
# importing openpyxl module
import openpyxl
# Give the location of the file
path = "C:\\Users\\Admin\\Desktop\\demo.xlsx"
# workbook object is created
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
m_row = sheet_obj.max_row
aList = []
# Loop will print all values
# of first column
for i in range(2, m_row + 1):
cell_obj = sheet_obj.cell(row = i, column = 1)
if (cell_obj.value[:-2] == 15):
aList.append(cell_obj.value)
For more information about openpyxl, please check this hyperlink;
How to read from Excel sheet using openpyxl
I managed to find a solution using Python, so this is no longer an issue. Thank you.
data2 = data.set_index('Time').resample('15T').mean()
data2

How to paste into cell based on value - Openpyxl

Good morning all.
I'm in a situation where I have two Excel workbooks. The first has my source data, and the second I'm trying to paste the source data into.
My code searches for a particular cell with today's date in in the first workbook, finds the cells of data I require associated with it, then tries to paste that range of data into a second workbook.
The code is able to currently iterate over the first workbook and find the correct data, but the issue comes to when I try and paste the data into the second workbook.
If for example the data found is from A40:C40, it will paste into the second workbook at the same location (A40:C40). I need the code to iterate the second workbook and find the correct location to paste the data in based on another cells value.
To be clear, the location of the copy and the paste varies every day. I cannot use a fixed cell reference.
from openpyxl import Workbook
import openpyxl
import datetime
wb = openpyxl.load_workbook('Online Log.xlsx')
wb1 = openpyxl.load_workbook('Blank.xlsx')
sheet = wb['Weather']
sheet1 = wb1['Sheet2']
# Find yesterday in date format
today = datetime.date.today()
yesterday = str(today - datetime.timedelta(days=1))
# Find position of midnight position on today's DPR
for row in sheet.iter_rows():
for cell in row:
if str(cell.value) == (str(today) + ' 00:00:00'):
Start_Coord = sheet.cell(row=cell.row, column=3).coordinate
End_Coord = sheet.cell(row=cell.row + 3, column=9).coordinate
for row in sheet[Start_Coord:End_Coord]:
for cell in row:
sheet1[cell.coordinate].value = cell.value
wb1.save('file2.xlsx')
I've tried incorporating the following code to search for the relevant place to paste into the second workbook, but that doesn't work either.
for rows in sheet1.iter_rows():
for cell in rows:
if str(cell.value) == 'Paste Cell Below':
Start_Coord_2 = sheet1.cell(row=cell.row, column=3).coordinate
End_Coord_2 = sheet1.cell(row=cell.row + 3, column=9).coordinate
for rows in sheet1[Start_Coord_2:End_Coord_2]:
for cell in rows:
sheet1[cell.coordinate].value = cell.value
print(cell.coordinate)

How can you Create a copy of a worksheet by iterating through another while excluding certain rows in a array.?

I'm comparing two workbooks using Openpyxl I have it incrementing a counter for later usage and then keeping track of rows that should be removed from the initial workbook. How do I go about getting rid of these rows from that workbook or creating a new sheet(With the Original then deleted) or workbook with those rows removed?
I've written the code up until this point but I havent found much in terms of writing or deleting rows from a workbook and I haven't any concrete luck, I was advised by someone to instead create a copy of the workbook but I also have had no success at doing such.
from openpyxl import load_workbook
from tkinter import Tk
from tkinter.filedialog import askopenfilename
import datetime
import time
class ManualReporter:
def __init__(self):
'''
Initializes Variables for use within the Class
Hides the tkinter pop-up when using the file dialog
'''
Tk().withdraw()
self.sap_file = None
self.tracker_file = None
self.wb_sap = None
self.wb_wt = None
self.XT = 0
self.deadrows = []
def open_sapfile(self):
'''
Sets the sap_file variable to be the first directory to the SAP Report based on what the User Selects in the File Dialog
Sets that directory and the file as the current workbook under the variable self.wb_sap
Creates a Backup of the SAP Report so that if Errors Occur a Fresh Clean Copy is Present
'''
self.sap_file = askopenfilename()
self.wb_sap = load_workbook(filename=self.sap_file)
# Code to create a backup File in-case of Error or Fault
copyfile = "Untimed_Report_SAP_" + str(datetime.date.today())+".xlsx"
self.wb_sap.save(copyfile)
print(self.sap_file)
def open_tracker(self):
'''
Same as Above, sets self.tracker_file as a filedialog which retrieves the file's directory (User Inputted)
Loads the File Workbook as self.wb_wt
Creates a Backup of the Second SAP Report so that if Error Occurs a Clean Copy is Present.
'''
self.tracker_file = askopenfilename()
self.wb_wt = load_workbook(filename=self.tracker_file)
print(self.tracker_file)
def check_rows(self):
'''
Sets the Active Sheets in Both the Workbook Variables,
Creates a New Sheet in the Newest Report to Contain the Modified Data,
Iterates through the Rows of the Two Sheets checking for a Comparison in Part Number,
OpCode and then Compares the X/T/P Classification and Adjusts Data in Second Sheet
'''
start = time.time()
sap = self.wb_sap.worksheets[0] #Sets The First Sheet in the Excel Workbook as the variable sap
wt = self.wb_wt.worksheets[0]#Sets the First Sheet in the Second Report as the var wt
ws1 = self.wb_sap.create_sheet("Sheet1", 1)#Create a Spare Sheet in the First Report to place the Adjusted Data
ws1 = self.wb_sap.worksheets[1]#Sets ws1 as the Active Second Sheet for New Data
for saprow in sap.iter_rows():
for wtrow in wt.iter_rows():
if (saprow[3].value == wtrow[4].value and int(saprow[2].value) == int(wtrow[5].value)):# IF Material NUM & OPCode MATCH DO:
if wtrow[7].value in ("T","P"): #WT Entry is Marked as T/P
if saprow[4].value is "X": #SAP Report Entry is Marked as X
self.XT += 1#Increment X->Ts Counts
#print("X->T")
self.deadrows.append(saprow)
else:
if saprow not in self.deadrows:
ws1.append(saprow)
end = time.time()
#print("Finished, Total X->Ts: ", self.XT)
print("Time Taken: ", (end - start))
x = ManualReporter()
x.open_sapfile()
x.open_tracker()
x.check_rows()
My expectation is that the output would be an exact copy of workbook one but the rows that had a certain change in values are removed from that workbook. I expected to be able to delete them but no methods I've done have achieved anything other than broken code or issues.
self.deadrows.append(saprow)
else:
if saprow not in self.deadrows:
for i in saprow:
#Code to Create a row in ws1.
#Code to Append value of saprow[i] to current ws1 rows
EDIT 1: I included my Attempts to append the rows to a copied worksheet.
EDIT 2: I though about manually iterating through the Saprow and appending the data into the rows of the new sheet but I've stumped myself thinking about it.
After Ample help I have reached the conclusion that to copy data from one sheet to another you can copy over data row by row through this Method:
self.workbook = load_workbook(filename="filepath")
sheet1 = self.workbook.worksheet[0]
sheet2 = self.workbook.create_sheet("Sheet 2")
sheet2 = self.workbook.worksheets[1]
for row in sheet1.iter_rows():
sheet2.append([cell.value for cell in row])
I also figured out if you want to filter out data you can add if statements inside of the for-loop above that can limit what rows have their cells written into the new worksheet.
self.RowsToExclude = Some List containing row data that will be excluded.
for row in sheet1.iter_rows():
if row not in self.RowsToExclude:
ws1.append([cell.value for cell in row])
Finally, I'd like to thank all those who contributed towards me reaching this conclusion.

Python 3x win32com: Copying used cells from worksheets in workbook

I have 6 work sheets in my workbook. I want to copy data (all used cells except the header) from 5 worksheets and paste them into the 1st. Snippet of code that applies:
`
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(mergedXL)
wsSIR = wb.Sheets(1)
sheetList = wb.Sheets
for ws in sheetList:
used = ws.UsedRange
if ws.Name != "1st sheet":
print ("Copying cells from "+ws.Name)
used.Copy()
`
used.Copy() will copy ALL used cells, however I don't want the first row from any of the worksheets. I want to be able to copy from each sheet and paste it into the first blank row in the 1st sheet. So when cells from the first sheet (that is NOT the sheet I want to copy to) are pasted in the 1st sheet, they will be pasted starting in A3. Every subsequent paste needs to happen in the first available blank row. I probably haven't done a great job of explaining this, but would love some help. Haven't worked with win32com a ton.
I also have this code from one of my old scripts, but I don't understand exactly how it's copying stuff and how I can modify it to work for me this time around:
ws.Range(ws.Cells(1,1),ws.Cells(ws.UsedRange.Rows.Count,ws.UsedRange.Columns.Count)).Copy()
wsNew.Paste(wsNew.Cells(wsNew.UsedRange.Rows.Count,1))
If I understand well your problem, I think this code will do the job:
import win32com.client
# create an instance of Excel
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
# Open the workbook
file_name = 'path_to_your\file.xlsx'
wb = excel.Workbooks.Open(file_name)
# Select the first sheet on which you want to write your data from the other sheets
ws_paste = wb.Sheets('Sheet1')
# Loop over all the sheets
for ws in wb.Sheets:
if ws.Name != 'Sheet1': # Not the first sheet
used_range = ws.UsedRange.SpecialCells(11) # 11 = xlCellTypeLastCell from VBA Range.SpecialCells Method
# With used_range.Row and used_range.Col you get the number of row and col in your range
# Copy the Range from the cell A2 to the last row/col
ws.Range("A2", ws.Cells(used_range.Row, used_range.Column)).Copy()
# Get the last row used in your first sheet
# NOTE: +1 to go to the next line to not overlapse
row_copy = ws_paste.UsedRange.SpecialCells(11).Row + 1
# Paste on the first sheet starting the first empty row and column A(1)
ws_paste.Paste(ws_paste.Cells(row_copy, 1))
# Save and close the workbook
wb.Save()
wb.Close()
# Quit excel instance
excel.Quit()
I hope it helps you to understand your old code as well.
Have you considered using pandas?
import pandas as pd
# create list of panda dataframes for each sheet (data starts ar E6
dfs=[pd.read_excel("source.xlsx",sheet_name=n,skiprows=5,usecols="E:J") for n in range(0,4)]
# concatenate the dataframes
df=pd.concat(dfs)
# write the dataframe to another spreadsheet
writer = pd.ExcelWriter('merged.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

Can't save excel file using openpyxl

I'm having an issue with saving an Excel file in openpyxl.
I'm trying to create a processing script which would grab data from one excel file, dump it into a dump excel file, and after some tweaking around with formulas in excel, I will have all of the processed data in the dump excel file. My current code is as so.
from openpyxl import load_workbook
import os
import datetime
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
dump = dumplocation
desktop = desktoplocation
date = datetime.datetime.now().strftime("%Y-%m-%d")
excel = load_workbook(dump+date+ ".xlsx", use_iterators = True)
sheet = excel.get_sheet_by_name("Sheet1")
try:
query = raw_input('How many rows of data is there?\n')
except ValueError:
print 'Not a number'
#sheetname = raw_input('What is the name of the worksheet in the data?\n')
for filename in os.listdir(desktop):
if filename.endswith(".xlsx"):
print filename
data = load_workbook(filename, use_iterators = True)
ws = data.get_sheet_by_name(name = '17270115')
#copying data from excel to data excel
n=16
for row in sheet.iter_rows():
for cell in row:
for rows in ws.iter_rows():
for cells in row:
n+=1
if (n>=17) and (n<=32):
cell.internal_value = cells.internal_value
#adding column between time in UTC and the data
column_index = 1
new_cells = {}
sheet.column_dimensions = {}
for coordinate, cell in sheet._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
# shifting columns
if column >= column_index:
column += 1
column_letter = get_column_letter(column)
coordinate = '%s%s' % (column_letter, row)
# it's important to create new Cell object
new_cells[coordinate] = Cell(sheet, column_letter, row, cell.value)
sheet.cells = new_cells
#setting columns to be hidden
for coordinate, cell in sheet._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
if (column<=3) and (column>=18):
column.set_column(column, options={'hidden': True})
A lot of my code is messy I know since I just started Python two or three weeks ago. I also have a few outstanding issues which I can deal with later on.
It doesn't seem like a lot of people are using openpyxl for my purposes.
I tried using the normal Workbook module but that didn't seem to work because you can't iterate in the cell items. (which is required for me to copy and paste relevant data from one excel file to another)
UPDATE: I realised that openpyxl can only create workbooks but can't edit current ones. So I have decided to change tunes and edit the new workbook after I have transferred data into there. I have resulted to using back to Workbook to transfer data:
from openpyxl import Workbook
from openpyxl import worksheet
from openpyxl import load_workbook
import os
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
dump = "c:/users/y.lai/desktop/data/201501.xlsx"
desktop = "c:/users/y.lai/desktop/"
excel = Workbook()
sheet = excel.add_sheet
try:
query = raw_input('How many rows of data is there?\n')
except ValueError:
print 'Not a number'
#sheetname = raw_input('What is the name of the worksheet in the data?\n')
for filename in os.listdir(desktop):
if filename.endswith(".xlsx"):
print filename
data = load_workbook(filename, use_iterators = True)
ws = data.get_sheet_by_name(name = '17270115')
#copying data from excel to data excel
n=16
q=0
for x in range(6,int(query)):
for s in range(65,90):
for cell in Cell(sheet,chr(s),x):
for rows in ws.iter_rows():
for cells in rows:
q+=1
if q>=5:
n+=1
if (n>=17) and (n<=32):
cell.value = cells.internal_value
But this doesn't seem to work still
Traceback (most recent call last):
File "xxx\Desktop\xlspostprocessing.py", line 40, in <module>
for cell in Cell(sheet,chr(s),x):
File "xxx\AppData\Local\Continuum\Anaconda\lib\site-packages\openpyxl\cell.py", line 181, in __init__
self._shared_date = SharedDate(base_date=worksheet.parent.excel_base_date)
AttributeError: 'function' object has no attribute 'parent'
Went through the API but..I'm overwhelmed by the coding in there so I couldn't make much sense of the API. To me it looks like I have used the Cell module wrongly. I read the definition of the Cell and its attributes, thus having the chr(s) to give the 26 alphabets A-Z.
You can iterate using the standard Workbook mode. use_iterators=True has been renamed read_only=True to emphasise what this mode is used for (on demand reading of parts).
Your code as it stands cannot work with this method as the workbook is read-only and cell.internal_value is always a read only property.
However, it looks like you're not getting that far because there is a problem with your Excel files. You might want to submit a bug with one of the files. Also the mailing list might be a better place for discussion.
You could try using xlrd and xlwt instead of pyopenxl but you might find exactly what you are looking to do already available in xlutil - all are from python-excel.

Categories

Resources