Here is my process:
Step 1: Open File1
Step 2: Load Sheet1
Step 3: Load OutputFile
Step 4: Create a new sheet in OutputFile
Step 5: Copy contents cell by cell from Step 2 to paste in the sheet created in Step 4
Step 6: Repeat the process 'n' number of times
I have created a Python script to achieve this but the program is insanely slow. Takes an hour to complete. Here is a snippet of the code that does the copying over.
import xlsxwriter as xlsx
import openpyxl as xl
for i in range (6,k):
#get the file location/name from source file
filename = sheet.cell_value(i,3)
#get the sheetname from the sheet read in the above statement
sheetname = sheet.cell_value(i,4)
#print the file name to verify
print(filename)
#get output sheet name
outputsheetname = sheet.cell_value(i,5)
#load the source workbook
wb1 = xl.load_workbook(filename=filename,data_only = True)
#get the index of sheet to be copied
wb1_sheet_index = wb1.sheetnames.index(sheetname)
#load the sheet
ws1 = wb1.worksheets[wb1_sheet_index]
#load the output workbook
wb2 = xl.load_workbook(filename=output_loc)
#create a new sheet in output workbook
ws2 = wb2.create_sheet(outputsheetname)
#print(ws2,":",outputsheetname)
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save(output_loc)
wb2.save(output_loc)
The filename, sheetname and outputsheetname comes from a master excel sheet where I keep the file location and sheet names. I load this file before this loop.
Also, I want the contents of the cell to be copied. If the source sheet has any formula, I do not want that to be copied over. And if there is a value 500 in Cell A5, I want the value to be in cell A5 in the output sheet.
Maybe I am approaching this the wrong way. Any help is appreciated.
openpyxl is the slowest module to work with excel file. you can try doing it with xlwings or if you're okay to use any excel add-in here is the RDB Merge that you can prefer using it, it is comparetively fast and does work
Related
I'm trying copy and paste some data from one sheet to another sheet. The code works fine but I only need the value.
original_wb = xl.load_workbook(filename1)
copy_to_wb = xl.load_workbook(filename1)
source_sheet = original_wb.worksheets[0] # The first worksheet
copy_to_sheet = copy_to_wb.create_sheet(source_sheet.title+"_copy")
for row in source_sheet:
for cell in row:
copy_to_sheet[cell.coordinate].value = cell.value
copy_to_wb.save(str(filename1))
Can this be done in pandas instead?
if you want just values to be read and copied to new sheet . try read excel and write excel commands.
file_name= r"path"
#Read
df= (pd.read_excel(io=file_name,sheet_name='name'))
#process required data
#write to new work book or sheet
df.to_excel( file_name ,sheet_name= 'name')
I have to copy data from different workbooks and paste it into a master workbook. All the workbooks are located in a folder: C:\Users\f65651\data transfer. The copied data should be merged into one and then overwritten into the Master wkbk cells. Subsequently also, data from updated workbooks should be overwritten in the Master wkbk.
After some help, I have been able to incorporate all the excel workbooks together
import openpyxl as xl
import os
path1 ='C:\\Users\\f65651\Rresult.xlsx' #Master workbook
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
#iterating over the workbooks
for filename in os.listdir(directory):
if filename.endswith(".xlsx"):
g= os.path.join(directory, filename)
f =xl.load_workbook(filename=g)
f1 = f.worksheets[0]
print (filename, f1)
for row in f1:
values=[cell.value for cell in row]
ws1.append(values)
wb1.save(path1)
print ('Process finished!')
However with this code above, the data is appended under the Master wkbk existing table format instead of being overwritten directly into the cells
I have tried fixing this issue but i dont know how. I feel i am not doing the copying of the workbooks into the Master wkbk right. I also dont want to lose the formatting in the Master sheet. Please help!
For better understanding of the problem, I have attached a snippet of what i am trying to achieve, Data 1&2 are examples of the workbks and the Result file is the master sheet.
https://i.stack.imgur.com/0G4lM.png
from openpyxl import load_workbook
import os
directory = "workbooks"
master = Workbook()
master_sheet = master.active
master_sheet.title = "master_sheet"
for filename in os.listdir(directory):
if filename.endswith(".xlsx"):
file_path = os.path.join(directory, filename)
sheet = load_workbook(file_path).active
# Read each column's value of each excel sheet starting from row 3
for index, row in enumerate(sheet.iter_rows()):
if (index <= 1):
for cell in row:
master_sheet[cell.coordinate].value = cell.value
else:
row_dict = {cell.coordinate[:1]:cell.value for cell in row}
master_sheet.append(row_dict)
master.save("sheet3.xlsx")
I have an optimization problem that runs in a for loop. I want the results of each new iteration to be saved in a different tab in the same workbook.
This is what I'm doing. Instead of giving me multiple tabs in the same workbook, I'm getting multiple workbooks.
from openpyxl import Workbook
wb1 = Workbook()
for i in range(n):
ws = wb1.active()
ws.title = str(i)
#code on formatting sheet, optimization problem
wb1.save('outfile'+str(i)+'.xlsx')
Every iteration you are grabbing the same worksheet - ws = wb1.active() - and then simply saving your results to a different workbook.
You simply need to create a new sheet on each iteration. Something like this:
from openpyxl import Workbook
wb1 = Workbook()
for i in range(n):
ws = wb1.create_sheet("run " + str(i))
#code on formatting sheet, optimization problem
wb1.save('outfile.xlsx')
Notice that the save is indented out to simply save the file once all worksheets have been formatted. It is not necessary to save on each iteration. The saving operation can take time, especially when adding more tabs.
This code will create Excel Workbook containing worksheets same as the number of strings in a text file taken as the input. Here i have a text file named 'sample.txt' having 3strings. This code will so create 3 worksheets in a workbook named 'reformatted.data.xls'.
Also i have removed the default worksheets that get created automatically when the workbook object is created.
import xlwt
from openpyxl import Workbook
wb1 = Workbook()
row = 0
f = open('C:\Desktop\Mytestcases\sample.txt')
lines = f.readlines()
for i in range(len(lines)):
ws = wb1.create_sheet("worksheet" + str(i))
ws.cell(row=1, column=1).value = lines[i]
row += 1
sheet = wb1.get_sheet_by_name('Sheet')
wb1.remove_sheet(sheet)
wb1.save('reformatted.data.xls')
I have 6 work sheets in my workbook. I want to copy data (all used cells except the header) from 5 worksheets and paste them into the 1st. Snippet of code that applies:
`
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(mergedXL)
wsSIR = wb.Sheets(1)
sheetList = wb.Sheets
for ws in sheetList:
used = ws.UsedRange
if ws.Name != "1st sheet":
print ("Copying cells from "+ws.Name)
used.Copy()
`
used.Copy() will copy ALL used cells, however I don't want the first row from any of the worksheets. I want to be able to copy from each sheet and paste it into the first blank row in the 1st sheet. So when cells from the first sheet (that is NOT the sheet I want to copy to) are pasted in the 1st sheet, they will be pasted starting in A3. Every subsequent paste needs to happen in the first available blank row. I probably haven't done a great job of explaining this, but would love some help. Haven't worked with win32com a ton.
I also have this code from one of my old scripts, but I don't understand exactly how it's copying stuff and how I can modify it to work for me this time around:
ws.Range(ws.Cells(1,1),ws.Cells(ws.UsedRange.Rows.Count,ws.UsedRange.Columns.Count)).Copy()
wsNew.Paste(wsNew.Cells(wsNew.UsedRange.Rows.Count,1))
If I understand well your problem, I think this code will do the job:
import win32com.client
# create an instance of Excel
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
# Open the workbook
file_name = 'path_to_your\file.xlsx'
wb = excel.Workbooks.Open(file_name)
# Select the first sheet on which you want to write your data from the other sheets
ws_paste = wb.Sheets('Sheet1')
# Loop over all the sheets
for ws in wb.Sheets:
if ws.Name != 'Sheet1': # Not the first sheet
used_range = ws.UsedRange.SpecialCells(11) # 11 = xlCellTypeLastCell from VBA Range.SpecialCells Method
# With used_range.Row and used_range.Col you get the number of row and col in your range
# Copy the Range from the cell A2 to the last row/col
ws.Range("A2", ws.Cells(used_range.Row, used_range.Column)).Copy()
# Get the last row used in your first sheet
# NOTE: +1 to go to the next line to not overlapse
row_copy = ws_paste.UsedRange.SpecialCells(11).Row + 1
# Paste on the first sheet starting the first empty row and column A(1)
ws_paste.Paste(ws_paste.Cells(row_copy, 1))
# Save and close the workbook
wb.Save()
wb.Close()
# Quit excel instance
excel.Quit()
I hope it helps you to understand your old code as well.
Have you considered using pandas?
import pandas as pd
# create list of panda dataframes for each sheet (data starts ar E6
dfs=[pd.read_excel("source.xlsx",sheet_name=n,skiprows=5,usecols="E:J") for n in range(0,4)]
# concatenate the dataframes
df=pd.concat(dfs)
# write the dataframe to another spreadsheet
writer = pd.ExcelWriter('merged.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
I need to change the sheet in an excel workbook, as many times as the code runs..Suppose my python scripts runs the first time and data gets saved in sheet A, next time when some application runs my script data should be saved in sheet B.Sheet A should be as it is in that workbook..
Is it posible ? If yes ,How?
Here is my code:
#!/usr/bin/env python
import subprocess
import xlwt
process=subprocess.Popen('Test_Project.exe',stdout=subprocess.PIPE)
out,err = process.communicate()
wb=xlwt.Workbook()
sheet=wb.add_sheet('Sheet_A') #next time it should save in Sheet_B
row = 0
for line in out.split('\n'):
for i,wrd in enumerate(line.split()):
if not wrd.startswith("***"):
print wrd
sheet.write(row,i,wrd)
row=row+1
wb.save('DDS.xls')
Any help is appreciated...
I would recommend using openpyxl. It can read and write xlsx files.
If needed, you can always convert them to xls with Excel or Open/LibreOffice,
assuming you have only one big file at the end.
This script creates a new Excel file if none exists and adds a new sheet every time it is run. I use the index + 1 as the sheet name (title) starting with 1. The numerical index starts at 0. You will end up with a file that has sheets named 1, 2, 3 etc. Every time you write your data into the last sheet.
import os
from openpyxl import Workbook
from openpyxl.reader.excel import load_workbook
file_name = 'test.xlsx'
if os.path.exists(file_name):
wb = load_workbook(file_name)
last_sheet = wb.worksheets[-1]
index = int(last_sheet.title)
ws = wb.create_sheet(index)
ws.title = str(index + 1)
else:
wb = Workbook()
ws = wb.worksheets[0]
ws.title = '1'
ws.cell('A2').value= 'new_value'
wb.save(file_name)