trouble copiyng a xlsx file to another using python - python

So This might looks silly to some of you but I am new at python so i don't quite know what is happening,
I need to delet the first column and the first 7 rows of a excel sheet, after looking it up I found here on this website that open another file and coping only what I needed would be easier, so I tried something like this
import openpyxl
#File to be copied
wb = openpyxl.load_workbook(r"C:\Users\gb2gaet\Nova pasta\old.xlsx") #Add file name
sheet = wb["Sheet1"]#Add Sheet name
#File to be pasted into
template = openpyxl.load_workbook(r"C:\Users\gb2gaet\Nova pasta\new.xlsx") #Add file name
temp_sheet = wb["Sheet1"] #Add Sheet name
#Takes: start cell, end cell, and sheet you want to copy from.
def copyRange(startCol, startRow, endCol, endRow, sheet):
rangeSelected = []
#Loops through selected Rows
for i in range(startRow,endRow + 1,1):
#Appends the row to a RowSelected list
rowSelected = []
for j in range(startCol,endCol+1,1):
rowSelected.append(sheet.cell(row = i, column = j).value)
#Adds the RowSelected List and nests inside the rangeSelected
rangeSelected.append(rowSelected)
return rangeSelected
#Paste data from copyRange into template sheet
def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving, copiedData):
countRow = 0
for i in range(startRow,endRow+1,1):
countCol = 0
for j in range(startCol,endCol+1,1):
sheetReceiving.cell(row = i, column = j).value = copiedData[countRow][countCol]
countCol += 1
countRow += 1
def createData():
print("Processing...")
selectedRange = copyRange(2,8,17,100000,sheet)
pasteRange(1,1,16,100000,temp_sheet,selectedRange)
wb.save("new.xlsx")
print("Range copied and pasted!")
the program runs without any error but when I look into the new table it is completely empty, what am I missing?
If you guys can think of any easier solution to delete the rows and columns I am open to change all the code though

I'd recommend doing this through pandas. Import the excel file into a data frame with the pandas.read_excel() function, then use the dataframe.drop() function to drop the columns and rows you want, then export the dataframe to a new excel file with the to_excel() function.
Code would look something like this:
import pandas as pd
df = pd.read_excel(r"C:\Users\gb2gaet\Nova pasta\old.xlsx")
#careful with how this imports different sheets. If you have multiple,
#it will basically import the excel file as a dictionary of dataframes
#where each key-value pair corresponds to one sheet.
df = df.drop(columns = <columns you want removed>)
df.to_excel('new.xlsx')
#this will save the new file in the same place as your python script
Here is some documentation on those functions:
read_excel(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html?highlight=read_excel
Drop(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
to_excel(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html?highlight=to_excel#pandas.DataFrame.to_excel

Related

Concatenating 2 Excel Sheets. - It finds an error and doesn't skip columns when none value?

Okay, it is a complecated sheet.
Files = list with two path names to two excel sheets.
I want to attach the first excel sheet to the next one.
It has 500+ columns.
But the end result is strange, instead of filling in None in a lot of columns it makes the new rows (200+), just 15 columns with every value that is not None.
def concatenateOBMs(files):
# Get first file as the main file
wb_main = load_workbook(files[0])
# Go Through all the files that are selected and use it as a workbook to copy to the original sheet.
for i in range(1, len(files)):
wb = load_workbook(files[i])
# Make the new sheet active
ws = wb.active
# Count how many telephones are in the new sheet
row_count = ws.max_row
n = (row_count - 8) / 48
# Count at which row we should start to write
index_row = 56 + ((n-1)*48)
# Select Range
n1 = '9'
n2 = index_row
rng = ws[n1:'200']
# Loop through all the rows
for cells in rng:
index = 1
index_row +=1
# Loop through all the values per row
for cell in cells:
index +=1
# Add the values to the right position in the sheet
ws2 = wb_main.active
ws2.cell(row = index_row, column = index).value = cell.value
wb_main.save(filename = 'Complete' + '.xlsx')
When I open, complete.xlsx it says: Repaired Part: /xl/worksheets/sheet1.xml part with XML error. Load error. Line 1, column 0.
And the new rows look like the following:
enter image description here

Is there a way to export a list of 100+ dataframes to excel?

So this is kind of weird but I'm new to Python and I'm committed to seeing my first project with Python through to the end.
So I am reading about 100 .xlsx files in from a file path. I then trim each file and send only the important information to a list, as an individual and unique dataframe. So now I have a list of 100 unique dataframes, but iterating through the list and writing to excel just overwrites the data in the file. I want to append the end of the .xlsx file. The biggest catch to all of this is, I can only use Excel 2010, I do not have any other version of the application. So the openpyxl library seems to have some interesting stuff, I've tried something like this:
from openpyxl.utils.dataframe import dataframe_to_rows
wb = load_workbook(outfile_path)
ws = wb.active
for frame in main_df_list:
for r in dataframe_to_rows(frame, index = True, header = True):
ws.append(r)
Note: In another post I was told it's not best practice to read dataframes line by line using loops, but when I started I didn't know that. I am however committed to this monstrosity.
Edit after reading Comments
So my code scrapes .xlsx files and stores specific data based on a keyword comparison into dataframes. These dataframes are stored in a list, I will list the entirety of the program below so hopefully I can explain what's in my head. Also, feel free to roast my code because I have no idea what is actual good python practices vs. not.
import os
import pandas as pd
from openpyxl import load_workbook
#the file path I want to pull from
in_path = r'W:\R1_Manufacturing\Parts List Project\Tool_scraping\Excel'
#the file path where row search items are stored
search_parameters = r'W:\R1_Manufacturing\Parts List Project\search_params.xlsx'
#the file I will write the dataframes to
outfile_path = r'W:\R1_Manufacturing\Parts List Project\xlsx_reader.xlsx'
#establishing my list that I will store looped data into
file_list = []
main_df = []
master_list = []
#open the file path to store the directory in files
files = os.listdir(in_path)
#database with terms that I want to track
search = pd.read_excel(search_parameters)
search_size = search.index
#searching only for files that end with .xlsx
for file in files:
if file.endswith('.xlsx'):
file_list.append(in_path + '/' + file)
#read in the files to a dataframe, main loop the files will be maninpulated in
for current_file in file_list:
df = pd.read_excel(current_file)
#get columns headers and a range for total rows
columns = df.columns
total_rows = df.index
#adding to store where headers are stored in DF
row_list = []
column_list = []
header_list = []
for name in columns:
for number in total_rows:
cell = df.at[number, name]
if isinstance(cell, str) == False:
continue
elif cell == '':
continue
for place in search_size:
search_loop = search.at[place, 'Parameters']
#main compare, if str and matches search params, then do...
if insensitive_compare(search_loop, cell) == True:
if cell not in header_list:
header_list.append(df.at[number, name]) #store data headers
row_list.append(number) #store row number where it is in that data frame
column_list.append(name) #store column number where it is in that data frame
else:
continue
else:
continue
for thing in column_list:
df = pd.concat([df, pd.DataFrame(0, columns=[thing], index = range(2))], ignore_index = True)
#turns the dataframe into a set of booleans where its true if
#theres something there
na_finder = df.notna()
#create a new dataframe to write the output to
outdf = pd.DataFrame(columns = header_list)
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
#I turn the dataframe into booleans and read until False
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
#Store actual dataframe into my output dataframe, outdf
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
So main_df is a list that has 100+ dataframes in it. For this example I will only use 2 of them. I would like them to print out into excel like:
So the comment from Ashish really helped me, all of the dataframes had different column titles so my 100+ dataframes eventually concat'd to a dataframe that is 569X52. Here is the code that I used, I completely abandoned openpyxl because once I was able to concat all of the dataframes together, I just had to export it using pandas:
# what I want to do here is grab all the data in the same column as each
# header, then move to the next column
for i in range(len(row_list)):
k = 0
while na_finder.at[row_list[i] + k, column_list[i]] == True:
if(df.at[row_list[i] + k, column_list[i]] not in header_list):
outdf.at[k, header_list[i]] = df.at[row_list[i] + k, column_list[i]]
k += 1
main_df.append(outdf)
to_xlsx_df = pd.DataFrame()
for frame in main_df:
to_xlsx_df = pd.concat([to_xlsx_df, frame])
to_xlsx_df.to_excel(outfile_path)
The output to excel ended up looking something like this:
Hopefully this can help someone else out too.

Getting staircase output from openpyxl when importing data

I'm trying to import data from multiple sheets to another in excel, and in order to do this I need python to input the data into the first empty cell, instead of overwriting the data from the last file. It seems to almost work, however, each column is jumping to its "own" empty row, and not staying in the correct row with the rest of its matching data, creating a staircase type pattern.
This is my code
import os
import openpyxl
os.chdir('C:\\Users\\XX\\Desktop')
wb1 = openpyxl.load_workbook('Test file python.xlsx', data_only = True) #open source excel file
ws1 = wb1.worksheets[0]
wb2 = openpyxl.load_workbook('test3.xlsx', data_only = True) #destination excel file
ws2 = wb2.active
#row_offset = ws2.max_row + 1
for i in range(10,150):
for j in range(3,13):
c = ws1.cell(row = i, column = j)
rowOffset = ws2.max_row + 1
rowNum = rowOffset
ws2.cell(row = rowNum, column = j-2).value = c.value
wb2.save('test3.xlsx')
Here is a screenshot of the output in excel Staircase output
You are changing ws2.max_row each time you put something in ws2 (i.e. - ws2.cell(row = rowNum, column = j-2).value = c.value) your max_row goes up by one affecting the entire loop creating that effect.
use current_row = ws2.max_row outside of the nested loop and it should fix your "staircase" issue.
Also, mind that when you run in the first iteration max_row == 1 that is why your sheet starts at row 2 and not at row 1.

Copying data from Excel workbook to another workbook, specific rows and columns need to be selected

I have been able to open up the workbook and save it, but I can't seem to copy and paste specific rows and columns. I would like to be able to use this for multiple sheets and append the data to data as the rows grow with.
The final product I would like to select multiple Excel files and copy specific rows and columns then append each to one single Excel workbook. Since I now have to go through 20 workbooks and copy and paste it all to one single workbook.
I've tried a couple of different methods and searched on forums. I can only get to copy and paste sheets.
import openpyxl
#Prepare the spreadsheets to copy from and paste too.
#File to load
wb = openpyxl.load_workbook("Test_Book.xlsx")
# Get a sheet by name
sheet = wb['Sheet1']
#File to be pasted into
template = openpyxl.load_workbook("Copy of Test_Book.xlsx") #Add file
name
temp_sheet = template['Sheet1'] #Add Sheet name
#Copy range of cells as a nested list
#Takes: start cell, end cell, and sheet you want to copy from.
def copyRange(startCol, startRow, endCol, endRow, sheet):
rangeSelected = []
#Loops through selected Rows
#A 8 to BC 27
for i in range(startRow,endRow + 1,1):
#Appends the row to a RowSelected list
rowSelected = []
for j in range(startCol,endCol+ 1,1):
rowSelected.append(sheet.cell(row = i, column = j).value)
#Adds the RowSelected List and nests inside the rangeSelected
rangeSelected.append(rowSelected)
return rangeSelected
#Paste range
#Paste data from copyRange into template sheet
def pasteRange(startCol, startRow, endCol, endRow,
sheetReceiving,copiedData):
countRow = 0
for i in range(startRow,endRow+1,1):
countCol = 0
for j in range(startCol,endCol+1,1):
sheetReceiving.cell(row = i, column = j).value =
copiedData[countRow][countCol]
countCol += 1
countRow += 1
def createData():
print("Processing...")
selectedRange = copyRange(1,2,4,14,sheet)
pasteRange(1,2,4,14,temp_sheet,selectedRange)
template.save("Copy of Test_Book.xlsx")
print("Range copied and pasted!")
You can specify which row or column you want to loop through in your worksheet object.
import openpyxl
wb = openpyxl.load_workbook("your_excel_file")
ws = wb.active
some_column = [cell.value for cell in ws["A"]] # Change A to whichever column you want
some_row = [cell.value for cell in ws["1"]] # Change 1 to whichever row you want
You can then append the whole column/row to your new worksheet.

Python/Pandas copy and paste from excel sheet

I found this syntax to copy and paste from one workbook specific sheet to another workbook. however, what i need help with is how to paste the copied information to a specific cell in the second workbook/sheet. like i need to information to be pasted in cell B3 instead of A1.
Thank you
import openpyxl as xl
path1 = "C:/Users/almur_000/Desktop/disandpopbyage.xlsx"
path2 = "C:/Users/almur_000/Desktop/disandpopbyage2.xlsx"
wb1 = xl.load_workbook(filename=path1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename=path2)
ws2 = wb2.create_sheet(ws1.title)
for row in ws1:
for cell in row:
ws2[cell.coordinate].value = cell.value
wb2.save(path2)
wb2 is path2 "C:/Users/almur_000/Desktop/disandpopbyage2.xlsx"
Since the OP is using the openpyxl module I wanted to show a way to do this using that module. With this answer I demonstrate a way to move the original data to new column and row coordinates (there may be better ways to do this).
This fully reproducible example first creates a workbook for demonstration purposes called 'test.xlsx', with three sheets named 'test_1', 'test_2' and 'test_3'. Then using openpyxl, it copies 'test_2' into a new workbook called 'new.xlsx' shifting the cells over 4 columns and down 3 columns. It makes use of the ord() and chr() functions.
import pandas as pd
import numpy as np
import openpyxl
# This section is sample code that creates a worbook in the current directory with 3 worksheets
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='test_1', index=False)
df.to_excel(writer, sheet_name='test_2', index=False)
df.to_excel(writer, sheet_name='test_3', index=False)
wb = writer.book
ws = writer.sheets['test_2']
writer.close()
# End of sample code that creates a worbook in the current directory with 3 worksheets
wb = openpyxl.load_workbook('test.xlsx')
ws_name_wanted = "test_2"
list_all_ws = wb.get_sheet_names()
for item in list_all_ws:
if item != ws_name_wanted:
remove = wb.get_sheet_by_name(item)
wb.remove_sheet(remove)
ws = wb['%s' % (ws_name_wanted)]
for row in ws.iter_rows():
for cell in row:
cell_value = cell.value
new_col_loc = (chr(int(ord(cell.coordinate[0:1])) + 4))
new_row_loc = cell.coordinate[1:]
ws['%s%d' % (new_col_loc ,int(new_row_loc) + 3)] = cell_value
ws['%s' % (cell.coordinate)] = ' '
wb.save("new.xlsx")
Here's what 'test.xlsx' looks like:
And here's what 'new.xlsx' looks like:
thank you for those helping me.
I found the answer with slight modification. I have removed the last def statement and kept every thing else as it is. it works fantastically. copy and paste in the place i need without removing anything from the template.
`#! Python 3
- Copy and Paste Ranges using OpenPyXl library
import openpyxl
#Prepare the spreadsheets to copy from and paste too.
#File to be copied
wb = openpyxl.load_workbook("foo.xlsx") #Add file name
sheet = wb.get_sheet_by_name("foo") #Add Sheet name
#File to be pasted into
template = openpyxl.load_workbook("foo2.xlsx") #Add file name
temp_sheet = template.get_sheet_by_name("foo2") #Add Sheet name
#Copy range of cells as a nested list
#Takes: start cell, end cell, and sheet you want to copy from.
def copyRange(startCol, startRow, endCol, endRow, sheet):
rangeSelected = []
#Loops through selected Rows
for i in range(startRow,endRow + 1,1):
#Appends the row to a RowSelected list
rowSelected = []
for j in range(startCol,endCol+1,1):
rowSelected.append(sheet.cell(row = i, column = j).value)
#Adds the RowSelected List and nests inside the rangeSelected
rangeSelected.append(rowSelected)
return rangeSelected
#Paste range
#Paste data from copyRange into template sheet
def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving,copiedData):
countRow = 0
for i in range(startRow,endRow+1,1):
countCol = 0
for j in range(startCol,endCol+1,1):
sheetReceiving.cell(row = i, column = j).value = copiedData[countRow][countCol]
countCol += 1
countRow += 1
def createData():
print("Processing...")
selectedRange = copyRange(1,2,4,14,sheet) #Change the 4 number values
pastingRange = pasteRange(1,3,4,15,temp_sheet,selectedRange) #Change the 4 number values
#You can save the template as another file to create a new file here too.s
template.save("foo.xlsx")
print("Range copied and pasted!")`
To copy paste the entire sheet from work book to another.
import pandas as pd
#change NameOfTheSheet with the sheet name that includes the data
data = pd.read_excel(path1, sheet_name="NameOfTheSheet")
#save it to the 'NewSheet' in destfile
data.to_excel(path2, sheet_name='NewSheet')

Categories

Resources