Copy Data from 1 Excel into another Excel - python

I am very new in the programming space and tried a lot of things to solve my problem.
It feels like I exactly knew what I have to do, but I can not communicate it in Python.
I have two Excel sheets: wb 1 and wb2.
wb1 is a self updating sheet from some downloaded CSV data with one table, and extending number of rows on and fix number of columns.
I want to insert the date from wb1 (column A-J) and the max. number of rows into wb2 (sheet 3) sheet 3 is a huge table with a lot of data but just the specific part out of wb1 (same positioning as in the wb1 file) needs to be updated.
So I basically want to copy a part out of an workbook and insert int into a specific position into another workbook.
The data in workbook2 is formatted as table so I guess I haive to format the date in workbook1 as table as well before I can copy it. I tried it so far with openpyxl what seems to work great but I can't specify the location in wb2.
I knew this is a problem what can be solved within seconds but I am really in a dead end right now.
Workbook1 = "download.xlsx"
Workbook2 = "Git.xlsx"
wb1 = xl.load_workbook(filename = Workbook1)
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook(filename = Workbook2)
ws2 = wb2.get_sheet_by_name("Table3")
?
wb2.save(path2)

First, you've to read all the data from the first sheet which download.xlsx, then appened the output value into a list of tuples, after that write the values using the same package openpyxl into the second file Git.xlsx.
I will not be able to write the whole code because my phone battery is running off. So, i will try to share with you an important links you can follow:-
Reading using openpyxl
Writing using openpyxl
Lists in Python
Python List of tuples

Related

How to modify xml tables inside excel with openpyxl?

I'm trying fo fill a table (inside an xlsx template for Dynamics NAV) with openpyxl, but when I open the file with Excel it promts an alert: “We found a problem with some content in <Excel_filename>. Do you want us to try recovering the file as much as we can? If you trust the source of this workbook, then click Yes”
Then Excel 'repairs' the file and I can still see the data but the table /xl/tables/table1.xml is gone, and Navision can't accept the file.
This is my code in python:
import openpyxl
wb = openpyxl.load_workbook("data_source.xlsx", data_only=True)
sheet1 = wb.active
wb2 = openpyxl.load_workbook('template.xlsx')
sheet2 = wb2.active
filas = sheet1.max_row
for fila in range(3,filas):
sheet2["A"+ str(fila)] = sheet1["A"+ str(fila)].value
sheet2["B"+ str(fila)] = sheet1["B"+ str(fila)].value
sheet2["C"+ str(fila)] = "FRA"
sheet2["D"+ str(fila)] = "NAC"
wb2.save('tax1.xlsx')
wb2.close()
When I create a table from zero with the code they show in the openpyxl official site:
https://openpyxl.readthedocs.io/en/latest/worksheet_tables.html#creating-a-table
it works fine only if the table starts from row one (ref="A1:E5").
...but this template has a table that starts from row 3!
So when I try to make the table I need (ref="A3:D6") I get this: 'UserWarning: File may not be readable: column headings must be strings.' and as expected, I get the same alert and the same result when I open it with Excel.
Is there a way to modify/fill a table without corrupting the xlsx file?
or, like a workaround
Is there a way to create a table from A3 with no errors?
Thanks in advance

How to change sheet names without losing graph reference with openpyxl

I'm trying to write some code that will change sheet names in an excel file based on the data in another excel file.
At first this worked fine;
Sheet_names = "A","B","C","D"
wb = openpyxl.load_workbook(file_name.xlsx)
for i in range (1, len(Sheet_names)):
Name_change = wb["Sheet{}".format(str(i))]
wb.active = Name_change
Name_change.title = "{}".format(Sheet_names[i])
wb.save((file_name.xlsx))
wb.close()
But for some reason (I'm unsure what I've changed) a graph within the excel file isn't updating. So the data reference is still to Sheet1, Sheet2 etc.
Im also getting a warning message that there are external links - I guess it's assuming the sheet references are external? The excel file comes from a template that's copied across.
Changing it manually isn't an option, Is having Openpyxl recreate the graph for every sheet the only option?
Out of ideas, help!

python : Get Active Sheet in xlrd? and help for reading and validating excel file in Python

2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.
The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)
Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.
You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.

Copy columns from workbook, paste in second sheet of second workbook, openPyXL

I'm new to openpyxl and am developing a tool that requires copying & pasting columns.
I have a folder containing two sets of excel files. I need the script to iterate through the files, find the ones that are named "GenLU_xx" (xx represents name of a place such as Calgary) and copy Columns C & E (3 & 5). It then needs to find the corresponding file which is named as "LU_Summary_xx" (xx again represents name of place such as Calgary) and paste the copied columns to the second sheet of that workbook. It needs to match GenLU_Calgary with LUZ_Summary_Calgary and so forth for all the files. So far I have not been able to figure out code for copying and pasting columns and the seemingly double iteration is confusing me. My python skills are beginner although I'm usually able to figure out code by looking at examples. In this case I'm having some trouble locating sample code. Just started using openpyxl. I have completed the script except for the parts pertaining to excel. Hopefully someone can help out. Any help would be much appreciated!
EDIT: New to StackOverflow as well so not sure why I got -2. Maybe due to lack of any code?
Here is what I have so far:
import os, openpyxl, glob
from openpyxl import Workbook
Tables = r"path"
os.chdir(Tables)
for file in glob.glob ("LUZ*"):
wb = openpyxl.load_workbook(file)
ws = wb.active
ws ["G1"] = "GEN_LU_ZN"
wb.create_sheet(title="Sheet2")
wb.save(file)
This just adds a value to G1 of every file starting with LUZ and creates a second sheet.
As I mentioned previously, I have yet to even figure out the code for copying the values of an entire column.
I am thinking I could iterate through all files starting with "GenLU*" using glob and then store the values of Columns 3 & 5 but I'm still having trouble figuring out how to access values for columns. I do not have a range of rows as each workbook will have a different number of rows for the two columns.
EDIT 2: I am able to access cell values for a particular column using this code:
for file in glob.glob ("GenLU_Airdrie*"):
wb = openpyxl.load_workbook(file, use_iterators=True)
ws = wb.active
for row in ws.iter_rows ('C1:C200'):
for cell in row:
values = cell.value
print values
However I'm not sure how I would go about 'pasting' these values in column A of the other sheet.
Charlie's code worked for me by changing 'col=4' to 'column=4' using openpyxl-2.3.3
ws.cell(row=idx, column=4).value = cell.value
If you really do want to work with columns then you can use the .columns property when reading files.
To copy the values from one sheet to another you just assign them. The following will copy the value of A1 from one worksheet to another.
ws1 = wb1.active
ws2 = wb2.active
ws2['A1'] = ws1['A1'].value
To copy column D code could look something like this
col_d = ws1.columns[3] # 0-indexing
for idx, cell in enumerate(col_d, 1):
ws.cell(row=idx, col=4).value = cell.value #1-indexing

xlsxwriter: is there a way to open an existing worksheet in my workbook?

I'm able to open my pre-existing workbook, but I don't see any way to open pre-existing worksheets within that workbook. Is there any way to do this?
You cannot append to an existing xlsx file with xlsxwriter.
There is a module called openpyxl which allows you to read and write to preexisting excel file, but I am sure that the method to do so involves reading from the excel file, storing all the information somehow (database or arrays), and then rewriting when you call workbook.close() which will then write all of the information to your xlsx file.
Similarly, you can use a method of your own to "append" to xlsx documents. I recently had to append to a xlsx file because I had a lot of different tests in which I had GPS data coming in to a main worksheet, and then I had to append a new sheet each time a test started as well. The only way I could get around this without openpyxl was to read the excel file with xlrd and then run through the rows and columns...
i.e.
cells = []
for row in range(sheet.nrows):
cells.append([])
for col in range(sheet.ncols):
cells[row].append(workbook.cell(row, col).value)
You don't need arrays, though. For example, this works perfectly fine:
import xlrd
import xlsxwriter
from os.path import expanduser
home = expanduser("~")
# this writes test data to an excel file
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
sheet1 = wb.add_worksheet()
for row in range(10):
for col in range(20):
sheet1.write(row, col, "test ({}, {})".format(row, col))
wb.close()
# open the file for reading
wbRD = xlrd.open_workbook("{}/Desktop/test.xlsx".format(home))
sheets = wbRD.sheets()
# open the same file for writing (just don't write yet)
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
# run through the sheets and store sheets in workbook
# this still doesn't write to the file yet
for sheet in sheets: # write data from old file
newSheet = wb.add_worksheet(sheet.name)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
newSheet.write(row, col, sheet.cell(row, col).value)
for row in range(10, 20): # write NEW data
for col in range(20):
newSheet.write(row, col, "test ({}, {})".format(row, col))
wb.close() # THIS writes
However, I found that it was easier to read the data and store into a 2-dimensional array because I was manipulating the data and was receiving input over and over again and did not want to write to the excel file until it the test was over (which you could just as easily do with xlsxwriter since that is probably what they do anyway until you call .close()).
After searching a bit about the method to open the existing sheet in xlxs, I discovered
existingWorksheet = wb.get_worksheet_by_name('Your Worksheet name goes here...')
existingWorksheet.write_row(0,0,'xyz')
You can now append/write any data to the open worksheet.
You can use the workbook.get_worksheet_by_name() feature:
https://xlsxwriter.readthedocs.io/workbook.html#get_worksheet_by_name
According to https://xlsxwriter.readthedocs.io/changes.html the feature has been added on May 13, 2016.
"Release 0.8.7 - May 13 2016
-Fix for issue when inserting read-only images on Windows. Issue #352.
-Added get_worksheet_by_name() method to allow the retrieval of a worksheet from a workbook via its name.
-Fixed issue where internal file creation and modification dates were in the local timezone instead of UTC."
Although it is mentioned in the last two answers with it's documentation link, and from the documentation it seems indeed there are new methods to work with the "worksheets", I couldn't able to find this methods in the latest package of "xlsxwriter==3.0.3"
"xlrd" has removed support for anything other than xls files now.
Hence I was able to workout with "openpyxl" this gives you the expected functionality as mentioned in the first answer above.

Categories

Resources