results to be written in excel column B - python

The below code currently translate each words from the Excel sheet for the words location in Column A, But the results it currently gives me in the editor but I want the translated output/result in the same excel sheet in Column B. The below code gives me an error.
please help me with the code for the results to be written in excel in column B.
import xlrd
import goslate
loc = r"C:\path\fruits.xlsx"
gs = goslate.Goslate()
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
for i in range(sheet.nrows):
print(gs.translate(sheet.cell_value(i, 0), 'de'))
print(sheet.cell_value(i, 1)
I am receiving the below error
return self._cell_values[rowx][colx]
IndexError: list index out of range
Please someone help me to write my output/result in the same excel in Column B

The error is caused because you don't have a B is empty and package does not read the column from sheet.
To write the translated result to the sheet, you can do something like this:
I don't think xlrd can write to sheet. You will need to use xlwt package. You will need to install it pip install xlwt
import xlrd
import xlwt # this package is going to write to sheet
import goslate
loc = "dummy.xlsx"
translated = "dummy2.xlsx" # location to where store the modified sheet
gs = goslate.Goslate()
# crate a workbook using xlwt package in order to write to it.
wbt = xlwt.Workbook() # there is a typo here. this should be wbt
ws = wbt.add_sheet('A Test Sheet') # change this to your sheet name
rwb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
for i in range(sheet.nrows):
ws.write(i, 0, sheet.cell_value(i, 0)) # this will write the A column value
ws.write(i, 1, gs.translate(sheet.cell_value(i, 0), 'de')) # this will write the B column value
wbt.save(translated) # this will save the sheet.
As for making changes in the same file, I my opinion you should not do that. The file is already opend by another process in read mode. Changing it can result in unexpected behavior. But if you intent to do that, backup your file, and set the loc for both when reading and when saving file.

You're getting the error because xlrd addresses columns are zero based and per documentation the xlrd ignores cells with no data.
so you could access column A by doing
sheet.cell_value(i, 0)
and write to column B by doing
sheet._cell_types[i][1] = xlrd.XL_CELL_TEXT
sheet._cell_values[i][1] = source
however xlrd is only for reading, so you'd have to use xlwt to save any changes.
Saving changes brings up another issue, you're source file is ".xlsx" extension, while xlrd does read this format, xlwt only writes to the older ".xls" format.
To read and write to ".xlsx" format with one library you can use openpyxl, using this library your code would look like this:
import openpyxl
import goslate
loc = r"C:\path\fruits.xlsx"
gs = goslate.Goslate()
wb = openpyxl.load_workbook(loc)
sheet = wb.active
for i in range(2, sheet.max_row + 1):
original = sheet.cell(row=i, column=1).value
translated = gs.translate(original, 'de')
sheet.cell(row=i, column=2).value = translated
wb.save(loc)

Related

Can't see csv file (converted from df) in files

After saving my dataframe to a csv in a specific location, the csv file doesn't appear in the location I saved it to. Is there any reason why it possibly is not showing?
Here is the code to save my dataframe to csv:
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Even changing an empty df does not seem to work.
import pandas as pd
olympics={}
df = pd.DataFrame(olympics)
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Thanks for the help!
I would rather use the module openpyxl. Example of saving:
import openpyxl
workbook = openpyxl.Workbook()
sheet = workbook.active
# Work on your workbook. Once finished:
workbook.save(file_name) # file_name is a variable you must define
Don't forget installing openpyxl with pip first!

Excel Error when using pandas and openpyxl: Repaired Part: /xl/worksheets/sheet1.xml part with XML error. HRESULT 0x8000ffff Line 1, column 0

I'm coming up with an error of opening up an excel file after writing to it. This is what I have so far:
#locate source document
Path = Path(r'C:\Users\username\Test\EXCEL_Test.xlsx')
# open doc and go to active sheet
wb = load_workbook(filename = Path)
ws = wb.active
#add drop down list to each cell in a certain column
dv_v = DataValidation(type="list", formula1='"Y,N"', allow_blank=True)
for cell in ws['D']:
cell = ws.add_data_validation(dv_v)
wb.save(Path)
And these are the two errors that comes up on opening the excel file:
First error popup:
"We found a problem with some content in 'EXCEL_Test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes."
Second error popup:
"Repaired Part: /xl/worksheets/sheet1.xml part with XML error. HRESULT 0x8000ffff Line 1, column 0."
My data validation is not showing up, and the file has the above errors when attempting to open the file to view the openpyxl changes.
Maybe if someone can help me find out why these errors are popping up? Python finishes with exit code 0, and why the data validation is coming up as blanks in the recovered file?
I think you are using the ws.add_data_validation(dv) incorrectly. The data validations get assigned to the dv first then the dv gets added to the cell.
Try doing it like this.
import openpyxl
from openpyxl import Workbook
from openpyxl.worksheet.datavalidation import DataValidation
#locate source document
Path = 'C:/Users/username/test/Excel_test.xlsx'
# open doc and go to active sheet
wb = openpyxl.load_workbook(filename = Path)
ws = wb['Sheet1']
#add drop down list to each cell in a certain column
dv = DataValidation(type="list", formula1='"Y,N"', allow_blank=True)
ws.add_data_validation(dv)
# This is the same as for the whole of column D
dv.add('D1:D1048576')
wb.save(Path)
Take a look at the Docs here: https://openpyxl.readthedocs.io/en/stable/validation.html

How do i read from excel into python at runtime?

Here is a sample code where i am trying to print the number of rows in the excel file each time a new row is inserted.The code does not work ,because i believe it's not interacting with the excel file at run time.
import xlrd
loc = r'C:\Users\dell\Desktop\sample2.xlsx'
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
k = sheet.nrows
while(True):
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
k1 = sheet.nrows
if(k1 > k):
print(k1)
k=k1
I think you misunderstand how the library xlrd works. It provides you with an interface to Excel files, not to an Excel session of an Excel instance somebody is working on in parallel. Everything you do in Excel is not written to the according file until you save the workbook. Hence, this is the moment when your code reads updated cells, not already when cells are changed.

Overwriting existing cells in an XLSX file using Python

I am trying to find a library that overwrites an existing cell to change its contents using Python.
what I want to do:
read from .xlsx file
compare cell data determine if change is needed.
change data in cell Eg. overwrite date in cell 'O2'
save file.
I have tried the following libraries:
xlsxwriter
combination of:
xlrd
xlwt
xlutils
openpyxl
xlsxwriter only writes to a new excel sheet and file.
combination: works to read from .xlsx but only writes to .xls
openpyxl: reads from existing file but doesn't write to existing cells can only create new rows and cells, or can create entire new workbook
Any suggestions would greatly be appreciated. Other libraries? how to manipulate the libraries above to overwrite data in an existing file?
from win32com.client import Dispatch
import os
xl = Dispatch("Excel.Application")
xl.Visible = True # otherwise excel is hidden
# newest excel does not accept forward slash in path
wbs_path = r'C:\path\to\a\bunch\of\workbooks'
for wbname in os.listdir(wbs_path):
if not wbname.endswith(".xlsx"):
continue
wb = xl.Workbooks.Open(wbs_path + '\\' + wbname)
sh = wb.Worksheets("name of sheet")
sh.Range("A1").Value = "some new value"
wb.Save()
wb.Close()
xl.Quit()
Alternatively you can use xlwing, which (if I had to guess) seems to be using this approach under the hood.
>>> import xlwings as xw
>>> wb = xw.Book() # this will create a new workbook
>>> wb = xw.Book('FileName.xlsx') # connect to an existing file in the current working directory
>>> wb = xw.Book(r'C:\path\to\file.xlsx') # on Windows: use raw strings to escape backslashes

Accessing worksheets using xlwt 'get_sheet' method

I would like to access worksheets of a spreadsheet. I've copied the main workbook to another workbook using xlutils.copy(). But don't know the right way to access worksheets using xlwt module.
My sample code:
import xlrd
import xlwt
from xlutils.copy import copy
wb1 = xlrd.open_workbook('workbook1.xls', formatting_info=True)
wb2 = copy(master_wb)
worksheet_name = 'XYZ' (worksheet_name is a iterative parameter)
worksheet = wb2.get_sheet(worksheet_name)
Could someone please tell me what's the right command line to access the existing worksheets in a workbook using xlwt module? I know we can use 'add_sheet' method to add a worksheet in the existing workbook using xlwt module.
Any help, appreciated.
You can do sheets = wb1.sheets() to get a list of sheet objects, then call .name on each to get their names. To find the index of your sheet, use
[s.name for s in sheets].index(sheetname)
The sheets() method is curiously absent from the xlwt.Workbook class, so the other answer using that method will not work - only xlrd.book (for reading XLS files) has a sheets() method. Because all the class attributes are private, you have to do something like this:
def get_sheet_by_name(book, name):
"""Get a sheet by name from xlwt.Workbook, a strangely missing method.
Returns None if no sheet with the given name is present.
"""
# Note, we have to use exceptions for flow control because the
# xlwt API is broken and gives us no other choice.
try:
for idx in itertools.count():
sheet = book.get_sheet(idx)
if sheet.name == name:
return sheet
except IndexError:
return None
If you don't need it to return None for a non-existent sheet then just remove the try/except block. If you want to access multiple sheets by name repeatedly it would be more efficient to put them in a dictionary, like this:
sheets = {}
try:
for idx in itertools.count():
sheet = book.get_sheet(idx)
sheets[sheet.name] = sheet
except IndexError:
pass
Well, here is my answer. Let me take it step-by-step.
Considerting previous answers, xlrd is the right module to get the worksheets.
xlrd.Book object is returned by open_workbook.
rb = open_workbook('sampleXLS.xls',formatting_info=True)
nsheets is an attribute integer which returns the total number of sheets in the workbook.
numberOfSheets=rb.nsheets
Since you have copied this to a new workbook wb -> basically to write things, wb to modify excel
wb = copy(rb)
there are two ways to get the sheet information,
a. if you just want to read the sheets, use sheet=rb.sheet_by_index(sheetNumber)
b. if you want to edit the sheet, use ws = wb.get_sheet(sheetNumber) (this is required in this context to the asked question)
you know how many number of sheets in excel workbook now and how to get them individually,
putting all of them together,
Sample Code:
reference: http://www.simplistix.co.uk/presentations/python-excel.pdf
from xlrd import open_workbook
from xlutils.copy import copy
from xlwt import Workbook
rb = open_workbook('sampleXLS.xls',formatting_info=True)
numberOfSheets=rb.nsheets
wb = copy(rb)
for each in range(sheetsCount):
sheet=rb.sheet_by_index(each)
ws = wb.get_sheet(each)
## both prints will give you the same thing
print sheet.name
print ws.name

Categories

Resources