openoffice: duplicating rows of a table in writer - python

I need to programmatically duplicate rows of a Table in openoffice writer.
It's not difficult to add rows via table.Rows.insertByIndex(idx, count), that adds empty rows and it's easy to add text in that row assigning DataArray to the CellRange. Doing this way you loose control on the style of the cells and specifically if a cell has words with different style (bold/italic) they get flattened to the same face. What I need is to duplicate a row in a way that preserves the style of each word in the cell/row.
This is the last step of a Python template system that uses openoffice (http://oootemplate.argolinux.org). I access the document via uno interface in Python but any language would do to explain the logic behind it.

The solution is to use controller's method .getTrasferable() to get data from the ViewCursor. that in turn requires that you control your view cursor and position it in every single cell (I was not able to make the ViewCursor span multiple cells). Once you have acquired the transferable you place the cursor in the destination and insert.
desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
document = desktop.loadComponentFromURL("file://%s/template-debug.odt" % os.getcwd() ,"_blank", 0, ())
controller=document.getCurrentController()
table = document.TextTables.getByIndex(0)
view_cursor=controller.getViewCursor()
src = table.getCellByName(src_name)
dst = table.getCellByName(dst_name)
view_cursor.gotoRange(src.Text, False)
txt = controller.getTransferable()
view_cursor.gotoRange(dst.Text, False)
controller.insertTransferable(txt)

Related

Google doc Python API: How to modify the content of a specific cell

Thanks for you help & time, here is my code, I am accessing a specific cell of a table and my goal is to modify the text context of this specific cell - by that I mean to overwrite the existing text in that cell with a new string value. How do I do that?
def main():
credentials = get_creds()
service = build("docs", "v1", credentials=credentials).documents()
properties_req = service.get(documentId=REQ_DOCUMENT_ID).execute()
doc_content_req = properties_req.get('body').get('content')
properties_des = service.get(documentId=DES_DOCUMENT_ID).execute()
doc_content_des = properties_des.get('body').get('content')
reqs = find_requirements(doc_content_req)
for (req, row) in zip(reqs, req_table.get('tableRows')):
loc = search_structural_elements(doc_content_des, req)
cell = get_cell(row, design_col)
print(f"Requirement {req} is located in section {loc} of the design doc.")
print(cell) # Need to modify the text content of this specific cell
You can easily modify the contents of a particular cell by using the Documents.batchUpdate() tool. First and foremost I strongly recommend you to familiarize yourself with the Table structure inside a Doc. There you can see how the table is first declared, then divided in rows and later formatted with some styles. After you write the desired change in the desired cell (from the desired row), then you can use Documents.batchUpdate() and reach your goal.
As a tip, I want to add that you can run a small trick to easily find the desired cell. This operation can help you manage your first cell edit easily. First you will need to open the Doc in a browser, then you have to write a recognizable string (like 123ABC) in the desired cell. After that you could use Documents.get() to receive the Document object. You can search for the 123ABC string without difficulty in that object, change it to the desired value, and use that object as a template for the batch update. Please ask me any additional questions about this answer.

How to extract image from table in MS Word document with docx library?

I am working on a program that needs to extract two images from a MS Word document to use them in another document. I know where the images are located (first table in the document), but when I try to extract any information from the table (even just plain text), I get empty cells.
Here is the Word document that I want to extract the images from. I want to extract the 'Rentel' images from the first page (first table, row 0 and 1, column 2).
I have tried to try the following code:
from docxtpl import DocxTemplate
source_document = DocxTemplate("Source document.docx")
# It doesn't really matter which rows or columns I use for the cells, everything is empty
print(source_document.tables[0].cell(0,0).text)
Which just gives me empty lines...
I have read on this discussion and this one that the problem might be that "contained in a wrapper element that Python Docx cannot read". They suggest altering the source document, but I want to be able to select any document that was previously created with the same template as a source document (so those documents also contain the same problem and I cannot change every document separately). So a Python-only solution is really the only way I can think about solving the problem.
Since I also only want those two specific images, extracting any random image from the xml by unzipping the Word file doesn't really suit my solution, unless I know which image name I need to extract from the unzipped Word file folders.
I really want this to work as it is part of my thesis (and I'm just an electromechanical engineer, so I don't know that much about software).
[EDIT]: Here is the xml code for the first image (source_document.tables[0].cell(0,2)._tc.xml) and here it is for the second image (source_document.tables[0].cell(1,2)._tc.xml). I noticed however that taking (0,2) as row and column value, gives me all the rows in column 2 within the first "visible" table. Cell (1,2) gives me all the rows in column 2 within the second "visible" table.
If the problem isn't directly solvable with Python Docx, is it a possibility to search for the image name or ID or something within the XML code and then add the image using this ID/name with Python Docx?
Well, the first thing that jumps out is that both of the cells (w:tc elements) you posted each contain a nested table. This is perhaps unusual, but certainly a valid composition. Maybe they did that so they could include a caption in a cell below the image or something.
To access the nested table you'd have to do something like:
outer_cell = source_document.tables[0].cell(0,2)
nested_table = outer_cell.tables[0]
inner_cell_1 = nested_table.cell(0, 0)
print(inner_cell_1.text)
# ---etc....---
I'm not sure that solves your whole problem, but it strikes me that this is two or more questions in the end, the first being: "Why isn't my table cell showing up?" and the second perhaps being "How do I get an image out of a table cell?" (once you've actually found the cell in question).
For the people who have the same problem, this is the code that helped me solve it:
First I extract the nested cell from the table using the following method:
#staticmethod
def get_nested_cell(table, outer_row, outer_column, inner_row, inner_column):
"""
Returns the nested cell (table inside a table) of the *document*
:argument
table: [docx.Table] outer table from which to get the nested table
outer_row: [int] row of the outer table in which the nested table is
outer_column: [int] column of the outer table in which the nested table is
inner_row: [int] row in the nested table from which to get the nested cell
inner_column: [int] column in the nested table from which to get the nested cell
:return
inner_cell: [docx.Cell] nested cell
"""
# Get the global first cell
outer_cell = table.cell(outer_row, outer_column)
nested_table = outer_cell.tables[0]
inner_cell = nested_table.cell(inner_row, inner_column)
return inner_cell
Using this cell, I can get the xml code and extract the image from that xml code. Note:
I didn't set the image width and height because I wanted it to be the same
In the replace_logos_from_source method I know that the table where I want to get the logos from is 'tables[0]' and that the nested table is in outer_row and outer_column '0', so I just filled it in the get_nested_cell method without adding extra arguments to replace_logos_from_source
def replace_logos_from_source(self, source_document, target_document, inner_row, inner_column):
"""
Replace the employer and client logo from the *source_document* to the *target_document*. Since the table
in which the logos are placed are nested tables, the source and target cells with *inner_row* and
*inner_column* are first extracted from the nested table.
:argument
source_document: [DocxTemplate] document from which to extract the image
target_document: [DocxTemplate] document to which to add the extracted image
inner_row: [int] row in the nested table from which to get the image
inner_column: [int] column in the nested table from which to get the image
:return
Nothing
"""
# Get the target and source cell (I know that the table where I want to get the logos from is 'tables[0]' and that the nested table is in outer_row and outer_column '0', so I just filled it in without adding extra arguments to the method)
target_cell = self.get_nested_cell(target_document.tables[0], 0, 0, inner_row, inner_column)
source_cell = self.get_nested_cell(source_document.tables[0], 0, 0, inner_row, inner_column)
# Get the xml code of the inner cell
inner_cell_xml = source_cell._tc.xml
# Get the image from the xml code
image_stream = self.get_image_from_xml(source_document, inner_cell_xml)
# Add the image to the target cell
paragraph = target_cell.paragraphs[0]
if image_stream: # If not None (image exists)
run = paragraph.add_run()
run.add_picture(image_stream)
else:
# Set the target cell text equal to the source cell text
paragraph.add_run(source_cell.text)
#staticmethod
def get_image_from_xml(source_document, xml_code):
"""
Returns the rId for an image in the *xml_code*
:argument
xml_code: [string] xml code from which to extract the image from
:return
image_stream: [BytesIO stream] the image to find
None if no image exists in the xml_file
"""
# Parse the xml code for the blip
xml_parser = minidom.parseString(xml_code)
items = xml_parser.getElementsByTagName('a:blip')
# Check if an image exists
if items:
# Extract the rId of the image
rId = items[0].attributes['r:embed'].value
# Get the blob of the image
source_document_part = source_document.part
image_part = source_document_part.related_parts[rId]
image_bytes = image_part._blob
# Write the image bytes to a file (or BytesIO stream) and feed it to document.add_picture(), maybe:
image_stream = BytesIO(image_bytes)
return image_stream
# If no image exists
else:
return None
To call the method, I used:
# Replace the employer and client logos
self.replace_logos_from_source(self.source_document, self.template_doc, 0, 2) # Employer logo
self.replace_logos_from_source(self.source_document, self.template_doc, 1, 2) # Client logo

Unsetting Header Row in python-docx or setting first column as title column

I am generating multiple tables with the python-docx but I have issue that some tables are rotated == having headers in the first column instead of the first row.
Is there any possibility how to uncheck following option with the python-docx or with accessing xml directly?
I did found workaround how to achieve what I needed. It is possible to directly access each paragraph in each cell and set it manually to bold during creation.
This example makes first column bold and leaving rest of the table normal:
table = document.add_table(rows=0, cols=len(data))
table.style = 'Table Grid'
row_cells = table.add_row().cells
# Some fake data
row_cells[0].text = data[0]
row_cells[1].text = data[1]
row_cells[2].text = data[2]
# ...
# This gets Heading Row 1 paragraph and sets it to bold
row_cells[0].paragraphs[0].runs[0].font.bold = True
These options specifie whether the corresponding part of a Table Style definition applied to a table.
In the Word Open XML, these are listed as attributes of the w:tblLook element. The heading row is the attribute w:firstRow. It's an on/off | true/false setting so to turn it off the attribute needs to be set to 0 (false).
I can't tell you whether python-docx supports this (the Open XML SDK does). But this information will let you change it in the XML directly and possibly search the language reference for python-docx.

Arcpy, select features based on part of a string

So for my example, I have a large shapefile of state parks where some of them are actual parks and others are just trails. However there is no column defining which are trails vs actual parks, and I would like to select those that are trails and remove them. I DO have a column for the name of each feature, that usually contains the word "trail" somewhere in the string. It's not always at the beginning or end however.
I'm only familiar with Python at a basic level and while I could go through manually selecting the ones I want, I was curious to see if it could be automated. I've been using arcpy.Select_analysis and tried using "LIKE" in my where_clause and have seen examples using slicing, but have not been able to get a working solution. I've also tried using the 'is in' function but I'm not sure I'm using it right with the where_clause. I might just not have a good enough grasp of the proper terms to use when asking and searching. Any help is appreciated. I've been using the Python Window in ArcMap 10.3.
Currently I'm at:
arcpy.Select_analysis ("stateparks", "notrails", ''trail' is in \"SITE_NAME\"')
Although using the Select tool is a good choice, the syntax for the SQL expression can be a challenge. Consider using an Update Cursor to tackle this problem.
import arcpy
stateparks = r"C:\path\to\your\shapefile.shp"
notrails = r"C:\path\to\your\shapefile_without_trails.shp"
# Make a copy of your shapefile
arcpy.CopyFeatures_management(stateparks, notrails)
# Check if "trail" exists in the string--delete row if so
with arcpy.da.UpdateCursor(notrails, "SITE_NAME") as cursor:
for row in cursor:
if "trails" in row[0]: # row[0] refers to the current row in the "SITE_NAME" field
cursor.deleteRow() # Delete the row if condition is true

Some simple examples of Smartsheet API using the Python SDK

I am newbie to the Smartsheet Python SDK. Using the sample code from the Smartsheets API doc as a starting point:
action = smartsheet.Sheets.list_sheets(include_all=True)
sheets = action.data
This code returns a response just fine.
I am now looking for some simple examples to iterate over the sheets ie:
for sheet in sheets:
then select a sheet by name
then iterate over the rows in the selected sheet and select a row.
for row in rows:
then retrieve a cell value from the selected row in the selected sheet.
I just need some simple samples to get started. I have searched far and wide and unable to find any simple examples of how to do this
Thanks!
As Scott said, a sheet could return a lot of data, so make sure that you use filters judiciously. Here is an example of some code I wrote to pull two rows but only one column in each row:
action = smartsheet.Sheets.get_sheet(SHEET_ID, column_ids=COL_ID, row_numbers="2,4")
Details on the available filters can be found here.
UPDATE: more code added in order to follow site etiquette and provide a complete answer.
The first thing I did while learning the API is display a list of all my sheets and their corresponding sheetId.
action = MySS.Sheets.list_sheets(include_all=True)
for single_sheet in action.data:
print single_sheet.id, single_sheet.name
From that list I determined the sheetId for the sheet I want to pull data from. In my example, I actually needed to pull the primary column, so I used this code to determine the Id of the primary column (and also saved the non-primary column Ids in a list because at the time I thought I might need them):
PrimaryCol = 0
NonPrimaryCol = []
MyColumns = MySS.Sheets.get_columns(SHEET_ID)
for MyCol in MyColumns.data:
if MyCol.primary:
print "Found primary column", MyCol.id
PrimaryCol = MyCol.id
else:
NonPrimaryCol.append(MyCol.id)
Lastly, keeping in mind that retrieving an entire sheet could return a lot of data, I used a filter to return only the data in the primary column:
MySheet = MySS.Sheets.get_sheet(SHEET_ID, column_ids=PrimaryCol)
for MyRow in MySheet.rows:
for MyCell in MyRow.cells:
print MyRow.id, MyCell.value
Below is a very simple example. Most of this is standard python, but one somewhat non-intuitive thing about this may be the fact that the sheet objects in the list returned from smartsheet.Sheets.list_sheets doesn't include the rows & cells. As this could be a lot of data, it returns information about the sheet, that you can use to retrieve the sheet's complete data by calling smartsheet.Sheets.get_sheet.
To better understand things such as this, be sure to keep the Smartsheet REST API reference handy. Since the SDK is really just calling this API under the covers, you can often find more information by look at that documentation as well.
action = smartsheet.Sheets.list_sheets(include_all=True)
sheets = action.data
for sheetInfo in sheets:
if sheetInfo.name=='WIP':
sheet = smartsheet.Sheets.get_sheet(sheetInfo.id)
for row in sheet.rows:
if row.row_number==2:
for c in range(0, len(sheet.columns)):
print row.cells[c].value
I started working with Python APIs with SmartSheets. Due to our usage of smartsheets to back some of our RIO2016 Olympic Games operations, every now and then we had to delete the oldest Smartsheets for the sake of licence compliance limits. And that was a blunder: login, select each smarts among 300 hundred, check every field and so on. So thanks smartsheet API 2.0, we could learn easily how many sheets we have been used so far, get all the 'modified' date, sort by that column from the latest to the most recent date and then write to a CSV disk. I am not sure if this is the best approach for that but it worked as I expected.I use Idle-python2.7, Debian 8.5. Here you are:
# -*- coding: utf-8 -*-
#!/usr/bin/python
'''
create instance of Sheet Object.
Then populate List of Sheet Object with name and modified
A token is necessary to access Smartsheets
We create and return a list of all objects with fields aforesaid.
'''
# The Library
import smartsheet, csv
'''
Token long var. This token can be obtained in
Account->Settings->Apps...->API
from a valid SmartSheet Account.
'''
xMytoken=xxxxxxxxxxxxxxxxxxxxxx
# Smartsheet Token
xSheet = smartsheet.Smartsheet(xMyToken)
# Class object
xResult = xSheet.Sheets.list_sheets(include_all=True)
# The list
xList = []
'''
for each sheet element, we choose two, namely name and date of modification. As most of our vocabulary has special characters, we use utf-8 after the name of each spreadsheet.So for each sheet read from Object sheets
'''
for sheet1 in xResult.data.
xList.append((sheet1._name.encode('utf-8'),sheet1._modified_at))
# sort the list created by 'Modifiedat' attribute
xNlist = sorted(xList,key=lambda x: x[1])
# print list
for key, value in xNlist:
print key,value
# Finally write to disk
with open("listofsmartsh.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(xNList)
Hope you enjoy.
regards

Categories

Resources