Access STEP Instance ID's with PythonOCC

Access STEP Instance ID's with PythonOCC - python

Let's suppose I'm using this STEP file data as input:
#417=ADVANCED_FACE('face_1',(#112),#405,.F.);
#418=ADVANCED_FACE('face_2',(#113),#406,.F.);
#419=ADVANCED_FACE('face_3',(#114),#407,.F.);
I'm using pythonocc-core to read the STEP file.
Then the following code will print the names of the ADVANCED_FACE instances (face_1,face_2 and face_3):
from OCC.Core.STEPControl import STEPControl_Reader
from OCC.Core.TopExp import TopExp_Explorer
from OCC.Core.TopAbs import TopAbs_FACE
from OCC.Core.StepRepr import StepRepr_RepresentationItem
reader = STEPControl_Reader()
tr = reader.WS().TransferReader()
reader.ReadFile('model.stp')
reader.TransferRoots()
shape = reader.OneShape()
exp = TopExp_Explorer(shape, TopAbs_FACE)
while exp.More():
s = exp.Current()
exp.Next()
item = tr.EntityFromShapeResult(s, 1)
item = StepRepr_RepresentationItem.DownCast(item)
name = item.Name().ToCString()
print(name)
How can I access the identifiers of the individual shapes? (#417,#418 and #419)
Minimal reproduction
https://github.com/flolu/step-occ-instance-ids

Create a STEP model after reader.TransferRoots() like this:
model = reader.StepModel()
And access the ID like this in the loop:
id = model.IdentLabel(item)
The full code looks like this and can also be found on GitHub:
from OCC.Core.STEPControl import STEPControl_Reader
from OCC.Core.TopExp import TopExp_Explorer
from OCC.Core.TopAbs import TopAbs_FACE
from OCC.Core.StepRepr import StepRepr_RepresentationItem
reader = STEPControl_Reader()
tr = reader.WS().TransferReader()
reader.ReadFile('model.stp')
reader.TransferRoots()
model = reader.StepModel()
shape = reader.OneShape()
exp = TopExp_Explorer(shape, TopAbs_FACE)
while exp.More():
s = exp.Current()
exp.Next()
item = tr.EntityFromShapeResult(s, 1)
item = StepRepr_RepresentationItem.DownCast(item)
label = item.Name().ToCString()
id = model.IdentLabel(item)
print('label', label)
print('id', id)
Thanks to temurka1 for pointing this out!

I was unable to run your code due to issues installing the pythonocc module, however, I suspect that you should be able to inspect the StepRep_RepresentationItem object (prior to string conversion) by traversing __dict__ on it to discover/access whatever attributes/properties/methods of the object you may need:
entity = tr.EntityFromShapeResult(s, 1)
item = StepRepr_RepresentationItem.DownCast(entity)
print(entity.__dict__)
print(item.__dict__)
If necessary the inspect module exists to pry deeper into the object.
References
https://docs.python.org/3/library/stdtypes.html#object.__dict__
https://docs.python.org/3/library/inspect.html
https://github.com/tpaviot/pythonocc-core/blob/66d6e1ef6b7552a1110a90e86a1ed34eb12ecf16/src/SWIG_files/wrapper/StepElement.pyi

Related

Reset index name in elasticsearch dsl

I'm trying to create an ETL that extracts from mongo, process the data and loads into elastic. I will do a daily load so I thought of naming my index with the current date. This will help me for a later processing I need to do with this first index.
I used elasticsearch dsl guide: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html
The problem that I have comes from my little experience with working with classes. I don't know how to reset the Index name from the class.
Here is my code for the class (custom_indices.py):
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime
class News(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
manual_tagging = Keyword()
class Index:
name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")
def save(self, ** kwargs):
return super(News, self).save(** kwargs)
def is_published(self):
return datetime.now() >= self.processed
And this is the part of the code where I create the instance to that class:
from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime
connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
new_insert = News(meta={'id': doc.url_hashed},
title = doc.title,
manual_tagging = doc.customTags,
)
new_insert.save()
Every time I call the "News" class I would expect to have a new name. However, the name doesn't change even if I load the class again (from custom_indices import News). I know this is only a problem I have when testing but I'd like to know how to force that "reset". Actually, I originally wanted to change the name outside the class with this line right before the loop:
News.Index.name = "NEW_NAME"
However, that didn't work. I was still seeing the name defined on the class.
Could anyone please assist?
Many thanks!
PS: this must be just an object oriented programming issue. Apologies for my ignorance on the subject.

Maybe you could take advantage of the fact that Document.init() accepts an index keyword argument. If you want the index name to get set automatically, you could implement init() in the News class and call super().init(...) in your implementation.
A simplified example (python 3.x):
from elasticsearch_dsl import Document
from elasticsearch_dsl.connections import connections
import datetime
class News(Document):
#classmethod
def init(cls, index=None, using=None):
index_name = index or 'processed_news_' + datetime.datetime.now().strftime("%Y%m%d")
return super().init(index=index_name, using=using)

You can override the index when you call save() .
new_insert.save('processed_news_' + datetime.datetime.now().strftime("%Y%m%d"))

Example as following.
# coding: utf-8
import datetime
from elasticsearch_dsl import Keyword, Text, \
Index, Document, Date
from elasticsearch_dsl.connections import connections
HOST = "localhost:9200"
index_names = [
"foo-log-",
"bar-log-",
]
default_settings = {"number_of_shards": 4, "number_of_replicas": 1}
index_settings = {
"foo-log-": {
"number_of_shards": 40,
"number_of_replicas": 1
}
}
class LogDoc(Document):
level = Keyword(ignore_above=256)
date = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
hostname = Text(fields={'fields': Keyword(ignore_above=256)})
message = Text()
createTime = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
def auto_create_index():
'''自动创建ES索引'''
connections.create_connection(hosts=[HOST])
for day in range(3):
dt = datetime.datetime.now() + datetime.timedelta(days=day)
for index in index_names:
name = index + dt.strftime("%Y-%m-%d")
settings = index_settings.get(index, default_settings)
idx = Index(name=name)
idx.document(LogDoc)
idx.settings(**settings)
try:
idx.create()
except Exception as e:
print(e)
continue
print("create index %s" % name)
if __name__ == '__main__':
auto_create_index()

Assign a short name to a class attribute

I am using a Python package which read some type of data. From the data, it creates attributes to easily access meta-information related to the data.
How can create a short name to an attribute?
Basically let's assume the package name is read_data and it has an attribute named data_header_infomation_x_location
import read_data
my_data = read_data(file_path)
How can I instead create a short name to this attribute?
x = "data_header_infomation_x_location"
my_data[1].x gives an error no attribute
Here is a full example from my case
from obspy.io.segy.core import _read_segy
file_path = "some_file_in_my_pc)
sgy = _read_segy(file_path, unpack_trace_headers=True)
sgy[1].stats.segy.trace_header.x_coordinate_of_ensemble_position_of_this_trace
The last line gives a number. e.g., x location
what I want is to rename all this long nested attribute stats.segy.trace_header.x_coordinate_of_ensemble_position_of_this_trace with a short name.
trying for example
attribute = "stats.segy.trace_header.x_coordinate_of_ensemble_position_of_this_trace"
getattr(sgy[1], attribute )
does not work

how about:
from obspy.io.segy.core import _read_segy
attribute_tree_x = ['stats', 'segy', 'trace_header', 'x_coordinate_of_ensemble_position_of_this_trace']
def get_nested_attribute(obj, attribute_tree):
for attr in attribute_tree:
obj = getattr(obj, attr)
return obj
file_path = "some_file_in_my_pc"
sgy = _read_segy(file_path, unpack_trace_headers=True)
sgy[1].stats.segy.trace_header.x_coordinate_of_ensemble_position_of_this_trace
x = get_nested_attribute(sgy[1], attribute_tree_x) # should be the same as the line above
You cannot request the attribute of the attribute in one go, but this loops through the layers to obtain the final value you are looking for.

Using Python's docx library, how can a table be indented?

How can a docx table be indented? I am trying to line a table up with a tab stop set at 2cm. The following script creates a header, some text and a table:
import docx
from docx.shared import Cm
doc = docx.Document()
style = doc.styles['Normal']
style.paragraph_format.tab_stops.add_tab_stop(Cm(2))
doc.add_paragraph('My header', style='Heading 1')
doc.add_paragraph('\tText is tabbed')
# This indents the paragraph inside, not the table
# style = doc.styles['Table Grid']
# style.paragraph_format.left_indent = Cm(2)
table = doc.add_table(rows=0, cols=2, style="Table Grid")
for rowy in range(1, 5):
row_cells = table.add_row().cells
row_cells[0].text = 'Row {}'.format(rowy)
row_cells[0].width = Cm(5)
row_cells[1].text = ''
row_cells[1].width = Cm(1.2)
doc.save('output.docx')
It produces a table with no ident as follows:
How can the table be indented as follows?
(preferably without having to load an existing document):
If for example left-indent is added to the Table Grid style (by uncommenting the lines), it will be applied at the paragraph level, not the table level resulting in the following (which is not wanted):
In Microsoft Word, this can be done on the table properties by entering 2.0 cm for Indent from left.

Based on Fred C's answer, I came up with this solution:
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
def indent_table(table, indent):
# noinspection PyProtectedMember
tbl_pr = table._element.xpath('w:tblPr')
if tbl_pr:
e = OxmlElement('w:tblInd')
e.set(qn('w:w'), str(indent))
e.set(qn('w:type'), 'dxa')
tbl_pr[0].append(e)

This feature is not yet supported by python-docx. It looks like this behavior is produced by the w:tblInd child of the w:tbl element. It's possible you could develop a workaround function to add an element like this using lxml calls on the w:tbl element, which should be available on the ._element attribute of a Table object.
You can find examples of other workaround functions by searching on 'python-docx workaround function' and similar ones by searching on 'python-pptx workaround functions'.

Here's how I did it:
import docx
import lxml
mydoc = docx.Document()
mytab = self.mydoc.add_table(3,3)
nsmap=mytab._element[0].nsmap # For namespaces
searchtag='{%s}tblPr' % nsmap['w'] # w:tblPr
mytag='{%s}tblInd' % nsmap['w'] # w:tblInd
myw='{%s}w' % nsmap['w'] # w:w
mytype='{%s}type' % nsmap['w'] # w:type
for elt in mytab._element:
if elt.tag == searchtag:
myelt=lxml.etree.Element(mytag)
myelt.set(myw,'1000')
myelt.set(mytype,'dxa')
myelt=elt.append(myelt)

How to use python diff_match_patch to create a patch and apply it

I'm looking for a pythonic way to compare two files file1 and file2 obtain the differences in form of a patch file and merge their differences into file2. The code should do something like this:
diff file1 file2 > diff.patch
apply the patch diff.patch to file2 // this must be doing something like git apply.
I have seen the following post Implementing Google's DiffMatchPatch API for Python 2/3 on google's python API dif_match_patch to find the differences but I'm looking for a solution to create and apply patch.

First you need to install diff_match_patch.
Here is my code:
import sys
import time
import diff_match_patch as dmp_module
def readFileToText(filePath):
file = open(filePath,"r")
s = ''
for line in file:
s = s + line
return s
dmp = dmp_module.diff_match_patch()
origin = sys.argv[1];
lastest = sys.argv[2];
originText = readFileToText(origin)
lastestText = readFileToText(lastest)
patch = dmp.patch_make(originText, lastestText)
patchText = dmp.patch_toText(patch)
# floder = sys.argv[1]
floder = '/Users/test/Documents/patch'
print(floder)
patchFilePath = floder
patchFile = open(patchFilePath,"w")
patchFile.write(patchText)
print(patchText)

How to instantiate an ontology using rdflib?

I have an ontology where I have defined series of classes, subclasses and properties. Now I want to automatically instantiate the ontology with Python code and save it in RDF/XML again and load it in Protege. I have written the following code:
def instantiating_ontology(rdf_address):
from rdflib import *
g = Graph()
input_RDF = g.parse(rdf_address)
#input_RDF = g.open(rdf_address, create=False)
myNamespace="http://www.semanticweb.org/.../ontologies/2015/3/RNO_V5042_RDF"
rno = Namespace(myNamespace+"#")
nodeClass = URIRef(rno+"Node")
arcClass = URIRef(rno+"Arc")
#owlNamespace = 'http://www.w3.org/2002/07/owl#NamedIndividual'
namedIndividual = URIRef('http://www.w3.org/2002/07/owl#NamedIndividual')
rdftype = URIRef("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")
for i in range(0,100):
individualName = rno + "arc_"+str(arcID)
#arc_individual= BNode(individualName)
arc_individual = BNode()
#g.add()
#g.add((arc_individual,rdftype, namedIndividual))
g.add((arc_individual,rdftype, arcClass))
g.add((arc_individual,rdftype, arcClass))
#g.commit()
output_address ="RNO_V5042_RDF.owl"
g.serialize(destination = output_address)
The file contains the added triples to the rdf/xml:
<rdf:Description rdf:nodeID="N0009844208f0490887a02160fbbf8b98">
<rdf:type rdf:resource="http://www.semanticweb.org/ehsan.abdolmajidi/ontologies/2015/3/RNO_V5042#Arc"/>
but when I open the file in Protege there are no instances for the classes.
Can someone tell me if the way I defined instances is wrong or I should use different tags?

After playing around with the code and the results, I realized that the notion rdf:nodeID should be replaced with rdf:about. to do so I only needed to change
for i in range(0,100):
individualName = rno + "arc_"+str(arcID)
#arc_individual= BNode(individualName)
arc_individual = BNode() #---> remove this one
arc_individual = URIRef(individualName) #----> add this one
g.add((arc_individual,rdftype, arcClass))
g.add((arc_individual,rdftype, arcClass))
arc_individual = URIRef(individualName)
that might seem easy but took me sometime to understand. I hope this can help others. :D

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Access STEP Instance ID's with PythonOCC - python

Related

Reset index name in elasticsearch dsl

Assign a short name to a class attribute

Using Python's docx library, how can a table be indented?

How to use python diff_match_patch to create a patch and apply it

How to instantiate an ontology using rdflib?

Categories

Resources