How are 'ea_localid' values distributed in Enterprise Architect API? - python

While working with the Enterprise Architect API I have noticed that when you export an EA project to XMI, several different kinds of elements get an attribute called ea_localid. That means in the XMI you'll find a tag that has the ea_localid as an attribute. This attribute seems to be used to reference source and target of connecting elements (at least this is valid for 'transitions', as we are working with State Machine diagrams).
So far, so good. Now the problem for my intended usage is that these values seem to be newly distributed every time you do an import and an export. EDIT: it is not quite clear to me when exactly during this process. EDIT #2 It seems to happen on import.
That means after having exported your project, reimporting it, changing nothing and then exporting it again gives the generated XMI document a set of different ea_localid values. Moreover, it seems that some values that used to belong to one element can now be used for an entirely different one.
Does anybody know anything about the distribution mechanism? Or, even better, a way of emulating it? Or a way to reset all counters?
As far as I've seen, generally there seem to be different classes of elements and within these classes a new ea_localid for the next element is generated by counting +1. So the first one has the value 1, then the next one 2 and so on.
My goal here is doing 'roundtrips' (XMI --> project --> XMI ...) and always getting the same ea_localid values, possibly by editing the XMI document after export. Any help or suggestions would be highly appreciated. Cheers

The ea_localid represents the elementID of elements (or AttributeID for attributes etc...)
In EA each "thing" has two ID's. A numeric ID, and a GUID.
The numeric ID (e.g. t_object.Object_ID) is used as key in most relations, but this is not stable.
Things like importing XMI files can reset the numeric ID's. This explains why the ea_localID changes.
If you are looking for a stable ID you should use the GUID. This one is guaranteed to stay the same, even after exporting and importing to other models. (as long as you don't set the flag Strip GUIDs when importing)
In the xmi file you'll find those stable id's in the attribte xmi.id
e.g
<UML:Class name="Aannemer" xmi.id="EAID_04A526DF_7F07_4475_8E65_16D2D88CEECD" visibility="public" namespace="EAPK_0345C8A9_9E8F_42c5_9931_CB842233B11B" isRoot="false" isLeaf="false" isAbstract="false" isActive="false">
This value corresponds to the ea_guid columns in each of the tables.

So, after some testing I have found out that regarding the aforementioned goal of doing a roundtrip (xmi --> import to EA --> xmi) and always getting the exact same document, the easiest solution is ...
running a filter over the xmi that just deletes all nodes containing ea_localid, ea_sourceID (sic!) and ea_targetID values.
On reimport EA will just assign them new values. The information regarding source and target of 'transitions' and other connecting elements is also stored with the GUID, so there is no loss of information.

Related

AWS S3 boto3: Server-side filtering for top-level keys

I am developing a GUI for the analysts in my company to be able to browse a series of S3 buckets (and files therein named according to different hierarchies of keys) just as if they were working with any other hierarchical FS. The idea is that they can work without needing to know any details about how (or where) is the data actually stored.
I am aware of how S3 does not natively support folders and how can this be (to some extent) simulated by properly using key naming and delimiters. For my use case, let's suppose the content of my bucket is:
asset1/property1/fileA.txt
asset1/property2/fileB.txt
asset2/property1/fileA.txt
asset2/property2/fileB.txt
configX.txt
configY.txt
configZ.txt
I have so far written a simple GUI navigator that enables interactive navigation through the different levels of S3 keys as if they were folders (using the CommonPrefixes key of the dictionary returned by the s3 client or the paginator). The problem I have when I land in an example like the one above. Obviously, the CommonPrefixes is not going to return the file basenames under the requested key, but I also want to display them to the user (they are files contained in that "folder"!).
One thing that I have tried is for every "inspection" of a requested key level (when the user clicks on a list item (key substring) as if it was a folder) I retrieve the first, say, 1000 item basenames under the passed prefix and search for any file matching exactly the {prefix}/{basename} key. If any of the 1000 first files matches the criterion, it means that "there are files contained directly in that very folder", so then I can use the paginator to query them all (eentually there are more than 1000 in total) and have their keys returned to me "for displaying them in the folder".
The problem I have in the above situation is that the paginator will recursively inspect for all the contents with the passed key (logically), but, unlike in the above toy example, the 'asset' or 'property' "folders" can contain tens or hundreds of thousands files, thereby significantly slowing down the search "just" for being able to show the extra top-level 'config' files that share path with the asset "folders".
I can filter the results myself to display at each level what it has to be displayed, thereby resembling the appearance of a hierarchical folder structure. Nevertheless, the loss of speed and interactiveness is massive, rendering the GUI solution for comfortable analyst experience somewhat pointless. I have been searching for a way to filter results server-side beyond the standard prefix and delimiter (e. g. passing a suffix or a regular expression), but I have been unable to find any solution that does not imply the (slow) full-key results retrieval and client-side filtering.
Is there any way to approach this that I am not seeing? What would be the correct way to go about this problem? Since I do not think my use-case is a very specific corner-case, apologies if this is a basic question that has already been solved. I have tried to find an answer, but chances are that I am unable to google.
Thanks very much in advance.
D.
P.S.: BTW I have read that boto2 delivers the query results by levels, as I would expect them, but I am not certain that it wouldn't anyway query the whole bucket (which is what actually costs time)
If you look at the boto3 doc closely, besides CommonPrefixes being returned, there is also a Contents key which contains
A list of metadata about each object returned.

DELETE and CREATE Cypher statements in one transaction

I'm writing a python 3.6 application that uses Neo4j as a backend with the official python driver. I'm new to Neo4j and Cypher. Data coming into the database needs to replace previous 'versions' of that data. I'm attempting to do this by creating a root node, indicating version. Eg.
MATCH (root:root_node)-[*..]->(any_node:) DETACH DELETE root, any_node
CREATE(root:new_root_node)
...
...
... represents all the new data i'm attaching to the new_root_node
The above doesn't work. How do I incorporate DELETE and CREATE statements in one transaction?
Thanks!
There's not a problem with DELETE and CREATE statements in a single transaction.
There are two problems here needing fixing.
The first is with (anynode:). The : separates the variable from the node label. If : is present, then the node label must also be present, and because no label is supplied here you're getting an error. To fix this, remove : completely like so: (anynode)
The second problem is with CREATE(root:new_root_node). The problem here is that the root variable is already in scope from your previous MATCH, so you'll need to use a different variable.
Also, your :new_root_node label doesn't seem useful, as any previously created queries for querying data from the root node will need to change to use the new label. I have a feeling you might be misunderstanding something about Neo4j labels, so a quick review of the relevant documentation may be helpful.

How to keep multiple QAbstractItemModel classes in sync

I've been working really hard trying to find a solution to this for the past few weeks. I have committed to a direction now, but I am still not entirely satisfied with what I have come up with. Asking this now purely out of curiosity and for hope of a more proper solution for next time.
How on earth do I keep multiple QAbstractItemModel classes in sync that are referring to the same source data but displaying in different ways in the tree view?
One of the main reasons for using model/view is to keep multiple views in sync with one another. However, if each of my views requires different data being displayed at the same column, as far as I can tell I need to then subclass my model to two different models with different implementations that will then cater to each of those unique view displays of the same items.
Underlying source items are the same, but data displayed is different. Maybe the flags are different as well, so that the user can only select top level items in one view and then can only select child items in the other view.
I'll try to give an example:
Lets say my TreeItem has three properties: a, b, c.
I have two tree views: TreeView1, TreeView2. Each has two columns.
TreeView1 displays data as follows: column1 -> a, column2 -> b
TreeView2 displays data as follows: column1 -> a, column2 -> c
I then need to create two different models, one for TreeView1 and one for TreeView2, and override the data and flags methods appropriately for each.
Since they are now different models, even though they are both referring to the same TreeItem in the background, they are no longer staying in sync. I have to manually call the refresh on TreeView2 whenever I change data on TreeView1, and vice versa.
Consider that column1, or property a, is editable and allows the user to set the name of the TreeItem. Desired behaviour would be for the edit that is done in TreeView1 to instantly be reflected in TreeView2.
I feel like I am missing some important design pattern or something when approaching this. Can anyone out there see where I am going wrong and correct me? Or is this a correct interpretation?
Thanks!
One way to do it is to use viewmodels. Have one QAbstractItemModel adapter to your underlying data model. All interaction must pass through that model. When you need to further adapt the data to a view, simply use a proxy view model class that refers to the adapter above and reformats/adapts the data for a view. All the view models will then be automagically synchronized. They can derive from QAbstractProxyModel, although that's not strictly necessary.
There is no other way of doing it if the underlying source of data doesn't provide change notification both for the contents and for the structure. If the underlying data source provides relevant notifications, it might as well be a QAbstractItemModel to begin with :)

Same Kind Entities Groups

Let's take an example on which I run a blog that automatically updates its posts.
I would like to keep an entity of class(=model) BlogPost in two different "groups", one called "FutureBlogPosts" and one called "PastBlogPosts".
This is a reasonable division that will allow me to work with my blog posts efficiently (query them separately etc.).
Basically the problem is the "kind" of my model will always be "BlogPost". So how can I separate it into two different groups?
Here are the options I found so far:
Duplicating the same model class code twice (once FutureBlogPost class and once PastBlogPost class (so their kinds will be different)) -- seems quite ridiculous.
Putting them under different anchestors (FutureBlogPost, "SomeConstantValue", BlogPost, #id) but this method also has its implications (1 write per second?) and also the whole ancestor-child relationship doesn't seem fit here. (and why do I have to use "SomeConstantValue" if I choose that option?)
Using different namespaces -- seems too radical for such a simple separation
What is the right way to do it?
Well seems like I finally found the relevant article.
As I understand it, pulling all entities by a specific kind and pulling them by a specific property would make no difference, both will require the same type of work on the background.
(However, querying by a specific full-key, is still faster)
So basically adding a property named "Type" or any other property you want to use to split your specific entities into groups is just as useful as giving it a certain kind.
Read more here: https://developers.google.com/appengine/articles/storage_breakdown
As you see, both EntitiesByKind and EntitiesByProperty are nothing but index tables to the original key.
Finally, an answer.
Why not just put a boolean in your "BlogPost" Entity, 0 if it's past, 1 if it's future? will let you query them separately easily.

Using DVCS for an RDBMS audit trail

I'm looking to implement an audit trail for a reasonably complicated relational database, whose schema is prone to change. One avenue I'm thinking of is using a DVCS to track changes.
(The benefits I can imagine are: schemaless history, snapshots of entire system's state, standard tools for analysis, playback and migration, efficient storage, separate system, keeping DB clean. The database is not write-heavy and history is not not a core feature, it's more for the sake of having an audit trail. Oh and I like trying crazy new approaches to problems.)
I'm not an expert with these systems (I only have basic git familiarity), so I'm not sure how difficult it would be to implement. I'm thinking of taking mercurial's approach, but possibly storing the file contents/manifests/changesets in a key-value data store, not using actual files.
Data rows would be serialised to json, each "file" could be an row. Alternatively an entire table could be stored in a "file", with each row residing on the line number equal to its primary key (assuming the tables aren't too big, I'm expecting all to have less than 4000 or so rows. This might mean that the changesets could be automatically generated, without consulting the rest of the table "file".
(But I doubt it, because I think we need a SHA-1 hash of the whole file. The files could perhaps be split up by a predictable number of lines, eg 0 < primary key < 1000 in file 1, 1000 < primary key < 2000 in file 2 etc, keeping them smallish)
Is there anyone familiar with the internals of DVCS' or data structures in general who might be able to comment on an approach like this? How could it be made to work, and should it even be done at all?
I guess there are two aspects to a system like this: 1) mapping SQL data to a DVCS system and 2) storing the DVCS data in a key/value data store (not files) for efficiency.
(NB the json serialisation bit is covered by my ORM)
I've looked into this a little on my own, and here are some comments to share.
Although I had thought using mercurial from python would make things easier, there's a lot of functionality that the DVCS's have that aren't necessary (esp branching, merging). I think it would be easier to simply steal some design decisions and implement a basic system for my needs. So, here's what I came up with.
Blobs
The system makes a json representation of the record to be archived, and generates a SHA-1 hash of this (a "node ID" if you will). This hash represents the state of that record at a given point in time and is the same as git's "blob".
Changesets
Changes are grouped into changesets. A changeset takes note of some metadata (timestamp, committer, etc) and links to any parent changesets and the current "tree".
Trees
Instead of using Mercurial's "Manifest" approach, I've gone for git's "tree" structure. A tree is simply a list of blobs (model instances) or other trees. At the top level, each database table gets its own tree. The next level can then be all the records. If there are lots of records (there often are), they can be split up into subtrees.
Doing this means that if you only change one record, you can leave the untouched trees alone. It also allows each record to have its own blob, which makes things much easier to manage.
Storage
I like Mercurial's revlog idea, because it allows you to minimise the data storage (storing only changesets) and at the same time keep retrieval quick (all changesets are in the same data structure). This is done on a per record basis.
I think a system like MongoDB would be best for storing the data (It has to be key-value, and I think Redis is too focused on keeping everything in memory, which is not important for an archive). It would store changesets, trees and revlogs. A few extra keys for the current HEAD etc and the system is complete.
Because we're using trees, we probably don't need to explicitly link foreign keys to the exact "blob" it's referring to. Justing using the primary key should be enough. I hope!
Use case: 1. Archiving a change
As soon as a change is made, the current state of the record is serialised to json and a hash is generated for its state. This is done for all other related changes and packaged into a changeset. When complete, the relevant revlogs are updated, new trees and subtrees are generated with the new object ("blob") hashes and the changeset is "committed" with meta information.
Use case 2. Retrieving an old state
After finding the relevant changeset (MongoDB search?), the tree is then traversed until we find the blob ID we're looking for. We go to the revlog and retrieve the record's state or generate it using the available snapshots and changesets. The user will then have to decide if the foreign keys need to be retrieved too, but doing that will be easy (using the same changeset we started with).
Summary
None of these operations should be too expensive, and we have a space efficient description of all changes to a database. The archive is kept separately to the production database allowing it to do its thing and allowing changes to the database schema to take place over time.
If the database is not write-heavy (as you say), why not just implement the actual database tables in a way that achieves your goal? For example, add a "version" column. Then never update or delete rows, except for this special column, which you can set to NULL to mean "current," 1 to mean "the oldest known", and go up from there. When you want to update a row, set its version to the next higher one, and insert a new one with no version. Then when you query, just select rows with the empty version.
Take a look at cqrs and Greg Young's event sourcing. I also have a blog post about working in meta events that pin point schema changes within the river of business events.
http://adventuresinagile.blogspot.com/2009/09/rewind-button-for-your-application.html
If you look through my blog, you'll also find version script schemes and you can source code control those.

Categories

Resources