I have a very difficult problem with a ros2 topic that for some reason keeps more than one message. My project is rather simple: I have a planner in which I can create targets and edit them. The planner consists of several nodes, one for changing each value of the target. List of my nodes:
/add_target
/change_comment
/change_target_index
/clear_state
/remove_target
/rename_target
/set_target
/toggle_select_target
/toggle_visible
Each node extends StateNode (see implementation below), which helps to keep the same state for each node.
The idea is simple: a node receives a service call, for example /planner/rename_target, finds the specific target from the node's state, modifies it, and publishes new state to /planner/state. Each node is subscribed to /planner/state and sets the state to the message received. The idea is to keep the state consistent across all nodes, so each node has access to all state data and can modify it.
I have set my quality of service profile to keep ONLY the latest message. However, my problem is that after using service calls to different nodes, sometimes when running for example
ros2 topic echo --qos-history keep_last --qos-depth 1 --qos-durability transient_local --qos-reliability reliable /planner/state
I receive multiple messages. The order of the messages changes randomly. The state of each node seems to be the same, BUT it seems there are old messages "floating around" in the topic. My qos should allow only the latest message to persist.
For example, if I first service call twice
ros2 service call /planner/add_target mtms_interfaces/srv/AddTarget "{target: {position:{x: 0.0,y: 0.0,z: 0.0}, orientation: {alpha: 0.0,beta: 0.0,gamma: 0.0}}}"
my topic echo looks normal, but if I then
ros2 service call /planner/rename_target mtms_interfaces/srv/RenameTarget "{name: 'Target-0', new_name: 'example'}"
suddenly my topic echo shows two messages. In one of the messages the target has not been modified, and in the other the target has been modified.
What could be the problem here?
Here are some examples of my nodes
StateNode implementation:
class StateNode(Node):
def __init__(self, name):
super().__init__(name)
# Persist the latest sample.
qos = QoSProfile(
depth=1,
durability=DurabilityPolicy.TRANSIENT_LOCAL,
history=HistoryPolicy.KEEP_LAST,
reliability=ReliabilityPolicy.RELIABLE
)
self._state_publisher = self.create_publisher(
PlannerState,
"/planner/state",
qos
)
self._state_subscriber = self.create_subscription(
PlannerState,
'/planner/state',
self.state_updated,
10
)
self._state = None
def state_updated(self, msg):
self._state = msg
RenameTargetNode implementation:
class RenameTargetNode(StateNode):
def __init__(self):
super().__init__('rename_target')
self.create_service(RenameTarget, '/planner/rename_target', self.rename_target_callback)
def rename_target_callback(self, request, response):
state = self._state
if state is None:
response.success = False
return response
self.get_logger().info('Renaming {} to {}'.format(request.name, request.new_name))
i = 0
for target in state.targets:
# Name already exists
if target.name == request.new_name:
response.success = False
return response
# Save index of target in case new_name is unique
if target.name == request.name:
i = state.targets.index(target)
state.targets[i].name = request.new_name
self._state_publisher.publish(state)
response.success = True
return response
AddTargetNode implementation
class AddTargetNode(StateNode):
def __init__(self):
super().__init__('add_target')
self.create_service(AddTarget, '/planner/add_target', self.add_target_callback)
def first_available_target_name(self):
if self._state is None:
return "Target-0"
target_names = [target.name for target in self._state.targets]
idx = 0
while True:
target_name = "Target-{}".format(idx)
if target_name not in target_names:
break
idx += 1
return target_name
def create_new_target(self, pose):
target = Target()
target.name = self.first_available_target_name()
target.type = "Target"
target.comment = ""
target.selected = False
target.target = False # XXX: Misnomer
target.pose = pose
target.intensity = 100.0
target.iti = 100.0
return target
def add_target_callback(self, request, response):
self.get_logger().info('Incoming request')
target = self.create_new_target(
pose=request.target # XXX: Misnomer
)
if self._state is None:
msg = PlannerState()
msg.targets = [
target
]
else:
msg = self._state
msg.targets.append(target)
self._state_publisher.publish(msg)
response.success = True
return response
System information:
Ubuntu 20.04, kernel 5.14.0-1042-oem, x86_64
I'm running the ros nodes in one docker container created with osrf/ros:galactic-desktop.
The problem here was that I had several publishers to the same topic with DurabilityPolicy.TRANSIENT_LOCAL, which is described as follows: "the publisher becomes responsible for persisting samples for “late-joining” subscriptions." In practice this means that when a new subscriber joins, each publisher will send them their last message --> subscriber receives multiple messages.
There are several solutions for this, for example creating a master node that is subscribed to an inner state that is updated by each node, but only the master node is responsible for publishing the state "outside".
Related
I am using python 3.7 with SimPy 4. I have 4 Resources (say "First Level") with a capacity of 5 and each Resource has an associated Resource (say "Second Level") with a capacity of 1 (So, 4 "First Level" Resources and 4 "Second Level" Resources in total). When an agent arrives, it requests a Resource from any Resource of the "First Level", when it gets access to it then it requests the associated Resource of the "Second Level".
I am using AnyOf to choose any of the "First Level" Resources. It works but I need to know which Resource is chosen by which agent. How can I do that?
Here is a representation of what I am doing so far:
from simpy.events import AnyOf, Event
num_FL_Resources = 4
capacity_FL_Resources = 5
FL_Resources = [simpy.Resource(env, capacity = capacity_FL_Resources ) for i in range(num_FL_Resources)]
events = [FirstLevelResource.request() for FirstLevelResource in FL_Resources]
yield Anyof(env, events)
Note 1: I didn't use Store or FilterStore in the "First Level" and randomly put the agent to one of the available Store because the agents are keep coming and all of the Stores might be in use. They need to queue up. Also, please let me know if there is a good way of using Store here.
Note 2: Resource.users gives me <Request() object at 0x...> so it isn't helpful.
Note 3:: I am using a nested dictionary for "First Level" and "Second Level" Resources like below. However, for convenience I didn't add my longer code here.
{'Resource1': {'FirstLevel1': <simpy.resources.resource.Resource at 0x121f45690>,
'SecondLevel1': <simpy.resources.resource.Resource at 0x121f45710>},
'Resource2': {'FirstLevel2': <simpy.resources.resource.Resource at 0x121f457d0>,
'SecondLevel2': <simpy.resources.resource.Resource at 0x121f458d0>},
'Resource3': {'FirstLevel3': <simpy.resources.resource.Resource at 0x121f459d0>,
'SecondLevel3': <simpy.resources.resource.Resource at 0x121f45a90>},
'Resource4': {'FirstLevel4': <simpy.resources.resource.Resource at 0x121f47750>,
'SecondLevel4': <simpy.resources.resource.Resource at 0x121f476d0>}}
So I did it with a store. In the store I have groups of first level objects that have a common second level resource. here is the code
"""
example of a two stage resource grab using a store and resouces
A agent will queue up to get a first level resource object
and then use this object to get a second level rescource
However groups of the frist level resouce have one common second level resource
so there will also be a queue for the second level resource.
programer: Michael R. Gibbs
"""
import simpy
import random
class FirstLevel():
"""
A frist level object, a group of these objects will make a type of resource
each object in the group will have the same second level resource
"""
def __init__(self, env, groupId, secondLevel):
self.env = env
self.groupId = groupId
self.secondLevel = secondLevel
def agent(env, agentId, firstLevelStore):
"""
sims a agent/entity that will first grab a first level resource
then a second level resource
"""
print(f'agent {agentId} requesting from store with {len(firstLevelStore.items)} and queue {len(firstLevelStore.get_queue)}')
# queue and get first level resouce
firstLevel = yield firstLevelStore.get()
print(f"agent {agentId} got first level resource {firstLevel.groupId} at {env.now}")
# use the first level resource to queue and get the second level resource
with firstLevel.secondLevel.request() as req:
yield req
print(f"agent {agentId} got second level resource {firstLevel.groupId} at {env.now}")
yield env.timeout(random.randrange(3, 10))
print(f"agent {agentId} done second level resource {firstLevel.groupId} at {env.now}")
# put the first level resource back into the store
yield firstLevelStore.put(firstLevel)
print(f"agent {agentId} done first level resource {firstLevel.groupId} at {env.now}")
def agentGen(env, firstLevelStore):
"""
creates a sequence of agents
"""
id = 1
while True:
yield env.timeout(random.randrange(1, 2))
print(f"agent {id} arrives {env.now}")
env.process(agent(env,id, firstLevelStore))
id += 1
if __name__ == '__main__':
print("start")
num_FL_Resources = 4 # number of first level groups/pools
capacity_FL_Resources = 5 # number of first level in each group/pool
env = simpy.Environment()
# store of all first level, all mixed togethers
store = simpy.Store(env, capacity=(num_FL_Resources * capacity_FL_Resources))
for groupId in range(num_FL_Resources):
# create the second level resource for each group os first level resources
secondLevel = simpy.Resource(env,1)
for cap in range(capacity_FL_Resources):
# create the individual first level objects for the group
firstLevel = FirstLevel(env,groupId,secondLevel)
store.items.append(firstLevel)
env.process(agentGen(env, store))
env.run(200)
print("done")
Situation:
I have 2 google cloud functions, let's call them gcf-composer and gcf-action.
I have a list of 70,000 unique dicts for which I want to execute the gcf-action.
I use the gcf-composer loop over all dicts and publish a message per dict to the gcf-action topic containing the dict as payload.
I need the gcf-composer, because running the gcf-action directly for all dicts would take more than the 9 min threshold.
I start off the gcf-composer using the google cloud-scheduler
Problem
When firing off the gcf-composer on the cloud, after an x-amount of seconds, it will stop and return the following:
'connection error'
These are the results of 4 separate tries.
Why does it give me the "finished with status: 'connection error'", and how do i solve it?
When I run this locally, so sending messages to topic, it works.
Please let me know if you need any more code or information!
Code of gcf-composer
from mlibs.pubsub import PubSubConnection
pubsub = PubSubConnection()
TOPIC_NAME = 'gcf-action-topic'
def gcf_composer(period, run_method='local', list_x, names_y):
"""Run composer given run method (local or cloud_fn)"""
for k, row in names_y.iterrows():
# get dict of identifiers
y = row.to_dict()
for x in list_x:
parameters = {'x': x, 'y': y}
if run_method == 'local':
c.test_local(x=x, y=y)
elif run_method == 'cloud-fn':
pubsub.publish_to_topic(topic_name=TOPIC_NAME, params={'params': parameters})
else:
print(f'Unknown run method {run_method} used. Please try again.')
PubSubConnection:
"""Interaction with the Pub/Sub Engine"""
from google.oauth2 import service_account
from google.cloud import pubsub_v1
from mlibs.utils import json
from mlibs.utils import decode
from mlibs.utils import files as fs
class PubSubConnection:
def __init__(self):
"""Initiate a PubSub connection"""
self.project_id = None
self.publisher = None
self.count = 0
self.init_connection()
def init_connection(self):
"""Initiates a connection given the service account"""
self.publisher = pubsub_v1.PublisherClient(credentials=*****)
self.project_id = credentials.project_id
def publish_to_topic(self, topic_name, params):
# Define the topic path
topic_path = self.publisher.topic_path(self.project_id, topic_name)
# Convert to ByteString
params_bytes = json.dumps(params).encode('utf-8')
# Publish and handle the Future
cbl = Callable(self.count, params)
message_future = self.publisher.publish(topic_path, data=params_bytes, test='pubsub')
# Done callback
message_future.add_done_callback(cbl.callback)
# https://googleapis.dev/python/pubsub/latest/publisher/index.html#futures
# block result
# message_id = message_future.result()
self.count = self.count + 1
print(f'[pubsub][{self.count}][{topic_name}]')
class Callable:
def __init__(self, count, params):
self.count = count
self.params = params
def callback(self, message_future):
if message_future.exception(timeout=30):
print(f'[pubsub-except] Publishing message threw an Exception {message_future.exception()}')
else:
print(f'[pubsub][{self.count}][{message_future.result()}] {self.params}')
How can I select a node via python before the one currently selected?
For example, I want to add a "Clamp" node exactly before all "Write" ones.
This code snippet allows you to define a node upstream existing Write node.
import nuke
iNode = nuke.toNode('Write1')
def upstream(iNode, maxDeep=-1, found=None):
if found is None:
found = set()
if maxDeep != 0:
willFind = set(z for z in iNode.dependencies() if z not in found)
found.update(willFind)
for depth in willFind:
upstream(depth, maxDeep+1, found)
return found
Then call the method upstream(iNode).
And a script's snippet you've sent me earlier should look like this:
allWrites = nuke.allNodes('Grade')
depNodes = nuke.selectedNode().dependencies()
for depNode in depNodes:
depNode.setSelected(True)
queueElem = len(allWrites)
trigger = -1
for i in range(1,queueElem+1):
trigger += 1
for write in allWrites[(0+trigger):(1+trigger)]:
write.setSelected(True)
nuke.createNode("Clamp")
for all in nuke.allNodes():
all.setSelected(False)
I'm using Storm to process messages off of Kafka in real-time and using streamparse to build my topology. For this use case, it's imperative that we have 100% guarantee that any message into Storm is processed and ack'd. I have implemented logic on my bolt using try/catch (see below), and I would like to have Storm replay these messages in addition to writing this to another "error" topic in Kafka.
In my KafkaSpout, I assigned the tup_id to equal the offset id from the Kafka topic that my consumer is feeding from. However, when I force an error in my Bolt using a bad variable reference, I'm not seeing the message be replayed. I am indeed seeing one write to the 'error' Kafka topic, but only once--meaning that the tuple is never being resubmitted into my bolt(s). My setting for the TOPOLOGY_MESSAGE_TIMEOUT_SEC=60 and I'm expecting Storm to keep replaying the failed message once every 60 seconds and have my error catch keep writing to the error topic, perpetually.
KafkaSpout.py
class kafkaSpout(Spout):
def initialize(self, stormconf, context):
self.kafka = KafkaClient(str("host:6667"))#,offsets_channel_socket_timeout_ms=60000)
self.topic = self.kafka.topics[str("topic-1")]
self.consumer = self.topic.get_balanced_consumer(consumer_group=str("consumergroup"),auto_commit_enable=False,zookeeper_connect=str("host:2181"))
def next_tuple(self):
for message in self.consumer:
self.emit([json.loads(message.value)],tup_id=message.offset)
self.log("spout emitting tuple ID (offset): "+str(message.offset))
self.consumer.commit_offsets()
def fail(self, tup_id):
self.log("failing logic for consumer. resubmitting tup id: ",str(tup_id))
self.emit([json.loads(message.value)],tup_id=message.offset)
processBolt.py
class processBolt(Bolt):
auto_ack = False
auto_fail = False
def initialize(self, conf, ctx):
self.counts = Counter()
self.kafka = KafkaClient(str("host:6667"),offsets_channel_socket_timeout_ms=60000)
self.topic = self.kafka.topics[str("topic-2")]
self.producer = self.topic.get_producer()
self.failKafka = KafkaClient(str("host:6667"),offsets_channel_socket_timeout_ms=60000)
self.failTopic = self.failKafka.topics[str("topic-error")]
self.failProducer = self.failTopic.get_producer()
def process(self, tup):
try:
self.log("found tup.")
docId = tup.values[0]
url = "solrserver.host.com/?id="+str(docId)
thisIsMyForcedError = failingThisOnPurpose ####### this is what im using to fail my bolt consistent
data = json.loads(requests.get(url).text)
if len(data['response']['docs']) > 0:
self.producer.produce(json.dumps(docId))
self.log("record FOUND {0}.".format(docId))
else:
self.log('record NOT found {0}.'.format(docId))
self.ack(tup)
except:
docId = tup.values[0]
self.failProducer.produce( json.dumps(docId), partition_key=str("ERROR"))
self.log("TUP FAILED IN PROCESS BOLT: "+str(docId))
self.fail(tup)
I would appreciate any help with how to correctly implement the custom fail logic for this case. Thanks in advance.
import celery
def temptask(n):
header=list(tempsubtask.si(i) for i in range(n))
callback=templink.si('printed at last?')
r = celery.chord(celery.group(header))(callback)
return r
#task()
def tempsubtask(i):
print i
for x in range(i):
time.sleep(2)
current_task.update_state(
state='PROGRESS', meta={'completed': x, 'total': i })
#task()
def templink(x):
print 'this should be run at last %s'%x
#executing temptask
r = temptask(100)
I want acccess to the progress status updated by tempsubtask. How can I go about achieving it?
I've had a similar question. Most examples on the net are outdated, the docs didn't help much, but the docs have links to sources, reading which did help me.
My objective was to organize parallel tasks in groups. The groups would have to be executed sequentially in order.
So I decided to generate the task ids before starting any tasks separately and only assigning them. I'm using Celery 4.3.0
Here's a brief example.
Firstly I needed a dummy task to make execution sequential and to be able to check the state of a certain group. As this is used a callback, it will complete only after all other tasks in the group.
#celery.task(bind=True, name="app.tasks.dummy_task")
def dummy_task( self, results=None, *args, **kwargs ):
return results
My comments here explain how I assign ids.
from celery.utils import uuid
from celery import group, chord, chain
# Generating task ids,
# which can be saved to a db, sent to the client and so on
#
# This is done before executing any tasks
task_id_1 = uuid()
task_id_2 = uuid()
chord_callback_id_1 = uuid()
chord_callback_id_2 = uuid()
workflow_id = None
# Generating goups, using signatures
# the group may contain any number of tasks
group_1 = group(
[
celery.signature(
'app.tasks.real_task',
args=(),
kwargs = { 'email': some_email, 'data':some_data },
options = ( {'task_id': task_id_1 } )
)
]
)
group_2 = group(
[
celery.signature(
'app.tasks.real_task',
args=(),
kwargs = { 'email': some_email, 'data':some_data },
options = ( {'task_id': task_id_2 } )
)
]
)
# Creating callback task which will simply rely the result
# Using the task id, which has been generated before
#
# The dummy task start after all tasks in this group are completed
# This way we know that the group is completed
chord_callback = celery.signature(
'app.tasks.dummy_task',
options=( {'task_id': chord_callback_id_1 } )
)
chord_callback_2 = celery.signature(
'app.tasks.dummy_task',
options=( {'task_id': chord_callback_id_2 } )
)
# we can monitor each step status
# by its chord callback id
# the id of the chord callback
step1 = chord( group_1, body=chord_callback )
# the id of the chord callback
step2 = chord( group_2, body=chord_callback_2 )
# start the workflow execution
# the steps will execute sequentially
workflow = chain( step1, step2 )()
# the id of the last cord callback
workflow_id = workflow.id
# return any ids you need
print( workflow_id )
That's how I can check the status of any task in my app.
# This is a simplified example
# some code is omitted
from celery.result import AsyncResult
def task_status( task_id=None ):
# PENDING
# RECEIVED
# STARTED
# SUCCESS
# FAILURE
# REVOKED
# RETRY
task = AsyncResult(task_id)
response = {
'state': task.state,
}
return jsonify(response), 200
After hours of googling I stumbled upon http://www.manasupo.com/2012/03/chord-progress-in-celery.html . Though the solution there didn't work for me out of the box, it did inspire me to try something similar.
from celery.utils import uuid
from celery import chord
class ProgressChord(chord):
def __call__(self, body=None, **kwargs):
_chord = self.type
body = (body or self.kwargs['body']).clone()
kwargs = dict(self.kwargs, body=body, **kwargs)
if _chord.app.conf.CELERY_ALWAYS_EAGER:
return self.apply((), kwargs)
callback_id = body.options.setdefault('task_id', uuid())
r= _chord(**kwargs)
return _chord.AsyncResult(callback_id), r
and instead of executing celery.chord I use ProgressChord as follows:
def temptask(n):
header=list(tempsubtask.si(i) for i in range(n))
callback=templink.si('printed at last?')
r = celery.Progresschord(celery.group(header))(callback)
return r
returned value of r contained a tuple having both, callback's asyncresult and a group result. So success looked something like this:
In [3]: r
Out[3]:
(<AsyncResult: bf87507c-14cb-4ac4-8070-d32e4ff326a6>,
<GroupResult: af69e131-5a93-492d-b985-267484651d95 [4672cbbb-8ec3-4a9e-971a-275807124fae, a236e55f-b312-485c-a816-499d39d7de41, e825a072-b23c-43f2-b920-350413fd5c9e, e3f8378d-fd02-4a34-934b-39a5a735871d, c4f7093b-9f1a-4e5e-b90d-66f83b9c97c4, d5c7dc2c-4e10-4e71-ba2b-055a33e15f02, 07b1c6f7-fe95-4c1f-b0ba-6bc82bceaa4e, 00966cb8-41c2-4e95-b5e7-d8604c000927, e039c78e-6647-4c8d-b59b-e9baf73171a0, 6cfdef0a-25a2-4905-a40e-fea9c7940044]>)
I inherited and overrode [celery.chord][1] instead of [celery.task.chords.Chord][2] because I couldn't find it's source anywhere.
Old problem and I wasted a several days to find a better and modern solution. In my current project I must to track group progress separately and release lock in final callback.
And current solution is much more simple (but harder to guess), subject lines commented at the end:
#celery_app.task(name="_scheduler", track_started=True, ignore_result=False)
def _scheduler():
lock = cache.lock("test_lock")
if not lock.acquire(blocking=False):
return {"Error": "Job already in progress"}
lock_code = lock.local.token.decode("utf-8")
tasks = []
for x in range(100):
tasks.append(calculator.s())
_group = group(*tasks)
_chord = chord(_group)(_get_results.s(token=lock_code))
group_results = _chord.parent # This is actual group inside chord
group_results.save() # I am saving it to usual results backend, and can track progress inside.
return _chord # can return anything, I need only chord.
I am working in Celery 5.1