I know this question has been asked prior, but I found no answer that addressed by particular problem.
I have a Python script to create kubernetes clusters and nodes in Azure, which takes anywhere between 5-10 mins. There's a function(get_cluster_end) to get the cluster endpoint, but this fails as the endpoint is not yet ready when this function is call. The func I wrote(wait_for_end)does not seem to be correct.
def wait_for_endpoint(timeout=None):
endpoint = None
start = time.time()
while not endpoint:
if timeout is not None and (time.time() - start > timeout):
break
endpoint = **get_cluster_end()**
time.sleep(5)
return endpoint
My main func:
def main():
create_cluster()
start = time.time()
job.set_progress("Waiting for cluster IP address...")
endpoint = wait_for_endpoint(timeout=TIMEOUT)
if not endpoint:
return ("FAILURE","No IP address returned after {} seconds".format(TIMEOUT),
"")
The script fails, because no endpoint has yet been created. How do I set the sleep after the cluster has been created and before the "wait_for_endpoint()" is called?
Related
I am trying to connect with IB Api to download some historical data. I have noticed that my client connects to the API, but then disconnects automatically in a very small period (~a few seconds).
Here's the log in the server:
socket connection for client{10} has closed.
Connection terminated.
Here's my main code for starting the app:
class TestApp(TestWrapper, TestClient):
def __init__(self):
TestWrapper.__init__(self)
TestClient.__init__(self, wrapper=self)
self.connect(config.ib_hostname, config.ib_port, config.ib_session_id)
self.session_id = int(config.ib_session_id)
self.thread = Thread(target = self.run)
self.thread.start()
setattr(self, "_thread", self.thread)
self.init_error()
def reset_connection(self):
pass
def check_contract(self, name, exchange_name, security_type, currency):
self.reset_connection()
ibcontract = IBcontract()
ibcontract.secType = security_type
ibcontract.symbol = name
ibcontract.exchange = exchange_name
ibcontract.currency = currency
return self.resolve_ib_contract(ibcontract)
def resolve_contract(self, security):
self.reset_connection()
ibcontract = IBcontract()
ibcontract.secType = security.security_type()
ibcontract.symbol=security.name()
ibcontract.exchange=security.exchange()
ibcontract.currency = security.currency()
return self.resolve_ib_contract(ibcontract)
def get_historical_data(self, security, duration, bar_size, what_to_show):
self.reset_connection()
resolved_ibcontract=self.resolve_contract(security)
data = test_app.get_IB_historical_data(resolved_ibcontract.contract, duration, bar_size, what_to_show)
return data
def create_app():
test_app = TestApp()
return test_app
Any suggestions on what could be the problem? I can show more error messages from the debug if needed.
If you can connect without issue only by changing the client ID, typically that indicates that the previous connection was not properly closed and TWS thinks its still open. To disconnect an API client you should call the EClient.disconnect function explicity, overridden in your example as:
test_app.disconnect()
Though its not necessary to disconnect/reconnect after every task, and you can just leave the connection open for extended periods.
You may sometimes encounter problems if an API function, such as reqHistoricalData, is called immediately after connection. Its best to have a small pause after initiating a connection to wait for a callback such as nextValidID to ensure the connection is complete before proceeding.
http://interactivebrokers.github.io/tws-api/connection.html#connect
I'm not sure what the function init_error() is intended for in your example since it would always be called when a TestApp object is created (whether or not there is an error).
Installing the latest version of TWS API (v 9.76) solved the problem.
https://interactivebrokers.github.io/#
I have an app deployed on GKE, separated in different microservices.
One of the microservices, let's call it "worker", receives tasks to execute from pubsub messages.
The tasks can take up to 1 hour to be executed. The regular acknowledgement deadline for Google pubsub messages being pretty short, we renew the deadline every 10s before it ends. Here is the piece of code responsible for that:
def watchdog(businessDoneEvent, subscription, ack_deadline, message, ack_id):
'''
Prevents message from being republished as long as computation is
running
'''
while True:
# Wait (defaultDeadline - 10) seconds before renewing if defaultDeadline
# is > 5 seconds; renewed every second otherwise
sleepTime = ack_deadline - 10 if ack_deadline > 10 else 1
startTime = time.time()
while time.time() - startTime < sleepTime:
LOGGER.info('Sleeping time: {} - ack_deadline: {}'.format(time.time() - startTime, ack_deadline))
if businessDoneEvent.isSet():
LOGGER.info('Business done!')
return
time.sleep(1)
subscriber = SubscriberClient()
LOGGER.info('Modifying ack deadline for message ' +
str(message.data) + ' processing to ' +
str(ack_deadline))
subscriber.modify_ack_deadline(subscription, [ack_id],
ack_deadline)
Once the execution is over, we reach this piece of code:
def callbackWrapper(callback,
subscription,
message,
ack_id,
endpoint,
context,
subscriber,
postAcknowledgmentCallback=None):
'''
Pub/sub message acknowledgment if everything ran correctly
'''
try:
callback(message.data, endpoint, context, **message.attributes)
except Exception as e:
LOGGER.info(message.data)
LOGGER.error(traceback.format_exc())
raise e
else:
LOGGER.info("Trying to acknowledge...")
my_retry = Retry(predicate=if_exception_type(ServiceUnavailable), deadline=3600)
subscriber.acknowledge(subscription, [ack_id], retry=my_retry)
LOGGER.info(str(ack_id) + ' has been acknowledged')
if postAcknowledgmentCallback is not None:
postAcknowledgmentCallback(message.data,
**message.attributes)
Note that we also use this code in most of our microservices and it works just fine.
My problem is, even though I do not get any error from this code and it seems that the acknowledgement request is sent properly, it is actually acknowledged later. For example, according to the GCP console, right now I have 8 unacknowledged messages, but I should only have 3. It also said there are 12 when I should only have 5 for an hour:
I have a horizontal pod autoscaler using the pubsub metric. When the pods are done, they are not scaled down, or only 1 hour later or more. This creates some useless costs that I would like to avoid.
Does anyone have an idea about why this is happening?
I am totally new to Kafka and Docker, and have been handed a problem to fix. Our Continuous Integration tests for Kafka (Apache) queues run just fine on local machines, but when on the Jenkins CI server, occasionally fail with this sort of error:
%3|1508247800.270|FAIL|art#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv4#127.0.0.1:9092 failed: Connection refused
%3|1508247800.270|ERROR|art#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv4#127.0.0.1:9092 failed: Connection refused
%3|1508247800.270|ERROR|art#producer-1| [thrd:localhost:9092/bootstrap]: 1/1 brokers are down
The working theory is that the Docker image takes time to get started, by which time the Kafka producer has given up. The offending code is
producer_properties = {
'bootstrap.servers': self._job_queue.bootstrap_server,
'client.id': self._job_queue.client_id
}
try:
self._producer = kafka.Producer(**producer_properties)
except:
print("Bang!")
with the error lines above appearing in the creation of the producer. However, no exception is raised, and the call returns an otherwise valid looking producer, so I can't programmatically test the presence of the broker endpoint. Is there an API to check the status of a broker?
It seems the client doesn't throw exception if connection to broker fails. It actually tries to connect to bootstrap servers when first time producer tries to send the message. If connection fails, it repeatedly tries to connect to any of the brokers passed in the bootstrap list. Eventually, if the brokers come up, send will happen (and we may check the status in the callback function).
The confluent kafka python library is using librdkafka library and this client doesn't seem to have proper documentation. Some of the Kafka producer option specified by Kafka protocol, seem not supported by librdkafka.
Here is the sample code with callback I used:
from confluent_kafka import Producer
def notifyme(err, msg):
print err, msg.key(), msg.value()
p = Producer({'bootstrap.servers': '127.0.0.1:9092', 'retry.backoff.ms' : 100,
'message.send.max.retries' : 20,
"reconnect.backoff.jitter.ms" : 2000})
try:
p.produce(topic='sometopic', value='this is data', on_delivery=notifyme)
except Exception as e:
print e
p.flush()
Also, checking for the presence of the broker, you may just telnet to the broker ip on its port (in this example it is 9092). And on the Zookeeper used by Kafka cluster, you may check the contents of the znodes under /brokers/ids
Here is the code that seems to work for me. If it looks a bit Frankenstein, then you are right, it is! If there is a clean solution, I look forward to seeing it:
import time
import uuid
from threading import Event
from typing import Dict
import confluent_kafka as kafka
# pylint: disable=no-name-in-module
from confluent_kafka.cimpl import KafkaError
# more imports...
LOG = # ...
# Default number of times to retry connection to Kafka Broker
_DEFAULT_RETRIES = 3
# Default time in seconds to wait between connection attempts
_DEFAULT_RETRY_DELAY_S = 5.0
# Number of times to scan for an error after initiating the connection. It appears that calling
# flush() once on a producer after construction isn't sufficient to catch the 'broker not available'
# # error. At least twice seems to work.
_NUM_ERROR_SCANS = 2
class JobProducer(object):
def __init__(self, connection_retries: int=_DEFAULT_RETRIES,
retry_delay_s: float=_DEFAULT_RETRY_DELAY_S) -> None:
"""
Constructs a producer.
:param connection_retries: how many times to retry the connection before raising a
RuntimeError. If 0, retry forever.
:param retry_delay_s: how long to wait between retries in seconds.
"""
self.__error_event = Event()
self._job_queue = JobQueue()
self._producer = self.__wait_for_broker(connection_retries, retry_delay_s)
self._topic = self._job_queue.topic
def produce_job(self, job_definition: Dict) -> None:
"""
Produce a job definition on the queue
:param job_definition: definition of the job to be executed
"""
value = ... # Conversion to JSON
key = str(uuid.uuid4())
LOG.info('Produced message: %s', value)
self.__error_event.clear()
self._producer.produce(self._topic,
value=value,
key=key,
on_delivery=self._on_delivery)
self._producer.flush(self._job_queue.flush_timeout)
#staticmethod
def _on_delivery(error, message):
if error:
LOG.error('Failed to produce job %s, with error: %s', message.key(), error)
def __create_producer(self) -> kafka.Producer:
producer_properties = {
'bootstrap.servers': self._job_queue.bootstrap_server,
'error_cb': self.__on_error,
'client.id': self._job_queue.client_id,
}
return kafka.Producer(**producer_properties)
def __wait_for_broker(self, retries: int, delay: float) -> kafka.Producer:
retry_count = 0
while True:
self.__error_event.clear()
producer = self.__create_producer()
# Need to call flush() several times with a delay between to ensure errors are caught.
if not self.__error_event.is_set():
for _ in range(_NUM_ERROR_SCANS):
producer.flush(0.1)
if self.__error_event.is_set():
break
time.sleep(0.1)
else:
# Success: no errors.
return producer
# If we get to here, the error callback was invoked.
retry_count += 1
if retries == 0:
msg = '({})'.format(retry_count)
else:
if retry_count <= retries:
msg = '({}/{})'.format(retry_count, retries)
else:
raise RuntimeError('JobProducer timed out')
LOG.warn('JobProducer: could not connect to broker, will retry %s', msg)
time.sleep(delay)
def __on_error(self, error: KafkaError) -> None:
LOG.error('KafkaError: %s', error.str())
self.__error_event.set()
I have a pipeline where a request hits one server, and that server calls another server and that one executes a job for two seconds, then should return to the main server for it to do some minor computation, then return to the client. The problem is that my current setup blocks if number of concurrent requests > number of workers, and I don't know how to use Python threading to make it async. Any ideas on how to implement this?
Main server -> Outside server -> 2 seconds -> Main server
:Edit
The line that takes 2 seconds is the one with the "find_most_similar_overall(image_name, classifier, labels)" call. That function takes 2 seconds, which means that the worker stops right there.
#app.route("/shoes/<_id>")
def classify_shoe(_id):
if request.method == 'GET':
unique_user = request.cookies.get('uniqueuser')
shoe = Shoe.query.filter_by(id = _id)
if shoe.count() is not 0:
shoe = shoe.first()
image_name = shoe.image_name
shoes,category = find_most_similar_overall(image_name, classifier, labels)
return render_template('category.html', same_shoes = similar_shoes,shoes=shoes,main_shoe=shoe,category=category, categories=shoe_categories)
I'm working on a small flask app.
The idea is when a user clicks on "update all", a utility class submits all defined servers to the updater which calls each server's "get()" method.
route.py:
#servers.route('/update/<id>')
def update(id):
servers = UcxServer.query.filter(id != None)
def update_server(server):
server.create_ucx()
server.update_ucx()
return threading.current_thread().name, server
with Pool(max_workers=5) as executor:
start = timer()
for name, server in executor.map(update_server, servers):
print("%s %s" % (name, server)) # print results
db.session.add(server)
db.session.commit()
print('time:', timer() - start)
flash('All servers have been updated')
return redirect(url_for('servers.index'))
The problem appears when this button is used multiple times in that it keeps spawning new threads. If I have 5 servers, first time I use the button I will get 5 threads. Next time 10, and so on.
What is the proper way to do this thread management so that I don't end up with a million threads after the apps has been up for a spell?
Thanks