I am running MLflow with several users can instanciate experiments.
Is there any way to override user_id which is set to system username?
Looking for a solution which works with start_run block.
Any ideas?
with mlflow.start_run():
start_run doesn't support user_id as argument, and its kwargs are actually Run tags, so you need to work with other functions.
In order to do what you need, you have to create for yourself the run from a FileStore object, then wrap this new run inside an ActiveRun wrapper, that after the with-block will automatically finish the run passed as argument.
The code should look like this.
import time
import mlflow
from mlflow import ActiveRun
from mlflow.store.tracking.file_store import FileStore
now = int(time.time() * 1000) # mlflow default when start_time is None
fs = FileStore()
run = fs.create_run(
experiment_id = None, # defaults to "0"
user_id = "your_user_id", # your actual user-id
start_time = now,
tags = {},
)
with ActiveRun(run) as arun:
...
# after with block, run is finished
Related
I have situation like this. I am creating a Python package. That Python package needs to use Redis, so I want to allow the user of the package to define the Redis url.
Here's how I attempted to do it:
bin/main.py
from my_package.main import run
from my_package.config import config
basicConfig(filename='logs.log', level=DEBUG)
# the user defines the redis url
config['redis_url'] = 'redis://localhost:6379/0'
run()
my_package/config.py
config = {
"redis_url": None
}
my_package/main.py
from .config import config
def run():
print(config["redis_url"]) # prints None instead of what I want
Unfortunately, it doesn't work. In main.py the value of config["redis_url"] is None instead of the url defined in bin/main.py file. Why is that? How can I make it work?
I could pass the config to the run() function, but then if I run some other function I will need to pass the config to that function as well. I'd like to pass it one time ideally.
I am using gunicorn with multiple workers for my machine learning project. But the problem is when I send a train request only the worker getting the training request gets updated with the latest model after training is done. Here it is worth to mention that, to make the inference faster I have programmed to load the model once after each training. This is why, the only worker which is used for current training operation loads the latest model and the other workers still keeps the previously loaded model. Right now the model file (binary format) is loaded once after each training in a global dictionary variable where key is the model name and the value is the model file. Obviously, this problem won't occur if I program it to load the model every time from disk for each prediction, but I cannot do it, as it will make the prediction slower.
I studied further on global variables and further investigation shows that, in a multi-processing environment, all the workers (processes) create their own copies of global variables. Apart from the binary model file, I also have some other global variables (in dictionary type) need to be synced across all processes. So, how to handle this situation?
TL;DR: I need some approach which can help me to store variable which will be common across all the processes (workers). Any way to do this? With multiprocessing.Manager, dill etc.?
Update 1: I have multiple machine learning algorithms in my project and they have their own model files, which are being loaded to memory in a dictionary where the key is the model name and the value is the corresponding model object. I need to share all of them (in other words, I need to share the dictionary). But some of the models are not pickle serializable like - FastText. So, when I try to use a proxy variable (in my case a dictionary to hold models) with multiprocessing.Manager I get error for those non-pickle-serializable object while assigning the loaded model file to this dictionary. Like: can't pickle fasttext_pybind.fasttext objects. More information on multiprocessing.Manager can be found here: Proxy Objects
Following is the summary what I have done:
import multiprocessing
import fasttext
mgr = multiprocessing.Manager()
model_dict = mgr.dict()
model_file = fasttext.load_model("path/to/model/file/which/is/in/.bin/format")
model_dict["fasttext"] = model_file # This line throws this error
Error:
can't pickle fasttext_pybind.fasttext objects
I printed the model_file which I am trying to assign, it is:
<fasttext.FastText._FastText object at 0x7f86e2b682e8>
Update 2:
According to this answer I modified my code a little bit:
import fasttext
from multiprocessing.managers import SyncManager
def Manager():
m = SyncManager()
m.start()
return m
# As the model file has a type of "<fasttext.FastText._FastText object at 0x7f86e2b682e8>" so, using "fasttext.FastText._FastText" as the class of it
SyncManager.register("fast", fasttext.FastText._FastText)
# Now this is the Manager as a replacement of the old one.
mgr = Manager()
ft = mgr.fast() # This line gives error.
This gives me EOFError.
Update 3: I tried using dill both with multiprocessing and multiprocess. The summary of changes are as the following:
import multiprocessing
import multiprocess
import dill
# Any one of the following two lines
mgr = multiprocessing.Manager() # Or,
mgr = multiprocess.Manager()
model_dict = mgr.dict()
... ... ...
... ... ...
model_file = dill.dumps(model_file) # This line throws the error
model_dict["fasttext"] = model_file
... ... ...
... ... ...
# During loading
model_file = dill.loads(model_dict["fasttext"])
But still getting the error: can't pickle fasttext_pybind.fasttext objects.
Update 4:
This time I am using another library called jsonpickle. It seems to be that serialization and de-serialization occurs properly (as it is not reporting any issue while running). But surprisingly enough, after de-serialization whenever I am making a prediction, it faces segmentation fault. More details and the steps to reproduce it can be found here: Segmentation fault (core dumped)
Update 5: Tried cloudpickle, srsly, but couldn't make the program working.
For the sake of completeness I am providing the solution that worked for me. All the approaches I have tried to serialize FastText went in vain. Finally, as #MedetTleukabiluly mentioned in the comment, I managed to share the message of loading the model from the disk with other workers with redis-pubsub. Obviously, it is not actually sharing the model from the same memory space, rather, just sharing the message to other workers to inform them they should load the model from the disk (as a new training just happened). Following is the general solution:
# redis_pubsub.py
import logging
import os
import fasttext
import socket
import threading
import time
"""The whole purpose of GLOBAL_NAMESPACE is to keep the whole pubsub mechanism separate.
As this might be a case another service also publishing in the same channel.
"""
GLOBAL_NAMESPACE = "SERVICE_0"
def get_ip():
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
# doesn't even have to be reachable
s.connect(('10.255.255.255', 1))
IP = s.getsockname()[0]
except Exception:
IP = '127.0.0.1'
finally:
s.close()
return IP
class RedisPubSub:
def __init__(self):
self.redis_client = get_redis_client() #TODO: A SAMPLE METHOD WHICH CAN RETURN YOUR REDIS CLIENT (you have to implement)
# Unique ID is used, to identify which worker from which server is the publisher. Just to avoid updating
# getting a message which message is indeed sent by itself.
self.unique_id = "IP_" + get_ip() + "__" + str(GLOBAL_NAMESPACE) + "__" + "PID_" + str(os.getpid())
def listen_to_channel_and_update_models(self, channel):
try:
pubsub = self.redis_client.pubsub()
pubsub.subscribe(channel)
except Exception as exception:
logging.error(f"REDIS_ERROR: Model Update Listening: {exception}")
while True:
try:
message = pubsub.get_message()
# Successful operation gives 1 and unsuccessful gives 0
# ..we are not interested to receive these flags
if message and message["data"] != 1 and message["data"] != 0:
message = message["data"].decode("utf-8")
message = str(message)
splitted_msg = message.split("__SEPERATOR__")
# Not only making sure the message is coming from another worker
# but also we have to make sure the message sender and receiver (i.e, both of the workers) are under the same namespace
if (splitted_msg[0] != self.unique_id) and (splitted_msg[0].split('__')[1] == GLOBAL_NAMESPACE):
algo_name = splitted_msg[1]
model_path = splitted_msg[2]
# Fasttext
if "fasttext" in algo_name:
try:
#TODO: YOU WILL GET THE LOADED NEW FILE IN model_file. USE IT TO UPDATE THE OLD ONE.
model_file = fasttext.load_model(model_path + '.bin')
except Exception as exception:
logging.error(exception)
else:
logging.info(f"{algo_name} model is updated for process with unique_id: {self.unique_id} by process with unique_id: {splitted_msg[0]}")
time.sleep(1) # sleeping for 1 second to avoid hammering the CPU too much
except Exception as exception:
time.sleep(1)
logging.error(f"PUBSUB_ERROR: Model or component update: {exception}")
def publish_to_channel(self, channel, algo_name, model_path):
def _publish_to_channel():
try:
message = self.unique_id + '__SEPERATOR__' + str(algo_name) + '__SEPERATOR__' + str(model_path)
time.sleep(3)
self.redis_client.publish(channel, message)
except Exception as exception:
logging.error(f"PUBSUB_ERROR: Model or component publishing: {exception}")
# As the delay before pubsub can pause the next activities which are independent, hence, doing this publishing in another thread.
thread = threading.Thread(target = _publish_to_channel)
thread.start()
Also you have to start the listener:
from redis_pubsub import RedisPubSub
pubsub = RedisPubSub()
# start the listener:
thread = threading.Thread(target = pubsub.listen_to_channel_and_update_models, args = ("sync-ml-models", ))
thread.start()
From fasttext training module, when you finish the training, publish this message to other workers, such that the other workers get a chance to re-load the model from the disk:
# fasttext_api.py
from redis_pubsub import RedisPubSub
pubsub = RedisPubSub()
pubsub.publish_to_channel(channel = "sync-ml-models", # a sample name for the channel
algo_name = f"fasttext",
model_path = "path/to/fasttext/model")
I want to use Luigi to manage workflows in Openstack. I am new to Luigi. For the starter, I just want to authenticate myself to Openstack and then fetch image list, flavor list etc using Luigi. Any help will be appreciable.
I am not good with python but I tried below code. I am also not able to list images. Error: glanceclient.exc.HTTPNotFound: The resource could not be found. (HTTP 404)
import luigi
import os_client_config
import glanceclient.v2.client as glclient
from luigi.mock import MockFile
import sys
import os
def get_credentials():
d = {}
d['username'] = 'X'
d['password'] = 'X'
d['auth_url'] = 'X'
d['tenant_name'] = 'X'
d['endpoint'] = 'X'
return d
class LookupOpenstack(luigi.Task):
d =[]
def requires(self):
pass
def output(self):
gc = glclient.Client(**get_credentials())
images = gc.images.list()
print("images", images)
for i in images:
print(i)
return MockFile("images", mirror_on_stderr=True)
def run(self):
pass
if __name__ == '__main__':
luigi.run(["--local-scheduler"], LookupOpenstack())
The general approach to this is just write python code to perform the tasks you want using the OpenStack API. https://docs.openstack.org/user-guide/sdk.html It looks like the error you are getting is addressed on the OpenStack site. https://ask.openstack.org/en/question/90071/glanceclientexchttpnotfound-the-resource-could-not-be-found-http-404/
You would then just wrap this code in luigi Tasks as appropriate- there's nothing special about doing with this OpenStack, except that you must define the output() of your luigi tasks to match up with an output that indicates the task is done. Right now it looks like the work is being done in the output() method, which should be in the run() method, the output method should just be what to look for to indicate that the run() method is complete so it doesn't run() when required by another task if it is already done.
It's really impossible to say more without understanding more details of your workflow.
How do I check whether the screen is off due to the Energy Saver settings in System Preferences under Mac/Python?
Quick and dirty solution: call ioreg and parse the output.
import subprocess
import re
POWER_MGMT_RE = re.compile(r'IOPowerManagement.*{(.*)}')
def display_status():
output = subprocess.check_output(
'ioreg -w 0 -c IODisplayWrangler -r IODisplayWrangler'.split())
status = POWER_MGMT_RE.search(output).group(1)
return dict((k[1:-1], v) for (k, v) in (x.split('=') for x in
status.split(',')))
In my computer, the value for CurrentPowerState is 4 when the screen is on and 1 when the screen is off.
Better solution: use ctypes to get that information directly from IOKit.
The only way i can think off is by using OSX pmset Power Management CML Tool
DESCRIPTION
pmset changes and reads power management settings such as idle sleep timing, wake on administrative
access, automatic restart on power loss, etc.
Refer to the following link, it will provide a great deal of information that should aid you in accomplishing exactly what you are looking for.
http://managingamac.blogspot.com/2012/12/power-assertions-in-python.html
I will include the code provided by the link for "saving and documentation" purposes:
#!/usr/bin/python
import ctypes
import CoreFoundation
import objc
import subprocess
import time
def SetUpIOFramework():
# load the IOKit library
framework = ctypes.cdll.LoadLibrary(
'/System/Library/Frameworks/IOKit.framework/IOKit')
# declare parameters as described in IOPMLib.h
framework.IOPMAssertionCreateWithName.argtypes = [
ctypes.c_void_p, # CFStringRef
ctypes.c_uint32, # IOPMAssertionLevel
ctypes.c_void_p, # CFStringRef
ctypes.POINTER(ctypes.c_uint32)] # IOPMAssertionID
framework.IOPMAssertionRelease.argtypes = [
ctypes.c_uint32] # IOPMAssertionID
return framework
def StringToCFString(string):
# we'll need to convert our strings before use
return objc.pyobjc_id(
CoreFoundation.CFStringCreateWithCString(
None, string,
CoreFoundation.kCFStringEncodingASCII).nsstring())
def AssertionCreateWithName(framework, a_type,
a_level, a_reason):
# this method will create an assertion using the IOKit library
# several parameters
a_id = ctypes.c_uint32(0)
a_type = StringToCFString(a_type)
a_reason = StringToCFString(a_reason)
a_error = framework.IOPMAssertionCreateWithName(
a_type, a_level, a_reason, ctypes.byref(a_id))
# we get back a 0 or stderr, along with a unique c_uint
# representing the assertion ID so we can release it later
return a_error, a_id
def AssertionRelease(framework, assertion_id):
# releasing the assertion is easy, and also returns a 0 on
# success, or stderr otherwise
return framework.IOPMAssertionRelease(assertion_id)
def main():
# let's create a no idle assertion for 30 seconds
no_idle = 'NoIdleSleepAssertion'
reason = 'Test of Pythonic power assertions'
# first, we'll need the IOKit framework
framework = SetUpIOFramework()
# next, create the assertion and save the ID!
ret, a_id = AssertionCreateWithName(framework, no_idle, 255, reason)
print '\n\nCreating power assertion: status %s, id %s\n\n' % (ret, a_id)
# subprocess a call to pmset to verify the assertion worked
subprocess.call(['pmset', '-g', 'assertions'])
time.sleep(5)
# finally, release the assertion of the ID we saved earlier
AssertionRelease(framework, a_id)
print '\n\nReleasing power assertion: id %s\n\n' % a_id
# verify the assertion has been removed
subprocess.call(['pmset', '-g', 'assertions'])
if __name__ == '__main__':
main()
https://opensource.apple.com/source/PowerManagement/PowerManagement-211/pmset/pmset.c
The code relies on IOPMLib, which functions to make assertions, schedule power events, measure thermals, and more.
https://developer.apple.com/documentation/iokit/iopmlib_h
To call these functions through Python, we must go through the IOKit Framework.
https://developer.apple.com/library/archive/documentation/DeviceDrivers/Conceptual/IOKitFundamentals/Introduction/Introduction.html
In order for us to manipulate C data types in Python, we'll use a foreign function interface called ctypes.
http://python.net/crew/theller/ctypes/
Here's the wrapper the author describe's on the page; written by Michael Lynn. The code i posted from the Author's link above is a rewrite of this code to make it more understandable.
https://github.com/pudquick/pypmset/blob/master/pypmset.py
I have a python script that sets up several gearman workers. They call into some methods on SQLAlchemy models I have that are also used by a Pylons app.
Everything works fine for an hour or two, then the MySQL thread gets lost and all queries fail. I cannot figure out why the thread is getting lost (I get the same results on 3 different servers) when I am defining such a low value for pool_recycle. Also, why wouldn't a new connection be created?
Any ideas of things to investigate?
import gearman
import json
import ConfigParser
import sys
from sqlalchemy import create_engine
class JSONDataEncoder(gearman.DataEncoder):
#classmethod
def encode(cls, encodable_object):
return json.dumps(encodable_object)
#classmethod
def decode(cls, decodable_string):
return json.loads(decodable_string)
# get the ini path and load the gearman server ips:ports
try:
ini_file = sys.argv[1]
lib_path = sys.argv[2]
except Exception:
raise Exception("ini file path or anypy lib path not set")
# get the config
config = ConfigParser.ConfigParser()
config.read(ini_file)
sqlachemy_url = config.get('app:main', 'sqlalchemy.url')
gearman_servers = config.get('app:main', 'gearman.mysql_servers').split(",")
# add anypy include path
sys.path.append(lib_path)
from mypylonsapp.model.user import User, init_model
from mypylonsapp.model.gearman import task_rates
# sqlalchemy setup, recycle connection every hour
engine = create_engine(sqlachemy_url, pool_recycle=3600)
init_model(engine)
# Gearman Worker Setup
gm_worker = gearman.GearmanWorker(gearman_servers)
gm_worker.data_encoder = JSONDataEncoder()
# register the workers
gm_worker.register_task('login', User.login_gearman_worker)
gm_worker.register_task('rates', task_rates)
# work
gm_worker.work()
I've seen this across the board for Ruby, PHP, and Python regardless of DB library used. I couldn't find how to fix this the "right" way which is to use mysql_ping, but there is a SQLAlchemy solution as explained better here http://groups.google.com/group/sqlalchemy/browse_thread/thread/9412808e695168ea/c31f5c967c135be0
As someone in that thread points out, setting the recycle option to equal True is equivalent to setting it to 1. A better solution might be to find your MySQL connection timeout value and set the recycle threshold to 80% of it.
You can get that value from a live set by looking up this variable http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_connect_timeout
Edit:
Took me a bit to find the authoritivie documentation on useing pool_recycle
http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/connections.html?highlight=pool_recycle