I need to run a Kedro (v0.17.4) pipeline with a node that is supposed to process data with a different logic depending on the load version of the input.
As a simple and crude example assuming there is a catalog.yml file with this entry:
test_data_set:
type: pandas.CSVDataSet
filepath: data/01_raw/test.csv
versioned: true
and there are multiple versions of test.csv (say '1' and '2') and I want to use the Catalog from the config file and run the following node/pipeline:
from kedro.config import ConfigLoader
from kedro.io import DataCatalog
conf_loader = ConfLoader(['conf/base'])
conf_catalog = conf_loader.get('catalog*', 'catalog/**')
io = DataCatalog.from_config(conf_catalog)
def my_node(my_data_set):
#if version_of_my_data_set == '1': # how to do this?
# print("do something with version 1")
# ... do something else
return
my_pipeline = Pipeline([node(func=my_node, inputs="test_data_set", outputs=None, name="process_versioned_data")])
SequentialRunner().run(my_pipeline, catalog=io)
I understand that runtime parameters or the load version are supposed to be separated from the logic in a node by design, but in my specific case it would still be useful to find a way to do this.
In general the pipeline will be executed via the API but also via the command line with the --load_version flag.
Solutions that I have considered but discarded:
store the load version somehow in the Kedro session and access it within the node via "get_current_session" (how?)
add load_version as a required input parameter for the node (would probably break compatibility with some upstream pipeline)
In short:
Is there a good way to pass the information of the user specified load version of a dataset to a kedro node?
I am extreme beginner in writing python scripts as I am learning it currently.
I am writing a code where I am going to extract the branches which I have which are named something like tobedeleted_branch1 , tobedeleted_branch2 etc with the help of python-gitlab module.
With so much of research and everything, I was able to extract the names of the branch with the script I have given bellow, but now what I want is, I need to delete the branches which are getting printed.
So I had a plan that, I will go ahead and store the print output in a variable and will delete them in a go, but I am still not able to store them in a variable.
Once I store the 'n' number of branches in that variable, I want to delete them.
I went through the documentation but I couldn't figure out how can I make use of it in python script.
Module: https://python-gitlab.readthedocs.io/en/stable/index.html
Delete branch with help of module REF: https://python-gitlab.readthedocs.io/en/stable/gl_objects/branches.html#branches
Any help regarding this is highly appreciated.
import gitlab, os
TOKEN = "MYTOKEN"
GITLAB_HOST = 'MYINSTANCE'
gl = gitlab.Gitlab(GITLAB_HOST, private_token=TOKEN)
# set gitlab group id
group_id = 6
group = gl.groups.get(group_id, lazy=True)
#get all projects
projects = group.projects.list(include_subgroups=True, all=True)
#get all project ids
project_ids = []
for project in projects:
project_ids.append((project.id))
print(project_ids)
for project in project_ids:
project = gl.projects.get(project)
branches = project.branches.list()
for branch in branches:
if "tobedeleted" in branch.attributes['name']:
print(branch.attributes['name'])
Also, I am very sure this is not the clean way to write the script. Can you please drop your suggestions on how to make it better ?
Thanks
Branch objects have a delete method.
for branch in project.branches.list(as_list=False):
if 'tobedeleted' in branch.name:
branch.delete()
You can also delete a branch by name if you know its exact name already:
project.branches.delete('exact-branch-name')
As a side note:
The other thing you'll notice I've done is add the as_list=False argument to .list(). This will make sure that you paginate through all branches. Otherwise, you'll only get the first page (default 20 per page) of branches. The same is true for most list methods.
Just learning how to use Meson and want to generate protobuf source/headers for multiple languages - C++, Python, Java, Javascript. C++ was simple enough using the generator function in my meson.build file:
project('MesonProtobufExample', 'cpp')
protoc = find_program('protoc', required : true)
deps = dependency('protobuf', required : true)
gen = generator(protoc, \
output : ['#BASENAME#.pb.cc', '#BASENAME#.pb.h'],
arguments : ['--proto_path=#CURRENT_SOURCE_DIR#', '--cpp_out=#BUILD_DIR#', '#INPUT#'])
generated = gen.process('MyExample.proto')
ex = executable('my_example', 'my_example.cpp', generated, dependencies : deps)
Which produces the MyExample.pb.cc and MyExample.pb.h files. I figured Python would be just as easy but I'm a bit stumped since there's no executable() step for my Python script since it doesn't need to be compiled. I noticed meson (and CMake it turns out) don't actually generate the protobuf files until you call executable() so I can't just skip this step or the MyExample_pb2.py file will not be generated. I have found no example for using meson/python/GPB together after several hours of searching. Shouldn't there be a simple way to 'link' the generated sources to a python file/module like the way CMake does?
protobuf_generate_python(PROTO_PY MyExample.proto)
# This command causes the protobuf python binding to be generated
add_custom_target(my_example.py ALL DEPENDS ${PROTO_PY})
You can use trick with custom_target() and "fake compiler" in the form of cp or cat tools (in -nix environments, of course, if you want to support Windows then you can use conditional find_program()). Here is the example with cp:
py_gen = generator( ... )
py_generated = gen.process('MyExample.proto')
py_proc = custom_target('py_proto',
command: [ 'cp', '#INPUT#', '#OUTPUT#' ],
input : py_generated,
output : 'MyExample_pb2.py',
build_by_default : true)
I added buid_by_default flag assuming that you need to generate it as a part of standard build process (of course, enabling this target can be conditional too).
It's the first time i have come across a recipe file written in python and it's giving me an error. The error is:
../meta-intel/recipes-rt/images/core-image-rt.bb: Error executing a python function in <code>:
This is a recipe which is coming from the meta-intel branch "[master] intel-vaapi-driver: 2.1.0 -> 2.2.0".
My poky version is" [morty] documentation: Updated manual revision table for 2.2.4 release date.
My BITBAKE version is: "BitBake Build Tool Core version 1.32.0"
The contents of core-image-rt.bb are:
require recipes-core/images/core-image-minimal.bb
# Skip processing of this recipe if linux-intel-rt is not explicitly specified as the
# PREFERRED_PROVIDER for virtual/kernel. This avoids errors when trying
# to build multiple virtual/kernel providers.
python () {
if d.getVar("PREFERRED_PROVIDER_virtual/kernel") != "linux-intel-rt":
raise bb.parse.SkipPackage("Set PREFERRED_PROVIDER_virtual/kernel to linux-intel-rt to enable it")
}
DESCRIPTION = "A small image just capable of allowing a device to boot plus a \
real-time test suite and tools appropriate for real-time use."
DEPENDS += "linux-intel-rt"
IMAGE_INSTALL += "rt-tests hwlatdetect"
LICENSE = "MIT"
If you need any additional information please let me know and i'll try and supply it.
I can normally build images on my ubuntu machine but don't believe have ever had to build an image in which the recipes were written in python
You are using incompatible API of using g.getVar method. In morty release as the last one with old way of using second parameter, there is still need to provide boolean parameter:
...
if d.getVar("PREFERRED_PROVIDER_virtual/kernel", True) != "linux-intel-rt":
...
Please take a look at one of the commit, that remove this in next releases.
I'm attempting to generate Python bindings for a Thrift service definition using Bazel. As far as I've been able to tell, there is no existing .bzl for doing this so I'm somewhat on my own here. I've written .bzl rules in the past but the situation I'm running into in this case is different.
The general issue is that I don't know the names of the output files from the thrift command before the build starts which means that I can't generate a py_library rule with a srcs attribute set correctly since I don't have the names of the files. I've tried to follow examples whereby the output files are known ahead of time by way of generating a .zip file, but the py_library rule only allows .py files as srcs so this doesn't work.
The only thing I can think of would be to use a repository_rule to generate the code and BUILD files but what I'm trying to accomplish doesn't seem like much of a stretch and should be supported.
Someone has attempted this before.
Discussion here:
https://groups.google.com/forum/#!topic/bazel-dev/g3DVmhVhNZs
Code Here:
https://github.com/wt/bazel_thrift
I would start there.
Edit:
I started there. I did not get as far as I hoped. Bazel is being extended to support having multiple outputs generated by one input, but it does not allow that very easily just yet, per:
groups.google.com/forum/#!topic/bazel-discuss/3WQhHm194yU
Regardless, I did attempt something for C++ thrift bindings, which have the same issue. The Java example got around this by using the source jar as a build source, which won't work for us. To make it work, I passed in the list of source files I cared about that would be created by the thrift generator. I then reported these files as the output that would be generated in the impl. That seems to work. It is a bit nasty in that you have to know what files you are looking for before you build, but it does work. It would also be possible to have a small program read the thift file and determine the output files it would make. That would be nicer, but I don't have the time. Plus, the current approach is nice, in that it explicitly defines what files you are looking for thrift to generate, which makes the BUILD file a little bit easier to understand for a newbie like me.
First pass at some code, maybe I will clean it up and submit it as a patch (maybe not):
###########
# CPP gen
###########
# Create Generated cpp source files from thrift idl files.
#
def _gen_thrift_cc_src_impl(ctx):
out = ctx.outputs.outs
if not out:
# empty set
# nothing to do, no inputs to build
return DefaultInfo(files=depset(out))
# Used dir(out[0]) to see what
# we had available in the object.
# dirname attribute tells us the directory
# we should be putting stuff in, works nicely.
# ctx.genfile_dir is not the output directory
# when called as an external repository
target_genfiles_root = out[0].dirname
thrift_includes_root = "/".join(
[ target_genfiles_root, "thrift_includes"])
gen_cpp_dir = "/".join([target_genfiles_root,"." ])
commands = []
commands.append(_mkdir_command_string(target_genfiles_root))
commands.append(_mkdir_command_string(thrift_includes_root))
thrift_lib_archive_files = ctx.attr.thrift_library._transitive_archive_files
for f in thrift_lib_archive_files:
commands.append(
_tar_extract_command_string(f.path, thrift_includes_root))
commands.append(_mkdir_command_string(gen_cpp_dir))
thrift_lib_srcs = ctx.attr.thrift_library.srcs
for src in thrift_lib_srcs:
commands.append(_thrift_cc_compile_command_string(
thrift_includes_root, gen_cpp_dir, src))
inputs = (
list(thrift_lib_archive_files) + thrift_lib_srcs )
ctx.action(
inputs = inputs,
outputs = out,
progress_message = "Generating CPP sources from thift archive %s" % target_genfiles_root,
command = " && ".join(commands),
)
return DefaultInfo(files=depset(out))
thrift_cc_gen_src= rule(
_gen_thrift_cc_src_impl,
attrs = {
"thrift_library": attr.label(
mandatory=True, providers=['srcs', '_transitive_archive_files']),
"outs" : attr.output_list(mandatory=True, non_empty=True),
},output_to_genfiles = True,
)
# wraps cc_library to generate a library from one or more .thrift files
# provided as a thrift_library bundle.
#
# Generates all src and hdr files needed, but you must specify the expected
# files. This is a bug in bazel: https://groups.google.com/forum/#!topic/bazel-discuss/3WQhHm194yU
#
# Instead of src and hdrs, requires: cpp_srcs and cpp_hdrs. These are required.
#
# Takes:
# name: The library name, like cc_library
#
# thrift_library: The library of source .thrift files from which our
# code will be built from.
#
# cpp_srcs: The expected source that will be generated and built. Passed to
# cc_library as src.
#
# cpp_hdrs: The expected header files that will be generated. Passed to
# cc_library as hdrs.
#
# Rest of options are documented in native.cc_library
#
def thrift_cc_library(name, thrift_library,
cpp_srcs=[],cpp_hdrs=[],
build_skeletons=False,
deps=[], alwayslink=0, copts=[],
defines=[], include_prefix=None,
includes=[], linkopts=[],
linkstatic=0, nocopts=None,
strip_include_prefix=None,
textual_hdrs=[],
visibility=None):
# from our thrift_library tarball source bundle,
# create a generated cpp source directory.
outs = []
for src in cpp_srcs:
outs.append("//:"+src)
for hdr in cpp_hdrs:
outs.append("//:"+hdr)
thrift_cc_gen_src(
name = name + 'cc_gen_src',
thrift_library = thrift_library,
outs = outs,
)
# Then make the library for the given name.
native.cc_library(
name = name,
deps = deps,
srcs = cpp_srcs,
hdrs = cpp_hdrs,
alwayslink = alwayslink,
copts = copts,
defines=defines,
include_prefix=include_prefix,
includes=includes,
linkopts=linkopts,
linkstatic=linkstatic,
nocopts=nocopts,
strip_include_prefix=strip_include_prefix,
textual_hdrs=textual_hdrs,
visibility=visibility,
)