I am using Jenkins python api to create a parameterized job, I can create a job with one parameter using the following config.xml
<?xml version='1.0' encoding='UTF-8'?>
<project>
<actions/>
<description>A build that explores the wonderous possibilities of parameterized builds.</description>
<keepDependencies>false</keepDependencies>
<properties>
<hudson.model.ParametersDefinitionProperty>
<parameterDefinitions>
<hudson.model.StringParameterDefinition>
<name>B</name>
<description>B, like buzzing B.</description>
<defaultValue></defaultValue>
</hudson.model.StringParameterDefinition>
</parameterDefinitions>
</hudson.model.ParametersDefinitionProperty>
</properties>
<scm class="hudson.scm.NullSCM"/>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers class="vector"/>
<concurrentBuild>false</concurrentBuild>
<builders>
<hudson.tasks.Shell>
<command>ping -c 1 localhost | tee out.txt
echo $A > a.txt
echo $B > b.txt</command>
</hudson.tasks.Shell>
</builders>
<publishers>
<hudson.tasks.ArtifactArchiver>
<artifacts>*</artifacts>
<latestOnly>false</latestOnly>
</hudson.tasks.ArtifactArchiver>
<hudson.tasks.Fingerprinter>
<targets></targets>
<recordBuildArtifacts>true</recordBuildArtifacts>
</hudson.tasks.Fingerprinter>
</publishers>
<buildWrappers/>
</project>
What I really want is to create a job with multiple parameters, I have tried adding a paralleled <name> tag in this xml, but it actually make one parameter in a new job. Do I change the xml mistakenly?
What's more, is it possible to add predefined values in parameter fields in the api? For example, the parameter B would have a value b after job creation.
Each parameter needs its own hudson.model.StringParameterDefinition section. It sounds like you're trying to put multiple names inside one StringParameterDefintion section, which won't work.
If in doubt, create the job by hand, then go to the page for that job and append '/config.xml'. You will get back the XML for the job. Get your Python to replicate that, and you're all set.
Related
Currently im consuming xml data from a kafka source and process them with Spark Structured Streaming.
To get the needed information out of the xml i am using xpath. As i want to make the pipeline more dynamic i tried to implement a dictionary which hold the column name to be extracted and the expression itself. In a future version the dictionary could get filled by some configuration files without touching the python job.
Unfortunatly it seems not to work as desired (i am a python noob, maybe that's why...).
The xml could be described as followed:
<root>
<header eventId ="1234" .../>
<.../>
</root>
My python code looks like this:
df = spark.readStream.format("kafka")...load()
df = df.selectExpr("CAST(timestamp) AS String)", "CAST(value AS String"))
xml_data = df
.selectExpr("xpath(value, './root/header/#eventId')event_id", ...)
.selectExpr("explode(arrays_zip(event_id,...)) value"
.select('value.*')
My next step was defining the dict:
mapping_dict = {
'event_id' : './root/header/#eventId',
...
}
I tried to rebuild the expression like this:
event_id = "\"xpath(value,'" + mapping_dict.get('event_id) + "') event_id\""
xml_data = df.selectExpr(event_id,...)
.selectExpr("explode(arrays_zip(event_id,...)) value"
.select('value.*')
Now i tried to use the dict value in the selectExpr but it fails with an error
org.apache.spark.sql.AnalysisException: cannot resolve '`event_id`' given input columns: [xpath(value, './root/header/#eventId') event_id ...
So this is my first problem, the second one would be, that i want to iterate over this dict and try to extract each entry from the xml. I don't know if i can do that with structured streaming that easily or if i would have to use an udf. And if so, how could an udf for this purpose look like?
Cheers
Is there someplace that fully describes use of config data in snakemake rules?
There is an example in the user guide of this in a yaml file:
samples:
A: data/samples/A.fastq
B: data/samples/B.fastq
Then, it is used in a rule like this:
bam=expand("sorted_reads/{sample}.bam", sample=config["samples"]),
It seems like the above would replace {sample} to "data/samples/A.fastq" rather than by "A" (and "B" etc.) as it apparently does.
What is the right way to make use of config data in output rules, e.g. to help form the output filename? This form doesn't work:
output: "{config.dataFolder}/{ID}/{ID}.yyy"
I'm looking for syntax guidance if I define complex structured data in the yaml file - how do I make use of it in the snake rules? When do I use Python syntax and when do I use SnakeMake syntax?
The yaml and JSON config files are severely limited in that they cannot use values defined earlier in the file to define new values, right? And that's something that would often be done when setting configuration parameters.
What is the advantage of using a configfile? Why not instead just use include: an include a python file to define parameters?
A useful thing would be a reference manual that describes the details of SnakeMake thoroughly. The current website is kind of scattered, takes a while to find things that you remember seeing previously somewhere in it.
How should config data be used in "output" rules? I found that the output string cannot contain {config.} values. However, they can be included using Python code, as follows:
output: config["OutputDir"] + "/myfile.txt"
But, this method does NOT work (in either output: or input:):
params: config=config
output: "{params.config[OutputDir]}/myfile.txt"
However, it DOES work in "shell:":
params: config=config
output: config["OutputDir"] + "/myfile.txt"
shell: echo "OutputDir is {params.config[OutputDir]}" > {output}
Notice that there are no quotes around OutputDir inside the [] in the shell cmd. The {} method of expanding values in a string does not use quotes around the keys.
Can config data be defined snakefile-wise OR python-wise? YES!
Parameters can be defined in a .yaml file included using 'configfile', or via a regular Python file included using 'include'. The latter is IMHO superior, since .yaml files don't allow definitions to reference previous ones, something that would be common in all but the simplest configuration files.
To define the "OutputDir" parameter above using yaml:
xxx.yaml:
OutputDir: DATA_DIR
snakefile:
configfile: 'xxx.yaml'
To define it using Python to be exactly compatible with above:
xxx.py:
config['OutputDir'] = "DATA_DIR"
snakefile:
include: 'xxx.py'
Or, to define a simple variable 'OutputDir' in a Python included configuration file and then use it in a rule:
xxx.py:
OutputDir = "DATA_DIR"
snakefile:
include: 'xxx.py'
rule:
output: OutputDir + "/myfile.txt"
Multi-nested dictionaries and lists can be easily defined and accessed, both via .yaml files and python files. Example:
MACBOOK> cat cfgtest.yaml
cfgtestYAML:
A: 10
B: [1, 2, 99]
C:
nst1: "hello"
nst2: ["big", "world"]
MACBOOK> cat cfgtest.py
cfgtestPY = {
'X': -2,
'Y': range(4,7),
'Z': {
'nest1': "bye",
'nest2': ["A", "list"]
}
}
MACBOOK> cat cfgtest
configfile: "cfgtest.yaml"
include: "cfgtest.py"
rule:
output: 'cfgtest.txt'
params: YAML=config["cfgtestYAML"], PY=cfgtestPY
shell:
"""
echo "params.YAML[A]: {params.YAML[A]}" >{output}
echo "params.YAML[B]: {params.YAML[B]}" >>{output}
echo "params.YAML[B][2]: {params.YAML[B][2]}" >>{output}
echo "params.YAML[C]: {params.YAML[C]}" >>{output}
echo "params.YAML[C][nst1]: {params.YAML[C][nst1]}" >>{output}
echo "params.YAML[C][nst2]: {params.YAML[C][nst2]}" >>{output}
echo "params.YAML[C][nst2][1]: {params.YAML[C][nst2][1]}" >>{output}
echo "" >>{output}
echo "params.PY[X]: {params.PY[X]}" >>{output}
echo "params.PY[Y]: {params.PY[Y]}" >>{output}
echo "params.PY[Y][2]: {params.PY[Y][2]}" >>{output}
echo "params.PY[Z]: {params.PY[Z]}" >>{output}
echo "params.PY[Z][nest1]: {params.PY[Z][nest1]}" >>{output}
echo "params.PY[Z][nest2]: {params.PY[Z][nest2]}" >>{output}
echo "params.PY[Z][nest2][1]: {params.PY[Z][nest2][1]}" >>{output}
"""
MACBOOK> snakemake -s cfgtest
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 1
1
rule 1:
output: cfgtest.txt
jobid: 0
Finished job 0.
1 of 1 steps (100%) done
MACBOOK> cat cfgtest.txt
params.YAML[A]: 10
params.YAML[B]: 1 2 99
params.YAML[B][2]: 99
params.YAML[C]: {'nst1': 'hello', 'nst2': ['big', 'world']}
params.YAML[C][nst1]: hello
params.YAML[C][nst2]: big world
params.YAML[C][nst2][1]: world
params.PY[X]: -2
params.PY[Y]: range(4, 7)
params.PY[Y][2]: 6
params.PY[Z]: {'nest1': 'bye', 'nest2': ['A', 'list']}
params.PY[Z][nest1]: bye
params.PY[Z][nest2]: A list
params.PY[Z][nest2][1]: list
YAML Configuration
This has to do with the nesting of YAML files, see an example here.
The config["samples"] request will return both 'A' and 'B'. I'm my head I think of it returning a list, but I am not positive on the variable type.
By using the configfile as listed here:
https://snakemake.readthedocs.io/en/latest/tutorial/advanced.html
You can link in the following YAML configuration files, in YAML format.
settings/config.yaml:
samples:
A
B
OR
settings/config.yaml:
sampleID:
123
124
125
baseDIR:
data
Resulting call with YAML config access
Snakefile:
configfile: "settings/config.yaml"
rule all:
input:
expand("{baseDIR}/{ID}.bam", baseDIR=config["baseDIR"], ID=config["sampleID"]),
rule fastq2bam:
input:
expand("{{baseDIR}}/{{ID}}.{readDirection}.fastq", readDirection=['1','2'])
output:
"{baseDIR}/{ID}.bam"
#Note different number of {}, 1 for wildcards not in expand.
#Equivalent line with 'useless' expand call would be:
#expand("{{baseDIR}}/{{ID}}.bam")
shell:
"""
bwa mem {input[0]} {input[1]} > {output}
"""
Dummy examples, just trying to exemplify the use of different strings and config variables. I use wildcards in the fastq2bam rule. Typically I only use config variables to set things in my rule 'all', when possible this is best practice. I cannot say if the shell call actually works for bwa mem, but I think you get the idea of what I'm implying.
A larger version of a Snakefile can be seen here
Once the configfile is setup, to reference anything in it, use 'config'. It can be used to access deep into a YAML as needed. Here I'll go down 3 hypothetical levels, like so:
hypothetical_var = config["yamlVarLvl1"]["yamlVarLvl2"]["yamlVarLvl3"]
Equates to (I'm not POSITIVE about the typing, I think it converts to strings)
hypothetical_var = ['124', '125', '126', '127', '128', '129']
If the YAML is:
yamlVarLvl1:
yamlVarLvl2:
yamlVarLvl3:
'124'
'125'
'126'
'127'
'128'
'129'
Code Organization
Python and Snakemake code, for the most part can be interleaved in certian places. I would advise against this as it will make the code difficult to maintain. It's up to the user to decide how to implement this. E.g, using the run or the shell directive changes how to access the variables.
YAML and JSON files are preferred configuration variable files as I believe the provide some support for editting and Command-Line Interface over-ridding of variables. This would not be as clean if it was implemented using externally imported python variables. Also it helps my brain, knowing python files do things, and YAML files store things.
YAML is always an external file, but...
If you are using a single Snakefile, put the supporting python at the top?
If you are using a multi-file system, consider having the supporting python scripts externalized.
Tutorials
I think a perfect vignette is difficult to design. I'm trying to teach my group about Snakemake, and I have over 40 pages of personally written documentation, I've provided three 1hr+ presentations with PowerPoint slideshows, I've read nearly the entire ReadTheDocs.io manual for Snakemake, and I just recently finished going through the list of additional resources, yet, I'm still learning too!
Side note, I found this tutorial to be very nice as well.
Does that provide enough context?
Is there someplace that fully describes use of config data in snakemake rules?
There is no limit to what you can put in the config file, provided it can be parsed into python objects. Basically, "your imagination is the limit".
What is the right way to make use of config data in output rules, e.g. to help form the output filename?
I extract things from the config outside the rules, in plain python.
Instead of output: "{config.dataFolder}/{ID}/{ID}.yyy", I would do:
data_folder = config.dataFolder
rule name_of_the_rule:
output:unction
os.path.join(data_folder, "{ID}", "{ID}.yyy")
I guess that with what you tried, snakemake has problems formatting the string when there is a mix of things coming from the wildcards, and others. But maybe the following works in python 3.6, using formatted string litterals: output: f"{config.dataFolder}/{ID}/{ID}.yyy". I haven't checked.
I'm looking for syntax guidance if I define complex structured data in the yaml file - how do I make use of it in the snake rules? When do I use Python syntax and when do I use SnakeMake syntax?
In the snakefile, I typically read the config file to extract configuration information before the rules. This is essentially pure python except that a config object is directly made available by Snakemake for convenience. You could probably just use plain standard python using config = json.load("config.json") or config = yaml.load("config.yaml").
In the snakefile, outside the rules, you can do whatever computations you want in python. This can be before reading the config as well as after. You can define functions that can be used in rules (for instance to generate rule's inputs), compute lists of things that will be used as wildcards. I think the only thing is that an object needs to be defined before the rules that use it.
Snakemake syntax seems mainly a means of describing the rules. Within the run part of a rule, you can use whatever python you want, knowing that you have access to a wildcards object to help you. Input and output of rules are lists of file paths, and you can use python in order to build them.
I am looking to convert a Python object into XML data. I've tried lxml, but eventually had to write custom code for saving my object as xml which isn't perfect.
I'm looking for something more like pyxser. Unfortunately pyxser xml code looks different from what I need.
For instance I have my own class Person
Class Person:
name = ""
age = 0
ids = []
and I want to covert it into xml code looking like
<Person>
<name>Mike</name>
<age> 25 </age>
<ids>
<id>1234</id>
<id>333333</id>
<id>999494</id>
</ids>
</Person>
I didn't find any method in lxml.objectify that takes object and returns xml code.
Best is rather subjective and I'm not sure it's possible to say what's best without knowing more about your requirements. However Gnosis has previously been recommended for serializing Python objects to XML so you might want to start with that.
From the Gnosis homepage:
Gnosis Utils contains several Python modules for XML processing, plus other generally useful tools:
xml.pickle (serializes objects to/from XML)
API compatible with the standard pickle module)
xml.objectify (turns arbitrary XML documents into Python objects)
xml.validity (enforces XML validity constraints via DTD or Schema)
xml.indexer (full text indexing/searching)
many more...
Another option is lxml.objectify.
Mike,
you can either implement object rendering into XML :
class Person:
...
def toXml( self):
print '<Person>'
print '\t<name>...</name>
...
print '</Person>'
or you can transform Gnosis or pyxser output using XSLT.
I'm using openldap on Mac OS X Server 10.6 and need to generate a vcard for all the users in a given group. By using the ldapsearch I can list all the memberUid's for all users in that group. I found a perl script (Advanced LDAP Search or ALS) that was written by someone that will generate the vcard easily. ALS can be found here http://www.ldapman.org/tools/als.gz
So what I need to do is create a wrapper script (in python or perl) that will effectively loop through the memberUid's and run the ALS command to create the vcard and append it to the file.
This command provides the memberUid's:
ldapsearch -x -b 'dc=ldap,dc=server,dc=com' '(cn=testgroup)'
Then running ALS gives the vcard:
als -b dc=ldap,dc=server,dc=com -V uid=aaronh > vcardlist.vcf
If it's easier to do this using Perl since ALS is already using it that would be fine. I've done more work in python but I'm open to suggestions.
Thanks in advance,
Aaron
EDIT:
Here is a link to the Net:LDAP code that I have to date. So far it pulls down the ldap entries with all user information. What I'm missing is how to capture just the UID for each user and then push it into ALS.
http://www.queencitytech.com/net-ldap
Here is an example entry (after running the code from the above link):
#-------------------------------
DN: uid=aaronh,cn=users,dc=ldap,dc=server,dc=com
altSecurityIdentities : Kerberos:aaronh#LDAP.SERVER.COM
apple-generateduid : F0F9DA73-70B3-47EB-BD25-FE4139E16942
apple-imhandle : Jabber:aaronh#ichat.server.com
apple-mcxflags : <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>simultaneous_login_enabled</key>
<true/>
</dict>
</plist>
authAuthority : ;ApplePasswordServer;0x4c11231147c72b59000001f800001663,1024 35 131057002239213764263627099108547501925287731311742942286788930775556419648865483768960345576253082450228562208107642206135992876630494830143899597135936566841409094870100055573569425410665510365545238751677692308677943427807426637133913499488233527734757673201849965347880843479632671824597968768822920700439 root#ldap.server.com:192.168.1.175;Kerberosv5;0x4c11231147c72b59000001f800001663;aaronh#LDAP.SERVER.COM;LDAP.SERVER.COM;1024 35 131057002239213764263627099108547501925287731311742942286788930775556419648865483768960345576253082450228562208107642206135992876630494830143899597135936566841409094870100055573569425410665510365545238751677692308677943427807426637133913499488233527734757673201849965347880843479632671824597968768822920700439 root#ldap.server.com:192.168.1.170
cn : Aaron Hoffman
gidNumber : 20
givenName : Aaron
homeDirectory : 99
loginShell : /bin/bash
objectClass : inetOrgPersonposixAccountshadowAccountapple-userextensibleObjectorganizationalPersontopperson
sn : Hoffman
uid : aaronh
uidNumber : 2643
userPassword : ********
#-------------------------------
My language of choice would be Perl - but only because I've done similar operations using Perl and LDAP.
If I remember correctly, that ldapsearch command will give you the full LDIF entry for each uid in the testgroup cn. If that's the case, then you'll need to clean it up a bit before it's ready for the als part. Though it's definitely not the most elegant solution, a quick and dirty method is to use backticks and run the command's output through a grep. This will return a nice list of all the memberUids. From there it's just a simple foreach loop and you're done. Without any testing or knowing for sure what your LDAP output looks like, I'd go with something like this:
#!/usr/bin/perl
# should return a list of "memberUid: name" entries
#uids = `ldapsearch -x -b 'cn=testgroup,cn=groups,dc=ldap,dc=server,dc=com' | grep memberUid:`;
foreach (#uids) {
$_ =~ s/memberUid: //; # get rid of the "uid: " part, leaving just the name
chomp $_; # get rid of the pesky newline
system "als -b \"dc=ldap,dc=server,dc=com\" -V uid=$_ >> vcardlist.vcf";
}
As I said, I haven't tested this, and I'm not exactly sure what the output of your ldapsearch looks like, so you may have to tweak it a bit to fit your exact needs. That should be enough to get you going though.
If anyone has a better idea I'd love to hear it too.
I have a XML file with the following data format:
<net NetName="abc" attr1="123" attr2="234" attr3="345".../>
<net NetName="cde" attr1="456" attr2="567" attr3="678".../>
....
Can anyone tell me how could I data mine the XML file using an awk one-liner? For example, I would like to know attr3 of abc. It will return 345 to me.
In general, you don't. XML/HTML parsing is hard enough without trying to do it concisely, and while you may be able to hack together a solution that succeeds with a limited subset of XML, eventually it will break.
Besides, there are many great languages with great XML parsers already written, so why not use one of them and make your life easier?
I don't know whether or not there's an XML parser built for awk, but I'm afraid that if you want to parse XML with awk you're going to get a lot of "hammers are for nails, screwdrivers are for screws" answers. I'm sure it can be done, but it's probably going to be easier for you to write something quick in Perl that uses XML::Simple (my personal favorite) or some other XML parsing module.
Just for completeness, I'd like to note that if your snippet is an example of the entire file, it is not valid XML. Valid XML should have start and end tags, like so:
<netlist>
<net NetName="abc" attr1="123" attr2="234" attr3="345".../>
<net NetName="cde" attr1="456" attr2="567" attr3="678".../>
....
</netlist>
I'm sure invalid XML has its uses, but some XML parsers may whine about it, so unless you're dead set on using an awk one-liner to try to half-ass "parse" your "XML," you may want to consider making your XML valid.
In response to your edits, I still won't do it as a one-liner, but here's a Perl script that you can use:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
sub usage {
die "Usage: $0 [NetName] ([attr])\n";
}
my $file = XMLin("file.xml", KeyAttr => { net => 'NetName' });
usage() if #ARGV == 0;
exists $file->{net}{$ARGV[0]}
or die "$ARGV[0] does not exist.\n";
if(#ARGV == 2) {
exists $file->{net}{$ARGV[0]}{$ARGV[1]}
or die "NetName $ARGV[0] does not have attribute $ARGV[1].\n";
print "$file->{net}{$ARGV[0]}{$ARGV[1]}.\n";
} elsif(#ARGV == 1) {
print "$ARGV[0]:\n";
print " $_ = $file->{net}{$ARGV[0]}{$_}\n"
for keys %{ $file->{net}{$ARGV[0]} };
} else {
usage();
}
Run this script from the command line with 1 or 2 arguments. The first argument is the 'NetName' you want to look up, and the second is the attribute you want to look up. If no attribute is given, it should just list all the attributes for that 'NetName'.
I have written a tool called xml_grep2, based on XML::LibXML, the perl interface to libxml2.
You would find the value you're looking for by doing this:
xml_grep2 -t '//net[#NetName="abc"]/#attr3' to_grep.xml
The tool can be found at http://xmltwig.com/tool/
xmlgawk can use XML very easily.
$ xgawk -lxml 'XMLATTR["NetName"]=="abc"{print XMLATTR["attr3"]}' test.xml
This one liner can parse XML and print "345".
If you do not have xmlgawk and your XML format is fixed, normal awk can do.
$ nawk -F '[ ="]+' '/abc/{for(i=1;i<=NF;i++){if($i=="attr3"){print $(i+1)}}}' test.xml
This script can return "345".
But I think it is very dangerous because normal awk can not use XML.
You might try this nifty little script: http://awk.info/?doc/tools/xmlparse.html