I have a Linux command that shows the output of disk usage
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 2.8T 1.1T 1.7T 39% /data/disk2
/dev/sdc1 2.8T 1.1T 1.7T 41% /data/disk3
/dev/sdd1 2.8T 1.1T 1.7T 40% /data/disk4
I need help with continue this in a script (Python or Bash) to let me know if the disks are more than 5% different of each other. If they are over 5% imbalanced, I will write code to email the results. Any help would be appreciated.
How about this:
import re
s = """
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 2.8T 1.1T 1.7T 39% /data/disk2
/dev/sdc1 2.8T 1.1T 1.7T 41% /data/disk3
/dev/sdd1 2.8T 1.1T 1.7T 40% /data/disk4
"""
regex = re.compile(r'\d{1,2}%')
result = [int(a[:-1]) for a in regex.findall(s)]
# [39, 41, 40]
If you want to compare them at the end...
if max(result) - min(result) > 5:
print("Imbalanced!")
else:
print("Balanced!")
Of course, you can call os-level functions and get their output like this:
command_output = subprocess.check_output(['df', '-h']).decode('utf-8')
Can use awk, too:
df | awk '\
BEGIN { \
max=0; \
min=2000; \
} \
NR>1 { \
pf = $5; \
sub( /\%/, "", pf ); \
pf = pf + 0; \
if ( pf > max ) max = pf; \
if ( pf < min ) min = pf; \
} \
END { \
diff = ( max - min ); \
print diff \
}'
awk solution:
df -h | awk 'NR>1{ a[NR-1]=substr($5,1,length($5)-1) }
END{ asort(a); print ((a[length(a)]-a[1]) > 5? "Not good!" : "Good!") }'
The output (for your input):
Good!
Related
My spark setting is like that :
spark_conf = SparkConf().setAppName('app_name') \
.setMaster("local[4]") \
.set('spark.executor.memory', "8g") \
.set('spark.executor.cores', 4) \
.set('spark.task.cpus', 1)
sc = SparkContext.getOrCreate(conf=spark_conf)
sc.setCheckpointDir(dirName='checkpoint')
When I do not have any checkpoint in the spark chain and my program is like this:
result = sc.parallelize(group, 4) \
.map(func_read, preservesPartitioning=True)\
.map(func2,preservesPartitioning=True) \
.flatMap(df_to_dic, preservesPartitioning=True) \
.reduceByKey(func3) \
.map(func4, preservesPartitioning=True) \
.reduceByKey(func5) \
.map(write_to_db) \
.count()
Running time is about 8 hours.
But when I use checkpoint and cache RDD like this:
result = sc.parallelize(group, 4) \
.map(func_read, preservesPartitioning=True)\
.map(func2,preservesPartitioning=True) \
.flatMap(df_to_dic, preservesPartitioning=True) \
.reduceByKey(func3) \
.map(func4, preservesPartitioning=True) \
.reduceByKey(func5) \
.map(write_to_db)
result.cache()
result.checkpoint()
result.count()
The program run in about 3 hours. Would you please guide how it is possible that after caching RDD and using checkpoint the program run faster?
Any help would be really appreciated.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to merge two sets of files together. I don't mind what language is used as long as it works.
For example:
I have a folder with both sets of files
N-4A-2A_S135_L001_R1_001.fastq.gz
N-4A-2A_S135_L001_R2_001.fastq.gz
N-4A-2A_S135_L001_R1_001-2.fastq.gz
N-4A-2A_S135_L001_R2_001-2.fastq.gz
N-4-Bmp-1_S20_L001_R1_001.fastq.gz
N-4-Bmp-1_S20_L001_R2_001.fastq.gz
N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz
N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz
I need them merged like this:
N-4A-2A-S135-L001_R1_001.fastq.gz + N-4A-2A-S135-L001_R1_001-2.fastq.gz > N-4A-2A-S135-L001_R1_001-cat.fastq.gz
N-4A-2A-S135-L001_R2_001.fastq.gz + N-4A-2A-S135-L001_R2_001-2.fastq.gz > N-4A-2A-S135-L001_R2_001-cat.fastq.gz
N-4-Bmp-1_S20_L001_R1_001.fastq.gz + N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz > N-4-Bmp-1_S20_L001_R1_001-cat.fastq.gz
N-4-Bmp-1_S20_L001_R2_001.fastq.gz + N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz > N-4-Bmp-1_S20_L001_R2_001-cat.fastq.gz
I have a total of ~180 files so any way to automate this would be amazing.
They all follow the same format of:
(sample name) + "R1_001" or "R2_001" + "" or "-2" + .fastq.gz
According to example you provided, your file names look sortable. If this is not the case, this will not work and this answer will be invalid.
Idea of the following snippet is really simple - sort all the filenames and then join adjacent tuples.
This should work in bash:
#!/bin/bash
files=( $( find . -type f | sort -r) )
files_num=${#files[#]}
for (( i=0 ; i < ${files_num} ; i=i+2 )); do
f1="${files[i]}"
f2="${files[i+1]}"
fname=$(basename -s '.gz' "$f1")".cat.gz"
echo "$f1 and $f2 -> $fname"
cat "$f1" "$f2" > "$fname"
done
or simpler version in ZShell (zsh)
#!/bin/zsh
files=( $( find . -type f | sort -r) )
for f1 f2 in $files; do
fname=$(basename -s '.gz' "$f1")".cat.gz"
echo "$f1 and $f2 -> $fname"
cat "$f1" "$f2" > "$fname"
done
You did say any language, so ... I prefer perl
#!/usr/bin/perl
# concatenate files
master(#ARGV);
exit(0);
# master -- master control
sub master
{
my(#argv) = #_;
my($tail,#tails);
local(%tails);
# scan argv for -opt
while (1) {
my($opt) = $argv[0];
last unless (defined($opt));
last unless ($opt =~ s/^-/opt_/);
$$opt = 1;
shift(#argv);
}
# load up the current directory
#tails = dirload(".");
foreach $tail (#tails) {
$tails{$tail} = 0;
}
# look for the base name of a file
foreach $tail (#tails) {
docat($tail)
if ($tail =~ /R[12]_001\.fastq\.gz$/);
}
# default mode is "dry run"
unless ($opt_go) {
printf("\n");
printf("rerun with -go to actually do it\n");
}
}
# docat -- process a pairing
sub docat
{
my($base) = #_;
my($tail);
my($out);
my($cmd);
my($code);
# all commands are joining just two files
$tail = $base;
$tail =~ s/\.fastq/-2.fastq/;
# to an output file
$out = $base;
$out =~ s/\.fastq/-cat.fastq/;
$cmd = "cat $base $tail > $out";
if ($opt_v) {
printf("\n");
printf("IN1: %s\n",$base);
printf("IN2: %s\n",$tail);
printf("OUT: %s\n",$out);
}
else {
printf("%s\n",$cmd);
}
die("docat: duplicate IN1\n")
if ($tails{$base});
$tails{$base} = 1;
die("docat: duplicate IN2\n")
if ($tails{$tail});
$tails{$tail} = 1;
die("docat: duplicate OUT\n")
if ($tails{$out});
$tails{$out} = 1;
{
last unless ($opt_go);
# execute the command and get error code
system($cmd);
$code = $? >> 8;
exit($code) if ($code);
}
}
# dirload -- get list of files in a directory
sub dirload
{
my($dir) = #_;
my($xf);
my($tail);
my(#tails);
# open the directory
opendir($xf,$dir) or
die("dirload: unable to open '$dir' -- $!\n");
# get list of files in the directory excluding "." and ".."
while ($tail = readdir($xf)) {
next if ($tail eq ".");
next if ($tail eq "..");
push(#tails,$tail);
}
closedir($xf);
#tails = sort(#tails);
#tails;
}
Here's the program output with the -v option:
IN1: N-4-Bmp-1_S20_L001_R1_001.fastq.gz
IN2: N-4-Bmp-1_S20_L001_R1_001-2.fastq.gz
OUT: N-4-Bmp-1_S20_L001_R1_001-cat.fastq.gz
IN1: N-4-Bmp-1_S20_L001_R2_001.fastq.gz
IN2: N-4-Bmp-1_S20_L001_R2_001-2.fastq.gz
OUT: N-4-Bmp-1_S20_L001_R2_001-cat.fastq.gz
IN1: N-4A-2A_S135_L001_R1_001.fastq.gz
IN2: N-4A-2A_S135_L001_R1_001-2.fastq.gz
OUT: N-4A-2A_S135_L001_R1_001-cat.fastq.gz
IN1: N-4A-2A_S135_L001_R2_001.fastq.gz
IN2: N-4A-2A_S135_L001_R2_001-2.fastq.gz
OUT: N-4A-2A_S135_L001_R2_001-cat.fastq.gz
rerun with -go to actually do it
I have 5 item types that I have to parse thousands of files (approximately 20kb - 75kb) for:
Item Types
SHA1 hashes
ip addresses
domain names
urls (full thing if possible)
email addresses
I currently use regex to find any items of these nature in thousands of files.
python regex is taking a really long time and I was wondering if there is a better method to identify these item types anywhere in any of my text based flat files?
reSHA1 = r"([A-F]|[0-9]|[a-f]){40}"
reIPv4 = r"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
reURL = r"[A-Z0-9\-\.\[\]]+(\.|\[\.\])(XN--CLCHC0EA0B2G2A9GCD|XN--HGBK6AJ7F53BBA|" \
r"XN--HLCJ6AYA9ESC7A|XN--11B5BS3A9AJ6G|XN--MGBERP4A5D4AR|XN--XKC2DL3A5EE0H|XN--80AKHBYKNJ4F|" \
r"XN--XKC2AL3HYE2A|XN--LGBBAT1AD8J|XN--MGBC0A9AZCG|XN--9T4B11YI5A|XN--MGBAAM7A8H|XN--MGBAYH7GPA|" \
r"XN--MGBBH1A71E|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--YFRO4I67O|XN--YGBI2AMMX|XN--3E0B707E|XN--JXALPDLP|" \
r"XN--KGBECHTV|XN--OGBPF8FL|XN--0ZWM56D|XN--45BRJ9C|XN--80AO21A|XN--DEBA0AD|XN--G6W251D|XN--GECRJ9C|" \
r"XN--H2BRJ9C|XN--J6W193G|XN--KPRW13D|XN--KPRY57D|XN--PGBS0DH|XN--S9BRJ9C|XN--90A3AC|XN--FIQS8S|" \
r"XN--FIQZ9S|XN--O3CW4H|XN--WGBH1C|XN--WGBL6A|XN--ZCKZAH|XN--P1AI|MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|" \
r"INFO|JOBS|MOBI|NAME|BIZ|CAT|COM|EDU|GOV|INT|MIL|NET|ORG|PRO|TEL|XXX|AC|AD|AE|AF|AG|AI|AL|AM|AN|AO|AQ|" \
r"AR|AS|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CC|CD|CF|CG|CH|CI|CK|" \
r"CL|CM|CN|CO|CR|CU|CV|CW|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|" \
r"GF|GG|GH|GI|GL|GM|GN|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|IO|IQ|IR|IS|IT|JE|JM|JO|" \
r"JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MK|ML|MM|MN|MO|" \
r"MP|MQ|MR|MS|MT|MU|MV|MW|MX|MY|MZ|NA|NC|NE|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|" \
r"PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SX|SY|SZ|TC|TD|TF|" \
r"TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|YE|YT|ZA|ZM|ZW)" \
r"(/\S+)"
reDomain = r"[A-Z0-9\-\.\[\]]+(\.|\[\.\])(XN--CLCHC0EA0B2G2A9GCD|XN--HGBK6AJ7F53BBA|XN--HLCJ6AYA9ESC7A|" \
r"XN--11B5BS3A9AJ6G|XN--MGBERP4A5D4AR|XN--XKC2DL3A5EE0H|XN--80AKHBYKNJ4F|XN--XKC2AL3HYE2A|" \
r"XN--LGBBAT1AD8J|XN--MGBC0A9AZCG|XN--9T4B11YI5A|XN--MGBAAM7A8H|XN--MGBAYH7GPA|XN--MGBBH1A71E|" \
r"XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--YFRO4I67O|XN--YGBI2AMMX|XN--3E0B707E|XN--JXALPDLP|XN--KGBECHTV|" \
r"XN--OGBPF8FL|XN--0ZWM56D|XN--45BRJ9C|XN--80AO21A|XN--DEBA0AD|XN--G6W251D|XN--GECRJ9C|XN--H2BRJ9C|" \
r"XN--J6W193G|XN--KPRW13D|XN--KPRY57D|XN--PGBS0DH|XN--S9BRJ9C|XN--90A3AC|XN--FIQS8S|XN--FIQZ9S|" \
r"XN--O3CW4H|XN--WGBH1C|XN--WGBL6A|XN--ZCKZAH|XN--P1AI|MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|INFO|JOBS|" \
r"MOBI|NAME|BIZ|CAT|COM|EDU|GOV|INT|MIL|NET|ORG|PRO|TEL|XXX|AC|AD|AE|AF|AG|AI|AL|AM|AN|AO|AQ|AR|AS|AT" \
r"|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CC|CD|CF|CG|CH|CI|CK|CL|CM|" \
r"CN|CO|CR|CU|CV|CW|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|" \
r"GH|GI|GL|GM|GN|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|IO|IQ|IR|IS|IT|JE|JM|JO|JP|" \
r"KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MK|ML|MM|MN|MO|MP" \
r"|MQ|MR|MS|MT|MU|MV|MW|MX|MY|MZ|NA|NC|NE|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|" \
r"PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SX|SY|SZ|TC|TD|TF" \
r"|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|YE|YT|ZA|" \
r"ZM|ZW)\b"
reEmail = r"\b[A-Za-z0-9._%+-]+(#|\[#\])[A-Za-z0-9.-]+(\.|\[\.\])(XN--CLCHC0EA0B2G2A9GCD|XN--HGBK6AJ7F53BBA|" \
r"XN--HLCJ6AYA9ESC7A|XN--11B5BS3A9AJ6G|XN--MGBERP4A5D4AR|XN--XKC2DL3A5EE0H|XN--80AKHBYKNJ4F|" \
r"XN--XKC2AL3HYE2A|XN--LGBBAT1AD8J|XN--MGBC0A9AZCG|XN--9T4B11YI5A|XN--MGBAAM7A8H|XN--MGBAYH7GPA|" \
r"XN--MGBBH1A71E|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--YFRO4I67O|XN--YGBI2AMMX|XN--3E0B707E|XN--JXALPDLP|" \
r"XN--KGBECHTV|XN--OGBPF8FL|XN--0ZWM56D|XN--45BRJ9C|XN--80AO21A|XN--DEBA0AD|XN--G6W251D|XN--GECRJ9C|" \
r"XN--H2BRJ9C|XN--J6W193G|XN--KPRW13D|XN--KPRY57D|XN--PGBS0DH|XN--S9BRJ9C|XN--90A3AC|XN--FIQS8S|" \
r"XN--FIQZ9S|XN--O3CW4H|XN--WGBH1C|XN--WGBL6A|XN--ZCKZAH|XN--P1AI|MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|" \
r"INFO|JOBS|MOBI|NAME|BIZ|CAT|COM|EDU|GOV|INT|MIL|NET|ORG|PRO|TEL|XXX|AC|AD|AE|AF|AG|AI|AL|AM|AN|AO|AQ|" \
r"AR|AS|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CC|CD|CF|CG|CH|CI|CK" \
r"|CL|CM|CN|CO|CR|CU|CV|CW|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE" \
r"|GF|GG|GH|GI|GL|GM|GN|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|IO|IQ|IR|IS|IT|JE|JM|" \
r"JO|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MK|ML|MM|MN" \
r"|MO|MP|MQ|MR|MS|MT|MU|MV|MW|MX|MY|MZ|NA|NC|NE|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|PA|PE|PF|PG|PH|PK|PL|PM|" \
r"PN|PR|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SX|SY|SZ|TC" \
r"|TD|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|YE|YT|" \
r"ZA|ZM|ZW)\b"
I am using a
with open(file, 'r') as f:
for m in re.finditer(key, text, re.IGNORECASE):
try:
m = str(m).split('match=')[-1].split("'")[1]
new_file.write(m + '\n')
except:
pass
method to open, find and output to a new file.
Any assistance with speeding up this item and making it more efficient would be grateful.
You probably want:
text = m.group(0)
print(text, file=new_file)
I have this simple script to check the memory usage of virtual machines managed by libvirt.
How can I convert the integer for state from dom.info() to a human readable string?
import libvirt
import re
import sys
def mem_total_kb():
meminfo = open('/proc/meminfo').read()
matched = re.search(r'^MemTotal:\s+(\d+)', meminfo)
return int(matched.groups()[0])
def main():
conn = libvirt.openReadOnly(None)
if conn == None:
print 'Failed to open connection to the hypervisor'
sys.exit(1)
used_mem_sum = 0
for domain_id in conn.listDomainsID():
dom = conn.lookupByID(domain_id)
state, max_mem, used_mem, vcpus, cpu_time_used = dom.info()
print(
'name=%s state=%s max_mem=%s used_mem=%s vcpus=%s cpu_time_used=%s' % (dom.name(), state, max_mem, used_mem, vcpus, cpu_time_used))
used_mem_sum += used_mem
print('Sum of used mem: %s KiB' % used_mem_sum)
mem_total = mem_total_kb()
print('Sum of physical mem: %s KiB' % mem_total)
if used_mem_sum > mem_total:
print('########## VMs use more RAM than available!')
return
mem_left=mem_total - used_mem_sum
print('Memory left: %s KiB' % (mem_left))
mem_left_should=4000000
if mem_left<mem_left_should:
print('less than mem_left_should=%sKiB left!' % mem_left_should)
if __name__ == '__main__':
main()
Docs: https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainInfo
state: the running state, one of virDomainState
enum virDomainState {
VIR_DOMAIN_NOSTATE = 0
no state
VIR_DOMAIN_RUNNING = 1
the domain is running
VIR_DOMAIN_BLOCKED = 2
the domain is blocked on resource
VIR_DOMAIN_PAUSED = 3
the domain is paused by user
VIR_DOMAIN_SHUTDOWN = 4
the domain is being shut down
VIR_DOMAIN_SHUTOFF = 5
the domain is shut off
VIR_DOMAIN_CRASHED = 6
the domain is crashed
VIR_DOMAIN_PMSUSPENDED = 7
the domain is suspended by guest power management
VIR_DOMAIN_LAST = 8
NB: this enum value will increase over time as new events are added to the libvirt API. It reflects the last state supported by this version of the libvirt API.
}
Looks like at least the enum constant names are exposed in the libvirt module. The following works for me...
import libvirt
import pprint
import re
d = {}
for k, v in libvirt.__dict__.iteritems():
if re.match('VIR_DOMAIN_[A-Z]+$', k):
d[v] = k
pprint.pprint(d)
...and prints...
{0: 'VIR_DOMAIN_NOSTATE',
1: 'VIR_DOMAIN_RUNNING',
2: 'VIR_DOMAIN_BLOCKED',
3: 'VIR_DOMAIN_PAUSED',
4: 'VIR_DOMAIN_SHUTDOWN',
5: 'VIR_DOMAIN_SHUTOFF',
6: 'VIR_DOMAIN_CRASHED',
7: 'VIR_DOMAIN_PMSUSPENDED'}
The descriptions, which are likely just comments in the original source code, are probably not exposed. One of the examples defines its own dictionary on line 106...
state_names = { libvirt.VIR_DOMAIN_RUNNING : "running",
libvirt.VIR_DOMAIN_BLOCKED : "idle",
libvirt.VIR_DOMAIN_PAUSED : "paused",
libvirt.VIR_DOMAIN_SHUTDOWN : "in shutdown",
libvirt.VIR_DOMAIN_SHUTOFF : "shut off",
libvirt.VIR_DOMAIN_CRASHED : "crashed",
libvirt.VIR_DOMAIN_NOSTATE : "no state" }
...so you could just take it from there, although it seems to be incomplete.
This bash script will fetch the list of virDomainState values (excluding VIR_DOMAIN_LAST) and their descriptions in JSON format:
#!/bin/bash
HEADER=include/libvirt/libvirt-domain.h
get_header_file()
{
curl -s https://raw.githubusercontent.com/libvirt/libvirt/master/"$HEADER"
}
select_virDomainState_block()
{
sed -n '/virDomainState:/,/^\} virDomainState;/ p'
}
join_multiline_comments()
{
sed -n '\%/\*[^*]*$% N; \%/\*.*\*/$% { s/\s*\n\s*/ /; p; }'
}
enum_values_to_json_map()
{
echo '{'
sed "s/\s*VIR_DOMAIN\S\+\s*=\s*//; s^,\s*/\*\s*^ : '^; s^\s*\*/^',^;"
echo '}'
}
get_header_file \
| select_virDomainState_block \
| join_multiline_comments \
| grep 'VIR_DOMAIN\S\+\s*=' \
| enum_values_to_json_map
Example usage:
$ ./get_libvirt_domain_states
{
0 : 'no state',
1 : 'the domain is running',
2 : 'the domain is blocked on resource',
3 : 'the domain is paused by user',
4 : 'the domain is being shut down',
5 : 'the domain is shut off',
6 : 'the domain is crashed',
7 : 'the domain is suspended by guest power management',
}
Note that the script downloads a ~150KB file from the libvirt mirror repository at GitHub. It is intended for facilitating staying up-to-date with the libvirt code. Of course you can call it from your python code, but personally I wouldn't do that.
what is the easiest way to decompress a data name?
For example, change compressed form:
abc[3:0]
into decompressed form:
abc[3]
abc[2]
abc[1]
abc[0]
preferable 1 liner :)
In Perl:
#!perl -w
use strict;
use 5.010;
my #abc = qw/ a b c d /;
say join( " ", reverse #abc[0..3] );
Or if you wanted them into separate variables:
my( $abc3, $abc2, $abc1, $abc0 ) = reverse #abc[0..3];
Edit: Per your clarification:
my $str = "abc[3:0]";
$str =~ /(abc)\[(\d+):(\d+)\]/;
my $base = $1;
my $from = ( $2 < $3 ? $2 : $3 );
my $to = ( $2 > $3 ? $2 : $3 );
my #strs;
foreach my $num ( $from .. $to ) {
push #strs, $base . '[' . $num . ']';
}
This is a little pyparsing exercise that I've done in the past, adapted to your example (also supports multiple ranges and unpaired indexes, all separated by commas - see the last test case):
from pyparsing import (Suppress, Word, alphas, alphanums, nums, delimitedList,
Combine, Optional, Group)
LBRACK,RBRACK,COLON = map(Suppress,"[]:")
ident = Word(alphas+"_", alphanums+"_")
integer = Combine(Optional('-') + Word(nums))
integer.setParseAction(lambda t : int(t[0]))
intrange = Group(integer + COLON + integer)
rangedIdent = ident("name") + LBRACK + delimitedList(intrange|integer)("indexes") + RBRACK
def expandIndexes(t):
ret = []
for ind in t.indexes:
if isinstance(ind,int):
ret.append("%s[%d]" % (t.name, ind))
else:
offset = (-1,1)[ind[0] < ind[1]]
ret.extend(
"%s[%d]" % (t.name, i) for i in range(ind[0],ind[1]+offset,offset)
)
return ret
rangedIdent.setParseAction(expandIndexes)
print rangedIdent.parseString("abc[0:3]")
print rangedIdent.parseString("abc[3:0]")
print rangedIdent.parseString("abc[0:3,7,14:16,24:20]")
Prints:
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]']
['abc[3]', 'abc[2]', 'abc[1]', 'abc[0]']
['abc[0]', 'abc[1]', 'abc[2]', 'abc[3]', 'abc[7]', 'abc[14]', 'abc[15]', 'abc[16]', 'abc[24]', 'abc[23]', 'abc[22]', 'abc[21]', 'abc[20]']