I have an application which writes/concatenates data into JSON, and then displays/graphs it via dygraphs. At times, various events can cause the values to go out of range. That range is user-subjective, so clamping that range at run-time is not the direction I am wishing to go.
I believe jq can help here - ideally I would be able to search for a field > x and if it is > x, replace it with x. I've gone searching for jq examples and not really found anything that's making sense to me yet.
I have spent a bit of time on this but not been able to make anything do what I think it should do ... at all. Like, I don't have bad code to show you because I've not made it do anything yet. I sincerely hope what I am asking is narrowed down enough for someone to be able to show me, in context, so I can extend it for the larger project.
Here's a line which I would expect to be able to modify:
{"cols":[{"type":"datetime","id":"Time","label":"Time"},{"type":"number","id":"Room1Temp","label":"Room One Temp"},{"type":"number","id":"Room1Set","label":"Room One Set"},{"type":"string","id":"Annot1","label":"Room One Note"},{"type":"number","id":"Room2Temp","label":"Room Two Temp"},{"type":"number","id":"Room2Set","label":"Room Two Set"},{"type":"string","id":"Annot2","label":"Room Two Note"},{"type":"number","id":"Room3Temp","label":"Room Three Temp"},{"type":"number","id":"State","label":"State"},{"type":"number","id":"Room4Temp","label":"Room Four Temp"},{"type":"number","id":"Quality","label":"Quality"}],"rows":[
{"c":[{"v":"Date(2019,6,4,20,31,13)"},{"v":68.01},{"v":68.0},null,{"v":62.02},{"v":55.89},null,null,{"v":4},{"v":69.0},{"v":1.052}]}]}
I'd want to do something like:
if JSONFile.Room2Set < 62
set Room2Set = 62
Here's a larger block of JSON which is the source of the chart shown below:
Example Chart
With a function clamp functions defined like so (in your ~/.jq file or inline):
def clamp_min($minInc): if . < $minInc then $minInc else . end;
def clamp_max($maxInc): if . > $maxInc then $maxInc else . end;
def clamp($minInc; $maxInc): clamp_min($minInc) | clamp_max($maxInc);
And with that data, you'll want to find the corresponding cells for each row and modify the value.
$ jq --arg col "Room2Set" --argjson max '62' '
def clamp_max($maxInc): if . > $maxInc then $maxInc else . end;
(INDEX(.cols|to_entries[]|{id:.value.id,index:.key};.id)) as $cols
| .rows[].c[$cols[$col].index] |= (objects.v |= clamp_max($max))
' input.json
With an invocation such as:
jq --arg col Room2Set --argjson mx 72 --argjson mn 62 -f clamp.jq input.json
where clamp.jq contains:
def clamp: if . > $mx then $mx elif . < $mn then $mn else . end;
(.cols | map(.id) | index($col)) as $ix
| .rows[].c[$ix].v |= clamp
the selected cells should be "clamped".
Related
so I've decided to try to make a nice cmd menu on windows in python, but I got stuck on one of the first things. I want to create a list of commands and then display them in a table.I am using prettytable to create the tables.
So I would like my output to look like this:
+---------+-------------------------------+
| Command | Usage |
+---------+-------------------------------+
| Help | /help |
| Help2 | /help 2 |
| Help3 | /help 3 |
+---------+-------------------------------+
But I cannot figure out how to create and work with the list. The code currently looks like this
from prettytable import PrettyTable
_cmdTable = PrettyTable(["Command", "Usage"])
#Here I create the commands
help = ['Help','/help']
help2 = ['Help2','/help2']
help3 = ['Help2','/help3']
#And here I add rows and print it
_cmdTable.add_row([help[0], help[1]])
_cmdTable.add_row([help2[0], help2[1]])
_cmdTable.add_row([help3[0], help3[1]])
print(_cmdTable)
But this is way too much work. I would like to make it easier, but I cannot figure out how. I'd imagine it to look something like this:
from prettytable import PrettyTable
_cmdTable = PrettyTable(["Command", "Usage"])
commands = {["Help", "/help"], ["Help2", "/help2"], ["Help3", "/help3"]}
for cmd in commands:
_cmdTable.add_row([cmd])
print(_cmdTable)
I know it's possible, just don't know how. It doesn't have to use the same module for tables, if you know some that's better or fits this request more, use it.
I basically want to make the process easier, not make it manually everytime I add a new command. Hope I explained it clearly. Thanks!
You can have more manual control using string formatting
header = ['Command', 'Usage']
rows = [['Help', '/help'], ['Help2', '/help 2'], ['Help3', '/help 3']]
spacer1 = 10
spacer2 = 20
line = '+' + '-'*spacer1 + '+' + '-'*spacer2 + '+\n'
header = f'| {header[0]:<{spacer1-1}}|{header[1]:^{spacer2}}|\n'
table_rows = ''
for row in rows:
table_rows += f'| {row[0]:<{spacer1-1}}|{row[1]:^{spacer2}}|\n'
print(line + header + line + table_rows + line)
Edit: Added spacing control with variables.
You can't put lists in a set. commands should either be a list of lists, or a set of tuples. Using a list is probably more appropriate in this application, because you may want the table items in a specific order.
You shouldn't put cmd inside another list. Each element of commands is already a list.
commands = [["Help", "/help"], ["Help2", "/help2"], ["Help3", "/help3"]]
for cmd in commands:
_cmdTable.add_row(cmd)
How can I delete multiple bins from all the records of a set in Aerospike using Aerospike Python Client udf? I tried passing one bin at a time to the udf and used scan to delete the bin from all the records, but this was very inefficient as expected. I also tried creating a list of bins in python and passing the list to the UDF. The following is the code for reference:
Suppose I have 2000 records and 200 bins with names '1', '2', '3' ... etc. I want to delete the bins from '1' to '99'. The namespace in use is testns and the set in use is udfBins. testUdf.lua is the lua file containing the udf and my_udf is the lua function name.
test.py
scan = client.scan("testns", "udfBins")
bins = [str(i) for i in range(1,366)]
# for i in range(1,100):
scan.apply("testUdf", "my_udf", [bins])
job_id = scan.execute_background()
while True:
response = client.job_info(job_id, aerospike.JOB_SCAN)
if response["status"] != aerospike.JOB_STATUS_INPROGRESS:
break
print("job done")
testUdf.lua
function my_udf(rec, bins)
info(bins)
for bin in python.iter(bins)
do
rec[bin] = nil
end
aerospike:update(rec)
end
The above code doesn't work and I'm unable to figure out the reason and the correct way to solve the problem in hand. Any help is highly appreciated.
Thanks a lot in advance
This is bit tricky problem to solve. We have to pass an array from python to lua as an argument to the lua function. Here is the pertinent part of the code that I used to make it work:
1 - pass the array as a string like so:
bins = '{"1","2"}'
# print(bins)
self.client.scan_apply("test", "users", "testUdf", "my_udf", [bins])
Note: in scan_apply (function name has an underscore, args are passed as a list, here just one arg - the string bins that in lua we convert to a table type and iterate.
Then in your testUdf.lua, do:
function my_udf(rec, bins_list)
bins_list = load("return "..bins_list)()
for i,bin in ipairs(bins_list)
do
-- debug("bins_list_item: "..bin)
rec[bin] = nil
end
aerospike:update(rec)
end
I used logging at debug level (you had info) to check what the lua code was doing.
This worked for me.
I created 3 records with bins "1", "2" and "3" and then deleted bins "1" and "2" using scan udf per above.
Here is sample output on one record after running the scan:
{'3': 1, '1': 1, '2': 1} <-- initial bins, 3 records, same bins, same values
{"1","2"} <--list that I passed as a string for setting these bins to nil
{'3': 1} <-- final bins
I checked with AQL, all 3 records had their bins "1" and "2" deleted.
aql> select * from test.users
+---+
| 3 |
+---+
| 1 |
| 1 |
| 1 |
+---+
3 rows in set (0.123 secs)
This is a good link for further reading: https://discuss.aerospike.com/t/what-is-the-syntax-to-pass-2d-array-values-to-the-record-udf-using-aql/4378
Well, I ran over a piece of code and I could not quite figure out its function.
It goes like this:
thresholds = (image[:,:,0] < rgbThreshold[0]) \
| (image[:,:,1] < rgbThreshold[1]) \
| (image[:,:,2] < rgbThreshold[2])
It's that bit over there,
\ <"return"> | (image[:,:,1] < ....
I can't quite figure out what it does.
If anyone is wondering what this code is meant to be doing, there are a set of RGB thresholds (redThreshold, green....) and an image "image".
I just select all the pixels that are below the specified threshold. Then I access them by colorSelect[threshold] = [0,0,0] i.e. blacken them (colorSelect is a numpy array that represents an image by its RBG pixel values.)
| means or and \ is just used to make the content of line in next line to make it according to PEP-8 guidelines and user able to read code better this way.
Here:
thresholds = (image[:,:,0] < rgbThreshold[0]) \
| (image[:,:,1] < rgbThreshold[1]) \
| (image[:,:,2] < rgbThreshold[2])
is same as:
thresholds = (image[:,:,0] < rgbThreshold[0]) | (image[:,:,1] < rgbThreshold[1]) | (image[:,:,2] < rgbThreshold[2])
The \ is breaking it into a new line for easier reading and the pipe character is used as a bitwise or to combine the three values.
image[:,:,1] means image[0:():(1)] which means image.__getitem__(0, (), (1))
, is an empty tuple in this context
I have a directory tree containing html files called slides. Something like:
slides_root
|
|_slide-1
| |_slide-1.html
| |_slide-2.html
|
|_slide-2
| |
| |_slide-1
| | |_slide-1.html
| | |_slide-2.html
| | |_slide-3.html
| |
| |_slide-2
| |_slide-1.html
...and so on. They could go even deeper. Now imagine I have to replace some slides in this structure by merging it with another tree which is a subset of this.
WITH AN EXAMPLE: say that I want to replace slide-1.html and slide-3.html inside "slides_root/slide-2/slide-1" merging "slides_root" with:
slide_to_change
|
|_slide-2
|
|_slide-1
|_slide-1.html
|_slide-3.html
I would merge "slide_to_change" into "slides_root". The structure is the same so everything goes fine. But I have to do it in a python object representation of this scheme.
So the two trees are represented by two instances - slides1, slides2 - of the same "Slide" class which is structured as follows:
Slide(object):
def __init__(self, path):
self.path = path
self.slides = [Slide(path)]
Both slide1 and slide2 contains a path and a list that contain other Slide objects with other path and list of Slide objects and so on.
The rule is that if the the relative path is the same then I would replace the slide object in slide1 with the one in slide2.
How can achieve this result? It is really difficult and I can see no way out. Ideally something like:
for slide_root in slide1.slides:
for slide_dest in slide2.slides:
if slide_root.path == slide_dest.path:
slide_root = slide_dest
// now restart the loop at a deeper level
// repeat
Thank everyone for any answer.
Sounds not so complicated.
Just use a recursive function for walking the to-be-inserted tree and keep a hold on the corresponding place in the old tree.
If the parts match:
If the parts are both leafs (html thingies):
Insert (overwrite) the value.
If the parts are both nodes (slides):
Call yourself with the subslides (here's the recursion).
I know this is just kind of a hint, just kind of a sketch on how to do it. But maybe you want to start on this. In Python it could look sth like this (also not completely fleshed out):
def merge_slide(slide, old_slide):
for sub_slide in slide.slides:
sub_slide_position_in_old_slide = find_sub_slide_position_by_path(sub_slide.path)
if sub_slide_position_in_old_slide >= 0: # we found a match!
sub_slide_in_old_slide = old_slide.slides[sub_slide_position_in_old_slide]
if sub_slide.slides: # this is a node!
merge_slide(sub_slide, sub_slide_in_old_slide) # here we recurse
else: # this is a leaf! so we replace it:
old_slide[sub_slide_position_in_old_slide] = sub_slide
else: # nothing like this in old_slide
pass # ignore (you might want to consider this case!)
Maybe that gives you an idea on how I would approach this.
I am attempting to rewrite some of my old bash scripts that I think are very inefficient (not to mention inelegant) and use some horrid piping...Perhaps somebody with real Python skills can give me some pointers...
The script makes uses of multiple temp files...another thing I think is a bad style and probably can be avoided...
It essentially manipulates INPUT-FILE by first cutting out certain number of lines from the top (discarding heading).
Then it pulls out one of the columns and:
calculate number of raws = N;
throws out all duplicate entries from this single column file (I use sort -u -n FILE > S-FILE).
After that, I create a sequential integer index from 1 to N and paste this new index column into the original INPUT-FILE using paste command.
My bash script then generates Percentile Ranks for the values we wrote into S-FILE.
I believe Python leverage scipy.stats, while in bash I determine number of duplicate lines (dupline) for each unique entry in S-FILE, and then calculated per-rank=$((100*($counter+$dupline/2)/$length)), where $length= length of FILE and not S-FILE. I then would print results into a separate 1 column file (and repeat same per-rank as many times as we have duplines).
I would then paste this new column with percentile ranks back into INPUT-FILE (since I would sort INPUT-FILE by the column used for calculation of percentile ranks - everything would line up perfectly in the result).
After this, it goes into the ugliness below...
sort -o $INPUT-FILE $INPUT-FILE
awk 'int($4)>2000' $INPUT-FILE | awk -v seed=$RANDOM 'BEGIN{srand(seed);} {print rand()"\t"$0}' | sort -k1 -k2 -n | cut -f2- | head -n 500 > 2000-$INPUT-FILE
diff $INPUT-FILE 2000-$INPUT-FILE | sed '/^[0-9][0-9]*/d; s/^. //; /^---$/d' | awk 'int($4)>1000' | awk -v seed=$RANDOM 'BEGIN{srand(seed);} {print rand()"\t"$0}' | sort -k1 -k2 -n | cut -f2- | head -n 500 > 1000-$INPUT-FILE
cat 2000-$INPUT-FILE 1000-$INPUT-FILE | sort > merge-$INPUT-FILE
diff merge-$INPUT-FILE $INPUT-FILE | sed '/^[0-9][0-9]*/d; s/^. //; /^---$/d' | awk 'int($4)>500' | awk -v seed=$RANDOM 'BEGIN{srand(seed);} {print rand()"\t"$0}' | sort -k1 -k2 -n | cut -f2- | head -n 500 > 500-$INPUT-FILE
rm merge-$INPUT-FILE
Essentially, this is a very inelegant bash way of doing the following:
RANDOMLY select 500 lines from $INPUT-FILE where value in column 4 is greater then 2000 and write it out to file 2000-$INPUT-FILE
For all REMAINING lines in $INPUT-FILE, randomly select 500 lines where value in column 4 is greater then 1000 and write it out to file 1000-$INPUT-FILE
For all REMAINING lines in $INPUT-FILE after 1) and 2), randomly select 500 lines where value in column 4 is greater then 500 and write it out to file 500-$INPUT-FILE
Again, I am hoping somebody can help me in reworking this ugly piping thing into a thing of python beauty! :) Thanks!
Two crucial points in the comments:
(A) The file is ~50k lines of ~100 characters. Small enough to comfortably fit in memory on modern desktop/server/laptop systems.
(B) The author's main question is about how to keep track of lines that have already been chosen, and don't choose them again.
I suggest three steps.
(1) Go through the file, making three separate lists -- call them u, v, w -- of the line numbers which satisfy each of the criteria. These lists may have more than 500 lines, and they may contain duplicates, but we will get rid of these problems in step (2).
u = []
v = []
w = []
with open(filename, "r") as f:
for linenum, line in enumerate(f):
x = int(line.split()[3])
if x > 2000:
u.append(x)
if x > 1000:
v.append(x)
if x > 500:
w.append(x)
(2) Choose line numbers. You can use the builtin Random.sample() to pick a sample of k elements from a population. We want to remove elements that have previously been chosen, so keep track of such elements in a set. (The "chosen" collection is a set instead of a list because the test "if x not in chosen" is O(log(n)) for a set, but O(n) for a list. Change it to a list and you'll see slowdown if you measure the timings precisely, though it might not be a noticeable delay for a data set of "only" 50k data points / 500 samples / 3 categories.)
import random
rand = random.Random() # change to random.Random(1234) for repeatable results
chosen = set()
s0 = rand.sample(u, 500)
chosen.update(s0)
s1 = rand.sample([x for x in v if x not in chosen], 500)
chosen.update(s1)
s2 = rand.sample([x for x in w if x not in chosen], 500)
chosen.update(s2)
(3) Do another pass through the input file, putting lines whose numbers are s0 into your first output file, lines whose numbers are in s1 into your second output file, and lines whose numbers are in s2 into your third output file. It's pretty trivial in any language, but here's an implementation which uses Python "idioms":
linenum2sample = dict([(x, 0) for x in s0]+[(x, 1) for x in s1]+[(x, 2) for x in s2])
outfile = [open("-".join(x, filename), "w") for x in ["2000", "1000", "500"]]
try:
with open(filename, "r") as f:
for linenum, line in enumerate(f):
s = linenum2sample.get(linenum)
if s is not None:
outfile[s].write(line)
finally:
for f in outfile:
f.close()
Break it up into easy pieces.
Read the file using csv.DictReader, or csv.reader if the headers are unusable. As you're iterating through the lines, check the value of column 4 and insert the lines into a dictionary of lists where the dictionary keys are something like 'gt_2000', 'gt_1000', 'gt_500'.
Iterate through your dictionary keys and for each, create a file and do a loop of 500 and for each iteration, use random.randint(0, len(the_list)-1) to get a random index of the list, write it to the file, then delete the item at that index from the list. If there could ever be fewer than 500 items in any bucket then this will require a tiny bit more.