Hi All,
I have a requirement where the client application is expected to send the data through the rest api end point provided by us. Client application is expected to send the data as query parameters.
curl -L -X POST "http://endpointurl:port/data?result=PASS&c=ADD_RECORD&attribute1=test1&attribute2=test2&attribute3=test3&attribute4=test4&signature=62759010b083d8fcf6e7ec18e6582bc07789d6dda17efbff6f474635c63db6afcbb3a0c25cf0d4c5bb1ba0ab772124edb9ba064d1530c2848fc160546263c86a2ba0cc26dd0073bb6344a1abb7475bcb1cd9f1c2b6af750db043a3da807ca356ab2d0959719dfff28af16246ce242a71d9fc99e5c383edfa90f6426568e1b6e9f8510871e40a05f6debaa6d9eee72eb9f6e0691ec625b1b24bb49cb3840940e7f83a13cdc0022e4a8ac35866f9b74418dcbeb232962113ad765cce334f431108866753c767098c363f97c056fa5f377b04094436629e9ede71b3074766c5b7492e4d7d5f4f52af0bee1683af68bb70f3cda4fef78cf5f98ce8765fd5d0e12280"
Along with the all the elements/columns of actual data(result, attribute1, attribute2, attribute3 and attribute4 client application would also send one addition parameter called signature which is created by hashing and creating the signature using client side private key based on the key parameters(not all the query parameters)
Signature: echo -n 'attribute1attribute3' | openssl sha1 -sign id_rsa -hex
Client application also provided the public key file for us to validate the signature with the actual data before wee process the records.
I am using apache nifi on HDP with below high level flow. Have used some other processors for validations and to allow other http requests.
Handlehttprequest-->>AttributestoJson-->>RouteOnAttribute-->>JolttransformJSON-->>ReplaceText-->>PutKafka-->>HandlehttpResponse
Basically, I am extracting all the http.query.param values from the data posted, and if parameter C=ADD_RECORD, I am concatenating the key attributes (attribute1, attribute3) which would be the actual data which should be verified against the signature value.
I tried to go through hashcontent processor with SHA1 but the hash value that i get is very small and it is not derived based on the client provided public key.
I also tried looking at the python scripts using Crypto package, but not been able to verify the signature with the actual data. On top of that, I am not sure, how can i call python script inside nifi.
Below are the commands that I can use manually to validate the signature with the data
echo -n 'attribute1attribute3' | openssl sha1 -sign id_rsa -hex>signature.hex
xxd -r -p signature.hex > signature.bin
echo -n 'attribute1attribute3'>keyattribute.txt
openssl dgst -sha1 -verify /tmp/test.pub -signature signature.bin keyattribute.txt and signature.bin to verify the digital signature., but in my actual requirement, I would be getting all these data as query parameters.
Need help in providing insights with respect to below.
Hashcontent can be used to generate the signature based on the public.key file? if so, I think we can use Routeonattribute to verify the signature with the actual value and take necessary actions.
Any guidance on achieving this by python/Groovy/Jython script and idea on how it can be called with in Nifi pipeline?
Any possibility of building the custom processor for meeting this requirement?
Appreciate any help on this.
Thanks,
Vish
====================================
Hi All,
In addition to my earlier query, I could finally get the python script up and running which takes three arguments
pub.key file location
signature value from hexdump from the client side
actual concatenated fields of key columns on which the signature is generated.
and displays whether the signature matches or fails.
from __future__ import print_function, unicode_literals
import sys
import binascii
from Crypto.PublicKey import RSA
from Crypto.Signature import PKCS1_v1_5
from Crypto.Hash import SHA
pubfile = sys.argv[1]
sig_hex = sys.argv[2]
data = sys.argv[3]
if not path.isfile(pubfile):
sys.stderr.write('public key file not found\n')
def verifier(pubkey, sig, data):
rsakey = RSA.importKey(key)
signer = PKCS1_v1_5.new(rsakey)
digest = SHA.new()
digest.update(data)
return signer.verify(digest, sig)
with open("pubfile", 'rb') as f: key = f.read()
sig = sig_hex.strip().decode('hex')
if verifier(key, sig, data):
print("Verified OK")
else:
print("Verification Failure")
Now need to know how can this be called with in nifi? and how can I pass the flow file attributes as arguments to the script (execute script processor) ? and how can I get the verification status message as additional attribute in the flow file?
Any help is greatly appreciated.
Thanks,
Vish
============================================================
You can do this in a few ways. Since you already have a working series of shell commands, you can use the ExecuteStreamCommand processor to execute the series of commands (or wrap them in a shell script) and stream the incoming flowfile content to STDIN and stream STDOUT to the outgoing flowfile content.
If you prefer, you can use ExecuteScript or InvokeScriptedProcessor to run the DSL script you've written. I would recommend switching to use Groovy, as it is handled much better in current Apache NiFi releases. Python (actually Jython) is much slower and does not have access to native libraries.
Finally, you could write a custom processor to do this if it's a repeated task you'll need in future flows. There are many guides for writing a custom processor available online.
Related
I have a simple SQL file that I'd like to read and execute using a Python Script Snap in SnapLogic. I created an expression library file to reference the Redshift account and have included it as a parameter in the pipeline.
I have the code below from another post. Is there a way to reference the pipeline parameter to connect to the Redshift database, read the uploaded SQL file and execute the commands?
fd = open('shared/PythonExecuteTest.sql', 'r')
sqlFile = fd.read()
fd.close()
sqlCommands = sqlFile.split(';')
for command in sqlCommands:
try:
c.execute(command)
except OperationalError, msg:
print "Command skipped: ", msg
You can access pipeline parameters in scripts using $_.
Let's say, you have a pipeline parameter executionId. Then to access it in the script you can do $_executionId.
Following is a test pipeline.
With the following pipeline parameter.
Following is the test data.
Following is the script
# Import the interface required by the Script snap.
from com.snaplogic.scripting.language import ScriptHook
import java.util
class TransformScript(ScriptHook):
def __init__(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log
# The "execute()" method is called once when the pipeline is started
# and allowed to process its inputs or just send data to its outputs.
def execute(self):
self.log.info("Executing Transform script")
while self.input.hasNext():
try:
# Read the next document, wrap it in a map and write out the wrapper
in_doc = self.input.next()
wrapper = java.util.HashMap()
wrapper['output'] = in_doc
wrapper['output']['executionId'] = $_executionId
self.output.write(in_doc, wrapper)
except Exception as e:
errWrapper = {
'errMsg' : str(e.args)
}
self.log.error("Error in python script")
self.error.write(errWrapper)
self.log.info("Finished executing the Transform script")
# The Script Snap will look for a ScriptHook object in the "hook"
# variable. The snap will then call the hook's "execute" method.
hook = TransformScript(input, output, error, log)
Output:
Here, you can see that the executionId was read from the pipeline parameters.
Note: Accessing pipeline parameters from scripts is a valid scenario but accessing other external systems from the script is complicated (because you would need to load the required libraries) and not recommended. Use the snaps provided by SnapLogic to access external systems. Also, if you want to use other libraries inside scripts, try sticking to Javascript instead of going to python because there are a lot of open source CDNs that you can use in your scripts.
Also, you can't access any configured expression library directly from the script. If you need some logic in the script, you would keep it in the script and not somewhere else. And, there is no point in accessing account names in the script (or mappers) because, even if you know the account name, you can't use the credentials/configurations stored in that account directly; that is handled by SnapLogic. Use the provided snaps and mappers as much as possible.
Update #1
You can't access the account directly. Accounts are managed and used internally by the snaps. You can only create and set accounts through the accounts tab of the relevant snap.
Avoid using script snap as much as possible; especially, if you can do the same thing using normal snaps.
Update #2
The simplest solution to this requirement would be as follows.
Read the file using a file reader
Split based on ;
Execute each SQL command using the Generic JDBC Execute Snap
I'm writing a custom Salt Stack runner to wrap accepting and rejecting minions. How do I call salt-key from my python runner that's equivalent to this from command line
salt-key -a {minion_name}
I can't provide you a definite answer, but here are my two cents:
The source code of the salt-key script is this one. Following the call chain, I reached this module which contains several classes to do key processing.
The module's documentation reads:
The Salt Key backend API and interface used by the CLI. The Key class can be
used to manage salt keys directly without interfacing with the CLI.
This is the mentioned class.
Based on this code, I presume it's used like:
import salt.client
import salt.key
client = salt.client.LocalClient()
key_manager = salt.key.Key(client.opts)
key_manager.accept('web*')
I know it has been a while since this question is answered, I would like to add my two cents on this subject.
In order to do key interactions programmatically, we use Wheel in salt. Usage is rather simple and clear:
from salt import config
from salt import wheel
masterOpts = config.master_config('/etc/salt/master')
wheelClient = wheel.WheelClient(masterOpts)
wheelClient.cmd('key.accept', ['minionId1'])
Bunch of other operations could be found in SaltStack documentatation here
I've used boto to interact with S3 with with no problems, but now I'm attempting to connect to the AWS Support API to pull back info on open tickets, trusted advisor results, etc. It seems that the boto library has different connect methods for each AWS service? For example, with S3 it is:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
According to the boto docs, the following should work to connect to AWS Support API:
>>> from boto.support.connection import SupportConnection
>>> conn = SupportConnection('<aws access key>', '<aws secret key>')
However, there are a few problems I see after digging through the source code. First, boto.support.connection doesn't actually exist. boto.connection does, but it doesn't contain a class SupportConnection. boto.support.layer1 exists, and DOES have the class SupportConnection, but it doesn't accept key arguments as the docs suggest. Instead it takes 1 argument - an AWSQueryConnection object. That class is defined in boto.connection. AWSQueryConnection takes 1 argument - an AWSAuthConnection object, class also defined in boto.connection. Lastly, AWSAuthConnection takes a generic object, with requirements defined in init as:
class AWSAuthConnection(object):
def __init__(self, host, aws_access_key_id=None,
aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None, debug=0,
https_connection_factory=None, path='/',
provider='aws', security_token=None,
suppress_consec_slashes=True,
validate_certs=True, profile_name=None):
So, for kicks, I tried creating an AWSAuthConnection by passing keys, followed by AWSQueryConnection(awsauth), followed by SupportConnection(awsquery), with no luck. This was inside a script.
Last item of interest is that, with my keys defined in a .boto file in my home directory, and running python interpreter from the command line, I can make a direct import and call to SupportConnection() (no arguments) and it works. It clearly is picking up my keys from the .boto file and consuming them but I haven't analyzed every line of source code to understand how, and frankly, I'm hoping to avoid doing that.
Long story short, I'm hoping someone has some familiarity with boto and connecting to AWS API's other than S3 (the bulk of material that exists via google) to help me troubleshoot further.
This should work:
import boto.support
conn = boto.support.connect_to_region('us-east-1')
This assumes you have credentials in your boto config file or in an IAM Role. If you want to pass explicit credentials, do this:
import boto.support
conn = boto.support.connect_to_region('us-east-1', aws_access_key_id="<access key>", aws_secret_access_key="<secret key>")
This basic incantation should work for all services in all regions. Just import the correct module (e.g. boto.support or boto.ec2 or boto.s3 or whatever) and then call it's connect_to_region method, supplying the name of the region you want as a parameter.
I am new to Jmeter. My HTTP request sampler call looks like this
Path= /image/**image_id**/list/
Header = "Key" : "Key_Value"
Key value is generated by calling a python script which uses the image_id to generate a unique key.
Before each sampler I wanted to generate the key using python script which will be passed as a header to the next HTTP Request sampler.
I know I have to used some kind of preprocessor to do that. Can anyone help me do it using a preprocessor in jmeter.
I believe that Beanshell PreProcessor is what you're looking for.
Example Beanshell code will look as follows:
import java.io.BufferedReader;
import java.io.InputStreamReader;
Runtime r = Runtime.getRuntime();
Process p = r.exec("/usr/bin/python /path/to/your/script.py");
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
StringBuilder response = new StringBuilder();
while ((line = b.readLine()) != null) {
response.append(line);
}
b.close();
vars.put("ID",response.toString());
The code above will execute Python script and put it's response into ID variable.
You will be able to refer it in your HTTP Request as
/image/${ID}/list/
See How to use BeanShell: JMeter's favorite built-in component guide for more information on Beanshell scripting in Apache JMeter and a kind of Beanshell cookbook.
You can also put your request under Transaction Controller to exclude PreProcessor execution time from load report.
A possible solution posted by Eugene Kazakov here:
JSR223 sampler has good possibility to write and execute some code,
just put jython.jar into /lib directory, choose in "Language" pop-up
menu jython and write your code in this sampler.
Sadly there is a bug in Jython, but there are some suggestion on the page.
More here.
You can use a BSF PreProcessor.
First download the Jython Library and save to your jmeter's lib directory.
On your HTTP sampler add a BSF PreProcessor, choose as language Jython and perform your needed magic to obtain the id, as an example I used this one:
import random
randImageString = ""
for i in range(16):
randImageString = randImageString + chr(random.randint(ord('A'),ord('Z')))
vars.put("randimage", randImageString)
Note the vars.put("randimage",randImageString") which will insert the variable available later to jmeter.
Now on your test you can use ${randimage} when you need it:
Now every Request will be different changing with the value put to randimage on the Python Script.
I'm trying to set up a twisted xmlrpc server, which will accept files from a client, process them, and return a file and result dictionary back.
I've used python before, but never the twisted libraries. For my purposes security is a non issue, and the ssh protocol seems like overkill. It also has problems on the windows server, since termios is not available.
So all of my research points to xmlrpc being the best way to accomplish this. However, there are two methods of file transfer available. Using the xml binary data method, or the http request method.
Files can be up to a few hundred megs either way, so which method should I use? Sample code is appreciated, since I could find no documentation for file transfers over xml with twisted.
Update:
So it seems that serializing the file with xmlrpclib.Binary does not work for large files, or I'm using it wrong. Test code below:
from twisted.web import xmlrpc, server
class Example(xmlrpc.XMLRPC):
"""
An example object to be published.
"""
def xmlrpc_echo(self, x):
"""
Return all passed args.
"""
return x
def xmlrpc_add(self, a, b):
"""
Return sum of arguments.
"""
return a + b
def xmlrpc_fault(self):
"""
Raise a Fault indicating that the procedure should not be used.
"""
raise xmlrpc.Fault(123, "The fault procedure is faulty.")
def xmlrpc_write(self, f, location):
with open(location, 'wb') as fd:
fd.write(f.data)
if __name__ == '__main__':
from twisted.internet import reactor
r = Example(allowNone=True)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
And the client code:
import xmlrpclib
s = xmlrpclib.Server('http://localhost:7080/')
with open('test.pdf', 'rb') as fd:
f = xmlrpclib.Binary(fd.read())
s.write(f, 'output.pdf')
I get xmlrpclib.Fault: <Fault 8002: "Can't deserialize input: "> when I test this. Is it because the file is a pdf?
XML-RPC is a poor choice for file transfers. XML-RPC requires the file content to be encoded in a way that XML supports. This is expensive in both runtime costs and network resources. Instead, try just POSTing or PUTing the file using plain old HTTP.