I'm trying to build a bridge between a Redis server and MQTT, so that when the Redis database is updated, these updates are dispatched via MQTT to clients.
For this a client (only one, the bridge) connects to the Redis database and starts to monitor it.
My issue is with parsing the commands, more specifically the arguments contained with it, which is a whitespace-separated list of stings.
For example, when I store the following hash in Redis
data = {
"key-3-1-json": "value-1",
"key-3-2-json": 'this "this is \'quoted\' text"',
}
print r18.hmset("test-hash", {
"key-1": "value-1",
"key-2": 'this "this is \'quoted\' text"',
"key-3": json.dumps(data),
})
the client recieves the following
1549578825.1 0 HMSET test-hash "key-3" "{\"key-3-1-json\": \"value-1\", \"key-3-2-json\": \"this \\\"this is 'quoted' text\\\"\"}" "key-2" "this \"this is 'quoted' text\"" "key-1" "value-1"
As you can see I'm already parsing the timestamp, database id, command and key, but the last part, I don't know how to create a list of strings from it.
This message would then be sent over MQTT as
mqtt.publish("redis/mon/0/HMSET/test-hash", json.dumps(args))
where args would be
[
"key-3",
"{\"key-3-1-json\": \"value-1\", \"key-3-2-json\": \"this \\\"this is 'quoted' text\\\"\"}",
"key-2",
"this \"this is 'quoted' text\"",
"key-1",
"value-1"
]
which would probably be the most complex case, since usually the args would be one single string, in the case where r18.set would have been used instead of r18.hmset.
I think there must be some built-in module in Python which could to this as it is like parsing a command line string.
The subprocess module's documentation states that subprocess.Popen() makes use of shlex.split() (shlex: Simple lexical analysis)
Calling shlex.split(args_str) effectively converts the arguments string into the desired list of substrings.
Related
I'm using Protobuf with the C++ API and I have a standart message I send between 2 different softwares and I want to add a raw nested message as data.
So I added a message like this:
Message main{
string id=1;
string data=2;
}
I tried to serialize some nested messages I made to a string and send it as "data" with "main" message but it doesn't work well on the parser side.
How can I send nested serialized message inside a message using c++ and python api.
Basically, use bytes:
message main {
string id=1;
bytes data=2;
}
In addition to not corrupting the data (string is strictly UTF-8), as long as the payload is a standard message, this is also compatible with changing it later (at either end, or both) to the known type:
``` proto
message main {
string id=1;
TheOtherMessageType data=2;
}
message TheOtherMessageType {...}
(or even using both versions at different times depending on which is most convenient)
As we know , we can send a key with kafka producer which is hashed internally to find which partition in topic data goes to.
I have a producer , where in I am sending a data in JSON format.
kafka-console-producer --broker-list 127.0.0.1:9092 --topic USERPROFILE << EOF
{"user_id" : 100, "firstname":"Punit","lastname":"Gupta", "countrycode":"IN", "rating":4.9 }
{"user_id" : 101, "firstname":"eli","lastname":"eli", "countrycode":"GB", "rating":3.0 }
EOF
Now I want to use "countrycode" as my key , while sending data.
In Normal delimited data we can specify 2 parameters :
--property "parse.key=true"
--property "key.separator=:
But How to do it when sending JSON sata.
I am using confluent's python API for Kafka if there is any thing that I have to write in terms of classed of functions to achieve this, i would be thankful if you can say it in terms of python.
JSON is just a string. The console producer doesn't parse JSON, only the Avro console producer does.
I would avoid key.separator=: since JSON contains :. You could use | character instead, then you just type out
countrycode|{"your":"data"}
In Python, the produce function takes a key, yes. You can parse your data like this in order to extract a value to the key.
key = 'countrycode'
records = [{"user_id" : 100, "firstname":"Punit","lastname":"Gupta", key:"IN", "rating":4.9 },
{"user_id" : 101, "firstname":"eli","lastname":"eli", key:"GB", "rating":3.0 }
]
import json
for r in records:
producer.produce('topic', key=r[key], value=json.dumps(r))
# first record will send a record containing ('IN', { ... 'countrycode':'IN'})
I am try to understand Avro Serialization on Confluent Kafka along with Schema Registry usage. It was all going well till the end but the final expectations from AVRO made lots of Confusions to me. As per my reading and understanding, Avro Serialization gives us the flexibility that when we have a change in schema, we can simply manage that without impacting the older producer/consumer.
Following the same, I have developed a python producer which will Check for a Schema existence in Schema-Registry, if absent, create it and start Producing the json messages show below. When I need to change schema, I simply update it in my producer and this produces messages with new schema.
My Old Schema :
data = '{"schema":"{\\"type\\":\\"record\\",\\"name\\":\\"value\\",\\"namespace\\":\\"my.test\\",\\"fields\\":[{\\"name\\":\\"fname\\",\\"type\\":\\"string\\"},{\\"name\\":\\"lname\\",\\"type\\":\\"string\\"},{\\"name\\":\\"email\\",\\"type\\":\\"string\\"},{\\"name\\":\\"principal\\",\\"type\\":\\"string\\"},{\\"name\\":\\"ipaddress\\",\\"type\\":\\"string\\"},{\\"name\\":\\"mobile\\",\\"type\\":\\"long\\"},{\\"name\\":\\"passport_make_date\\",\\"type\\":[\\"string\\",\\"null\\"],\\"logicalType\\":\\"timestamp\\",\\"default\\":\\"None\\"},{\\"name\\":\\"passport_expiry_date\\",\\"type\\":\\"string\\",\\"logicalType\\":\\"date\\"}]}"}'
Sample Data from Producer-1 :
{u'mobile': 9819841242, u'lname': u'Rogers', u'passport_expiry_date': u'2026-05-21', u'passport_make_date': u'2016-05-21', u'fname': u'tom', u'ipaddress': u'208.103.236.60', u'email': u'tom_Rogers#TEST.co.nz', u'principal': u'tom#EXAMPLE.COM'}
My New Schema:
data = '{"schema":"{\\"type\\":\\"record\\",\\"name\\":\\"value\\",\\"namespace\\":\\"my.test\\",\\"fields\\":[{\\"name\\":\\"fname\\",\\"type\\":\\"string\\"},{\\"name\\":\\"lname\\",\\"type\\":\\"string\\"},{\\"name\\":\\"email\\",\\"type\\":\\"string\\"},{\\"name\\":\\"principal\\",\\"type\\":\\"string\\"},{\\"name\\":\\"ipaddress\\",\\"type\\":\\"string\\"},{\\"name\\":\\"mobile\\",\\"type\\":\\"long\\"},{\\"name\\":\\"new_passport_make_date\\",\\"type\\":[\\"string\\",\\"null\\"],\\"logicalType\\":\\"timestamp\\",\\"default\\":\\"None\\"},{\\"name\\":\\"new_passport_expiry_date\\",\\"type\\":\\"string\\",\\"logicalType\\":\\"date\\"}]}"}'
Sample Data from Producer-2 :
{u'mobile': 9800647004, u'new_passport_make_date': u'2011-05-22', u'lname': u'Reed', u'fname': u'Paul', u'new_passport_expiry_date': u'2021-05-22', u'ipaddress': u'134.124.7.28', u'email': u'Paul_Reed#nbc.com', u'principal': u'Paul#EXAMPLE.COM'}
Case 1: when I have 2 producers with above 2 schemas running together, I can successfully consume message with below code. All is well till here.
while True:
try:
msg = c.poll(10)
except SerializerError as e:
xxxxx
break
print msg.value()
Case 2: When I go little deeper in JSON fields, things mixes up and breaks.
At first, say I have one producer running with ‘My Old Schema’ above and one consumer consuming these messages successfully.
print msg.value()["fname"] , msg.value()["lname"] , msg.value()["passport_make_date"], msg.value()["passport_expiry_date"]
When I run 2nd producer with ‘My New Schema’ mentioned above, my Old Consumers breaks as there is No Field passport_expiry_date and passport_make_date which is True.
Question:
Sometime I think, this is expected as it’s me(Developer) who is using the field names which are Not in the Message. But how Avro can help here? Shouldn't the missing field be handled by Avro? I saw examples in JAVA where this situation was handled properly but did not find any example in Python. For example, below github has perfect example of handling this scenario. When the field is not present, Consumer simply prints 'None'.
https://github.com/LearningJournal/ApacheKafkaTutorials
Case 3: When I run the combinations like Old Producer with Old Consumer and then in another terminals New Producer with New Consumer, Producers/Consumers mixes up and things break saying no json field.
Old Consumer ==>
print msg.value()["fname"] , msg.value()["lname"] , msg.value()["passport_make_date"], msg.value()["passport_expiry_date"]
New Consumer ==>
print msg.value()["fname"] , msg.value()["lname"] , msg.value()["new_passport_make_date"], msg.value()["new_passport_expiry_date"]
Question:
Again I think, this is expected. But, then Avro makes me think the right Consumer should get the right message with right schema. If I use msg.value() and always parse the fields at consumer side using programming without any role of Avro, then Where is the benefit of using avro? What is the benefit of sending schema with the messages/storing in SR?
Lastly, is there any way to check the schema attached to a message? I understand, in Avro, schema ID is attached with the message which is used further with Schema Registry while Reading and Writing messages. But I never see it with the messages.
Thanks much in Advance.
It's not clear what compatibility setting you're using on the registry, but I will assume backwards, which means you would have needed to add a field with a default.
Sounds like you're getting a Python KeyError because those keys don't exist.
Instead of msg.value()["non-existing-key"], you can try
option 1: treat it like a dict()
msg.value().get("non-existing-key", "Default value")
option 2: check individually for all the keys that might not be there
some_var = None # What you want to parse
val = msg.value()
if "non-existing-key" not in val:
some_var = "Default Value"
Otherwise, you must "project" the newer schema over the older data, which is what the Java code is doing by using a SpecificRecord subclass. That way, the older data would be parsed with the newer schema, which has the newer fields with their defaults.
If you used GenericRecord in Java instead, you would have similar problems. I'm not sure in Python there is an equivalent to Java's SpecificRecord.
By the way, I don't think the string "None" can be applied for a logicalType=timestamp
I'm working on an API written in Python that accepts JSON payloads from clients, applies some validation and stores the payloads in MongoDB so that they can be processed asynchronously.
However, I'm running into some trouble with payloads that (legitimately) include keys that start with $ and/or include .. According to the MongoDB documentation, my best bet is to escape these characters:
In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”).
Fair enough, but here's where it gets interesting. I'd like this process to be transparent to the application, so:
The keys should be unescaped when retrieving the documents...
...but only keys that needed to be escaped in the first place.
For example, suppose a (nefarious) user sends a JSON payload that includes a key like \ff04mixed.chars. When the application gets this document from the storage backend, this key should be converted back into \ff04mixed.chars, not $mixed.chars.
My primary concern here is information leakage; I don't want somebody to discover that the application requires special treatment for $ and . characters. The bad guys probably how to exploit MongoDB way better than I know how to secure it, and I don't want to take any chances.
Here's the approach I ended up going with:
Before inserting a document into Mongo, run it through a SONManipulator that searches for and escapes any illegal keys in the document.
The original keys get stored as a separate attribute in the document so that we can restore them later.
After retrieving a document from Mongo, run it through the SONManipulator to extract the original keys and restore them.
Here's an abbreviated example:
# Example of a document with naughty keys.
document = {
'$foo': 'bar',
'$baz': 'luhrmann'
}
##
# Before inserting the document, we must first run it through our
# SONManipulator.
manipulator = KeyEscaper()
escaped = manipulator.transform_incoming(document, collection.name)
# Now we can insert the document.
document_id = collection.insert_one(escaped).inserted_id
##
# Later, we retrieve the document.
raw = collection.find_one({'_id': document_id})
# Run the document through our KeyEscaper to restore the original
# keys.
unescaped = manipulator.transform_outgoing(raw, collection.name)
assert unescaped == document
The actual document stored in MongoDB looks like this:
{
"_id": ObjectId('582cebe5cd9b344c814d98e3')
"__escaped__1": "luhrmann",
"__escaped__0": "bar",
"__escaped__": {
"__escaped__1": ["$baz", {}],
"__escaped__0": ["$foo", {}]
}
}
Note the __escaped__ attribute that contains the original keys so that they can be restored when the document is retrieved.
This makes querying against the escaped keys a little tricky, but that's infinitely preferable to not being able to store the document in the first place.
Full code with unit tests and example usage:
https://gist.github.com/todofixthis/79a2f213989a3584211e49bfba582b40
I have an Android appthat originally posted some strings in json format to a python cgi script, which all worked fine. The problem is when the json object contains lists, then python (Using simplejson) when it gets them is still treating them as a big string
Here is a text dump of the json once it reaches python before I parse it:
{"Prob1":"[1, 2, 3]","Name":"aaa","action":1,"Prob2":"[20, 20, 20]","Tasks":"[1 task, 2 task, 3 task]","Description":""}
if we look at the "Tasks" key, the list after is clearly a single string with the elements all treated as one string (i.e. no quotes around each element). it's the same for prob1 and prob2. action, Name etc are all fine. I'm not sure if this is what python is expecting but I'm guessing not?
Just in case the android data was to blame i added quotes around each element of the arraylist like this:
Tasks.add('"'+row.get(1).toString()+'"'); instead of Tasks.add(row.get(1).toString());
On the webserver it's now received as
{"Prob1":"[1, 2, 3]","Name":"aaa","action":1,"Prob2":"[20, 20, 20]","Tasks":"[\"1 task\", \"2 task\", \"3 task\"]","Description":""}
but i still get the same problem; when i iterate through "Tasks" in a loop it's looping through each individual character as if the whole thing were a string :/
Since I don't know what the json structure should look like before it gets to Python I'm wondering whether it's a probem with the Android sending the data or my python interpreting it.. though from the looks of that script I've been guessing it's been the sending.
In the Android App I'm sending one big JSONObject containing "Tasks" and the associated arraylist as one of the key value pairs... is this correct? or should JSONArray be involved anywhere?
Thanks for any help everyone, I'm new to the whole JSON thing as well as to Android/Java (And only really a novice at Python too..). I can post additional code if anyone needs it, I just didn't want to lengthen the post too much
EDIT:
when I add
json_data=json_data.replace(r'"[','[')
json_data=json_data.replace(r']"',']')
json_data=json_data.replace(r'\"','"')
to the python it WORKS!!!! but that strikes me as a bit nasty and just papering over a crack..
Tasks is just a big string. To be a valid list, it would have to be ["1 task", "2 task", "3 task"]
Same goes for Prob1 and Prob2. To be a valid list, the brackets should not be enclosed in quotes.