python can't make sense of c++ string sent over winsock

python can't make sense of c++ string sent over winsock - python

Goal:
I am writing a socket server/client program (c++ is the server, python is the client) to send xml strings that carry data. My goal is to be able to receive an xml message from c++ in Python via socket.
Method
VS2013 pro
Python 2.7.2 via Vizard 4.1
1) socket communication is created just fine, no problems. I can send/receive stuff
2) after communications are initialized, c++ begins creating xml objects using Cmarkup
3) c++ converts the xml object to std::string type
4) c++ sends the std::string over the stream to Python
Problem:
The "string" received in python from C++ is interpreted as garbage symbols (not trying to offend, someone may have strong feelings for them, I do not ;) that look like symbols you'd see in notepad if you opened a binary file. This is not surprising, since data sent over the stream is binary.
What I cannot figure out is how to get Python to make sense of the stream.
Failed Attempts to fix:
1) made sure that VS2013 project uses Unicode characters
2) tried converting stream to python string and decoding it string.decode()
3) tried using Unicode()
4) also tried using binascii() methods to get something useful, small improvement but still not the same characters I sent from c++
If anyone can lend some insight on why this is happening I'd be most grateful. I have read several forums about the way data is sent over sockets, but this aspect of encoding and decoding is still spam-mackerel-casserole to my mind.
Here's the server code that creates xml, converts to string, then sends
MCD_CSTR rootname("ROOT");//initializes name for root node
MCD_CSTR Framename("FRAME");//creates name for child node
CMarkup xml;//initializes xml object using Cmarkup method
xml.AddElem(rootname);//create the root node
xml.IntoElem();//move into it
xml.AddElem(Framename, MyClient.GetFrameNumber().FrameNumber);//create child node with data from elsewhere, FrameNumber is an int
CStringA strXML = xml.GetDoc();//convert the xml object to a string using Cmarkup method
std::string test(strXML);//convert the CstringA to a std::string type
std::cout << test << '\n';//verify that the xml as a string looks right
std::cout << typeid(test).name() << '\n';//make sure it is the right type
iSendResult = send(ClientSocket, (char *)&test, sizeof(test), 0);//send the string to the client
Here is the code to receive the xml string in Python:
while 1:
data = s.recv(1024)#receive the stream with larger than required buffer
print(data)#see what is in there
if not data: break#if no data then stop listening for more

Since test is a string, this cannot work:
iSendResult = send(ClientSocket, (char *)&test, sizeof(test), 0);//send the string
The std::string is not a character array. It is an object, and all that line does is send nonsensical bytes to the socket. You want to send the data, not the object.
iSendResult = send(ClientSocket, (char *)test.c_str(), test.length(), 0);//send the string

You can't just write the memory at the location of a std::string and think that's serialization. Depending on how the C++ library implemented it, std::string is likely to be a structure containing a pointer to the actual character data. If you transmit the pointer, not only will you fail to send the character data, but the pointer value is meaningless in any other context than the current instance of the program.
Instead, serialize the important contents of the string. Send the length, then send the character data itself. Something like this.
uint32_t len = test.length();
send(..., &len, sizeof(uint32_t), ...);
send(..., test.c_str(), len, ...);

Related

Send already serialized message inside message

I'm using Protobuf with the C++ API and I have a standart message I send between 2 different softwares and I want to add a raw nested message as data.
So I added a message like this:
Message main{
string id=1;
string data=2;
}
I tried to serialize some nested messages I made to a string and send it as "data" with "main" message but it doesn't work well on the parser side.
How can I send nested serialized message inside a message using c++ and python api.

Basically, use bytes:
message main {
string id=1;
bytes data=2;
}
In addition to not corrupting the data (string is strictly UTF-8), as long as the payload is a standard message, this is also compatible with changing it later (at either end, or both) to the known type:
``` proto
message main {
string id=1;
TheOtherMessageType data=2;
}
message TheOtherMessageType {...}
(or even using both versions at different times depending on which is most convenient)

Deserializing a Streamed Protocol Buffer Message With Header and Repeated fields

I am working on deserializing a log file that has been serialized in C using protocol buffers (and NanoPB).
The log file has a short header composed of: entity, version, and identifier. After the header, the stream of data should be continuous and it should log the fields from the sensors but not the header values (this should only occur once and at the beginning).The same .proto file was used to serialize the file. I do not have separate .proto files for the header and for the streamed data.
After my implementation, I assume it should look like this:
firmware "1.0.0"
GUID "1231214211321" (example)
Timestamp 123123
Sens1 2343
Sens2 13123
Sens3 13443
Sens4 1231
Sens5 190
Timestamp 123124
Sens1 2345
Sens2 2312
...
I posted this question to figure out how to structure the .proto file initially, when I was implementing the serialization in C. And in the end I used a similar approach but did no include the: [(nanopb).max_count = 1];
Finally I opted with the following .proto in Python (There can be more sensors than 5):
syntax = "proto3";
import "timestamp.proto";
message SessionLogs {
int32 Entity = 1;
string Version = 2;
string GUID = 3;
repeated SessionLogsDetail LogDetail = 4;
}
message SessionLogsDetail
{
int32 DataTimestamp = 1; // internal counter to identify the order of session logs
// Sensor data, there can be X amount of sensors.
int32 sens1 = 2;
int32 sens2= 3;
int32 sens3= 4;
int32 sens4= 5;
}
At this point, I can serialize a message as I log with my device and according to the file size, the log seems to work, but I have not been able to deserialize it on Python offline to check if my implementation has been correct. And I can't do it in C since its an embedded application and I want to do the post-processing offline with Python.
Also, I have checked this online protobuf deserializer where I can pass the serialized file and get it deserialized without the need of the .proto file. In it I can see the header values (field 3 is empty so its not seen) and the logged information. So this makes me think that the serialization is correct but I am deserializing it wrongly on Python.
This is my current code used to deserialize the message in Python:
import PSessionLogs_pb2
with open('$PROTOBUF_LOG_FILENAME$', 'rb') as f:
read_metric = PSessionLogs_pb2.PSessionLogs()
read_metric.ParseFromString(f.read())
Besides this, I've used protoc to generate the .py equivalent of the .proto file to deserialize offline.

It looks like you've serialized a header, then serialized some other data immediately afterwards, meaning: instead of serializing a SessionLogs that has some SessionLogsDetail records, you've serialized a SessionLogs, and then you've serialized (separately) a SessionLogsDetail - does that sound about right? if so: yes, that will not work correctly; there are ways to do what you're after, but it isn't quite as simple as just serializing one after the other, because the root protobuf object is never terminated; so what actually happens is that it overwrites the root object with later fields by number.
There's two ways of addressing this, depending on the data volume. If the size (including all of the detail rows) is small, you can just change the code so that it is a true parent / child relationship, i.e. so that the rows are all inside the parent. When writing the data, this does not mean that you need to have all the rows before you start writing - there are ways of making appending child rows so that you are sending data as it becomes available; however, when deserializing, it will want to load everything in one go, so this approach is only useful if you're OK with that, i.e. you don't have obscene open-ended numbers of rows.
If you have large numbers of rows, you'll need to add your own framing, essentially. This is often done by adding a length-prefix between each payload, so that you can essentially read a single message at a time. Some of the libraries include helper methods for this; for example, in the java API this is parseDelimitedFrom and writeDelimitedTo. However, my understand is that the python API does not currently support this utility, so you'd need to do the framing yourself :(
To summarize, you currently have:
{header - SessionLogs}
{row 0 - SessionLogsDetail}
{row 1 - SessionLogsDetail}
option 1 is:
{header - SessionLogs
{row 0 - SessionLogsDetail}
{row 1 - SessionLogsDetail}
}
option 2 is:
{length prefix of header}
{header - SessionLogs}
{length prefix of row0}
{row 0 - SessionLogsDetail}
{length prefix of row1}
{row 1 - SessionLogsDetail}
(where the length prefix is something simple like a raw varint, or just a 4-byte integer in some agreed endianness)

safe and fast method for communicating a mix list of integers and booleans between pyserial and Arduino

I want to send and receive a mix list of integers and booleans between pyserial and an Arduino. I have figured the Arduino to pyserial out, well partly:
Arduino code:
...
bool var1 = ...;
int var2 = ...;
...
void setup() {
...
Serial.begin(9600);
...
}
void loop() {
...
// here I send the data in CSV format
Serial.print(var1);
Serial.print(", ");
Serial.print(var2);
Serial.print(", ");
...
Serial.println(var_n);
...
}
and the pyserial side:
import serial
serPort = serial.Serial("COM7")
dataRaw = serPort.readline().strip()
dataList = dataRaw.decode('ascii').split(', ')
data = [int(d) for d in dataList]
serPort.close()
however, the first issue is that python stops at readline. What I want to have is
to be sure serial buffer on both sides does not overflow. basically, if the receiver buffer is full the sender keeps the data in its send buffer till receiver buffer has some vacancy.
I want the readline to be run only if the receive buffer has received a new line. Basically some form of interrupt function, which is triggered when a new line is put in the receive buffer.
I want to read lines complete. so the text between to /n characters (carriage return?) must be read.
Although I'm kinda cheating here and get the boolean information as an integer in python, it does the job for me. However, I have no idea how should I send a similar mixed list from pyserial to Arduino. Consider the python list:
data = [12345, True, 67890, False, ...]
How can I send the list above to the Arduino also in a safe and fast way? I don't think that Arduino has regex or can afford complicated split, strip, ... string operations. What is the best way to send a list as above and then receive it on the Arduino side?
P.S. I did not ask this question on arduino.stackexchange because it is more general in terms of including serial communication and using python.

AMF serialization for python3

I am trying to write a python3 encoder/decoder for AMF.
The reason I'm doing it is because I didn't find a suitable library that works on python3 (I'm looking for a non-obtrusive library - one that will provide me with the methods and let me handle the gateway myself)
Avaialble libraries I tested for python are amfast, pyamf and amfy. While the first 2 are for python2 (several forks of pyamf suggest that they support python3 but I coudn't get it to work), amfy was designed for python3 but lacks some features that I need (specifically object serialization).
Reading through the specification of AMF0 and AMF3, I was able to add a package encoder/decoder but I stumbled on object serialization and the available documentation was not enough (would love to see some examples). Existing libraries were of no help either.
Using remoteObject (in flex), I managed to send the following request to my parser:
b'\x00\x03\x00\x00\x00\x01\x00\x04null\x00\x02/1\x00\x00\x00\xe0\n\x00\x00\x00\x01\x11
\n\x81\x13Mflex.messaging.messages.CommandMessage\x13operation\x1bcorrelationId\x13
timestamp\x11clientId\x15timeToLive\tbody\x0fheaders\x17destination\x13messageId\x04\x05
\x06\x01\x04\x00\x01\x04\x00\n\x0b\x01\x01\n\x05\tDSId\x06\x07nil%DSMessagingVersion\x04
\x01\x01\x06\x01\x06I03ACB769-9733-6A6C-0923-79F667AE8249'
(notice that newlines were introduced to make the request more readable)
The headers are parsed OK but when I get to the first object (\n near the end of the first line), it is marked as a reference (LSB = 0) while there is no other object it can reference to.
am I reading this wrong? Is this a malformed bytes request?
Any help decoding these bytes will be welcomed.

From the AMF3 spec, section 4.1 NetConnection and AMF3:
The format of this messaging structure is AMF 0 (See [AMF0]. A context header value or message body can switch to AMF 3 encoding using the special avmplus-object-marker type.
What this means is that by default, the message body must be parsed as AMF0. Only when encountering an avmplus-object-marker (0x11) should you switch to AMF3. As a result, the 0x0a type marker in your value is not actually an AMF3 object-marker, but an AMF0 strict-array-marker.
Looking at section 2.12 Strict Array Type in the AMF0 spec, we can see that this type is simply defined as an u32 array-count, followed that number of value-types.
In your data, the array-count is 0x00, 0x00, 0x00, 0x01 (i.e. 1), and the value following that has a type marker of 0x11 - which is the avmplus-object-marker mentioned above. Thus, only after starting to parse the AMF0 array contents should you actually switch to AMF3 to parse the following object.
In this case, the object then is an actual AMF3 object (type marker 0x0a), followed by a non-dynamic U29O-traits with 9 sealed members. But I'm sure you can take it from here. :)

Sending structured data over a network [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm a begginer in network programming, so, sorry if my questions may appear a little obvious.
I'm trying to send some data from Qt application to a Python server which will process them and send back some answer.
the methods that allows me to send data in the QTcpSocket class are:
// ...
write(const QByteArray &)
write(const char *)
// ...
my application will manage: authentification, sending and receiving some complexe data like struct, and files.
I've many questions about this situation:
Are the methods mentioned above sufficient to send complexe data, and how ?
how to deal with the data types in the server side (with Python) ?
do you think I should use an other protocole like HTTP (with the QNetworkAccessManager class) ?

Trying to answer your questions:
Are the methods mentioned above sufficient to send complexe data, and how ?
Well, yes, sending a raw byte array is the lowest level format. However, you need a something that can get you uniquely from your complex data to the byte array and from the byte array back to your complex data.
This process is called in different ways, encoding, serializing, marshalling.... But in general it just means creating a system for encoding complex structures into a sequence of bytes or characters
There are many you can choose from: ASN.1, JSON, XML, Google's protocol buffers or MIME....
You can even design your own (e.g. a simple schema is using TLV: (Tag-Length-Value), where Tag is an identifier of the Type and Value can be a basic type [and you have to define a representation for each type that you consider basic] or again one or more TLV), Length indicates how many bytes/characters are used to encode the Value.
What to choose depends a lot of where you encode (language/platform) and where you decode (language/platform) and your requirements for speed, bandwidth usage, transport, whether messages should be inspected... etc.
If you're dealing with heterogenous architectures you might need to think about endianness.
Finally, you should distinguish between the format (i.e. how the complex structure is expressed as a sequence of bytes in the line) and the library used for encoding (or the library used for decoding). Sometimes they will be linked, sometimes for the same format you will have a choice of libraries to use.
how to deal with the data types in the server side (with Python) ?
So, here you have a requirement... if you're going for a externally provided format, you must make sure it has a python library able to decode it.
if you're going for a home-grown solution, one of things you should define is the expression of your complex C++ structures as Python structures.
An additional possibility is to do everything in C++, and for the python server side use one of the systems for creating python extensions in C++ (e.g. boost-python or swig....)
do you think I should use an other protocol like HTTP (with the QNetworkAccessManager class) ?
It depends on what you try to do.
There are many HTTP libraries widely available that you can use on different languages and different architectures.
You still need to solve the problem of deciding the formatting of your information (although HTTP have some defined practices).
In addition HTTP is clearly biased towards the communication of a client with a server, with the action always initiated by the client.
Things get complex (or less widely supported) when is the server the one that needs to initiate the communication or the one that needs to send spontaneous information.

I do not think it is the language data structure type that should be differentiated, but it is more about the data you send over. Note that, different languages may have different language structures and so on. That is just really low-level details. What is more important is what you send.
You could look into the following example how the serialization/deserialization works with json format in QtCore. Json is also supported in python quite well by the json module, so you would have no issue on the server side to deserialize it:
JSON Save Game Example
This is basically the important part that would give you some hint on the client side. Do not get lost at saving into a file. It is basically writing the raw bytes to the file, which you would replace by sending over the network:
void Game::write(QJsonObject &json) const
{
QJsonObject playerObject;
mPlayer.write(playerObject);
json["player"] = playerObject;
QJsonArray levelArray;
foreach (const Level level, mLevels) {
QJsonObject levelObject;
level.write(levelObject);
levelArray.append(levelObject);
}
json["levels"] = levelArray;
}
... and then you would do something like this on the server side, again instead of reading from file, you would read from the network, but that is not a biggie as both are IO.
import json
json_data=open(file_directory).read()
data = json.loads(json_data)
pprint(data)
You could use raw protocol to design your own, or just use an extending. I would suggest to go with something standard, like http (tcp/udp). Then, you would only need to define the json format for your own data, and not deal with all the rest, like one-way or two-way communication, transaction identifier against reply attack, timestamp, data size and so on.
This would allow you to truly concentrate on the important stuff for you. Once, you have your own json format defined, you could look into the QtNetwork module to send post, get, put and delete requests as you wish.
You would probably work closely with the QNetworkManager, QNetworkReply classes, and so on. Here you can find a simple client implementation in Qt with QtCore's json for a simple pastebin functionality:
#include <QSslError>
#include <QNetworkAccessManager>
#include <QNetworkRequest>
#include <QNetworkReply>
#include <QTcpSocket>
#include <QJsonDocument>
#include <QJsonObject>
#include <QJsonParseError>
#include <QFile>
#include <QScopedPointer>
#include <QTextStream>
#include <QStringList>
#include <QCoreApplication>
#include <QDebug>
int main(int argc, char **argv)
{
QCoreApplication application{argc, argv};
application.setOrganizationName(R"("CutePaste")");
application.setApplicationName(R"("CutePaste Desktop Console Frontend")");
QTextStream standardOutputStream{stdout};
QFile dataFile;
QString firstArgument{QCoreApplication::arguments().size() < 2 ? QString() : QCoreApplication::arguments().at(1)};
if (!firstArgument.isEmpty()) {
dataFile.setFileName(firstArgument);
dataFile.open(QIODevice::ReadOnly);
} else {
dataFile.open(stdin, QIODevice::ReadOnly);
}
QByteArray pasteTextByteArray{dataFile.readAll()};
QJsonObject requestJsonObject;
requestJsonObject.insert(QStringLiteral("data"), QString::fromUtf8(pasteTextByteArray));
requestJsonObject.insert(QStringLiteral("language"), QStringLiteral("text"));
QJsonDocument requestJsonDocument{requestJsonObject};
QString baseUrlString{QStringLiteral(R"("http://pastebin.kde.org")")};
QNetworkRequest networkRequest;
networkRequest.setAttribute(QNetworkRequest::DoNotBufferUploadDataAttribute, true);
networkRequest.setHeader(QNetworkRequest::ContentTypeHeader, R"("application/json")");
networkRequest.setUrl(QUrl(baseUrlString + R"("/api/json/create")"));
QNetworkAccessManager networkAccessManager;
QScopedPointer<QNetworkReply> networkReplyScopedPointer(networkAccessManager.post(networkRequest, requestJsonDocument.toJson()));
QObject::connect(networkReplyScopedPointer.data(), &QNetworkReply::finished, [&] {
QJsonParseError jsonParseError;
QByteArray replyJsonByteArray{networkReplyScopedPointer->readAll()};
QJsonDocument replyJsonDocument{QJsonDocument::fromJson(replyJsonByteArray, &jsonParseError)};
if (jsonParseError.error != QJsonParseError::NoError) {
qDebug() << R"("The json network reply is not valid json:")" << jsonParseError.errorString();
QCoreApplication::quit();
}
if (!replyJsonDocument.isObject()) {
qDebug() << R"("The json network reply is not an object")";
QCoreApplication::quit();
}
QJsonObject replyJsonObject{replyJsonDocument.object()};
QJsonValue resultValue{replyJsonObject.value(QStringLiteral("result"))};
if (!resultValue.isObject()) {
qDebug() << R"("The json network reply does not contain an object for the "result" key")";
QCoreApplication::quit();
}
QJsonValue identifierValue{resultValue.toObject().value(QStringLiteral("id"))};
if (!identifierValue.isString()) {
qDebug() << R"("The json network reply does not contain a string for the "id" key")";
QCoreApplication::quit();
}
endl(standardOutputStream << baseUrlString << '/' << identifierValue.toString());
QCoreApplication::quit();
});
QObject::connect(networkReplyScopedPointer.data(), static_cast<void (QNetworkReply::*)(QNetworkReply::NetworkError)>(&QNetworkReply::error), [&](QNetworkReply::NetworkError networkReplyError) {
if (networkReplyError != QNetworkReply::NoError)
endl(standardOutputStream << networkReplyScopedPointer->errorString());
});
QObject::connect(networkReplyScopedPointer.data(), &QNetworkReply::sslErrors, [&](QList<QSslError> networkReplySslErrors) {
if (!networkReplySslErrors.isEmpty()) {
for (const auto &networkReplySslError : networkReplySslErrors)
endl(standardOutputStream << networkReplySslError.errorString());
}
});
int returnValue{application.exec()};
dataFile.close();
if (dataFile.error() != QFileDevice::NoError)
endl(standardOutputStream << dataFile.errorString());
return returnValue;
}
The JSON is defined in here:
http://sayakb.github.io/sticky-notes/pages/api/
For sure, it is not the only way of doing it, e.g. if you need efficiency, you may well look into a binary format like capnproto.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.