Trying to send string variable via Python socket - python

I'm in a CTF competition and I'm stuck on a challenge where I have to retrieve a string from a socket, reverse it and get it back. The string changes too fast to do it manually. I'm able to get the string and reverse it but am failing at sending it back. I'm pretty sure I'm either trying to do something that's not possible or am just too inexperienced at Python/sockets/etc. to kung fu my way through.
Here's my code:
import socket
aliensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
aliensocket.connect(('localhost', 10000))
aliensocket.send('GET_KEY'.encode())
key = aliensocket.recv(1024)
truncKey = str(key)[2:16]
revKey = truncKey[::-1]
print(truncKey)
print(revKey)
aliensocket.send(bytes(revKey.encode('UTF-8')))
print(aliensocket.recv(1024))
aliensocket.close()
And here is the output:
F9SIJINIK4DF7M
M7FD4KINIJIS9F
b'Server expects key to unlock or GET_KEY to retrieve the reversed key'

key is received as a byte string. The b'' wrapped around it when printed just indicates it is a byte string. It is not part of the string. .encode() turns a Unicode string into a byte string, but you can just mark a string as a byte string by prefixing with b.
Just do:
aliensocket.send(b'GET_KEY')
key = aliensocket.recv(1024)
revKey = truncKey[::-1]
print(truncKey) # or do truncKey.decode() if you don't want to see b''
print(revKey)
aliensocket.send(revKey)

data = ''
while True:
chunk = aliensocket.recv(1)
data +=chunk
if not chunk:
rev = data[::-1]
aliensocket.sendall(rev)
break

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

hey am trying to pull image from web server using socket programming in python while going through python for everyone book there is example in networked programming chapter i copied the code from example urljpeg.py
import socket
import time
#HOST = 'data.pr4e.org'
#PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""
while True:
data = mysock.recv(5120)
if len(data) < 1: break
# time .sleep(0.25)
count = count + len(data)
print( len(data),count)
picture = picture + data
mysock.close()
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()
The error message indicates that you are trying to decode data which is not utf-8. So why is this happening? Let's take a step back and look at what the code is doing:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
We're trying to find a sequence of \r\n\r\n, i.e. CR LF CR LF in the data. This would be the empty line that separates the HTTP header (which should be in ASCII, which is a subset of UTF-8) from the actual image data. Then we try to decode everything up to that point as a string. So why does it fail? The program conveniently prints the header length, and in the bit you posted earlier we could see that this was -1, which means that the picture.find call did not find anything! Why not? Well, look carefully at what the code actually does:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
It should be looking for \r\n\r\n, but it is actually looking for r\n\r\n!

How can combine few base64 audio chunks (from microphone)

I get base64 chunks from microphone.
I need to concatenate them and send to Google API as one base64 string for speech recognition. Roughly speaking, in the first chunk the word Hello is encoded, and in the second world!. I need to glue two chunks, send them to google api of one line and receive Hello world! in response
You can see Google Speech-to-Text as example. Google also sends data from the microphone in base64 string using websockets (see Network).
Unfortunately, I don't have a microphone at hand - I can't check it. And we must do it now.
Suppose I get
chunk1 = "TgvsdUvK ...."
chunk2 = "UZZxgh5V ...."
Do I understand correctly that it will be enough just
base64.b64encode (chunk1 + chunk2))
Or do you need to know something else? Unfortunately, everything depends on the lack of a microphone (
Your example of encoding chunk1 + chunk2 wouldn't work, since base64 strings have padding at the end. If you just concatenated two base64 strings together, they couldn't be decoded.
For example, the strings StringA and StringB, when their ascii or utf-8 representations are encoded in base64, are the following: U3RyaW5nQQ== and U3RyaW5nQg==. Each one of those can be decoded fine. But, if you concatenated them, your result would be U3RyaW5nQQ==U3RyaW5nQg==, which is invalid:
concatenated_b64_strings = 'U3RyaW5nQQ==U3RyaW5nQg=='
concatenated_b64_strings_bytes = concatenated_b64_strings.encode('ascii')
decoded_strings = base64.b64decode(concatenated_b64_strings_bytes)
print(decoded_strings.decode('ascii')) # just outputs 'StringA', which is incorrect
So, in order to take those two strings (which I'm using as an example in place of binary data) and concatenate them together, starting with only their base64 representations, you have to decode them:
import base64
string1_base64 = 'U3RyaW5nQQ=='
string2_base64 = 'U3RyaW5nQg=='
# need to convert the strings to bytes first in order to decode them
base64_string1_bytes = string1_base64.encode('ascii')
base64_string2_bytes = string2_base64.encode('ascii')
# now, decode them into the actual bytes the base64 represents
base64_string1_bytes_decoded = base64.decodebytes(base64_string1_bytes)
base64_string2_bytes_decoded = base64.decodebytes(base64_string2_bytes)
# combine the bytes together
combined_bytes = base64_string1_bytes_decoded + base64_string2_bytes_decoded
# now, encode these bytes as base64
combined_bytes_base64 = base64.encodebytes(combined_bytes)
# finally, decode these bytes so you're left with a base64 string:
combined_bytes_base64_string = combined_bytes_base64.decode('ascii')
print(combined_bytes_base64_string) # output: U3RyaW5nQVN0cmluZ0I=
# let's prove that it concatenated successfully (you wouldn't do this in your actual code)
base64_combinedstring_bytes = combined_bytes_base64_string.encode('ascii')
base64_combinedstring_bytes_decoded_bytes = base64.decodebytes(base64_combinedstring_bytes)
base64_combinedstring_bytes_decoded_string = base64_combinedstring_bytes_decoded_bytes.decode('ascii')
print(base64_combinedstring_bytes_decoded_string) # output: StringAStringB
In your case, you'd be combining more than just two input base64 strings, but the process is the same. Take all the strings, encode each one to ascii bytes, decode them via base64.decodebytes(), and then add them all together via the += operator:
import base64
input_strings = ['U3RyaW5nQQ==', 'U3RyaW5nQg==']
input_strings_bytes = [input_string.encode('ascii') for input_string in input_strings]
input_strings_bytes_decoded = [base64.decodebytes(input_string_bytes) for input_string_bytes in input_strings_bytes]
combined_bytes = bytes()
for decoded in input_strings_bytes_decoded:
combined_bytes += decoded
combined_bytes_base64 = base64.encodebytes(combined_bytes)
combined_bytes_base64_string = combined_bytes_base64.decode('ascii')
print(combined_bytes_base64_string) # output: U3RyaW5nQVN0cmluZ0I=

Python String Prefix by 4 Byte Length

I'm trying to write a server in Python to communicate with a pre-existing client whose message packets are ASCII strings, but prepended by four-byte unsigned integer values representative of the length of the remaining string.
I've done a receiver, but I'm sure there's a a more pythonic way. Until I find it, I haven't done the sender. I can easily calculate the message length, convert it to bytes and transmit the message.The bit I'm struggling with is creating an integer which is an array of four bytes.
Let me clarify: If my string is 260 characters in length, I wish to prepend a big-endian four byte integer representation of 260. So, I don't want the ASCII string "0260" in front of the string, rather, I want four (non-ASCII) bytes representative of 0x00000104.
My code to receive the length prepended string from the client looks like this:
sizeBytes = 4 # size of the integer representing the string length
# receive big-endian 4 byte integer from client
data = conn.recv(sizeBytes)
if not data:
break
dLen = 0
for i in range(sizeBytes):
dLen = dLen + pow(2,i) * data[sizeBytes-i-1]
data = str(conn.recv(dLen),'UTF-8')
I could simply do the reverse. I'm new to Python and feel that what I've done is probably longhand!!
1) Is there a better way of receiving and decoding the length?
2) What's the "sister" method to encode the length for transmission?
Thanks.
The struct module is helpful here
for writing:
import struct
msg = 'some message containing 260 ascii characters'
length = len(msg)
encoded_length = struct.pack('>I', length)
encoded_length will be a string of 4 bytes with value '\x00\x00\x01\x04'
for reading:
length = struct.unpack('>I', received_msg[:4])[0]
An example using asyncio:
import asyncio
import struct
def send_message(writer, message):
data = message.encode()
size = struct.pack('>L', len(data))
writer.write(size + data)
async def receive_message(reader):
data = await reader.readexactly(4)
size = struct.unpack('>L', data)[0]
data = await reader.readexactly(size)
return data.decode()
The complete code is here

Converting string to int in serial connection

I am trying to read a line from serial connection and convert it to int:
print arduino.readline()
length = int(arduino.readline())
but getting this error:
ValueError: invalid literal for int() with base 10: ''
I looked up this error and means that it is not possible to convert an empty string to int, but the thing is, my readline is not empty, because it prints it out.
The print statement prints it out and the next call reads the next line. You should probably do.
num = arduino.readline()
length = int(num)
Since you mentioned that the Arduino is returning C style strings, you should strip the NULL character.
num = arduino.readline()
length = int(num.strip('\0'))
Every call to readline() reads a new line, so your first statement has read a line already, next time you call readline() data is not available anymore.
Try this:
s = arduino.readline()
if len(s) != 0:
print s
length = int(s)
When you say
print arduino.readline()
you have already read the currently available line. So, the next readline might not be getting any data. You might want to store this in a variable like this
data = arduino.readline()
print data
length = int(data)
As the data seems to have null character (\0) in it, you might want to strip that like this
data = arduino.readline().rstrip('\0')
The problem is when the arduino starts to send serial data it starts by sending empty strings initially, so the pyserial picks up an empty string '' which cannot be converted to an integer. You can add a delay above serial.readline(), like this:
while True:
time.sleep(1.5)
pos = arduino.readline().rstrip().decode()
print(pos)

How do I 'declare' an empty bytes variable?

How do I initialize ('declare') an empty bytes variable in Python 3?
I am trying to receive chunks of bytes, and later change that to a
utf-8 string.
However, I'm not sure how to initialize the initial variable that will
hold the entire series of bytes. This variable is called msg.
I can't initialize it as None, because you can't add a bytes and a
NoneType. I can't initialize it as a unicode string, because then
I will be trying to add bytes to a string.
Also, as the receiving program evolves it might get me in to a mess
with series of bytes that contain only parts of characters.
I can't do without a msg initialization, because then msg would be
referenced before assignment.
The following is the code in question
def handleClient(conn, addr):
print('Connection from:', addr)
msg = ?
while 1:
chunk = conn.recv(1024)
if not chunk:
break
msg = msg + chunk
msg = str(msg, 'UTF-8')
conn.close()
print('Received:', unpack(msg))
Just use an empty byte string, b''.
However, concatenating to a string repeatedly involves copying the string many times. A bytearray, which is mutable, will likely be faster:
msg = bytearray() # New empty byte array
# Append data to the array
msg.extend(b"blah")
msg.extend(b"foo")
To decode the byte array to a string, use msg.decode(encoding='utf-8').
bytes() works for me;
>>> bytes() == b''
True
Use msg = bytes('', encoding = 'your encoding here').
Encase you want to go with the default encoding, simply use msg = b'', but this will garbage the whole buffer if its not in the same encoding
As per documentation:
Blockquote
socket.recv(bufsize[, flags])
Receive data from the socket. The return value is a string representing the data received.
Blockquote
So, I think msg="" should work just fine:
>>> msg = ""
>>> msg
''
>>> len(msg)
0
>>>
To allocate bytes of some arbitrary length do
bytes(bytearray(100))

Categories

Resources