Python - splitting string into individual bytes and putting them back together - python

Here is a part of a python script I have:
textString = raw_input('')
text = list(textString)
print textString
try:
for i in range (0, len(text)):
chat_client.sock.send(text[i])
i = i + 1
chat_client.sock.send(0)
except:
Exception
try:
for i in range (0, len(text)):
chat_server.conn.send(text[i])
i = i + 1
chat_server.conn.send(0)
except:
Exception
I then am hoping to put it back together when it is received, using the int delimiter 0. Just for testing purposes, I have got:
byte = self.conn.recv(1024)
if byte:
print byte
else:
break
just to show each byte that has been received individually.
However, when I insert a string, some of it is split into more than one character:
e.g. The quick brown fox jumps over the lazy dog -->
T
h
e
q
u
i
ck
b
r
o
wn
f
ox j
umps ov
er the
lazy dog
I wondered if anyone could figure out why this might be going on.
Thank you in advance.
Also, in case you are wondering why I am trying to split text like this, it is due to a suggestion from this post:
Python P2P socket chat script - only fully working on home network; connects at school but does not work

It is by design on stream socket. From the wikipedia page : a stream socket is a type of internet socket which provides a connection-oriented, sequenced, and unique flow of data without record boundaries. If multiple messages are already present when you read, they may be concatenated.
All what is guaranteed by the specification, if that you get all, and in order.

Related

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

hey am trying to pull image from web server using socket programming in python while going through python for everyone book there is example in networked programming chapter i copied the code from example urljpeg.py
import socket
import time
#HOST = 'data.pr4e.org'
#PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""
while True:
data = mysock.recv(5120)
if len(data) < 1: break
# time .sleep(0.25)
count = count + len(data)
print( len(data),count)
picture = picture + data
mysock.close()
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()
The error message indicates that you are trying to decode data which is not utf-8. So why is this happening? Let's take a step back and look at what the code is doing:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
We're trying to find a sequence of \r\n\r\n, i.e. CR LF CR LF in the data. This would be the empty line that separates the HTTP header (which should be in ASCII, which is a subset of UTF-8) from the actual image data. Then we try to decode everything up to that point as a string. So why does it fail? The program conveniently prints the header length, and in the bit you posted earlier we could see that this was -1, which means that the picture.find call did not find anything! Why not? Well, look carefully at what the code actually does:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
It should be looking for \r\n\r\n, but it is actually looking for r\n\r\n!

(Beginner) Why my code skips some of the functions ? (Python 3)

I'm trying to write code that receives two texts and thereafter I am writing code that finds common words among the texts. This is put in a new list. sys.stdin.read() is being used instead of input() because I need to process a long piece of text.
Below is what I wrote thus far. When I run the code it seems to hang because its only asking for input for text1 and never reaches text 2 input request.
What is going on and how can I fix it?
size text 1 = approx. 500.000 chars.
size text 2 = approx. 500.000 chars.
import sys
text1 = sys.stdin.read()
print(text1)
text2 = sys.stdin.read()
print(text2)
# ... snippet ... compare code here ...
I think this will work
text1 = input("Input some text: ")
text2 = input("Input some text: ")
I don't actually know whats going wrong in yours thoough
In the code below you have two different type of input(). At text-1 you add unlimited amount of text, press enter, then type "EOF!' and press enter to go to input for text 2. At text-2 you have the input() bound to the amount of characters (likely to be ASCII or UTF-8 type; accepts all types as written for now). When 500.000 characters is received text-2 is done and you can go to the compare step, which you might have written by now. If that gives problems too post a new question for that.
The print statements are there to show you what the input per text is minus "EOF!" marker for text-1. It can be any type of marker but End Of File seemed to be obvious for now to explain the stop-marker.
import sys
text1 = ''
text2 = ""
nr_lines = 2
while True:
text1 += input()+'\n'
# print(text1[-5:-1])
if text1[-5:-1] == 'EOF!':
break
print(f'\nlen: {len(text1)}, text1 {text1}\n\n'[:-7])
while True:
text2 = input()
if len(text2) >= 500000:
break
print(f'\nlen: {len(text2)}, text2 {text2}\n\n')
The reason why sys.stdin.read() is not working is due to a different purpose and therefore the lack of functionality written into the method. Its one direction communication and not bi-directional.

I'm reading into a 256 byte string. I want to skip it, if it's all binary zeros (\x00) Is there a single test?

Totally new to python. Trying to parse a file but not all records contain data. I want to skip the records that are all hex 00.
if record == ('\x00' * 256): from a sample of print("-"*80))
gave a Syntax error, hey I said I was new. :)
Thanks for the reply, I'm using 2.7 and reading like this....
with open(testfile, "rb") as f:
counter = 0
while True:
record = f.read(256)
counter += 1
Your example looks to be very close. I'm not sure about Python 2, but in Python 3 you should specify that a string is binary.
I would do something like:
empty = b'\x00' * 256
if record == empty:
print('skipped this line')
Remember that Python 2 uses print statements, so you should do print 'skipped this line' instead.

Search for multiple string in another string using reg-exp in python with substitution

I have a line as given below
line = 00000001: 5869379 AB 0 B CCC_NSE hello how GO_A ELLLEIILKEIII8888**
I wanted to search if 00000001 and CCC_NSE and GO_A exits in a line or not. The catch is the number/string 00000001 can vary, meaning wanted to search for multiple patterns.
I tried using below code,
if re.search(r'(%s)(.*)CCC_NSE(.*)GO_A(.*)'%(temp[i][3]), lines, re.M|re.I|re.U) #temp[i][3] just array with multiple number/string.
But the error's out as syntax error. Can anyone let me know if the above expression used to find if multiple strings exist in a line is correct or not ?
Example Code Given Below:
linez = "00004944 helo how are APPLE helloo.log.gz you MANGO_REQUEST life is cool and as usual SeaPort"
print linez
blue = "00004944"
print blue
if re.search(r"%d(.*)how(.*)you(.*)"%blue, linez, re.M|re.I|re.U)
print "Exists!"
else:
print "Nope"
Thanks !
try this.
if len(re.findall(r'0+1|CCC_NSE|GO_A',line)) >= 3:
print "ok"
else:
print "wrong"

read and stock various data from various usb devices in python

I am a beginner in python, and I am trying to read the data from several sensors (humidity, temperature, pressure sensors...) that I connect with a usb hub to my computer. My main goal is to record every five minutes the different values of those sensors and then store it to analyse it.
I have got all the data sheets and manuals of my sensors (which are from Hygrosens Instruments), I know how they work and what kind of data they are sending. But I do not know how to read them. Below is what I tried, using pyserial.
import serial #import the serial library
from time import sleep #import the sleep command from the time library
import binascii
output_file = open('hygro.txt', 'w') #create a file and allow you to write in it only. The name of this file is hygro.txt
ser = serial.Serial("/dev/tty.usbserial-A400DUTI", 9600) #load into a variable 'ser' the information about the usb you are listening. /dev/tty.usbserial.... is the port after plugging in the hygrometer, 9600 is for bauds, it can be diminished
count = 0
while 1:
read_byte = ser.read(size=1)
So now I want to find the end of the line of the data as the measurement informations that I need are in a line that begins with 'V', and if the data sheet of my sensor, it said that a line ends by , so I want to read one byte at a time and look for '<', then 'c', then 'r', then '>'. So I wanted to do this:
while 1:
read_byte = ser.read(size=8) #read a byte
read_byte_hexa =binascii.hexlify(read_byte) #convert the byte into hexadecimal
trad_hexa = int(read_byte_hexa , 16) #convert the hexadecimal into an int in purpose to compare it with another int
trad_firstcrchar = int('3c' , 16) #convert the hexadecimal of the '<' into a int to compare it with the first byte
if (trad_hexa == trad_firstcrchar ): #compare the first byte with the '<'
read_byte = ser.read(size=1) #read the next byte (I am not sure if that really works)
read_byte_hexa =binascii.hexlify(read_byte)# from now I am doing the same thing as before
trad_hexa = int(read_byte_hexa , 16)
trad_scdcrchar = int('63' , 16)
print(trad_hexa, end='/')# this just show me if it gets in the condition
print(trad_scdcrchar)
if (trad_hexa == trad_scdcrchar ):
read_byte = ser.read(size=1) #read the next byte
read_byte_hexa =binascii.hexlify(read_byte)
trad_hexa = int(read_byte_hexa , 16)
trad_thirdcrchar = int('72' , 16)
print(trad_hexa, end='///')
print(trad_thirdcrchar)
if (trad_hexa == trad_thirdcrchar ):
read_byte = ser.read(size=1) #read the next byte
read_byte_hexa =binascii.hexlify(read_byte)
trad_hexa = int(read_byte_hexa , 16)
trad_fourthcrchar = int('3e' , 16)
print(trad_hexa, end='////')
print(trad_fourthcrchar)
if (trad_hexa == trad_fourthcrchar ):
print ('end of the line')
But I am not sure that it works, I mean I think it does not have the time to read the second one, the second byte I am reading, it's not exactly the second one. So that's why I want to use a buffer, but I don't really get how I can do that. I am going to look for it, but if someone knows an easier way to do what I want, I am ready to try it!
Thank you
You seem to be under the impression that the end-of-line character for that sensor's communication protocol is 4 different characters: <, c, r and >. However, what is being referred to is the carriage return, often denoted by <cr> and in many programming languages just by \r (even though it looks like 2 characters, it represents just one character).
You could simplify your code greatly by reading in the data from the sensors line by line, as the protocol is structured. Here's something to help you get started:
import time
def parse_info_line(line):
# implement to your own liking
logical_channel, physical_probe, hardware_id, crc = [line[index:index+2] for index in (1, 3, 5, 19)]
serialno = line[7:19]
return physical_probe
def parse_value_line(line):
channel, crc = [line[ind:ind+2] for ind in (1,7)]
encoded_temp = line[3:7]
return twos_comp(int(encoded_temp, 16), 16)/100.
def twos_comp(val, bits):
"""compute the 2's compliment of int value `val`"""
if (val & (1 << (bits - 1))) != 0: # if sign bit is set e.g., 8bit: 128-255
val = val - (1 << bits) # compute negative value
return val # return positive value as is
def listen_on_serial(ser):
ser.readline() # do nothing with the first line: you have no idea when you start listening to the data broadcast from the sensor
while True:
line = ser.readline()
try:
first_char = line[0]
except IndexError: # got no data from sensor
break
else:
if first_char == '#': # begins a new sensor record
in_record = True
elif first_char == '$':
in_record = False
elif first_char == 'I':
parse_info_line(line)
elif first_char == 'V':
print(parse_value_line(line))
else:
print("Unexpected character at the start of the line:\n{}".format(line))
time.sleep(2)
The twos_comp function was written by travc and you are encouraged to upvote his answer when you have enough reputation and if you intend to use his code (and even if you won't, it's still a good answer, I upvoted it just now). The listen_on_serial could be improved as well (many Python programmers will recognize the switch-structure and implement it with a dictionary rather than if... elif... elif...), but this is only intended to get you started.
As a test, the following code extract simulates the sensor sending some data (which is line-delimited, using the carriage return as the end-of-line marker), which I copied from the pdf you linked to (FAQ_terminalfenster_E.pdf).
>>> import serial
>>> import io
>>>
>>> ser = serial.serial_for_url('loop://', timeout=1)
>>> serio = io.TextIOWrapper(io.BufferedRWPair(ser, ser), newline='\r', line_buffering=True)
>>> serio.write(u'A1A0\r' # simulation of starting to listen halfway between 2 records
... '$\r' # marks the end of the previous record
... '#\r' # marks the start of a new sensor record
... 'I0101010000000000001B\r' # info about a sensor's probe
... 'V0109470D\r' # data matching that probe
... 'I0202010000000000002B\r' # other probe, same sensor
... 'V021BB55C\r') # data corresponding with 2nd probe
73L
>>>
>>> listen_on_serial(serio)
23.75
70.93
>>>
Note that it is recommended by the pyserial docs to be using TextIOWrapper when the end-of-line character is not \n (the linefeed character), as was also answered here.

Categories

Resources