As I understand it, files like /dev/urandom provide just a constant stream of bits. The terminal emulator then tries to interpret them as strings, which results in a mess of unrecognised characters.
How would I go about doing the same thing in python, send a string of ones and zeros to the terminal as "raw bits"?
edit
I may have to clarify:
Say for example the string I want to "print" is 1011100. On an ascii system, the output should be "\". If I cat /dev/urandom, it provides a constant stream of bits. Which get printed like this: "���c�g/�t]+__��-�;". That's what I want.
Stephano: the key is the incomplete answer by "#you" above - the chr function :
import random, sys
for i in xrange(500):
sys.stdout.write(chr(random.randrange(256)))
Use the chr function. I takes an input between 0 and 255 and returns a string containing the character corresponding to that value.
And from another question on StackOverflow you can get a _bin function.
def _bin(x, width):
return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))
Then simply put call _bin(ord(x), 8) where x is a character (string of length one)
import sys, random
while True:
sys.stdout.write(chr(random.getrandbits(8)))
sys.stdout.flush()
Related
Am communicating with a piece of equipment over RS232, and it seems to only interpret commands correctly when issued commands in the following format:
b'\xXX'
for example:
equipment_ser.write(b'\xE1')
The argument is variable, and so I convert to hex before formatting the command. I'm having trouble coming up with a consistent way to ensure only 1 backslash while preserving the hex command. I need the entire range - \x00 to \xFF.
One approach was to use 'unicode escape':
setpoint_command_INT = 1
setpoint_command_HEX = "{0:#0{1}x}".format(setpoint_command_INT,4)
setpoint_command_HEX_partially_formatted = r'\x' + setpoint_command_HEX[2:4]
setpoint_command_HEX_fully_formatted = setpoint_command_HEX_partially_formatted.encode('utf_8').decode('unicode_escape')
works ok for the above example:
Out[324]: '\x01'
but not for large numbers where the code process changes it:
setpoint_command_INT = 240
Out[332]: 'ð'
How can I format this command so that I have the single backslash while preserving the ability to command across the full range 0-255?
Thanks
Edit:
The correct way to do this is as said by Michael below:
bytes((240,))
Thank you for the prompt responses.
In your code, you are sending a single byte
equipment_ser.write(b'\xE1')
In other words, you're sending decimal 225 but as a single byte.
For any integer value in the range 0-255 you can create its byte equivalent by:
import sys
N = 225 # for example
b = N.to_bytes(1, sys.byteorder)
equipment_ser.write(b)
The following is a function that my professor made, and we are to use this function in our code:
import os
def rando_num(num_bytes):
return ord(os.urandom(num_bytes))
Instructions say to make a create_list function, call rando_num within create_list, and to do some checks on the return. I keep getting errors, and out of curiosity I tested rando_num by itself and got the same errors.
If I call
rando_num('5')
I get "'str' object cannot be interpreted as an integer"
If I call
rando_num(5)
I get "ord() expected a character, but string of length 5 found"
If I try
rando_num('a') I get "'str' object cannot be interpreted as an integer"
I read up on os.urandom and ord, so I'm confused about what I need to put in the function. I thought urandom returns a string, but then the error says ord expects a character. I can't alter the professor's code, so how in the world do I use it?
As Python's documentation for os.urandom says, urandom returns "a string of n random bytes." And as Python's documentation for ord says, ord must be "given a string representing one Unicode character."
The only way I can see to reconcile those two without modifying your professor's code is to have urandom return only one byte which Python interprets as a string with one Unicode character. That character will be limited to have the ord result of 0 through 255, but it will work.
So set up your code to always call rando_num with an argument of 1. Calling that repeatedly in effect gives a sequence of random numbers between 0 and 255, but done by the operating system rather than by the programming language (which is want random does). This does work in my Python 3.7.
I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)
The problem is the following:
I read first 24 bytes of a file:
with open (from_folder+"/"+i, "rb") as myfile:
header=str(myfile.read(24))
then I look for pattern in it:
if y[1] in header:
shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])
where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']
y[1] is the pattern and = r'\x47\x40\x00'
the file has it inside, as you can see from the picture below.
the program does NOT find this pattern (r'\x47\x40\x00') in the file header.
so, I tried to print header:
You see? Python sees it as 'G#' instead of '\x47\x40'
and if i search for 'G#'+r'\x00' in header - everything is ok. It finds it.
Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G#'+r'\x00'.
OR
why python sees first two numbers as 'G#' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?
with open (from_folder+"/"+i, "rb") as myfile:
header=myfile.read(24)
header = str(binascii.hexlify(header))[2:-1]
the result I get is:
And I can work with it
4740001b0000b00d0001c100000001efff3690e23dffffff
P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.
In Python 3 you'll get bytes from a binary read, rather than a string.
No need to convert it to a string by str.
Print will try to convert bytes to something human readable.
If you don't want that, convert your bytes to e.g. hex representations of the integer values of the bytes by:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (''.join ([hex (aByte) for aByte in aBytes]))
Output as redirected from the console:
b'\x00G#\x00\x13\x00\x00\xb0'
0x00x470x400x00x130x00x00xb0
You can't search in aBytes directly with the in operator, since aBytes isn't a string but an array of bytes.
If you want to apply a string search on '\x00\x47\x40', use:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (r'\x'.join ([''] + ['%0.2x'%aByte for aByte in aBytes]))
Which will give you:
b'\x00G#\x00\x13\x00\x00\xb0'
\x00\x47\x40\x00\x13\x00\x00\xb0
So there's a number of separate issues at play here:
print tries to print something human readable, which succeeds only for the first two chars.
You can't directly search for bytearrays in bytearrays with in, so convert them to a string containing fixed length hex representations as substrings, as shown.
I am communicating with a power supply through rs232. I can communicate no problem when I send for example:
port.write("\x31")
but if instead I have a string as a variable
teststring='"\\x31"'
(which prints out as "\x31")
and I try:
port.write(teststring)
it does not send the command to the supply. I have tried:
port.write(bytes(teststring,'utf-8'))
and
port.write(teststring.encode('utf-8'))
But it still is somehow not sending the same as just entering the text. I need to be able to change this variable, so I cannot just code the text in.
Any help is appreciated!
Using comments below, I am now using an integer
testint=31
and if I print
chr(testint) I get a an odd box with 00 in the top row and 1F in the bottom. What I now need to be able to do is convert the 31 to 0x31, so I can use chr(0x31) which when printed produces 1. Hopefully the .write command will treat chr(0x31) the same as "\x31" ?
teststring in your example is escaping the backslash; you have "\\x30", instead of "\x30". "\x30" is a length-1 string containing the byte 0x30; "\\x30" is a length-4 string containing the characters \, x, 3 and 0. Dropping the first slash in teststring should behave exactly like using port.write("\x30").
Conclusion: It's impossible to override or disable Python's built-in escape sequence processing, such that, you can skip using the raw prefix specifier. I dug into Python's internals to figure this out. So if anyone tries designing objects that work on complex strings (like regex) as part of some kind of framework, make sure to specify in the docstrings that string arguments to the object's __init__() MUST include the r prefix!
Original question: I am finding it a bit difficult to force Python to not "change" anything about a user-inputted string, which may contain among other things, regex or escaped hexadecimal sequences. I've already tried various combinations of raw strings, .encode('string-escape') (and its decode counterpart), but I can't find the right approach.
Given an escaped, hexadecimal representation of the Documentation IPv6 address 2001:0db8:85a3:0000:0000:8a2e:0370:7334, using .encode(), this small script (called x.py):
#!/usr/bin/env python
class foo(object):
__slots__ = ("_bar",)
def __init__(self, input):
if input is not None:
self._bar = input.encode('string-escape')
else:
self._bar = "qux?"
def _get_bar(self): return self._bar
bar = property(_get_bar)
#
x = foo("\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
print x.bar
Will yield the following output when executed:
$ ./x.py
\x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4
Note the \x20 got converted to an ASCII space character, along with a few others. This is basically correct due to Python processing the escaped hex sequences and converting them to their printable ASCII values.
This can be solved if the initializer to foo() was treated as a raw string (and the .encode() call removed), like this:
x = foo(r"\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
However, my end goal is to create a kind of framework that can be used and I want to hide these kinds of "implementation details" from the end user. If they called foo() with the above IPv6 address in escaped hexadecimal form (without the raw specifier) and immediately print it back out, they should get back exactly what they put in w/o knowing or using the raw specifier. So I need to find a way to have foo's __init__() do whatever processing is necessary to enable that.
Edit: Per this SO question, it seems it's a defect of Python, in that it always performs some kind of escape sequence processing. There does not appear to be any kind of facility to completely turn off escape sequence processing, even temporarily. Sucks. I guess I am going to have to research subclassing str to create something like rawstr that intelligently determines what escape sequences Python processed in a string, and convert them back to their original format. This is not going to be fun...
Edit2: Another example, given the sample regex below:
"^.{0}\xcb\x00\x71[\x00-\xff]"
If I assign this to a var or pass it to a function without using the raw specifier, the \x71 gets converted to the letter q. Even if I add .encode('string-escape') or .replace('\\', '\\\\'), the escape sequences are still processed. thus resulting in this output:
"^.{0}\xcb\x00q[\x00-\xff]"
How can I stop this, again, without using the raw specifier? Is there some way to "turn off" the escape sequence processing or "revert" it after the fact thus that the q turns back into \x71? Is there a way to process the string and escape the backslashes before the escape sequence processing happens?
I think you have an understandable confusion about a difference between Python string literals (source code representation), Python string objects in memory, and how that objects can be printed (in what format they can be represented in the output).
If you read some bytes from a file into a bytestring you can write them back as is.
r"" exists only in source code there is no such thing at runtime i.e., r"\x" and "\\x" are equal, they may even be the exact same string object in memory.
To see that input is not corrupted, you could print each byte as an integer:
print " ".join(map(ord, raw_input("input something")))
Or just echo as is (there could be a difference but it is unrelated to your "string-escape" issue):
print raw_input("input something")
Identity function:
def identity(obj):
return obj
If you do nothing to the string then your users will receive the exact same object back. You can provide examples in the docs what you consider a concise readable way to represent input string as Python literals. If you find confusing to work with binary strings such as "\x20\x01" then you could accept ascii hex-representation instead: "2001" (you could use binascii.hexlify/unhexlify to convert one to another).
The regex case is more complex because there are two languages:
Escapes sequences are interpreted by Python according to its string literal syntax
Regex engine interprets the string object as a regex pattern that also has its own escape sequences
I think you will have to go the join route.
Here's an example:
>>> m = {chr(c): '\\x{0}'.format(hex(c)[2:].zfill(2)) for c in xrange(0,256)}
>>>
>>> x = "\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34"
>>> print ''.join(map(m.get, x))
\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34
I'm not entirely sure why you need that though. If your code needs to interact with other pieces of code, I'd suggest that you agree on a defined format, and stick to it.