The following is a function that my professor made, and we are to use this function in our code:
import os
def rando_num(num_bytes):
return ord(os.urandom(num_bytes))
Instructions say to make a create_list function, call rando_num within create_list, and to do some checks on the return. I keep getting errors, and out of curiosity I tested rando_num by itself and got the same errors.
If I call
rando_num('5')
I get "'str' object cannot be interpreted as an integer"
If I call
rando_num(5)
I get "ord() expected a character, but string of length 5 found"
If I try
rando_num('a') I get "'str' object cannot be interpreted as an integer"
I read up on os.urandom and ord, so I'm confused about what I need to put in the function. I thought urandom returns a string, but then the error says ord expects a character. I can't alter the professor's code, so how in the world do I use it?
As Python's documentation for os.urandom says, urandom returns "a string of n random bytes." And as Python's documentation for ord says, ord must be "given a string representing one Unicode character."
The only way I can see to reconcile those two without modifying your professor's code is to have urandom return only one byte which Python interprets as a string with one Unicode character. That character will be limited to have the ord result of 0 through 255, but it will work.
So set up your code to always call rando_num with an argument of 1. Calling that repeatedly in effect gives a sequence of random numbers between 0 and 255, but done by the operating system rather than by the programming language (which is want random does). This does work in my Python 3.7.
Related
I am taking a byte as input
b'\xe2I4\xdd\r\xe5\xfcy^4\xd5'
but it gets converted into string.
so when i am to trying to convert this string to byte it is again manipulating it and giving me output as:
b"b'\\xe2I4\\xdd\\r\\xe5\\xfcy^4\\xd5'"
My desired output is that when i provide b'\xe2I4\xdd\r\xe5\xfcy^4\xd5' it convert it into byte as it is without manipulating it or adding any character or symbol.
Any resource or reference will be helpful.
Thanks In Advance
The BAD idea
You could pass the value of input() to eval() function. The function eval() directly evaluates a string as if it were directly executed in Python. Although it might feel like a best feature at first but due to the same reason it is pretty much unsafe to use it within any production-level application, since, the user can execute any code using that which might cause a lot of problems.
Better alternative
You can use a safer alternative to eval() which is ast.literal_eval(). This function evaluates a given string as Python literals (like string, numbers, bytes object, etc.). In case if a string does not contain any literal (like function calls, object creation, assignment, etc.), this function throws an error. Enough about that let's see how you could get this working.
import ast
user_input = input()
eval_expr = ast.literal_eval(user_input)
If you want to check if the input is a bytes literal, you could use the isinstance function to check and then perform required action.
# Optional: Handle if `eval_expr` is not a `bytes` literal
if not isinstance(eval_expr, bytes):
...
So, all you need to do is import the module ast first. Then take the user's input. Thereafter pass this input string to ast.literal_eval() function to evaluate the string.
Conclusion: It's impossible to override or disable Python's built-in escape sequence processing, such that, you can skip using the raw prefix specifier. I dug into Python's internals to figure this out. So if anyone tries designing objects that work on complex strings (like regex) as part of some kind of framework, make sure to specify in the docstrings that string arguments to the object's __init__() MUST include the r prefix!
Original question: I am finding it a bit difficult to force Python to not "change" anything about a user-inputted string, which may contain among other things, regex or escaped hexadecimal sequences. I've already tried various combinations of raw strings, .encode('string-escape') (and its decode counterpart), but I can't find the right approach.
Given an escaped, hexadecimal representation of the Documentation IPv6 address 2001:0db8:85a3:0000:0000:8a2e:0370:7334, using .encode(), this small script (called x.py):
#!/usr/bin/env python
class foo(object):
__slots__ = ("_bar",)
def __init__(self, input):
if input is not None:
self._bar = input.encode('string-escape')
else:
self._bar = "qux?"
def _get_bar(self): return self._bar
bar = property(_get_bar)
#
x = foo("\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
print x.bar
Will yield the following output when executed:
$ ./x.py
\x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4
Note the \x20 got converted to an ASCII space character, along with a few others. This is basically correct due to Python processing the escaped hex sequences and converting them to their printable ASCII values.
This can be solved if the initializer to foo() was treated as a raw string (and the .encode() call removed), like this:
x = foo(r"\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
However, my end goal is to create a kind of framework that can be used and I want to hide these kinds of "implementation details" from the end user. If they called foo() with the above IPv6 address in escaped hexadecimal form (without the raw specifier) and immediately print it back out, they should get back exactly what they put in w/o knowing or using the raw specifier. So I need to find a way to have foo's __init__() do whatever processing is necessary to enable that.
Edit: Per this SO question, it seems it's a defect of Python, in that it always performs some kind of escape sequence processing. There does not appear to be any kind of facility to completely turn off escape sequence processing, even temporarily. Sucks. I guess I am going to have to research subclassing str to create something like rawstr that intelligently determines what escape sequences Python processed in a string, and convert them back to their original format. This is not going to be fun...
Edit2: Another example, given the sample regex below:
"^.{0}\xcb\x00\x71[\x00-\xff]"
If I assign this to a var or pass it to a function without using the raw specifier, the \x71 gets converted to the letter q. Even if I add .encode('string-escape') or .replace('\\', '\\\\'), the escape sequences are still processed. thus resulting in this output:
"^.{0}\xcb\x00q[\x00-\xff]"
How can I stop this, again, without using the raw specifier? Is there some way to "turn off" the escape sequence processing or "revert" it after the fact thus that the q turns back into \x71? Is there a way to process the string and escape the backslashes before the escape sequence processing happens?
I think you have an understandable confusion about a difference between Python string literals (source code representation), Python string objects in memory, and how that objects can be printed (in what format they can be represented in the output).
If you read some bytes from a file into a bytestring you can write them back as is.
r"" exists only in source code there is no such thing at runtime i.e., r"\x" and "\\x" are equal, they may even be the exact same string object in memory.
To see that input is not corrupted, you could print each byte as an integer:
print " ".join(map(ord, raw_input("input something")))
Or just echo as is (there could be a difference but it is unrelated to your "string-escape" issue):
print raw_input("input something")
Identity function:
def identity(obj):
return obj
If you do nothing to the string then your users will receive the exact same object back. You can provide examples in the docs what you consider a concise readable way to represent input string as Python literals. If you find confusing to work with binary strings such as "\x20\x01" then you could accept ascii hex-representation instead: "2001" (you could use binascii.hexlify/unhexlify to convert one to another).
The regex case is more complex because there are two languages:
Escapes sequences are interpreted by Python according to its string literal syntax
Regex engine interprets the string object as a regex pattern that also has its own escape sequences
I think you will have to go the join route.
Here's an example:
>>> m = {chr(c): '\\x{0}'.format(hex(c)[2:].zfill(2)) for c in xrange(0,256)}
>>>
>>> x = "\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34"
>>> print ''.join(map(m.get, x))
\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34
I'm not entirely sure why you need that though. If your code needs to interact with other pieces of code, I'd suggest that you agree on a defined format, and stick to it.
I am trying to write a small Python 2.x API to support fetching a
job by jobNumber, where jobNumber is provided as an integer.
Sometimes the users provide ajobNumber as an integer literal
beginning with 0, e.g. 037537. (This is because they have been
coddled by R, a language that sanely considers 037537==37537.)
Python, however, considers integer literals starting with "0" to
be OCTAL, thus 037537!=37537, instead 037537==16223. This
strikes me as a blatant affront to the principle of least
surprise, and thankfully it looks like this was fixed in Python
3---see PEP 3127.
But I'm stuck with Python 2.7 at the moment. So my users do this:
>>> fetchJob(037537)
and silently get the wrong job (16223), or this:
>>> fetchJob(038537)
File "<stdin>", line 1
fetchJob(038537)
^
SyntaxError: invalid token
where Python is rejecting the octal-incompatible digit.
There doesn't seem to be anything provided via __future__ to
allow me to get the Py3K behavior---it would have to be built-in
to Python in some manner, since it requires a change to the lexer
at least.
Is anyone aware of how I could protect my users from getting the
wrong job in cases like this? At the moment the best I can think
of is to change that API so it take a string instead of an int.
At the moment the best I can think of is to change that API so it take a string instead of an int.
Yes, and I think this is a reasonable option given the situation.
Another option would be to make sure that all your job numbers contain at least one digit greater than 7 so that adding the leading zero will give an error immediately instead of an incorrect result, but that seems like a bigger hack than using strings.
A final option could be to educate your users. It will only take five minutes or so to explain not to add the leading zero and what can happen if you do. Even if they forget or accidentally add the zero due to old habits, they are more likely to spot the problem if they have heard of it before.
Perhaps you could take the input as a string, strip leading zeros, then convert back to an int?
test = "001234505"
test = int(test.lstrip("0")) # 1234505
My SDK comes with code appearing with rows like this
id=str(profile["id"])
It makes me wonder why something like the following shouldn't work
id=profile["id"]
Casting I believe is expensive so either the same type can be used or polymorphism at the method called. Can you tell why I must cast the id to a string?
Thank you
There is no casting in Python. Str(67) does not cast. It calls the __str__ method on the integer object, which generates a string representation of itself.
This is necessary to make sure that profile['id'] is a string.
It turns profile[id] into a string, python doesn't do this automatically, and further along in the code, the program probably checks profile[id] against a string. Without this conversion, you would get a typeerror: Trying to compare a string with an integer.
Python does not do arbitrary run time type conversion. You can't use an integer as a string.
It turns profile[id] into a string
As I understand it, files like /dev/urandom provide just a constant stream of bits. The terminal emulator then tries to interpret them as strings, which results in a mess of unrecognised characters.
How would I go about doing the same thing in python, send a string of ones and zeros to the terminal as "raw bits"?
edit
I may have to clarify:
Say for example the string I want to "print" is 1011100. On an ascii system, the output should be "\". If I cat /dev/urandom, it provides a constant stream of bits. Which get printed like this: "���c�g/�t]+__��-�;". That's what I want.
Stephano: the key is the incomplete answer by "#you" above - the chr function :
import random, sys
for i in xrange(500):
sys.stdout.write(chr(random.randrange(256)))
Use the chr function. I takes an input between 0 and 255 and returns a string containing the character corresponding to that value.
And from another question on StackOverflow you can get a _bin function.
def _bin(x, width):
return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))
Then simply put call _bin(ord(x), 8) where x is a character (string of length one)
import sys, random
while True:
sys.stdout.write(chr(random.getrandbits(8)))
sys.stdout.flush()