Changing a python variable based on its content - python

I'm writing some stuff with Django and it's becoming a bit of a mess. I'm trying to create an app that can run and return data from up-loadable python modules with a text input field for calling functions.
Most of it's working and I can call functions with no arguments.
The problem is with entering functions which require arguments, as any input that's passed to python is in string form, meaning if I enter: param('a',[foo]) to call the function:
def param(a,b):
return "Hello world" + a + b
The function will return:
"Hello world'a'[foo]"
Basically I need a way to take a string and convert it to literal python code... if that's possible so that I can treat this input box as if it were the python console.
Any ideas? Any help would be greatly welcome! I do realize this isn't a very pythonic way of doing things.
EDIT due to concerns about security: I am not too worried about security issues as this will only ever be a local project.

You can use ast.literal_eval, it is more limited than eval but does not pose the risks:
Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
from ast import literal_eval
def param(a,b):
return "Hello world" + literal_eval(a) + literal_eval(b)
In [9]: param('" foo"', "'bar'")
Out[9]: 'Hello world foobar'
You can use a try/except to catch when you want a or b are just meant to be strings:

You can use eval for that.
def param(a, b):
return "Hello world" + eval(a) + eval(b)
It will surely produce an error if you try to concatenate a string and a list:
>>> param('", an unsafe world!"', '"!!!"')
'Hello world, an unsafe world!!!!'
>>> param('", an unsafe world!"', '[]')
TypeError: cannot concatenate 'str' and 'list' objects
As of getting rid of the quotation marks, I guess you can either add the quotation marks yourself (like eval('"{0}"'.format(a))) or use an ast.literal_eval() as proposed by Padraic.

Related

How to convert a regular string to a raw string? [duplicate]

I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.

How do I convert a string into a formatted string literal at run-time?

I am working my way through PEP 498. Maybe I do not quite get the concept, but how could I possibly convert a regular Python string into a formatted string literal and evaluate it at run-time?
Imagine you have a string:
some_string = 'Name: {name}'
If it was a formatted string literal, the following would work:
name = 'Demo'
some_string_literal = f'Name: {name}'
some_string_literal == 'Name: Demo' # returns True
Ignoring the question of whether this makes sense at all, how could I e.g. read the contents of some_string from a file at run-time, turn it into some_string_literal and evaluate it? My understanding of the underlying CPython implementation is that string literals are interpreted at compile-time (translation into byte code).
I am aware of the more "explicit" option of just using ...
some_string = 'Name: {name}'
some_string.format(name = 'Demo') == 'Name: Demo' # returns True
... but this is not what I am looking for.
EDIT: It was noted in the comments that the "explicit" option is what I should be looking for. I'd agree that what I am asking here is definitely insecure in a number of ways. Nevertheless, I am interested in whether there is a way to do this or not.
PEP 498 says
Because the compiler must be involved in evaluating the expressions contained in the interpolated strings, there must be some way to denote to the compiler which strings should be evaluated.
In other words, the f has an effect on the way the string literal is parsed, just as r does. So you have to invoke the parser for the interpolation to work.
You want to read data into some_string and turn it into a string literal that can be parsed as an f-string. That is not very different from wanting to read data into some_lambda_expression and turning it into a lambda expression. You can do it, of course, using eval, because eval is a way to turn the contents of a string variable into code, by invoking the parser. I know that is not what you wanted. But you can't parse some_string as an f-string without invoking the parser, so you can't have what you want.

Is it possible to suppress Python's escape sequence processing on a given string without using the raw specifier?

Conclusion: It's impossible to override or disable Python's built-in escape sequence processing, such that, you can skip using the raw prefix specifier. I dug into Python's internals to figure this out. So if anyone tries designing objects that work on complex strings (like regex) as part of some kind of framework, make sure to specify in the docstrings that string arguments to the object's __init__() MUST include the r prefix!
Original question: I am finding it a bit difficult to force Python to not "change" anything about a user-inputted string, which may contain among other things, regex or escaped hexadecimal sequences. I've already tried various combinations of raw strings, .encode('string-escape') (and its decode counterpart), but I can't find the right approach.
Given an escaped, hexadecimal representation of the Documentation IPv6 address 2001:0db8:85a3:0000:0000:8a2e:0370:7334, using .encode(), this small script (called x.py):
#!/usr/bin/env python
class foo(object):
__slots__ = ("_bar",)
def __init__(self, input):
if input is not None:
self._bar = input.encode('string-escape')
else:
self._bar = "qux?"
def _get_bar(self): return self._bar
bar = property(_get_bar)
#
x = foo("\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
print x.bar
Will yield the following output when executed:
$ ./x.py
\x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4
Note the \x20 got converted to an ASCII space character, along with a few others. This is basically correct due to Python processing the escaped hex sequences and converting them to their printable ASCII values.
This can be solved if the initializer to foo() was treated as a raw string (and the .encode() call removed), like this:
x = foo(r"\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
However, my end goal is to create a kind of framework that can be used and I want to hide these kinds of "implementation details" from the end user. If they called foo() with the above IPv6 address in escaped hexadecimal form (without the raw specifier) and immediately print it back out, they should get back exactly what they put in w/o knowing or using the raw specifier. So I need to find a way to have foo's __init__() do whatever processing is necessary to enable that.
Edit: Per this SO question, it seems it's a defect of Python, in that it always performs some kind of escape sequence processing. There does not appear to be any kind of facility to completely turn off escape sequence processing, even temporarily. Sucks. I guess I am going to have to research subclassing str to create something like rawstr that intelligently determines what escape sequences Python processed in a string, and convert them back to their original format. This is not going to be fun...
Edit2: Another example, given the sample regex below:
"^.{0}\xcb\x00\x71[\x00-\xff]"
If I assign this to a var or pass it to a function without using the raw specifier, the \x71 gets converted to the letter q. Even if I add .encode('string-escape') or .replace('\\', '\\\\'), the escape sequences are still processed. thus resulting in this output:
"^.{0}\xcb\x00q[\x00-\xff]"
How can I stop this, again, without using the raw specifier? Is there some way to "turn off" the escape sequence processing or "revert" it after the fact thus that the q turns back into \x71? Is there a way to process the string and escape the backslashes before the escape sequence processing happens?
I think you have an understandable confusion about a difference between Python string literals (source code representation), Python string objects in memory, and how that objects can be printed (in what format they can be represented in the output).
If you read some bytes from a file into a bytestring you can write them back as is.
r"" exists only in source code there is no such thing at runtime i.e., r"\x" and "\\x" are equal, they may even be the exact same string object in memory.
To see that input is not corrupted, you could print each byte as an integer:
print " ".join(map(ord, raw_input("input something")))
Or just echo as is (there could be a difference but it is unrelated to your "string-escape" issue):
print raw_input("input something")
Identity function:
def identity(obj):
return obj
If you do nothing to the string then your users will receive the exact same object back. You can provide examples in the docs what you consider a concise readable way to represent input string as Python literals. If you find confusing to work with binary strings such as "\x20\x01" then you could accept ascii hex-representation instead: "2001" (you could use binascii.hexlify/unhexlify to convert one to another).
The regex case is more complex because there are two languages:
Escapes sequences are interpreted by Python according to its string literal syntax
Regex engine interprets the string object as a regex pattern that also has its own escape sequences
I think you will have to go the join route.
Here's an example:
>>> m = {chr(c): '\\x{0}'.format(hex(c)[2:].zfill(2)) for c in xrange(0,256)}
>>>
>>> x = "\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34"
>>> print ''.join(map(m.get, x))
\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34
I'm not entirely sure why you need that though. If your code needs to interact with other pieces of code, I'd suggest that you agree on a defined format, and stick to it.

Python str() - specify which kind of quotes to add/use?

Is there a way to influence the kind of quotes that python uses when casting a tuple/list to string?
For some NLP software I get tuples somewhat like this:
("It", ("isn't", "true"))
I want to cast it to a string and simply remove all double quotes and commas:
(It (Isn't true))
However, python is having its way with the quotes, it seems to prefer single quotes:
>>> print str(("It", ("Isn't" ,"true")))
('It', ("Isn't", 'true'))
, making my life more difficult. Of course I could write my own function for printing it out part-by-part, but there is so much similarity between the representation and native python tuples.
You can't rely on the exact representation that repr uses. I'd just do as you thought and write your own function -- I don't see it being more than a handful of lines of code. This should get you going.
def s_exp(x):
if isinstance(x, (tuple, list)):
return '(%s)' % (' '.join(map(s_exp, x)))
return str(x)
Writing your own function may be inevitable: if your strings contain brackets "(", ")" or spaces " " then you'll need some form of escaping to produce well-formed s-expressions.
Perhaps you can use json instead
>>> import json
>>> print json.dumps(("It", ("isn't", "true")))
["It", ["isn't", "true"]]
Python objects have a __str__ method that converts them into a string representation. This is what does the conversion and it's intelligent enough to use one kind of quote when the other is used in the string and also to do escaping if both are used.
In your example, the It got single quoted since that's what Python "prefers". The double quote was used for Isn't since it contains a `.
You should roll out your own converter really. Using a little recursion, it should be quite small.

Python Command Line "characters" returns 'characters'

Thanks in advance for your help.
When entering "example" at the command line, Python returns 'example'. I can not find anything on the web to explain this. All reference materials speaks to strings in the context of the print command, and I get all of the material about using double quotes, singles quotes, triple quotes, escape commands, etc.
I can not, however, find anything explaining why entering text surrounded by double quotes at the command line always returns the same text surrounded by single quotes. What gives? Thanks.
In Python both 'string' and "string" are used to represent string literals. It's not like Java where single and double quotes represent different data types to the compiler.
The interpreter evaluates each line you enter and displays this value to you. In both cases the interpreter is evaluating what you enter, getting a string, and displaying this value. The default way of displaying strings is in single quotes so both times the string is displayed enclosed in single quotes.
It does seem odd - in that it breaks Python's rule of There should be one - and preferably only one - obvious way to do it - but I think disallowing one of the options would have been worse.
You can also enter a string literal using triple quotes:
>>> """characters
... and
... newlines"""
'characters\nand\nnewlines'
You can use the command line to confirm that these are the same thing:
>>> type("characters")
<type 'str'>
>>> type('characters')
<type 'str'>
>>> "characters" == 'characters'
True
The interpreter uses the __repr__ method of an object to get the display to print to you. So on your own objects you can determine how they are displayed in the interpreter. We can't change the __repr__ method for built in types, but we can customise the interpreter output using sys.displayhook:
>>> import sys
>>> def customoutput(value):
... if isinstance(value,str):
... print '"%s"' % value
... else:
... sys.__displayhook__(value)
...
>>> sys.displayhook = customoutput
>>> 'string'
"string"
In python, single quotes and double quotes are semantically the same.
It struck me as strange at first, since in C++ and other strongly-typed languages single quotes are a char and doubles are a string.
Just get used to the fact that python doesn't care about types, and so there's no special syntax for marking a string vs. a character. Don't let it cloud your perception of a great language
Don't get confused.
In python single quotes and double quotes are same. The creates an string object.

Categories

Resources