Python str() - specify which kind of quotes to add/use? - python

Is there a way to influence the kind of quotes that python uses when casting a tuple/list to string?
For some NLP software I get tuples somewhat like this:
("It", ("isn't", "true"))
I want to cast it to a string and simply remove all double quotes and commas:
(It (Isn't true))
However, python is having its way with the quotes, it seems to prefer single quotes:
>>> print str(("It", ("Isn't" ,"true")))
('It', ("Isn't", 'true'))
, making my life more difficult. Of course I could write my own function for printing it out part-by-part, but there is so much similarity between the representation and native python tuples.

You can't rely on the exact representation that repr uses. I'd just do as you thought and write your own function -- I don't see it being more than a handful of lines of code. This should get you going.
def s_exp(x):
if isinstance(x, (tuple, list)):
return '(%s)' % (' '.join(map(s_exp, x)))
return str(x)
Writing your own function may be inevitable: if your strings contain brackets "(", ")" or spaces " " then you'll need some form of escaping to produce well-formed s-expressions.

Perhaps you can use json instead
>>> import json
>>> print json.dumps(("It", ("isn't", "true")))
["It", ["isn't", "true"]]

Python objects have a __str__ method that converts them into a string representation. This is what does the conversion and it's intelligent enough to use one kind of quote when the other is used in the string and also to do escaping if both are used.
In your example, the It got single quoted since that's what Python "prefers". The double quote was used for Isn't since it contains a `.
You should roll out your own converter really. Using a little recursion, it should be quite small.

Related

How to convert a regular string to a raw string? [duplicate]

I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.

Python 2.X adding single quotes around a string

Currently to add single quotes around a string, the best solution I came up with was to make a small wrapper function.
def foo(s1):
return "'" + s1 + "'"
Is there an easier more pythonic way of doing this?
Here's another (perhaps more pythonic) option, using format strings:
def foo(s1):
return "'{}'".format(s1)
What about:
def foo(s1):
return "'%s'" % s1
Just wanted to highlight what #metatoaster said in the comment above, as I missed it at first.
Using repr(string) will add single quotes, then double quotes outside of that, then single quotes outside of that with escaped inner single quotes, then onto other escaping.
Using repr(), as a built-in, is more direct, unless there are other conflicts..
s = 'strOrVar'
print s, repr(s), repr(repr(s)), ' ', repr(repr(repr(s))), repr(repr(repr(repr(s))))
# prints: strOrVar 'strOrVar' "'strOrVar'" '"\'strOrVar\'"' '\'"\\\'strOrVar\\\'"\''
The docs state its basically state repr(), i.e. representation, is the reverse of eval():
"For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(),.."
Backquotes would be shorter, but are removed in Python 3+.
Interestingly, StackOverflow uses backquotes to specify code spans, instead of highlighting a code block and clicking the code button - it has some interesting behavior though.
This works on Python 3.5+
def foo2(char):
return("'{}'".format(char))

Why is escaping of single quotes inconsistent on file read in Python?

Given two nearly identical text files (plain text, created in MacVim), I get different results when reading them into a variable in Python. I want to know why this is and how I can produce consistent behavior.
For example, f1.txt looks like this:
This isn't a great example, but it works.
And f2.txt looks like this:
This isn't a great example, but it wasn't meant to be.
"But doesn't it demonstrate the problem?," she said.
When I read these files in, using something like the following:
f = open("f1.txt","r")
x = f.read()
I get the following when I look at the variables in the console. f1.txt:
>>> x
"This isn't a great example, but it works.\n\n"
And f2.txt:
>>> y
'This isn\'t a great example, but it wasn\'t meant to be. \n"But doesn\'t it demonstrate the problem?," she said.\n\n'
In other words, f1 comes in with only escaped newlines, while f2 also has its single quotes escaped.
repr() shows what's going on. first for f1:
>>> repr(x)
'"This isn\'t a great example, but it works.\\n\\n"'
And f2:
>>> repr(y)
'\'This isn\\\'t a great example, but it wasn\\\'t meant to be. \\n"But doesn\\\'t it demonstrate the problem?," she said.\\n\\n\''
This kind of behavior is driving me crazy. What's going on and how do I make it consistent? If it matters, I'm trying to read in plain text, manipulate it, and eventually write it out so that it shows the properly escaped characters (for pasting into Javascript code).
Python is giving you a string literal which, if you gave it back to Python, would result in the same string. This is known as the repr() (short for "representation") of the string. This may not (probably won't, in fact) match the string as it was originally specified, since there are so many ways to do that, and Python does not record anything about how it was originally specified.
It uses double quotes around your first example, which works fine because it doesn't contain any double quotes. The second string contains double quotes, so it can't use double quotes as a delimiter. Instead it uses single quotes and uses backslashes to escape the single quotes in the string (it doesn't have to escape the double quotes this way, and there are more of them than there are single quotes). This keeps the representation as short as possible.
There is no reason for this behavior to drive you crazy and no need to try to make it consistent. You only get the repr() of a string when you are peeking at values in Python's interactive mode. When you actually print or otherwise use the string, you get the string itself, not a reconstituted string literal.
If you want to get a JavaScript string literal, the easiest way is to use the json module:
import json
print json.dumps('I said, "Hello, world!"')
Both f1 and f2 contain perfectly normal, unescaped single quotes.
The fact that their repr looks different is meaningless.
There are a variety of different ways to represent the same string. For example, these are all equivalent literals:
"abc'def'ghi"
'abc\'def\'ghi'
'''abc'def'ghi'''
r"abc'def'ghi"
The repr function on a string always just generates some literal that is a valid representation of that string, but you shouldn't depend on exactly which one it generate. (In fact, you should rarely use it for anything but debugging purposes in the first place.)
Since the language doesn't define anywhere what algorithm it uses to generate a repr, it could be different for each version of each implementation.
Most of them will try to be clever, using single or double quotes to avoid as many escaped internal quotes as possible, but even that isn't guaranteed. If you really want to know the algorithm for a particular implementation and version, you pretty much have to look at the source. For example, in CPython 3.3, inside unicode_repr, it counts the number of quotes of each type; then if there are single quotes but no double quotes, it uses " instead of '.
If you want "the" representation of a string, you're out of luck, because there is no such thing. But if you want some particular representation of a string, that's no problem. You just have to know what format you want; most formats, someone's already written the code, and often it's in the standard library. You can make C literal strings, JSON-encoded strings, strings that can fit into ASCII RFC822 headers… But all of those formats have different rules from each other (and from Python literals), so you have to use the right function for the job.

Is there an easy way to convert a string containing a string literal into the string it represents?

I'm trying to (slightly) improve a script that does a quick-and-hacky parse of some config files.
Upon recognising "an item" read from the file, I need to try to convert it into a simple python value. The value could be a number or a string.
To convert strings read from the file into Python numbers I can just use int or float and catch the ValueError if it wasn't actually a number. Is there something similar for Python strings? i.e.
s1 = 'Goodbye World. :('
s2 = repr(s1)
s3 = ' "not a string literal" '
s4 = s3.strip()
v1 = parse_string_literal(s1) # throws ValueError
v2 = parse_string_literal(s2) # returns 'Goodby World. :('
v3 = parse_string_literal(s3) # throws ValueError
v4 = parse_string_literal(s4) # returns 'not a string literal'
In the file, string values are represented very similarly to Python string literals; they can be quoted with either ' or ", and could contain backslash escapes, etc. I could roll my own parser with regexes, but if there's something already existing I'd rather not re-invent the wheel.
I could use eval of course, but that's always somewhat dangerous.
... And sure enough, I just found the answer after I posted.
Even better than what I was looking for is ast.literal_eval: ast — Abstract Syntax Trees
It can evaluate any Python expression consisting solely of literals, which makes it safe. It also means I can recognise items from the config file that are potentially numbers or strings without having attempt multiple conversions, falling back to the next conversion on a ValueError exception. I don't even have to figure out what type the item is.
It's even way more flexible than I need, which could be a problem if I cared about making sure the item was only a number or a string, but I don't:
>>> ast.literal_eval('{"foo": [23.8, 170, (1, 2, 3)]}')
{'foo': [23.8, 170, (1, 2, 3)]}
ast.literal_eval() handles all simple Python literals, and most compound literals.

python string good practise: ' vs " [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Single quotes vs. double quotes in Python
I have seen that when i have to work with string in Python both of the following sintax are accepted:
mystring1 = "here is my string 1"
mystring2 = 'here is my string 2'
Is anyway there any difference?
Is it by any reason better use one solution rather than the other?
Cheers,
No, there isn't. When the string contains a single quote, it's easier to enclose it in double quotes, and vice versa. Other than this, my advice would be to pick a style and stick to it.
Another useful type of string literals are triple-quoted strings that can span multiple lines:
s = """string literal...
...continues on second line...
...and ends here"""
Again, it's up to you whether to use single or double quotes for this.
Lastly, I'd like to mention "raw string literals". These are enclosed in r"..." or r'...' and prevent escape sequences (such as \n) from being parsed as such. Among other things, raw string literals are very handy for specifying regular expressions.
Read more about Python string literals here.
While it's true that there is no difference between one and the other, I encountered a lot of the following behavior in the opensource community:
" for text that is supposed to be read (email, feeback, execption, etc)
' for data text (key dict, function arguments, etc)
triple " for any docstring or text that includes " and '
No. A matter of style only. Just be consistent.
I tend to using " simply because that's what most other programming languages use.
So, habit, really.
There's no difference.
What's better is arguable. I use "..." for text strings and '...' for characters, because that's consistent with other languages and may save you some keypresses when porting to/from different language. For regexps and SQL queries, I always use r'''...''', because they frequently end up containing backslashes and both types of quotes.
Python is all about the least amount of code to get the most effect. The shorter the better. And ' is, in a way, one dot shorter than " which is why I prefer it. :)
As everyone's pointed out, they're functionally identical. However, PEP 257 (Docstring Conventions) suggests always using """ around docstrings just for the purposes of consistency. No one's likely to yell at you or think poorly of you if you don't, but there it is.

Categories

Resources