How do I enclose a variable within single quotations in python? It's probably very simple but I can't seem to get it! I need to url-encode the variable term. Term is entered in a form by a user and is passed to a function where it is url-encoded term=urllib.quote(term). If the user entered "apple computer" as their term, after url-encoding it would be "apple%20comptuer". What I want to do is have the term surrounded by single-quotes before url encoding, so that it will be "'apple computer'" then after url-encoding "%23apple%20computer%23". I need to pass the term to a url and it won't work unless I use this syntax. Any suggestions?
Sample Code:
import urllib2
import requests
def encode():
import urllib2
query= avariable #The word this variable= is to be enclosed by single quotes
query = urllib2.quote(query)
return dict(query=query)
def results():
bing = "https://api.datamarket.azure.com/Data.ashx/Bing/SearchWeb/Web?Query=%(query)s&$top=50&$format=json"
API_KEY = 'akey'
r = requests.get(bing % encode(), auth=('', API_KEY))
return r.json
There are four ways:
string concatenation
term = urllib.quote("'" + term + "'")
old-style string formatting
term = urllib.quote("'%s'" % (term,))
new-style string formatting
term = urllib.quote("'{}'".format(term))
f-string style formatting (python 3.6+)
term = urllib.quote(f"'{term}'")
You can just use string interpolation:
>>> term = "foo"
>>> "'%s'" % term
"'foo'"
For those that are coming here while googling something like "python surround string" and are time conscientious (or just looking for the "best" solution).
I was going to add in that there are now f-strings which for Python 3.6+ environments are way easier to use and (from what I read) they say are faster.
#f-string approach
term = urllib.parse.quote(f"'{term}'")
I decided to do a timeit of each method of "surrounding" a string in python.
import timeit
results = {}
results["concat"] = timeit.timeit("\"'\" + 'test' + \"'\"")
results["%s"] = timeit.timeit("\"'%s'\" % ('test',)")
results["format"] = timeit.timeit("\"'{}'\".format('test')")
results["f-string"] = timeit.timeit("f\"'{'test'}'\"") #must me using python 3.6+
results["join"] = timeit.timeit("'test'.join((\"'\", \"'\"))")
for n, t in sorted(results.items(), key = lambda nt: nt[1]):
print(f"{n}, {t}")
Results:
concat, 0.009532792959362268
f-string, 0.08994143106974661
join, 0.11005984898656607
%s, 0.15808712202124298
format, 0.2698059631511569
Oddly enough, I'm getting that concatenation is faster than f-string every time I run it, but you can copy and paste to see if your string/use works differently, there may also be a better way to put them into timeit than \ escaping all the quotes so let me know
Try it online!
def wrap_and_encode(x):
return encode("'%s'" % x)
Should be what you are looking for.
What's wrong with adding the single quotes after it being url encoded? Or, just adding them before hand in you encode function above?
I just stumbled upon some code doing it this way:
term = urllib.quote(term.join(("'", "'")))
(In this case join() uses term as a separator to combine all elements that were given in the iterable parameter into one string. Since there are only two elements, they are simply wrapped around one instance of term.)
Although it is quite readable, I would still consider it a hack and less readable than other options. Therefore, I recommend the use of string formatting as mentioned by others:
term = urllib.quote("'{}'".format(term))
Related
I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.
import re
string="b'#DerkGently #seanferg85 #Umbertobaggio #EL4JC and he already had Popular support.. most people know this already. A\xe2\x80\xa6 '"
print(re.findall(r"\x[0-9a-z]{2}",string))
The the list returned by the findall() function is empty :(
The problem here is that your string is the Python representation of a Python bytes object, which is pretty much useless.
Most likely, you had a bytes object, like this:
b = b'#DerkGently #seanferg85 #Umbertobaggio #EL4JC and he already had Popular support.. most people know this already. A\xe2\x80\xa6 '
… and you converted it to a string, like this:
s = str(b)
Don't do that. Instead, decode it:
s = b.decode('utf-8')
That will get you the actual characters, which you can then match easily, instead of trying to match the characters in the string representation of the bytes representation and then reconstructing the actual characters laboriously from the results.
However, it's worth noting that \xe2\x80\xa6 is not an emoji, it's a horizontal ellipsis character, …. If that isn't what you wanted, you already corrupted the data before this point.
Not a regexp per se, but might help you out any way.
def emojis(s):
return [c for c in s if ord(c) in range(0x1F600, 0x1F64F)]
print(emojis("hello world 😊")) # sample usage
You need to re.compile(ur'A\xe2\x80\xa6',re.UNICODE)
Compile a Unicode regex and use that pattern matching for your find,find all’s,subs,etc.
Try this. I joined the string in your question with that in your title to make the final search string
import re
k = r"#DerkGently #seanferg85 #Umbertobaggio #EL4JC and he already had Popular support.. most people know this already. A\xe2\x80\xa6 for a string like \x60\xe2\x4b(indicating a emoticon) using regular expression in python"
print(k)
print()
p = re.findall(r"((\\x[a-z0-9]{1,}){1,})", k)
for each in p:
print(each[0])
Output
#DerkGently #seanferg85 #Umbertobaggio #EL4JC and he already had Popular support.. most people know this already. A\xe2\x80\xa6 for a string like \x60\xe2\x4b(indicating a emoticon) using regular expression in python
\xe2\x80\xa6
\x60\xe2\x4b
Currently to add single quotes around a string, the best solution I came up with was to make a small wrapper function.
def foo(s1):
return "'" + s1 + "'"
Is there an easier more pythonic way of doing this?
Here's another (perhaps more pythonic) option, using format strings:
def foo(s1):
return "'{}'".format(s1)
What about:
def foo(s1):
return "'%s'" % s1
Just wanted to highlight what #metatoaster said in the comment above, as I missed it at first.
Using repr(string) will add single quotes, then double quotes outside of that, then single quotes outside of that with escaped inner single quotes, then onto other escaping.
Using repr(), as a built-in, is more direct, unless there are other conflicts..
s = 'strOrVar'
print s, repr(s), repr(repr(s)), ' ', repr(repr(repr(s))), repr(repr(repr(repr(s))))
# prints: strOrVar 'strOrVar' "'strOrVar'" '"\'strOrVar\'"' '\'"\\\'strOrVar\\\'"\''
The docs state its basically state repr(), i.e. representation, is the reverse of eval():
"For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(),.."
Backquotes would be shorter, but are removed in Python 3+.
Interestingly, StackOverflow uses backquotes to specify code spans, instead of highlighting a code block and clicking the code button - it has some interesting behavior though.
This works on Python 3.5+
def foo2(char):
return("'{}'".format(char))
Very simple, I know, but the docs aren't too helpful. I'm trying to hash a simple string. I was following this guide. The example given therein is:
import hashlib
hash_object = hashlib.md5(b'Hello World')
print(hash_object.hexdigest())
And then you have a hash representation. Suppose I want to take this one step further. I have four strings I want to concatenate together, the result of which needs to be converted to byte sequence, in order to be passed to the hashlib.md5() function. However, I'm curious how I can replicate the b'Hello World' syntax using a variable instead of a hard-coded string. Docs seem to suggest you can pass in a format to the built-in format function, so for my use-case something like:
my_string = '%s%s%s%s' % (first, second, third, fourth)
byte_string = format(my_string, 'b')
This doesn't quite work, though. How do I do this?
Strings in Python are a sequence of characters, to convert a string to a sequence of bytes you encode it using some character set. For example:
my_string = '%s%s%s%s' % (first, second, third, fourth)
byte_string = my_string.encode('utf-8')
Instead of my_string.encode('utf-8') you could also use bytes(my_string, 'utf-8'), these are equivalent. You can also use a different encoding if you like, but UTF-8 is generally a good choice because it is capable of representing any code point (character) and it is fairly compact, especially for ASCII data.
my_string = '%s%s%s' % (first, second, third, fourth)
byte_string = bytes(my_string)
Is there a way to influence the kind of quotes that python uses when casting a tuple/list to string?
For some NLP software I get tuples somewhat like this:
("It", ("isn't", "true"))
I want to cast it to a string and simply remove all double quotes and commas:
(It (Isn't true))
However, python is having its way with the quotes, it seems to prefer single quotes:
>>> print str(("It", ("Isn't" ,"true")))
('It', ("Isn't", 'true'))
, making my life more difficult. Of course I could write my own function for printing it out part-by-part, but there is so much similarity between the representation and native python tuples.
You can't rely on the exact representation that repr uses. I'd just do as you thought and write your own function -- I don't see it being more than a handful of lines of code. This should get you going.
def s_exp(x):
if isinstance(x, (tuple, list)):
return '(%s)' % (' '.join(map(s_exp, x)))
return str(x)
Writing your own function may be inevitable: if your strings contain brackets "(", ")" or spaces " " then you'll need some form of escaping to produce well-formed s-expressions.
Perhaps you can use json instead
>>> import json
>>> print json.dumps(("It", ("isn't", "true")))
["It", ["isn't", "true"]]
Python objects have a __str__ method that converts them into a string representation. This is what does the conversion and it's intelligent enough to use one kind of quote when the other is used in the string and also to do escaping if both are used.
In your example, the It got single quoted since that's what Python "prefers". The double quote was used for Isn't since it contains a `.
You should roll out your own converter really. Using a little recursion, it should be quite small.