for each_ID ,each_Title in zip(Id,Title):
url="http://www.zjjsggzy.gov.cn/%E6%96%B0%E6%B5%81%E7%A8%8B/%E6%8B%9B%E6%8A%95%E6%A0%87%E4%BF%A1%E6%81%AF/jyxx_1.html?iq=x&type=%E6%8B%9B%E6%A0%87%E5%85%AC%E5%91%8A&tpid=%s&tpTitle=%s"%(each_ID,each_Title)
“each_ID”and “each_Title” are from website unicode parameters, but why it cause a “float”error, %s is not a string?
You have loads of % formatters in your string. %E formats a float object. You have several of those in your string, including at the start:
"http://www.zjjsggzy.gov.cn/%E6
# ^^
You'd need to double up every single % used in a URL character escape:
"http://www.zjjsggzy.gov.cn/%%E6%%96%%B0%%E6%%B5%%81%%E7%%A8%%8B/..."
That'd be a lot of work, you'd be better off using a different string formatting style. Use str.format():
url = (
"http://www.zjjsggzy.gov.cn/"
"%E6%96%B0%E6%B5%81%E7%A8%8B/%E6%8B%9B%E6%8A%95%E6%A0%87%E4%BF%A1%E6%81%AF"
"/jyxx_1.html?iq=x&type=%E6%8B%9B%E6%A0%87%E5%85%AC%E5%91%8A&"
"tpid={}&tpTitle={}".format(
each_ID, each_Title)
)
I broke the string up into multiple chunks to make it easier to read; the {} brackets delineate the placeholders.
Try using the format method on string. The existing '%' chars conflicting with your %s placeholders :
for each_ID ,each_Title in zip(Id,Title):
url="http://www.zjjsggzy.gov.cn/%E6%96%B0%E6%B5%81%E7%A8%8B/%E6%8B%9B%E6%8A%95%E6%A0%87%E4%BF%A1%E6%81%AF/jyxx_1.html?iq=x&type=%E6%8B%9B%E6%A0%87%E5%85%AC%E5%91%8A&tpid={}&tpTitle={}".format(each_ID, each_Title)
Related
I have a string s, its contents are variable. How can I make it a raw string? I'm looking for something similar to the r'' method.
i believe what you're looking for is the str.encode("string-escape") function. For example, if you have a variable that you want to 'raw string':
a = '\x89'
a.encode('unicode_escape')
'\\x89'
Note: Use string-escape for python 2.x and older versions
I was searching for a similar solution and found the solution via:
casting raw strings python
Raw strings are not a different kind of string. They are a different way of describing a string in your source code. Once the string is created, it is what it is.
Since strings in Python are immutable, you cannot "make it" anything different. You can however, create a new raw string from s, like this:
raw_s = r'{}'.format(s)
As of Python 3.6, you can use the following (similar to #slashCoder):
def to_raw(string):
return fr"{string}"
my_dir ="C:\data\projects"
to_raw(my_dir)
yields 'C:\\data\\projects'. I'm using it on a Windows 10 machine to pass directories to functions.
raw strings apply only to string literals. they exist so that you can more conveniently express strings that would be modified by escape sequence processing. This is most especially useful when writing out regular expressions, or other forms of code in string literals. if you want a unicode string without escape processing, just prefix it with ur, like ur'somestring'.
For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:
with open('sentences.csv', 'w') as f:
current_idx = 0
for idx, doc in sentences.items():
# Insert a newline to separate documents
if idx != current_idx:
f.write('\n')
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')
This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):
Makes sure the fast-path emits in order.
#param value the value to emit or queue up\n#param delayError if true, errors are delayed until the source has terminated\n#param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{#code amb} does not operate by default on a particular {#link Scheduler}.
#param the common element type\n#param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
#return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n#see ReactiveX operators documentation: Amb
...
Just format like that:
s = "your string"; raw_s = r'{0}'.format(s)
With a little bit correcting #Jolly1234's Answer:
here is the code:
raw_string=path.encode('unicode_escape').decode()
s = "hel\nlo"
raws = '%r'%s #coversion to raw string
#print(raws) will print 'hel\nlo' with single quotes.
print(raws[1:-1]) # will print hel\nlo without single quotes.
#raws[1:-1] string slicing is performed
The solution, which worked for me was:
fr"{orignal_string}"
Suggested in comments by #ChemEnger
I suppose repr function can help you:
s = 't\n'
repr(s)
"'t\\n'"
repr(s)[1:-1]
't\\n'
Just simply use the encode function.
my_var = 'hello'
my_var_bytes = my_var.encode()
print(my_var_bytes)
And then to convert it back to a regular string do this
my_var_bytes = 'hello'
my_var = my_var_bytes.decode()
print(my_var)
--EDIT--
The following does not make the string raw but instead encodes it to bytes and decodes it.
I am trying to format the string below with variables for readability, I would like to break it up so it easier read, right now it takes up 199 characters in the script line, every attempt I make seems to break it up so when printed it has large gaps, can anyone shed some light? I tried wrapping it in """ triple quotes and \ at the end but it still has spaces when printed or logged.
copy_sql = "COPY {0} FROM 's3://{1}/{2}' CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5}; ".format(table_name,bucket,key,aws_access_key_id,aws_secret_access_key,options)
Desired result would be something to this affect:
copy_sql = "COPY {0} FROM 's3://{1}/{2}' \
CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5}; \
".format(table_name,bucket,key, \
aws_access_key_id,aws_secret_access_key,options)
However when I print it I get large spaces between .gz and credentials:
COPY analytics.table FROM 's3://redshift-fake/storage/2017-11-02/part-00000.gz' CREDENTIALS 'aws_access_key_id=SECRET;aws_secret_access_key=SECRET' DELIMITER '\t' dateformat 'auto' fillrecord removequotes gzip;
I am thinking this would still work but I would like to clean it up for logging readability.
You can use string literal concatenation:
Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld".
In your case, something like this:
copy_sql = ("COPY {0} FROM 's3://{1}/{2}' "
"CREDENTIALS 'aws_access_key_id={3};aws_secret_access_key={4}' {5};"
).format(table_name,bucket,key,
aws_access_key_id,aws_secret_access_key,options)
Note the extra parentheses to make it parse correctly. As long as a line ending is inside at least one pair of parenthes, Python will always treat it as a line continuation, without the need for backslashes.
s='s=%r;print(s%%s)';print(s%s)
I understand % is to replace something in a string by s (but actually who to replace?)
Maybe more intriguing is, why the print(s%%s) become print(s%s) automatically after %s is replaced by s itself?
The "%%" you see in that code is a "conversion specifier" for the older printf-style of string formatting.
Most conversion specifiers tell Python how to convert an argument that is passed into the % format operator (for instance, "%d" says to convert the next argument to a decimal integer before inserting it into the string).
"%%" is different, because it directly converts to a single "%" character without consuming an argument. This conversion is needed in the format string specification, since otherwise any "%" would be taken as the first part of some other code and there would be no easy way to produce a string containing a percent sign.
The code you show is a quine (a program that produces its own code as its output). When it runs print(s%s), it does a string formatting operation where both the format string, and the single argument are the same string, s.
The "%r" in the string is a conversion specifier that does a repr of its argument. repr on a string produces the string with quotes around it. This is where the quoted string comes from in the output.
The "%%" produces the % operator that appears between the two s's in the print call. If only one "%" was included in s, you'd get an error about the formatting operation expecting a second argument (since %s is another conversion specifier).
print '% %s' % '' #wrong
print '%% %s' % '' #correct and print '% '
Think about \\ and \.
I'm writing a common library to setup an automation test suite with Selenium 2.0 Python's webdriver.
def verify_error_message_present(self, message):
try:
self.driver.find_element_by_xpath("//span[#class='error'][contains(.,'%s')]" % message)
self.assertTrue(True, "Found an error message containing %s" % message
except Exception, e:
self.logger.exception(e)
I would like to escape the message before passing it to XPath query, so it can support if 'message' is something like "The number of memory slots used (32) exceeds the number of memory slots that are available (16)"
Without escaping, the xpath query won't work since it contains '(' and ')'
Which library can we use to do this in Python?
I know that this is a simple question, but I don't have so much experience in Python (just started).
Thanks in advance.
Additional info:
During testing in firebug, the query below will return no result:
//span[#class='error'][contains(.,'The number of memory slots used (32) exceeds the number of memory slots that are available (16)')]
While the query below will return the desired component:
//span[#class='error'][contains(.,'The number of memory slots used \(32\) exceeds the number of memory slots that are available \(16\)')]
Logically this problem can be solved by replacing ) with \) for this particular string literal, but then there are still the other characters need to be escaped. So is there any library to do this in a proper way?
Parentheses should be fine there. They're inside an XPath string literal delimited by apostrophe, so they do not prematurely end the contains condition.
The problem is what happens when you have apostrophes in your string, since those do end the string literal, breaking the expression. Unfortunately there is no string escaping scheme for XPath string literals, so you have to work around it using expressions to generate the troublesome characters, typically in the form concat('str1', "'", 'str2').
Here's a Python function to do that:
def toXPathStringLiteral(s):
if "'" not in s: return "'%s'" % s
if '"' not in s: return '"%s"' % s
return "concat('%s')" % s.replace("'", "',\"'\",'")
"//span[#class='error'][contains(.,%s)]" % toXPathStringLiteral(message)
I'm working on something in Python and I need to escape from %20, the space in a URL. For example:
"%20space%20in%20%d--" % ordnum
So I need to use %20 for a URL but then %d for a number. But I get this error:
TypeError: not enough arguments for format string
I know what the problem is I just don't know how to escape from a %20 and fix it.
One way would be to double the % characters:
"%%20space%%20in%%20%d--" % ordnum
but a probably better way is using urllib.quote_plus():
urllib.quote_plus(" space in %d--" % ordnum)
The %20 should look like %%20 when Python's formatter sees it. To Python, %% formats out to %.
>>> import urllib
>>> unquoted = urllib.unquote("%20space%20in%20%d--")
>>> ordnum = 15
>>> print unquoted % ordnum
space in 15--
I see three ways to solve this:
Escape the %.
"%%%20dogs" % 11
Use the new .format syntax.
"{}%20dogs".format(11)
Use the + sign instead of %20, as I think that's possible as well.
"%+dogs" % 11