How does Python's triple-quote string work? - python

How should this function be changed to return "123456"?
def f():
s = """123
456"""
return s
UPDATE: Everyone, the question is about understanding how to not have \t or whatever when having a multiline comment, not how to use the re module.

Don't use a triple-quoted string when you don't want extra whitespace, tabs and newlines.
Use implicit continuation, it's more elegant:
def f():
s = ('123'
'456')
return s

def f():
s = """123\
456"""
return s
Don't indent any of the blockquote lines after the first line; end every line except the last with a backslash.

Subsequent strings are concatenated, so you can use:
def f():
s = ("123"
"456")
return s
This will allow you to keep indention as you like.

textwrap.dedent("""\
123
456""")
From the standard library. First "\" is necessary because this function works by removing the common leading whitespace.

Maybe I'm missing something obvious but what about this:
def f():
s = """123456"""
return s
or simply this:
def f():
s = "123456"
return s
or even simpler:
def f():
return "123456"
If that doesn't answer your question, then please clarify what the question is about.

You might want to check this str.splitlines([keepends])
Return a list of the lines in the string, breaking at line boundaries.
This method uses the universal newlines approach to splitting lines.
Line breaks are not included in the resulting list unless keepends is
given and true.
Python recognizes "\r", "\n", and "\r\n" as line boundaries for 8-bit strings.
So, for the problem at hand ... we could do somehting like this..
>>> s = """123
... 456"""
>>> s
'123\n456'
>>> ''.join(s.splitlines())
'123456'

re.sub('\D+', '', s)
will return a string, if you want an integer, convert this string with int.

Try
import re
and then
return re.sub("\s+", "", s)

My guess is:
def f():
s = """123
456"""
return u'123456'
Minimum change and does what is asked for.

Related

How to "render" \b in a string in python

I have a string with "\b" characters.
Is there a way to "render" the string or "apply" the escape sequences, in order to make the string looking it looks with the print() function?
How it looks like: Test..\b\b! 12344\b5
How it should look like: Test! 12345
Do you have an idea to solve my problem?
One way would be simply to use the replace method of the string object:
st = 'Test.\b!'
st.replace('.\b','')
# Out: 'Test!'
I found a solution with regex.
import re
def b(a):
while '\b' in a:
a = re.sub('[^\b]\b', '', a)
return a
b('Test..\b\b! 12344\b5')
# Out: 'Test! 12345'

In Python how to strip dollar signs and commas from dollar related fields only

I'm reading in a large text file with lots of columns, dollar related and not, and I'm trying to figure out how to strip the dollar fields ONLY of $ and , characters.
so say I have:
a|b|c
$1,000|hi,you|$45.43
$300.03|$MS2|$55,000
where a and c are dollar-fields and b is not.
The output needs to be:
a|b|c
1000|hi,you|45.43
300.03|$MS2|55000
I was thinking that regex would be the way to go, but I can't figure out how to express the replacement:
f=open('sample1_fixed.txt','wb')
for line in open('sample1.txt', 'rb'):
new_line = re.sub(r'(\$\d+([,\.]\d+)?k?)',????, line)
f.write(new_line)
f.close()
Anyone have an idea?
Thanks in advance.
Unless you are really tied to the idea of using a regex, I would suggest doing something simple, straight-forward, and generally easy to read:
def convert_money(inval):
if inval[0] == '$':
test_val = inval[1:].replace(",", "")
try:
_ = float(test_val)
except:
pass
else:
inval = test_val
return inval
def convert_string(s):
return "|".join(map(convert_money, s.split("|")))
a = '$1,000|hi,you|$45.43'
b = '$300.03|$MS2|$55,000'
print convert_string(a)
print convert_string(b)
OUTPUT
1000|hi,you|45.43
300.03|$MS2|55000
A simple approach:
>>> import re
>>> exp = '\$\d+(,|\.)?\d+'
>>> s = '$1,000|hi,you|$45.43'
>>> '|'.join(i.translate(None, '$,') if re.match(exp, i) else i for i in s.split('|'))
'1000|hi,you|45.43'
It sounds like you are addressing the entire line of text at once. I think your first task would be to break up your string by columns into an array or some other variables. Once you've don that, your solution for converting strings of currency into numbers doesn't have to worry about the other fields.
Once you've done that, I think there is probably an easier way to do this task than with regular expressions. You could start with this SO question.
If you really want to use regex though, then this pattern should work for you:
\[$,]\g
Demo on regex101
Replace matches with empty strings. The pattern gets a little more complicated if you have other kinds of currency present.
I Try this regex take if necessary.
\$(\d+)[\,]*([\.]*\d*)
SEE DEMO : http://regex101.com/r/wM0zB6/2
Use the regexx
((?<=\d),(?=\d))|(\$(?=\d))
eg
import re
>>> x="$1,000|hi,you|$45.43"
re.sub( r'((?<=\d),(?=\d))|(\$(?=\d))', r'', x)
'1000|hi,you|45.43'
Try the below regex and then replace the matched strings with \1\2\3
\$(\d+(?:\.\d+)?)(?:(?:,(\d{2}))*(?:,(\d{3})))?
DEMO
Defining a black list and checking if the characters are in it, is an easy way to do this:
blacklist = ("$", ",") # define characters to remove
with open('sample1_fixed.txt','wb') as f:
for line in open('sample1.txt', 'rb'):
clean_line = "".join(c for c in line if c not in blacklist)
f.write(clean_line)
\$(?=(?:[^|]+,)|(?:[^|]+\.))
Try this.Replace with empty string.Use re.M option.See demo.
http://regex101.com/r/gT6kI4/6

Returning NoneType vs. returning "" in re.search

I did some searching and did not see this specific issue, but let me know if it's a duplicate.
I wrote a function called find_results that searches a string for a separator character and then returns anything between the separator and a new line:
def find_results(findme, separator, string):
linelist=string.split('\n')
for line in linelist:
if re.search(findme, line):
#Split based on the separator we were sent, but only on the first occurrance
line = line.split(separator, 1)
return line[1].strip()
#End if line.find
#end for loop
return ""
#end find_results
The function works great, but I'm sure there's a more Pythonic way to accomplish the same task, and frankly I feel a little silly calling a custom function for such a simple thing.
I recently learned how to use Sets in regular expression, so I've been able to replace the function with an re.search call in some cases. If the separator is a colon, for example:
re.search("Preceeding\ Text:(.*)$", string).group(1)
The problem with this is that when there are no results, I get a "NoneType" crash because there is no attribute "group" on a "NoneType". I can check the results with an if or try / except statement, but that defeats the purpose of the change from using find_results to begin with.
My questions are:
Is there a way to suspend the NoneType crash and just have it return "" (blank)?
Is there a different one-line way to accomplish this?
If I have to use a custom function, is there a more Pythonic (and less embarrassing) way to write it?
Use str.partition:
def find_results(findme, separator, s):
tgt=s[s.find(findme):]
return tgt.partition(separator)[2]
>>> find_results('Text', ':', 'Preceding Text:the rest')
'the rest'
>>> find_results('Text', ';', 'Preceding Text:the rest')
''
>>> find_results('text', ':', 'Preceding Text:the rest')
''
Since partition always returns a 3 element tuple with the final element being '' for not found, that can probably even be your one liner:
>>> s='Preceding Text:the rest'
>>> s[s.find('Text'):].partition(':')[2]
'the rest'
>>> s[s.find('Text'):].partition(';')[2]
''
If the findme part or separator parts are only useful if they are regular expressions, use re.split with try/except:
def find_re_results(findme, separator, s):
p1=re.compile(findme)
p2=re.compile(separator)
m=p1.search(s)
if m:
li=p2.split(s[m.start():], maxsplit=1)
else:
return ''
try:
return li[1]
except IndexError:
return ''
Demo:
>>> find_re_results('\d+', '\t', 'Preceding 123:;[]\\:the rest')
''
>>> find_re_results('\d+', '\W+', 'Preceding 123:;[]\\:the rest')
'the rest'
>>> find_re_results('\t', '\W+', 'Preceding 123:;[]\\:the rest')
''
The one liner you are looking for is:
return re.findall(r'Preceeding\ Text:(.*)$', text) or ''
If there are no matches, findall() will return an empty list, in that case you want the result to be '' which is what the or will do.
Don't use string as a variable name, it conflicts with the built-in string module.
re.findall is a great way for searching for multiple instances of a pattern:
r = re.compile("^[^:]*:(.*)$", re.MULTILINE)
r.findall("a: b\nc: d")
Here is one-line code you want. Functional programming is really amazing.
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import re
if __name__ == '__main__':
findme= 'abc'
sep = ','
stringa = '1,2,3,4,5,abc'
print map(lambda line, findme=findme, sep=sep: line.split(sep, 1)[1].strip() if re.search(findme, line) else "", stringa.split('\n'))

Python complex regex replace

I'm trying to do a simple VB6 to c translator to help me port an open source game to the c language.
I want to be able to get "NpcList[NpcIndex]" from "With Npclist[NpcIndex]" using ragex and to replace it everywhere it has to be replaced. ("With" is used as a macro in VB6 that adds Npclist[NpcIndex] when ever it needs to until it founds "End With")
Example:
With Npclist[NpcIndex]
.goTo(245) <-- it should be replaced with Npclist[NpcIndex].goTo(245)
End With
Is it possible to use regex to do the job?
I've tried using a function to perfom another regex replace between the "With" and the "End With" but I can't know the text the "With" is replacing (Npclist[NpcIndex]).
Thanks in advance
I personally wouldn't trust any single-regex solution to get it right on the first time nor feel like debugging it. Instead, I would parse the code line-to-line and cache any With expression to use it to replace any . directly preceded by whitespace or by any type of brackets (add use-cases as needed):
(?<=[\s[({])\. - positive lookbehind for any character from the set + escaped literal dot
(?:(?<=[\s[({])|^)\. - use this non-capturing alternatives list if to-be-replaced . can occur on the beginning of line
import re
def convert_vb_to_c(vb_code_lines):
c_code = []
current_with = ""
for line in vb_code_lines:
if re.search(r'^\s*With', line) is not None:
current_with = line[5:] + "."
continue
elif re.search(r'^\s*End With', line) is not None:
current_with = "{error_outside_with_replacement}"
continue
line = re.sub(r'(?<=[\s[({])\.', current_with, line)
c_code.append(line)
return "\n".join(c_code)
example = """
With Npclist[NpcIndex]
.goTo(245)
End With
With hatla
.matla.tatla[.matla.other] = .matla.other2
dont.mind.me(.do.mind.me)
.next()
End With
"""
# use file_object.readlines() in real life
print(convert_vb_to_c(example.split("\n")))
You can pass a function to the sub method:
# just to give the idea of the regex
regex = re.compile(r'''With (.+)
(the-regex-for-the-VB-expression)+?
End With''')
def repl(match):
beginning = match.group(1) # NpcList[NpcIndex] in your example
return ''.join(beginning + line for line in match.group(2).splitlines())
re.sub(regex, repl, the_string)
In repl you can obtain all the information about the matching from the match object, build whichever string you want and return it. The matched string will be replaced by the string you return.
Note that you must be really careful to write the regex above. In particular using (.+) as I did matches all the line up to the newline excluded, which or may not be what you want(but I don't know VB and I have no idea which regex could go there instead to catch only what you want.
The same goes for the (the-regex-forthe-VB-expression)+. I have no idea what code could be in those lines, hence I leave to you the detail of implementing it. Maybe taking all the line can be okay, but I wouldn't trust something this simple(probably expressions can span multiple lines, right?).
Also doing all in one big regular expression is, in general, error prone and slow.
I'd strongly consider regexes only to find With and End With and use something else to do the replacements.
This may do what you need in Python 2.7. I'm assuming you want to strip out the With and End With, right? You don't need those in C.
>>> import re
>>> search_text = """
... With Np1clist[Npc1Index]
... .comeFrom(543)
... End With
...
... With Npc2list[Npc2Index]
... .goTo(245)
... End With"""
>>>
>>> def f(m):
... return '{0}{1}({2})'.format(m.group(1), m.group(2), m.group(3))
...
>>> regex = r'With\s+([^\s]*)\s*(\.[^(]+)\(([^)]+)\)[^\n]*\nEnd With'
>>> print re.sub(regex, f, search_text)
Np1clist[Npc1Index].comeFrom(543)
Npc2list[Npc2Index].goTo(245)

How do you filter a string to only contain letters?

How do I make a function where it will filter out all the non-letters from the string? For example, letters("jajk24me") will return back "jajkme". (It needs to be a for loop) and will string.isalpha() function help me with this?
My attempt:
def letters(input):
valids = []
for character in input:
if character in letters:
valids.append( character)
return (valids)
If it needs to be in that for loop, and a regular expression won't do, then this small modification of your loop will work:
def letters(input):
valids = []
for character in input:
if character.isalpha():
valids.append(character)
return ''.join(valids)
(The ''.join(valids) at the end takes all of the characters that you have collected in a list, and joins them together into a string. Your original function returned that list of characters instead)
You can also filter out characters from a string:
def letters(input):
return ''.join(filter(str.isalpha, input))
or with a list comprehension:
def letters(input):
return ''.join([c for c in input if c.isalpha()])
or you could use a regular expression, as others have suggested.
import re
valids = re.sub(r"[^A-Za-z]+", '', my_string)
EDIT: If it needs to be a for loop, something like this should work:
output = ''
for character in input:
if character.isalpha():
output += character
See re.sub, for performance consider a re.compile to optimize the pattern once.
Below you find a short version which matches all characters not in the range from A to Z and replaces them with the empty string. The re.I flag ignores the case, thus also lowercase (a-z) characters are replaced.
import re
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
If you really need that loop there are many awnsers, explaining that specifically. However you might want to give a reason why you need a loop.
If you want to operate on the number sequences and thats the reason for the loop consider replacing the replacement string parameter with a function like:
import re
def numberPrinter(matchString) {
print(matchString)
return ''
}
def charFilter(myString)
return re.sub('[^A-Z]+', '', myString, 0, re.I)
The method string.isalpha() checks whether string consists of alphabetic characters only. You can use it to check if any modification is needed.
As to the other part of the question, pst is just right. You can read about regular expressions in the python doc: http://docs.python.org/library/re.html
They might seem daunting but are really useful once you get the hang of them.
Of course you can use isalpha. Also, valids can be a string.
Here you go:
def letters(input):
valids = ""
for character in input:
if character.isalpha():
valids += character
return valids
Not using a for-loop. But that's already been thoroughly covered.
Might be a little late, and I'm not sure about performance, but I just thought of this solution which seems pretty nifty:
set(x).intersection(y)
You could use it like:
from string import ascii_letters
def letters(string):
return ''.join(set(string).intersection(ascii_letters))
NOTE:
This will not preserve linear order. Which in my use case is fine, but be warned.

Categories

Resources