I have the following string. I have to remove white spaces only in the part that is between single quotations. Rest of the part should be intact in the line,
+amk=0 nog = 0 nf=1 par=1 mg =0.34e-6 sd='((nf != 1) ? (nf-1)) :0)' sca=0 scb=0 scc=0 pj='2* ((w+7.61e-6) + (l+8.32e-6 ))'
So the output should be
+amk=0 nog = 0 nf=1 par=1 mg =0.34e-6 sd='((nf!=1)?(nf-1)):0)' sca=0 scb=0 scc=0 pj='2*((w+7.61e-6)+(l+8.32e-6))'
Is it possible to do this with a single Regex statement ? or needed multiple lines ?
As an alternative, you might want to consider finite state machine. I always forget the library, but it's super simple to create it on your own. Something like this:
def remove_quoted_whitespace(input_str):
"""
Remove space if it is quoted.
Examples
--------
>>> remove_quoted_whitespace("mg =0.34e-6 sd='((nf != 1) ? (nf-1)) :0)'")
"mg =0.34e-6 sd='((nf!=1)?(nf-1)):0)'"
"""
output = []
is_quoted = False
quotechars = ["'"]
ignore_chars = [' ']
for c in input_str:
if (c in ignore_chars and not is_quoted) or c not in ignore_chars:
output.append(c)
if c in quotechars:
is_quoted = not is_quoted
return ''.join(output)
See also: Is list join really faster than string concatenation in python?
Related
For context, this is the camel case #4 question from HackerRank.
I am given a multi-string such as test...
test = '''S;M;plasticCup()
C;V;mobile phone
C;C;coffee machine
'''
and I need to manipulate it based on several conditions (not relevant). Once I have successfully manipulated test, I need to return my own multi-line string output, which I am trying to accomplish below.
def feeder(user_input):
split_total = user_input.splitlines()
output = ''
for element in split_total:
output += f"{camelCase(element)} \n"
#camelCase()` is the user-defined function that applies the "several conditions" mentioned earlier
#code snippet for camelCase() provided at bottom of the post, in case it matters
return output
feeder(test)
However, my output variable completely ignores the newline \ns?
'plastic cup \nmobilePhone \nCoffeeMachine'
What gives?
I tried newline and it works here?
chr_list = ['a', 'b', 'c']
output = ''
for element in chr_list:
output += f"{element}\n"
print(output)
OUTPUT
a
b
c
CamelCase code
def camelCase(user_input):
split_ls = user_input.split(';')
if (split_ls[0] == 'S'): #splitting
index_of_upr, split_of_upr = [], []
split_ls[2] = split_ls[2].replace('()', '') #remove method's ()
for index, character in enumerate(split_ls[2]):
if (character.isupper()):
split_ls[2] = split_ls[2].replace(character, ' ' + character.lower())
split_ls[2] = split_ls[2].strip()
return split_ls[2]
else: #combining
split_ls[2] = split_ls[2].split(' ')
for i in range(len(split_ls[2])):
split_ls[2][i] = split_ls[2][i].title()
if (split_ls[1] == 'C'): #class
return ''.join(split_ls[2])
else: #method,variable
split_ls[2][0] = split_ls[2][0].lower()
if (split_ls[1] == 'V'): #variable
return ''.join(split_ls[2])
else: #method
return ''.join(split_ls[2]) + '()'
Thanks everyone for having a look at my code :)
There is a difference in returning the string or printing it.
When printing it the \n character does actually get shown as a newline
With the return statement from a function it continues to be shown as \n in the string until you print(string)
Your method returns the string. You have to print the string to view the new lines. Here:
print(feeder(test))
Outputs:
plastic cup
mobilePhone
CoffeeMachine
Print will evaluate the new line character and reflect it as such.
Hope this helped :) Cheers!
The difference here is looking at the result as a raw string vs printed.
When looking at the raw string, the \n new lines are escaped as \n
'plastic cup \nmobilePhone \nCoffeeMachine'
While if you print it, it should interpret \n as to print to a new line, giving you something like this:
plastic cup
mobilePhone
CoffeeMachine
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have to write a function that takes a string of full names and prints it in reverse order. It also removes unnecessary spaces and commas. Some of the expected output is as follow:
- >>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'.
I wrote the following code.
def main():
name=input()
reverse_name(name)
print(reverse_name(name))
def reverse_name(string1):
i = 0
for index in string1:
if index != ",":
i += 1
else:
last = string1[i + 1:]
first = string1[0:i]
result = last + " " + first
return result
if __name__ == "__main__":
main()
p.s: I must implement a function that takes a string as a parameter and returns a string. The input will also contain a comma which the output will not print.
You could combine split and join after having inverted the output of split:
def reverse_name(s):
return ' '.join([e.strip() for e in s.split(', ')][::-1])
>>> reverse_name('Techie, Teddy')
'Teddy Techie'
>>> reverse_name(' Duck, Donald ')
'Donald Duck'
Here is another option using the re module:
def reverse_name(s):
return re.sub(r'\s*(.+),\s*(.*\S)\s*', r'\2 \1', s)
if a comma is guaranty to always be there simply using string1.split(",") will give you a list of the separate words in the string, and simple filter the empty one and removing the trailing white spaces with .strip will do the trick
>>> def reverse_name(text):
return " ".join( w for w in map(str.strip,reversed(text.split(","))) if w)
# ^removing trailing white space ^filter empty ones
>>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>>
Use split() to separate the input at the commas. Strip spaces from each element to remove the extraneous spaces, and then reverse the list.
def reverse_names(string1):
names = string1.split(',') # split at commas
names = [name.strip() for name in names] # remove extra spaces
return " ".join(names[::-1]) # return reversed names as a string
you can use a regular expression to replace all the intermediate non-letter characters with a single space (then remove leading/trailing spaces). Then use a regular rsplit() to separate the first and last name (assuming that only the last name can be composite). Reassemble the inverted split result using join():
import re
def reverse_name(name):
name = re.sub('\W+',' ',name).strip()
return " ".join(name.rsplit(' ',1)[::-1])-1])
print(reverse_name("Techie, Teddy"))
print(reverse_name("Scumble, Arnold"))
print(reverse_name("von Grünbaumberger, Herbert"))
print(reverse_name("Fortunato,Frank"))
print(reverse_name("X,"))
print(reverse_name(",X"))
print(reverse_name(" , Y "))
print(reverse_name(" Duck, Donald "))
Teddy Techie
Arnold Scumble
Herbert von Grünbaumberger
Frank Fortunato
X
X
Y
Donald Duck
Of course, this leaves the problem of composite first names such as John-Paul Smith which creates an ambiguity on which words are part of the first and last name. If there is always going to be a comma, then the solution would be different (but you would have to state that explicitly in your question)
Solution based on systematic presence of a comma between last and first name:
def reverse_name(name):
names = re.sub('[^,\w]+',' ',name).split(',',1)
return " ".join(map(str.strip,names)).strip()
I have a grammar for parsing some log files using pyparsing but am running into an issue where only the first match is being returned. Is there a way to ensure that I get exhaustive matches? Here's some code:
from pyparsing import Literal, Optional, oneOf, OneOrMore, ParserElement, Regex, restOfLine, Suppress, ZeroOrMore
ParserElement.setDefaultWhitespaceChars(' ')
dt = Regex(r'''\d{2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) 20\d\d \d\d:\d\d:\d\d\,\d{3}''')
# TODO maybe add a parse action to make a datetime object out of the dt capture group
log_level = Suppress('[') + oneOf("INFO DEBUG ERROR WARN TRACE") + Suppress(']')
package_name = Regex(r'''(com|org|net)\.(\w+\.)+\w+''')
junk_data = Optional(Regex('\(.*?\)'))
guid = Regex('[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}')
first_log_line = dt.setResultsName('datetime') + \
log_level('log_level') + \
guid('guid') + \
junk_data('junk') + \
package_name('package_name') + \
Suppress(':') + \
restOfLine('message') + \
Suppress('\n')
additional_log_lines = Suppress('\t') + package_name + restOfLine
log_entry = (first_log_line + Optional(ZeroOrMore(additional_log_lines)))
log_batch = OneOrMore(log_entry)
In my mind, the last two lines are sort of equivalent to
log_entry := first_log_line | first_log_line additional_log_lines
additional_log_lines := additional_log_line | additional_log_line additional_log_lines
log_batch := log_entry | log_entry log_batch
Or something of the sort. Am I thinking about this wrong? I only see a single match with all of the expected tokens when I do print(log_batch.parseString(data).dump()).
Your scanString behavior is a strong clue. Suppose I wrote an expression to match one or more items, and erroneously defined my expression such that the second item in my list did not match. Then OneOrMore(expr) would fail, while expr.scanString would "succeed", in that it would give me more matches, but would still overlook the match I might have wanted, but just mis-parsed.
import pyparsing as pp
data = "AAA _AB BBB CCC"
expr = pp.Word(pp.alphas)
print(pp.OneOrMore(expr).parseString(data))
Gives:
['AAA']
At first glance, this looks like the OneOrMore is failing, whereas scanString shows more matches:
['AAA']
['AB'] <- really wanted '_AB' here
['BBB']
['CCC']
Here is a loop using scanString which prints not the matches, but the gaps between the matches, and where they start:
# loop to find non-matching parts in data
last_end = 0
for t,s,e in expr.scanString(data):
gap = data[last_end:s]
print(s, ':', repr(gap))
last_end = e
Giving:
0 : ''
5 : ' _' <-- AHA!!
8 : ' '
12 : ' '
Here's another way to visualize this.
# print markers where each match begins in input string
markers = [' ']*len(data)
for t,s,e in expr.scanString(data):
markers[s] = '^'
print(data)
print(''.join(markers))
Prints:
AAA _AB BBB CCC
^ ^ ^ ^
Your code would be a little more complex since your data spans many lines, but using pyparsing's line, lineno and col methods, you could do something similar.
So, there's a workaround that seems to do the trick. For whatever reason, scanString does iterate through them all appropriately, so I can very simply get my matches in a generator with:
matches = (m for m, _, _ in log_batch.scanString(data))
Still not sure why parseString isn't working exhaustively, though, and still a bit worried that I've misunderstood something about pyparsing, so more pointers are welcome here.
What I am trying to do is to take user input text which would contain wildcards (so I need to keep them that way) but furthermore to look for the specified input. So for example that I have working below I use the pipe |.
I figured out how to make this work:
dual = 'a bunch of stuff and a bunch more stuff!'
reobj = re.compile('b(.*?)f|\s[a](.*?)u', re.IGNORECASE)
result = reobj.findall(dual)
for link in result:
print link[0] +' ' + link[1]
which returns:
unch o
nd a b
As well
dual2 = 'a bunch of stuff and a bunch more stuff!'
#So I want to now send in the regex codes of my own.
userin1 = 'b(.*?)f'
userin2 = '\s[a](.*?)u'
reobj = re.compile(userin1, re.IGNORECASE)
result = reobj.findall(dual2)
for link in result:
print link[0] +' ' + link[1]
Which returns:
u n
u n
I don't understand what it is doing as if I get rid of all save link[0] in print I get:
u
u
I however can pass in a user input regex string:
dual = 'a bunch of stuff and a bunch more stuff!'
userinput = 'b(.*?)f'
reobj = re.compile(userinput, re.IGNORECASE)
result = reobj.findall(dual)
print(result)
but when I try to update this to two user strings with the pipe:
dual = 'a bunch of stuff and a bunch more stuff!'
userin1 = 'b(.*?)f'
userin2 = '\s[a](.*?)u'
reobj = re.compile(userin1|userin2, re.IGNORECASE)
result = reobj.findall(dual)
print(result)
I get the error:
reobj = re.compile(userin1|userin2, re.IGNORECASE)
TypeError: unsupported operand type(s) for |: 'str' and 'str'
I get this error a lot such as if I put brackets () or [] around userin1|userin2.
I have found the following:
Python regular expressions OR
but can not get it to work ;..{-( .
What I would like to do is to be able to understand how to pass in these regex variables such as that of OR and return all the matches of both as well as something such as AND - which in the end is useful as it will operate on files and let me know which files contain particular words with the various logical relations OR, AND etc.
Thanks much for your thoughts,
Brian
Although I couldn't get the answer from A. Rodas to work, he gave the idea for the .join. The example I worked out - although slightly different returns (in link[0] and link[1]) the desired results.
userin1 = '(T.*?n)'
userin2 = '(G.*?p)'
list_patterns = [userin1,userin2]
swaplogic = '|'
string = 'What is a Torsion Abelian Group (TAB)?'
theresult = re.findall(swaplogic.join(list_patterns), string)
print theresult
for link in theresult:
print link[0]+' '+link[1]
I have the following string which forces my Python script to quit:
"625 625 QUAIL DR UNIT B"
I need to delete the extra spaces in the middle of the string so I am trying to use the following split join script:
import arcgisscripting
import logging
logger = logging.getLogger()
gp = arcgisscripting.create(9.3)
gp.OverWriteOutput = True
gp.Workspace = "C:\ZP4"
fcs = gp.ListWorkspaces("*","Folder")
for fc in fcs:
print fc
rows = gp.UpdateCursor(fc + "//Parcels.shp")
row = rows.Next()
while row:
Name = row.GetValue('SIT_FULL_S').join(s.split())
print Name
row.SetValue('SIT_FULL_S', Name)
rows.updateRow(row)
row = rows.Next()
del row
del rows
Your source code and your error do not match, the error states you didn't define the variable SIT_FULL_S.
I am guessing that what you want is:
Name = ' '.join(row.GetValue('SIT_FULL_S').split())
Use the re module...
>>> import re
>>> str = 'A B C'
>>> re.sub(r'\s+', ' ', str)
'A B C'
I believe you should use regular expressions to match all the places where you find two or more spaces and then replace it (each occurence) with a single space.
This can be made using shorter portion of code:
re.sub(r'\s{2,}', ' ', your_string)
It's a bit unclear, but I think what you need is:
" ".join(row.GetValue('SIT_FULL_S').split())