Remove extra spaces in middle of string split join Python - python

I have the following string which forces my Python script to quit:
"625 625 QUAIL DR UNIT B"
I need to delete the extra spaces in the middle of the string so I am trying to use the following split join script:
import arcgisscripting
import logging
logger = logging.getLogger()
gp = arcgisscripting.create(9.3)
gp.OverWriteOutput = True
gp.Workspace = "C:\ZP4"
fcs = gp.ListWorkspaces("*","Folder")
for fc in fcs:
print fc
rows = gp.UpdateCursor(fc + "//Parcels.shp")
row = rows.Next()
while row:
Name = row.GetValue('SIT_FULL_S').join(s.split())
print Name
row.SetValue('SIT_FULL_S', Name)
rows.updateRow(row)
row = rows.Next()
del row
del rows

Your source code and your error do not match, the error states you didn't define the variable SIT_FULL_S.
I am guessing that what you want is:
Name = ' '.join(row.GetValue('SIT_FULL_S').split())

Use the re module...
>>> import re
>>> str = 'A B C'
>>> re.sub(r'\s+', ' ', str)
'A B C'

I believe you should use regular expressions to match all the places where you find two or more spaces and then replace it (each occurence) with a single space.
This can be made using shorter portion of code:
re.sub(r'\s{2,}', ' ', your_string)

It's a bit unclear, but I think what you need is:
" ".join(row.GetValue('SIT_FULL_S').split())

Related

How to print a string which is a name with two words in reverse order without commas and unnecessary spaces? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have to write a function that takes a string of full names and prints it in reverse order. It also removes unnecessary spaces and commas. Some of the expected output is as follow:
- >>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'.
I wrote the following code.
def main():
name=input()
reverse_name(name)
print(reverse_name(name))
def reverse_name(string1):
i = 0
for index in string1:
if index != ",":
i += 1
else:
last = string1[i + 1:]
first = string1[0:i]
result = last + " " + first
return result
if __name__ == "__main__":
main()
p.s: I must implement a function that takes a string as a parameter and returns a string. The input will also contain a comma which the output will not print.
You could combine split and join after having inverted the output of split:
def reverse_name(s):
return ' '.join([e.strip() for e in s.split(', ')][::-1])
>>> reverse_name('Techie, Teddy')
'Teddy Techie'
>>> reverse_name(' Duck, Donald ')
'Donald Duck'
Here is another option using the re module:
def reverse_name(s):
return re.sub(r'\s*(.+),\s*(.*\S)\s*', r'\2 \1', s)
if a comma is guaranty to always be there simply using string1.split(",") will give you a list of the separate words in the string, and simple filter the empty one and removing the trailing white spaces with .strip will do the trick
>>> def reverse_name(text):
return " ".join( w for w in map(str.strip,reversed(text.split(","))) if w)
# ^removing trailing white space ^filter empty ones
>>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>>
Use split() to separate the input at the commas. Strip spaces from each element to remove the extraneous spaces, and then reverse the list.
def reverse_names(string1):
names = string1.split(',') # split at commas
names = [name.strip() for name in names] # remove extra spaces
return " ".join(names[::-1]) # return reversed names as a string
you can use a regular expression to replace all the intermediate non-letter characters with a single space (then remove leading/trailing spaces). Then use a regular rsplit() to separate the first and last name (assuming that only the last name can be composite). Reassemble the inverted split result using join():
import re
def reverse_name(name):
name = re.sub('\W+',' ',name).strip()
return " ".join(name.rsplit(' ',1)[::-1])-1])
print(reverse_name("Techie, Teddy"))
print(reverse_name("Scumble, Arnold"))
print(reverse_name("von Grünbaumberger, Herbert"))
print(reverse_name("Fortunato,Frank"))
print(reverse_name("X,"))
print(reverse_name(",X"))
print(reverse_name(" , Y "))
print(reverse_name(" Duck, Donald "))
Teddy Techie
Arnold Scumble
Herbert von Grünbaumberger
Frank Fortunato
X
X
Y
Donald Duck
Of course, this leaves the problem of composite first names such as John-Paul Smith which creates an ambiguity on which words are part of the first and last name. If there is always going to be a comma, then the solution would be different (but you would have to state that explicitly in your question)
Solution based on systematic presence of a comma between last and first name:
def reverse_name(name):
names = re.sub('[^,\w]+',' ',name).split(',',1)
return " ".join(map(str.strip,names)).strip()

Replacing string by using regex and for loop value in python

I want to replace the value of first variable using second variable but i want to keep the commas. i used regex, but i don't know if its possible cause i'm still learning it. so here is my code.
import re
names = 'Mat,Rex,Jay'
nicknames = 'AgentMat LegendRex KillerJay'
split_nicknames = nicknames.split(' ')
for a in range(len(split_nicknames)):
replace = re.sub('\\w+', split_nicknames[a], names)
print(replace)
my output is:
KillerJay,KillerJay,KillerJay
and i want a output like this:
AgentMat,LegendRex,KillerJay
I suspect what you are looking for should resemble something like this:
import re
testString = 'This is my complicated test string where Mat, Rex and Jay are all having a lark, but MatReyRex is not changed'
mapping = { 'Mat' : 'AgentMat',
'Jay' : 'KillerJay',
'Rex' : 'LegendRex'
}
reNames = re.compile(r'\b('+'|'.join(mapping)+r')\b')
res = reNames.sub(lambda m: mapping[m.group(0)], testString)
print(res)
Executing this results in the mapped result:
This is my complicated test string where AgentMat, LegendRex and KillerJay are all having a lark, but MatReyRex is not changed
We can build the mapping as follows :
import re
names = 'Mat,Rex,Jay'
nicknames = 'AgentMat LegendRex KillerJay'
my_dict = dict(zip(names.split(','), nicknames.split(' ')))
replace = re.sub(r'\b\w+\b', lambda m:my_dict[m[0]], names)
print(replace)
Then use lambda to apply the mapping.

How to remove white spaces only in that part between quotation marks

I have the following string. I have to remove white spaces only in the part that is between single quotations. Rest of the part should be intact in the line,
+amk=0 nog = 0 nf=1 par=1 mg =0.34e-6 sd='((nf != 1) ? (nf-1)) :0)' sca=0 scb=0 scc=0 pj='2* ((w+7.61e-6) + (l+8.32e-6 ))'
So the output should be
+amk=0 nog = 0 nf=1 par=1 mg =0.34e-6 sd='((nf!=1)?(nf-1)):0)' sca=0 scb=0 scc=0 pj='2*((w+7.61e-6)+(l+8.32e-6))'
Is it possible to do this with a single Regex statement ? or needed multiple lines ?
As an alternative, you might want to consider finite state machine. I always forget the library, but it's super simple to create it on your own. Something like this:
def remove_quoted_whitespace(input_str):
"""
Remove space if it is quoted.
Examples
--------
>>> remove_quoted_whitespace("mg =0.34e-6 sd='((nf != 1) ? (nf-1)) :0)'")
"mg =0.34e-6 sd='((nf!=1)?(nf-1)):0)'"
"""
output = []
is_quoted = False
quotechars = ["'"]
ignore_chars = [' ']
for c in input_str:
if (c in ignore_chars and not is_quoted) or c not in ignore_chars:
output.append(c)
if c in quotechars:
is_quoted = not is_quoted
return ''.join(output)
See also: Is list join really faster than string concatenation in python?

Exhaustively parse file for all matches

I have a grammar for parsing some log files using pyparsing but am running into an issue where only the first match is being returned. Is there a way to ensure that I get exhaustive matches? Here's some code:
from pyparsing import Literal, Optional, oneOf, OneOrMore, ParserElement, Regex, restOfLine, Suppress, ZeroOrMore
ParserElement.setDefaultWhitespaceChars(' ')
dt = Regex(r'''\d{2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) 20\d\d \d\d:\d\d:\d\d\,\d{3}''')
# TODO maybe add a parse action to make a datetime object out of the dt capture group
log_level = Suppress('[') + oneOf("INFO DEBUG ERROR WARN TRACE") + Suppress(']')
package_name = Regex(r'''(com|org|net)\.(\w+\.)+\w+''')
junk_data = Optional(Regex('\(.*?\)'))
guid = Regex('[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}')
first_log_line = dt.setResultsName('datetime') + \
log_level('log_level') + \
guid('guid') + \
junk_data('junk') + \
package_name('package_name') + \
Suppress(':') + \
restOfLine('message') + \
Suppress('\n')
additional_log_lines = Suppress('\t') + package_name + restOfLine
log_entry = (first_log_line + Optional(ZeroOrMore(additional_log_lines)))
log_batch = OneOrMore(log_entry)
In my mind, the last two lines are sort of equivalent to
log_entry := first_log_line | first_log_line additional_log_lines
additional_log_lines := additional_log_line | additional_log_line additional_log_lines
log_batch := log_entry | log_entry log_batch
Or something of the sort. Am I thinking about this wrong? I only see a single match with all of the expected tokens when I do print(log_batch.parseString(data).dump()).
Your scanString behavior is a strong clue. Suppose I wrote an expression to match one or more items, and erroneously defined my expression such that the second item in my list did not match. Then OneOrMore(expr) would fail, while expr.scanString would "succeed", in that it would give me more matches, but would still overlook the match I might have wanted, but just mis-parsed.
import pyparsing as pp
data = "AAA _AB BBB CCC"
expr = pp.Word(pp.alphas)
print(pp.OneOrMore(expr).parseString(data))
Gives:
['AAA']
At first glance, this looks like the OneOrMore is failing, whereas scanString shows more matches:
['AAA']
['AB'] <- really wanted '_AB' here
['BBB']
['CCC']
Here is a loop using scanString which prints not the matches, but the gaps between the matches, and where they start:
# loop to find non-matching parts in data
last_end = 0
for t,s,e in expr.scanString(data):
gap = data[last_end:s]
print(s, ':', repr(gap))
last_end = e
Giving:
0 : ''
5 : ' _' <-- AHA!!
8 : ' '
12 : ' '
Here's another way to visualize this.
# print markers where each match begins in input string
markers = [' ']*len(data)
for t,s,e in expr.scanString(data):
markers[s] = '^'
print(data)
print(''.join(markers))
Prints:
AAA _AB BBB CCC
^ ^ ^ ^
Your code would be a little more complex since your data spans many lines, but using pyparsing's line, lineno and col methods, you could do something similar.
So, there's a workaround that seems to do the trick. For whatever reason, scanString does iterate through them all appropriately, so I can very simply get my matches in a generator with:
matches = (m for m, _, _ in log_batch.scanString(data))
Still not sure why parseString isn't working exhaustively, though, and still a bit worried that I've misunderstood something about pyparsing, so more pointers are welcome here.

Pythonic way to search a list using keywords

I am attempting to search for text between two keywords. My solution so far is using split() to change string to list. It works but I was wondering if there is more efficient/elegant way to achieve this. Below is my code:
words = "Your meeting with Dr Green at 8pm"
list_words = words.split()
before = "with"
after = "at"
title = list_words[list_words.index(before) + 1]
name = list_words[list_words.index(after) - 1]
if title != name:
var = title + " " + name
print(var)
else:
print(title)
Results:
>>> Dr Green
Id prefer a solution that is configurable as the text I'm searching for can be dynamic so Dr Green could be replaced by a name with 4 words or 1 word.
Sounds like a job for regular expressions. This uses the pattern (?:with)(.*?)(?:at) to look for 'with', and 'at', and lazily match anything in-between.
import re
words = 'Your meeting with Dr Green at 8pm'
start = 'with'
end = 'at'
pattern = r'(?:{})(.*?)(?:{})'.format(start, end)
match = re.search(pattern, words).group(1).strip()
print(match)
Outputs;
Dr Green
Note that the Regex does actually match the spaces on either side of Dr Green, I've included a simple match.strip() to remove trailing whitespace.
Using RE
import re
words = "Your meeting with Dr Green at 8pm"
before = "Dr"
after = "at"
result = re.search('%s(.*)%s' % (before, after), words).group(1)
print before + result
Output :
Dr Green
How about slicing the list at start and end, then just splitting it?
words = "Your meeting with Dr Jebediah Caruseum Green at 8pm"
start = "with"
end = "at"
list_of_stuff = words[words.index(start):words.index(end)].replace(start, '', 1).split()
list_of_stuff
['Dr', 'Jebediah', 'Caruseum', 'Green']
You can do anything you like with the list. For example I would parse for title like this:
list_of_titles = ['Dr', 'Sr', 'GrandMaster', 'Pleb']
try:
title = [i for i in list_of_stuff if i in list_of_titles][0]
except IndexError:
#title not found, skipping
title = ''
name = ' '.join([x for x in list_of_stuff if x != title])
print(title, name)

Categories

Resources