Regex to replace single delimiter - python

I'm looking to find a limiter in a string " - ". The problem is that there may be other occurrences of "-" which makes the regex a bit more complex
a-b
a - b
a-b - c-d
-a-b- - -c-d-
should end up as
a - b
a - b
a b - c d
a b - c d
One " - " delimiter only.
Other hyphen characters to be replaced by spaces.
import re
def clean_delimter(s):
regx = re.sub(r"(\s?-\s?)", " - ", s)
print regx
return regx
myList = [
"a-b",
"a - b",
"a-b - c-d",
"-a-b- - -c-d-"
]
for i in range(0, len(myList)):
clean_delimter(myList[i])
My regex changes all hyphens which is not what I'm after. But I don't know how to tell Reg how to find
\s*-|\s*-\s*|-\s*
and then look for other occurrences of "-", changing them to " "

since you data is symmetrical, it could be done without regex:
result = []
with open('data', 'r') as f:
for line in f:
line = line.strip()
if line.count('-') > 1:
n = (line.count('-') -1) // 2
line = line.replace('-',' ', n)
#reverse
line = line[::-1]
line = line.replace('-',' ', n)
# reverse to the original order
line = line[::-1]
line = re.sub(r'\s+',r' ', line)
result.append(line)
else:
result.append(line)
for line in result:
print(line)
a-b
a - b
a b - c d
a b - c d

Related

How to concatenate line space string in python

I have a python code which is on seperate line and want to get a response with "\n". When I write a code
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\ n"
print(s)
it gives me response correctly because i have a space:
a\ nb\ nc\ nd\ n
But if i take sspace between \ and n I get response back as
a
b
c
d
I want one blob and should be give me response as a blob of one character.
Great thank you. I just had to add two // and it worked.
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\\n"
print(s)

How to add a string after count of occurence of multiple specified strings

I would like to add a string (not replace) based on the nth occurence of multiple strings.
Example:
tablespecs = "l c d r c l"
Now assuming I want to add the third appearance of l, c and r the string "here: ", the desired output should be:
desired_tablespecs = "l c d here: r c l"
So solely considering l, c and r, ignoring d.
I solely cames as close, as the following, instead of adding the copde the code replaces the match, so it delivers "l c d here: c l".
tablespecs = "l c d r c l"
def replacenth(string, sub, wanted, n):
pattern = re.compile(sub)
where = [m for m in pattern.finditer(string)][n-1]
before = string[:where.start()]
after = string[where.end():]
newString = before + wanted + after
return newString
#Source of code chunk https://stackoverflow.com/questions/35091557/replace-nth-occurrence-of-substring-in-string
replacenth(tablespecs, "[lcr]", "[here:]", 3)
#wrong_output: 'l c d [here:] c l'
You can set after to start at where.start() instead of where.end(), so that it includes that character.
tablespecs = "l c d r c l"
def replacenth(string, sub, wanted, n):
pattern = re.compile(sub)
where = [m for m in pattern.finditer(string)][n-1]
before = string[:where.start()]
after = string[where.start():]
newString = before + wanted + after
return newString
replacenth(tablespecs, "[lcr]", "here: ", 3)
This outputs 'l c d here: r c l'.
A slightly different approach:
tablespecs = "l c d r c l"
def replacenth(string, sub, wanted, n):
count = 0
out = ""
for c in string:
if c in sub:
count += 1
if count == n:
out += wanted
count = 0
out += c
return out
res = replacenth(tablespecs, "lcr", "here: ", 3)
assert res == "l c d here: r c l", res

Concatenate the single characters in texts

I have a list with company names, some of them has abbreviations. ex:
compNames = ['Costa Limited', 'D B M LTD']
I need to convert compNames of text to a matrix of token counts using the following. But this does not output columns for B D M in D B M LTD
count_vect = CountVectorizer(analyzer='word')
count_vect.fit_transform(compNames).toarray()
What is the best way to concatenate the single characters in a text?
ex: 'D B M LTD' to 'DBM LTD'
import re
string = 'D B M LTD'
print re.sub("([^ ]) ", r"\1", re.sub(" ([^ ]{2,})", r" \1", string))
Awkward, but it should work. It introduces an additional space in front of LTD and then replaces "D " with "D", "B " with "B" and so on.
Here is a short function that breaks a string on white space characters to a list, iterates the list, builds a temporary string if the element is of length 1, appends the temp string to a new list when an element with length greater than one is encounters.
import re
a = 'D B M LTD'
def single_concat(s):
out = []
tmp = ''
for x in re.split(r'\s+', s):
if len(x) == 1:
tmp += x
else:
if tmp:
out.append(tmp)
out.append(x)
tmp = ''
return ' '.join(out)
single_concat(a)
# returns:
'DBM LTD'
import re
s = "D B M LTD"
first_part = ''
for chunk in re.compile("([A-Z]{1})\s").split(s):
if len(chunk) == 1:
first_part += chunk
elif len(chunk) > 1:
last_part = chunk
print(first_part + " " + last_part)
Prints DBM LTD.
import re
string = 'D B M LTD'
print re.sub(r"\+", r"", re.sub(r"\+(\w\B)", r" \1", re.sub(r"(\b\w) ", r"\1+", string)))
I'm using the + character as temporary, assuming there are no + characters in the string. If there are, use some other that doesn't occur.
Look, no re:
def mingle(s):
""" SO: 49692941 """
l = s.split()
r = []
t = []
for e in l:
if len(e) == 1:
t.append(e)
else:
j = "".join(t)
r.append( j )
r.append( e )
t = []
return " ".join(r)
print( mingle('D B M LTD') )
prints
DBM LTD

Python: how to replace characters from i-th to j-th matches?

For example, if I have:
"+----+----+---+---+--+"
is it possible to replace from second to fourth + to -?
If I have
"+----+----+---+---+--+"
and I want to have
"+-----------------+--+"
I have to replace from 2-nd to 4-th + to -. Is it possible to achieve this by regex? and how?
If you can assume the first character is always a +:
string = '+' + re.sub(r'\+', r'-', string[1:], count=3)
Lop off the first character of your string and sub() the first three + characters, then add the initial + back on.
If you can't assume the first + is the first character of the string, find it first:
prefix = string.index('+') + 1
string = string[:prefix] + re.sub(r'\+', r'-', string[prefix:], count=3)
I would rather iterate over the string, and then replace the pluses according to what I found.
secondIndex = 0
fourthIndex = 0
count = 0
for i, c in enumerate(string):
if c == '+':
count += 1
if count == 2 and secondIndex == 0:
secondIndex = i
elif count == 4 and fourthIndex == 0:
fourthIndex = i
string = string[:secondIndex] + '-'*(fourthIndex-secondIndex+1) + string[fourthIndex+1:]
Test:
+----+----+---+---+--+
+-----------------+--+
I split the string into an array of strings using the character to replace as the separator.
Then rejoin the array, in sections, using the required separators.
example_str="+----+----+---+---+--+"
swap_char="+"
repl_char='-'
ith_match=2
jth_match=4
list_of_strings = example_str.split(swap_char)
new_string = ( swap_char.join(list_of_strings[0:ith_match]) + repl_char +
repl_char.join(list_of_strings[ith_match:jth_match]) +
swap_char + swap_char.join(list_of_strings[jth_match:]) )
print (example_str)
print (new_string)
running it gives :
$ python ./python_example.py
+----+----+---+---+--+
+-------------+---+--+
with regex? Yes, that's possible.
^(\+-+){1}((?:\+[^+]+){3})
explanation:
^
(\+-+){1} # read + and some -'s until 2nd +
( # group 2 start
(?:\+[^+]+){3} # read +, followed by non-plus'es, in total 3 times
) # group 2 end
testing:
$ cat test.py
import re
pattern = r"^(\+-+){1}((?:\+[^+]+){3})"
tests = ["+----+----+---+---+--+"]
for test in tests:
m = re.search(pattern, test)
if m:
print (test[0:m.start(2)] +
"-" * (m.end(2) - m.start(2)) +
test[m.end(2):])
Adjusting is simple:
^(\+-+){1}((?:\+[^+]+){3})
^ ^
the '1' indicates that you're reading up to the 2nd '+'
the '3' indicates that you're reading up to the 4th '+'
these are the only 2 changes you need to make, the group number stays the same.
Run it:
$ python test.py
+-----------------+--+
This is pythonic.
import re
s = "+----+----+---+---+--+"
idx = [ i.start() for i in re.finditer('\+', s) ][1:-2]
''.join([ j if i not in idx else '-' for i,j in enumerate(s) ])
However, if your string is constant and want it simple
print (s)
print ('+' + re.sub('\+---', '----', s)[1:])
Output:
+----+----+---+---+--+
+-----------------+--+
Using only comprehension lists:
s1="+----+----+---+---+--+"
indexes = [i for i,x in enumerate(s1) if x=='+'][1:4]
s2 = ''.join([e if i not in indexes else '-' for i,e in enumerate(s1)])
print(s2)
+-----------------+--+
I saw you already found a solution but I do not like regex so much, so maybe this will help another! :-)

Python: Converting word to list of letters, then returning indexes of the letters against lower case alphabet

I have already completed the task but in its most basic form looking for help shortening it and so it can apply to any word not just one with eight letters, here's what I've got so far (bit long for what it does):
alpha = map(chr, range(97, 123))
word = "computer"
word_list = list(word)
one = word[0]
two = word[1]
three = word[2]
four = word[3]
five = word[4]
six = word[5]
seven = word[6]
eight = word[7]
one_index = str(alpha.index(one))
two_index = str(alpha.index(two))
three_index = str(alpha.index(three))
four_index = str(alpha.index(four))
five_index = str(alpha.index(five))
six_index = str(alpha.index(six))
seven_index = str(alpha.index(seven))
eight_index = str(alpha.index(eight))
print (one + "=" + one_index)
print (two + "=" + two_index)
print (three + "=" + three_index)
print (four + "=" + four_index)
print (five + "=" + five_index)
print (six + "=" + six_index)
print (seven + "=" + seven_index)
print (eight + "=" + eight_index)
What you are probably looking for is a for-loop.
Using a for-loop your code could look like this:
word = "computer"
for letter in word:
index = ord(letter)-97
if (index<0) or (index>25):
print ("'{}' is not in the lowercase alphabet.".format(letter))
else:
print ("{}={}".format(letter, str(index+1))) # +1 to make a=1
If you use
for letter in word:
#code
the following code will be executed for every letter in the word (or element in word if word is a list for example).
A good start to learn more about loops is here: https://en.wikibooks.org/wiki/Python_Programming/Loops
You can find tons of ressources in the internet covering this topic.
Use for loop for loop,
alpha = map(chr, range(97, 123))
word = "computer"
for l in word:
print '{} = {}'.format(l,alpha.index(l.lower()))
Result
c = 2
o = 14
m = 12
p = 15
u = 20
t = 19
e = 4
r = 17
Start with a dict that maps each letter to its number.
import string
d = dict((c, ord(c)-ord('a')) for c in string.lowercase)
Then pair each letter of your string to the appropriate index.
result = [(c, d[c]) for c in word]
thanks for the help managed to solve it myself in a different way using a function and a while loop, not as short but will work for all lower case words:
alpha = map(chr, range (97,123))
word = "computer"
count = 0
y = 0
def indexfinder (number):
o = word[number]
i = str(alpha.index(o))
print (o + "=" + i)
while count < len(word):
count = count + 1
indexfinder (y)
y = y+1

Categories

Resources