Python replace oneliner without using regexp - python

I have my code here:
a = u"\n".join(my_array).replace(u"\n\n", u"\n")
The problem is that if there are "\n\n\n\n" you are left with "\n\n" and I just want one "\n"
So I've come up with:
a = u"\n".join(my_array)
while a.find(u"\n\n")>=0:
a = a.replace(u"\n\n", u"\n")
I was wondering if there's a more elegant way / maybe oneliner without using regexp to do this in Python?

If you really want to do this in one line and without using regular expression, one way to reduce all the sequences of multiple \n to single \n would to be first split by \n and then join all the non-empty segments by a single \n.
>>> a = "foo\n\nbar\n\n\nblub\n\n\n\nbaz"
>>> "\n".join(x for x in a.split("\n") if x)
'foo\nbar\nblub\nbaz'
Here, a is the entire string, i.e. after you did "\n".join(my_array), and depending on what my_array originally is, there may be better solutions, e.g. stripping \n from the individual lines prior to joining, but this will work nonetheless.

To convert sequences of newlines to single newlines you can split the string on newlines and then filter out the empty strings before re-joining. Eg,
mystring = u"this\n\nis a\ntest string\n\nwith embedded\n\n\nnewlines\n"
a = u'\n'.join(filter(None, mystring.split(u'\n')))
print '{0!r}\n{1!r}'.format(mystring, a)
output
u'this\n\nis a\ntest string\n\nwith embedded\n\n\nnewlines\n'
u'this\nis a\ntest string\nwith embedded\nnewlines'
Note that this eliminates any trailing newlines, but that shouldn't be a big deal.

Using reduce should work:
reduce(lambda x,y: (x+y).replace('\n\n', '\n'), x)
However, regular expressions would be more elegant:
re.sub('\n+', '\n', x)

perhaps this can help:
u"\n".join(s.replace(u'\n', '') for s in my_array))

Related

Remove characters after matching two conditions

I have the Python code below and I would like the output to be a string: "P-1888" discarding all numbers after the 2nd "-" and removing the leading 0's after the 1st "-".
So far all I have been able to do in the following code is to remove the trailing 0's:
import re
docket_no = "P-01888-000"
doc_no_rgx1 = re.compile(r"^([^\-]+)\-(0+(.+))\-0[\d]+$")
massaged_dn1 = doc_no_rgx1.sub(r"\1-\2", docket_no)
print(massaged_dn1)
You can use the split() method to split the string on the "-" character and then use the join() method to join the first and second elements of the resulting list with a "-" character. Additionally, you can use the lstrip() method to remove the leading 0's after the 1st "-". Try this.
docket_no = "P-01888-000"
docket_no_list = docket_no.split("-")
docket_no_list[1] = docket_no_list[1].lstrip("0")
massaged_dn1 = "-".join(docket_no_list[:2])
print(massaged_dn1)
First way is to use capturing groups. You have already defined three of them using brackets. In your example the first capturing group will get "P", and the third capturing group will get numbers without leading zeros. You can get captured data by using re.match:
match = doc_no_rgx1.match(docket_no)
print(f'{match.group(1)}-{match.group(3)}') # Outputs 'P-1888'
Second way is to not use regex for such a simple task. You could split your string and reassemble it like this:
parts = docket_no.split('-')
print(f'{parts[0]}-{parts[1].lstrip("0")}')
It seems like a sledgehammer/nut situation but of you do want to use re then you could use:
doc_no_rgx1 = ''.join(re.findall('([A-Z]-)0+(\d+)-', docket_no)[0])
I don't think I'd use a regular expression for this purpose. Your usecase can be handled by standard string manipulation so using a regular expression would be overkill. Instead, consider doing this:
docket_nos = "P-01888-000".split('-')[:-1]
docket_nos[1] = docket_nos[1].lstrip('0')
docket_no = '-'.join(docket_nos)
print(docket_no) # P-1888
This might seem a little bit verbose but it does exactly what you're looking for. The first line splits docket_no by '-' characters, producing substrings P, 01888 and 000; and then discards the last substring. The second line strips leading zeros from the second substring. And the third line joins all these back together using '-' characters, producing your desired result of P-1888.
Functionally this is no different than other answers suggesting that you split on '-' and lstrip the zero(s), but personally I find my code more readable when I use multiple assignment to clarify intent vs. using indexes:
def convert_docket_no(docket_no):
letter, number, *_ = docket_no.split('-')
return f'{letter}-{number.lstrip("0")}'
_ is used here for a "throwaway" variable, and the * makes it accept all elements of the split list past the first two.

Removing parts of a string after certain chars in Python

New to Python.
I'd like to remove the substrings between the word AND and the comma character in the following string:
MyString = ' x.ABC AND XYZ, \ny.DEF AND Type, \nSome Long String AND Qwerty, \nz.GHI AND Tree \n'
The result should be:
MyString = ' x.ABC,\ny.DEF,\nSome Long String,\nz.GHI\n'
I'd like to do it without using regex.
I have tried various methods with splits and joins and indexes to no avail.
Any direction appreciated.
Thanks.
While Moses's answer is really good, I have a funny feeling this is a homework question and meant for you not to use any imports. Anyways here's an answer with no imports, it's not as efficient as other answers like Moses' or Regex but it works just not as well as others.
MyString = 'x.ABC AND XYZ, \ny.DEF AND Type, \nSome Long String AND Qwerty, \nz.GHI AND Tree \n'
new_string = ''
for each in [[y for y in x.split(' AND ')][0] for x in MyString.split('\n')]:
new_string+=each
new_string+='\n'
print(new_string)
You can split the string into lines, and further split the lines into words and use itertools.takewhile to drop all words after AND (itself included):
from itertools import takewhile
''.join(' '.join(takewhile(lambda x: x != 'AND', line.split())) + ',\n'
for line in MyString.splitlines())
Notice that the newline character and a comma are manually added after each line is reconstructed with str.join.
All the lines are then finally joined using str.join.
Now it is working.. and probably avoiding the 'append' keyword makes it really fast...
In [19]: ',\n'.join([x.split('AND')[0].strip() for x in MyString.split('\n')])
Out[19]: 'x.ABC,\ny.DEF,\nSome Long String,\nz.GHI,\n'
You can check this answer to understand why...
Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

Python split a string at an underscore

How do I split a string at the second underscore in Python so that I get something like this
name = this_is_my_name_and_its_cool
split name so I get this ["this_is", "my_name_and_its_cool"]
the following statement will split name into a list of strings
a=name.split("_")
you can combine whatever strings you want using join, in this case using the first two words
b="_".join(a[:2])
c="_".join(a[2:])
maybe you can write a small function that takes as argument the number of words (n) after which you want to split
def func(name, n):
a=name.split("_")
b="_".join(a[:n])
c="_".join(a[n:])
return [b,c]
Assuming that you have a string with multiple instances of the same delimiter and you want to split at the nth delimiter, ignoring the others.
Here's a solution using just split and join, without complicated regular expressions. This might be a bit easier to adapt to other delimiters and particularly other values of n.
def split_at(s, c, n):
words = s.split(c)
return c.join(words[:n]), c.join(words[n:])
Example:
>>> split_at('this_is_my_name_and_its_cool', '_', 2)
('this_is', 'my_name_and_its_cool')
I think you're trying the split the string based on second underscore. If yes, then you used use findall function.
>>> import re
>>> s = "this_is_my_name_and_its_cool"
>>> re.findall(r'^[^_]*_[^_]*|[^_].*$', s)
['this_is', 'my_name_and_its_cool']
>>> [i for i in re.findall(r'^[^_]*_[^_]*|(?!_).*$', s) if i]
['this_is', 'my_name_and_its_cool']
print re.split(r"(^[^_]+_[^_]+)_","this_is_my_name_and_its_cool")
Try this.
Here's a quick & dirty way to do it:
s = 'this_is_my_name_and_its_cool'
i = s.find('_'); i = s.find('_', i+1)
print [s[:i], s[i+1:]]
output
['this_is', 'my_name_and_its_cool']
You could generalize this approach to split on the nth separator by putting the find() into a loop.

Python Regular Expression Escape or not

I need to write a regular expression to get all the characters in the list below..
(remove all the characters not in the list)
allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
I don't know how to do it, should I even use re.match or re.findall or re.sub...?
Thanks a lot in advance.
Don't use regular expressions at all, first convert allow_characters to a set and then use ''.join() with a generator expression that strips out the unwanted characters. Assuming the string you are transforming is called s:
allow_char_set = set(allow_characters)
s = ''.join(c for c in s if c in allow_char_set)
That being said, here is how this might look with regex:
s = re.sub(r'[^#.\-_a-zA-Z0-9]+', '', s)
You could convert your allow_characters string into this regex, but I think the first solution is significantly more straightforward.
Edit: As pointed out by DSM in comments, str.translate() is often a very good way to do something like this. In this case it is slightly complicated but you can still use it like this:
import string
allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
all_characters = string.maketrans('', '')
delete_characters = all_characters.translate(None, allow_characters)
s = s.translate(None, delete_characters)

How do I strip a string given a list of unwanted characters? Python

Is there a way to pass in a list instead of a char to str.strip() in python? I have been doing it this way:
unwanted = [c for c in '!##$%^&*(FGHJKmn']
s = 'FFFFoFob*&%ar**^'
for u in unwanted:
s = s.strip(u)
print s
Desired output, this output is correct but there should be some sort of a more elegant way than how i'm coding it above:
oFob*&%ar
Strip and friends take a string representing a set of characters, so you can skip the loop:
>>> s = 'FFFFoFob*&%ar**^'
>>> s.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
(the downside of this is that things like fn.rstrip(".png") seems to work for many filenames, but doesn't really work)
Since, you are looking to not delete elements from the middle, you can just use.
>>> 'FFFFoFob*&%ar**^'.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
Otherwise, Use str.translate().
>>> 'FFFFoFob*&%ar**^'.translate(None, '!##$%^&*(FGHJKmn')
'oobar'

Categories

Resources