How can split string in python and get result with delimiter? - python

I have code like
a = "*abc*bbc"
a.split("*")#['','abc','bbc']
#i need ["*","abc","*","bbc"]
a = "abc*bbc"
a.split("*")#['abc','bbc']
#i need ["abc","*","bbc"]
How can i get list with delimiter in python split function or regex or partition ?
I am using python 2.7 , windows

You need to use RegEx with the delimiter as a group and ignore the empty string, like this
>>> [item for item in re.split(r"(\*)", "abc*bbc") if item]
['abc', '*', 'bbc']
>>> [item for item in re.split(r"(\*)", "*abc*bbc") if item]
['*', 'abc', '*', 'bbc']
Note 1: You need to escape * with \, because RegEx has special meaning for *. So, you need to tell RegEx engine that * should be treated as the normal character.
Note 2: You ll be getting an empty string, when you are splitting the string where the delimiter is at the beginning or at the end. Check this question to understand the reason behind it.

import re
x="*abc*bbc"
print [x for x in re.split(r"(\*)",x) if x]
You have to use re.split and group the delimiter.
or
x="*abc*bbc"
print re.findall(r"[^*]+|\*",x)
Or thru re.findall

Use partition();
a = "abc*bbc"
print (a.partition("*"))
>>>
('abc', '*', 'bbc')
>>>

Related

How to set end of string in Python regex?

I have the following list
lst = ['BILL_FROM:', 'MyCompany', '525._S._Lexington_Ave.', 'Burlington._NC._2725', 'United_States', 'musicjohnliofficial#gmail.com', 'BILL_TO:', 'O.Relly', '343._S._Lexington_Ave.', 'Burlington._NC._2725', 'United_States', 'musicjohnliofficial#gmail.com', 'INVOICE_number', '01', 'INVOICE_DATE', '2022-12-27', 'AMOUNT_DUE', '1.128', 'SUBTOTAL', '999.00', 'TAX_(13.0%)', '129.87', 'TOTAL', '1.128']
And I want to get it's BILL_TO: field using regex.
I'm trying to do
>>> bill_to = re.compile("(\w+)to$", re.IGNORECASE)
>>> list(filter(bill_to.match, lst))
to get ['BILL_TO:'] field only, but instead getting
['Burlington._NC._2725', 'BILL_TO:', 'Burlington._NC._2725', 'SUBTOTAL']
Why the $ symbol is not working here? Or am I doing something else wrong?
Thank you
The $ will match the end of the string, but you have a : beforehand which you also need to match:
(\w+)to:$
Also, it's recommended to use a raw string to escape the \ (notice the r):
bill_to = re.compile(r"(\w+)to:$", re.IGNORECASE)

How to split a string by tabs but only once per occurrence

I have a string structured like this:
"I\thave\ta\t\tstring"
And in order split by tabs I used this method:
text = [splits for splits in row.split("\t") if splits is not ""]
Now this method removes all tabs from the string but I want it to remove only the first occurrence of a tab after a word so it would end up like this:
"Ihavea\tstring"
Is there a way of doing this?
Using re.split on a negative look behind assertion should do:
import re
s = ''.join(re.split(r'(?<!\t)\t', row))
print(s)
# 'Ihavea\tstring'
The assertion (?<!\t) prevents a split on a \t which was preceded by another \t.
You can use re.sub if you do not actually need the items from the split:
s = re.sub(r'(?<!\t)\t', '', row)
print(s)
# 'Ihavea\tstring'
List comprehension is also a way to go if you want to avoid to import the re module:
row = "I\thave\ta\t\tstring"
text = [splits if splits else "\t" for splits in row.split("\t")]
"".join(text)
#'Ihavea\tstring'
An empty string is in a boolean context false and empty list elements will be generated for every consecutive split-char ("\t" in this case)
To keep it simple you can use re.split
from re import split
text = "I\thave\ta\t\tstring"
split_string = split(r'\t+', text) #Gives ['I', 'have', 'a', 'string']
The regular expression r'\t+' basically just groups all consecutive tabs together.

Using parentheses as delimiter in re or str.split() python

I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.
There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'
d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']
Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']

Python - removing characters from a list

I have list of elements like this:
['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
My question is:
How can I remove specific characters from the elements of this list ?
As a result I want to have :
['test', 'test', '1989', 'test', '']
Any suggestions, solutions ?
Thanks in advance.
>>> re.findall(r'\{(.*)\}', '1:{test}')
['test']
Just make a loop with it:
[(re.findall(r'\{(.*)\}', i) or [''])[0] for i in your_list]
or maybe:
[''.join(re.findall(r'\{(.*)\}', i)) for i in your_list]
You could use a regular expression, like so:
import re
s = re.compile("\d+:{(.*)}")
data = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
result = [s.match(d).group(1) if s.match(d) else d for d in data]
results in
['test', 'test', '1989', 'test', '']
Python's strip() function does exactly that -- remove specific characters from the ends of a string -- but there are probably better ways to do what you want.
You haven't said exactly what the pattern is, or what you want if there are no braces, but this will work on your example:
stripped = []
for x in my_data:
m = re.search("{.*}", x)
stripped.append(m.group if m else x)
t = ['1:{test}', '2:{test}', '4:{1989}', '9:{test}', '']
map(lambda string: re.search(r'(?<=\{).+(?=\})', string).group(0), t)
Granted, this is not the most well-formatted or easiest to read of answers. This maps an anonymous function that finds and returns what is inside the brackets to each element of the list, returning the whole list.
(?<=...) means "match only that has this at the beginning, but don't include it in the result
(?=...) means "match only that has this at the end, but don't include it in the result
.+ means "at least one character of any kind"

Protect commas on consecutive string.join() and string.split()

Suppose the following code (notice the commas inside the strings):
>>> a = ['1',",2","3,"]
I need to concatenate the values into a single string. Naive example:
>>> b = ",".join(a)
>>> b
'1,,2,3,'
And later I need to split the resulting object again:
>>> b.split(',')
['1', '', '2', '3', '']
However, the result I am looking for is the original list:
['1', ',2', '3,']
What's the simplest way to protect the commas in this process? The best solution I came up with looks rather ugly.
Note: the comma is just an example. The strings can contain any character. And I can choose other characters as separators.
The strings can contain any character.
If no matter what you use as a delimiter, there is a chance that the item itself contains the delimiter character, then use the csv module:
import csv
class PseudoFile(object):
# http://stackoverflow.com/a/8712426/190597
def write(self, string):
return string
writer = csv.writer(PseudoFile())
This concatenates the items in a using commas:
a = ['1',",2","3,"]
line = writer.writerow(a)
print(line)
# 1,",2","3,"
This recovers a from line:
print(next(csv.reader([line])))
# ['1', ',2', '3,']
Do you have to use comas to separate the items? Else you could also use another symbol that is not used in the items of the list.
In [1]: '|'.join(['1', ',2', '3,']).split('|')
Out[1]: ['1', ',2', '3,']
Edit: The string may apparently contain any character. Is it an option to use the json module? You could just dump and load the list.
In [3]: json.dumps(['1', ',2', '3,'])
Out[3]: '["1", ",2", "3,"]'
In [4]: json.loads('["1", ",2", "3,"]')
Out[4]: [u'1', u',2', u'3,']
Edit #2: If you may not use it, you could use str.encode('string-encode') to escape the characters in your string and then enclose the encoded version into single quotes and separate those with comas:
In [10]: print "'example'".encode('string-escape')
\'example\' #' (have to close the opened string for stackoverflow
In [11]: print r"\'example\'".decode('string-escape')
'example'
Edit #3: Running example of str.encode('string-encode'):
import re
def list_to_str(list):
return ','.join("'{}'".format(s.encode('string-escape')) for s in list)
def str_to_list(str):
return re.findall(r"'([^']*)'", str)
if __name__ == '__main__':
a = ['1', ',2', '3,']
b = list_to_str(a)
print 'It is {} that this works.'.format(str_to_list(b) == a)
When you are serializing a list to a String, then you need to choose as a separator a character that doesn't appear in the list items. Can't you just replace the comma with another character?
b = ";".join(a)
b.split(';')
Does the delimiter need to be only a single character? If not then you can use a delimiter made up of a sequence of characters that definitley wont appear in your string, like |#| or something similar.
You need to escape the comma and probably also escape the escape sequence. Here's one way:
>>> a = ['1',",2","3,"]
>>> b = ','.join(s.replace('%', '%%').replace(',', '%2c') for s in a)
>>> [s.replace('%2c', ',').replace('%%', '%') for s in b.split(',')]
['1', ',2', '3,']
>>> b
'1,%2c2,3%2c'
>>>
I would join and split using another character than ",", e.g. ";":
>>> b = ";".join(a)
>>> b.split(';')
['1', ',2', '3,']

Categories

Resources