I have a string structured like this:
"I\thave\ta\t\tstring"
And in order split by tabs I used this method:
text = [splits for splits in row.split("\t") if splits is not ""]
Now this method removes all tabs from the string but I want it to remove only the first occurrence of a tab after a word so it would end up like this:
"Ihavea\tstring"
Is there a way of doing this?
Using re.split on a negative look behind assertion should do:
import re
s = ''.join(re.split(r'(?<!\t)\t', row))
print(s)
# 'Ihavea\tstring'
The assertion (?<!\t) prevents a split on a \t which was preceded by another \t.
You can use re.sub if you do not actually need the items from the split:
s = re.sub(r'(?<!\t)\t', '', row)
print(s)
# 'Ihavea\tstring'
List comprehension is also a way to go if you want to avoid to import the re module:
row = "I\thave\ta\t\tstring"
text = [splits if splits else "\t" for splits in row.split("\t")]
"".join(text)
#'Ihavea\tstring'
An empty string is in a boolean context false and empty list elements will be generated for every consecutive split-char ("\t" in this case)
To keep it simple you can use re.split
from re import split
text = "I\thave\ta\t\tstring"
split_string = split(r'\t+', text) #Gives ['I', 'have', 'a', 'string']
The regular expression r'\t+' basically just groups all consecutive tabs together.
Related
I am trying to split a string based on a particular pattern in an effort to rejoin it later after adding a few characters.
Here's a sample of my string: "123\babc\b:123" which I need to convert to "123\babc\\"b\":123". I need to do it several times in a long string. I have tried variations of the following:
regex = r"(\\b[a-zA-Z]+)\\b:"
test_str = "123\\babc\\b:123"
x = re.split(regex, test_str)
but it doesn't split at the right positions for me to join. Is there another way of doing this/another way of splitting and joining?
You're right, you can do it with re.split as suggested. You can split by \b and then rebuild your output with a specific separator (and keep the \b when you want too).
Here an example:
# Import module
import re
string = "123\\babc\\b:123"
# Split by "\n"
list_sliced = re.split(r'\\b', "123\\babc\\b:123")
print(list_sliced)
# ['123', 'abc', ':123']
# Define your custom separator
custom_sep = '\\\\"b\\"'
# Build your new output
output = list_sliced[0]
# Iterate over each word
for i, word in enumerate(list_sliced[1:]):
# Chose the separator according the parity (since we don't want to change the first "\b")
sep = "\\\\b"
if i % 2 == 1:
sep = custom_sep
# Update output
output += sep + word
print(output)
# 123\\babc\\"b\":123
Maybe, the following expression,
^([\\]*)([^\\]+)([\\]*)([^\\]+)([\\]*)([^:]+):(.*)$
and a replacement of,
\1\2\3\4\5\\"\6\\":\7
with a re.sub might return our desired output.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
I have a string like this:
['过\r\n啤酒\r\n小心\r\n照顾\r\n锻炼\r\n过去\r\n忘记\r\n哭\r\n包\r\n个子\r\n瘦\r\n选择\r\n奶奶\r\n突然\r\n节目\r\n']
How do I remove all of the "\r\n", and then turn the string into a list like so:
[过, 啤酒, 小心, 照顾, 过去, etc...]
str.split removes all whitespace; this includes \r and \n:
A = ['过\r\n啤酒\r\n小心\r\n照顾\r\n锻炼\r\n过去\r\n忘记\r\n哭\r\n包\r\n个子\r\n瘦\r\n选择\r\n奶奶\r\n突然\r\n节目\r\n']
res = A[0].split()
print(res)
['过', '啤酒', '小心', '照顾', '锻炼', '过去', '忘记', '哭', '包', '个子', '瘦', '选择', '奶奶', '突然', '节目']
As described in the str.split docs:
If sep is not specified or is None, a different splitting
algorithm is applied: runs of consecutive whitespace are regarded as a
single separator, and the result will contain no empty strings at the
start or end if the string has leading or trailing whitespace.
To limit the split to \r\n you can use .splitlines():
>>> li=['过\r\n啤酒\r\n小心\r\n照顾\r\n锻炼\r\n过去\r\n忘记\r\n哭\r\n包\r\n个子\r\n瘦\r\n选择\r\n奶奶\r\n突然\r\n节目\r\n']
>>> li[0].splitlines()
['过', '啤酒', '小心', '照顾', '锻炼', '过去', '忘记', '哭', '包', '个子', '瘦', '选择', '奶奶', '突然', '节目']
Try this:
s = "['过\r\n啤酒\r\n小心\r\n照顾\r\n锻炼\r\n过去\r\n忘记\r\n哭\r\n包\r\n个子\r\n瘦\r\n选择\r\n奶奶\r\n突然\r\n节目\r\n']"
s = s.replace('\r\n', ',').replace("'", '')
print(s)
Output:
[过,啤酒,小心,照顾,锻炼,过去,忘记,哭,包,个子,瘦,选择,奶奶,突然,节目,]
This first replace replaces the \r\n and the second one replaces the single quote from the string as you expected as the output.
So I have a string which I need to parse. The string contains a number of words, separated by a hyphen (-). The string also ends with a hyphen.
For example one-two-three-.
Now, if I want to look at the words on their own, I split up the string to a list.
wordstring = "one-two-three-"
wordlist = wordstring.split('-')
for i in range(0, len(wordlist)):
print(wordlist[i])
Output
one
two
three
#empty element
What I don't understand is, why in the resulting list, the final element is an empty string.
How can I omit this empty element?
Should I simply truncate the list or is there a better way to split the string?
You have an empty string because the split on the last - character produces an empty string on the RHS. You can strip all '-' characters from the string before splitting:
wordlist = wordstring.strip('-').split('-')
If the final element is always a - character, you can omit it by using [:-1] which grabs all the elements of the string besides the last character.
Then, proceed to split it as you did:
wordlist = wordstring[:-1].split('-')
print(wordlist)
['one', 'two', 'three']
You can use regex to do this :
import re
wordlist = re.findall("[a-zA-Z]+(?=-)", wordstring)
Output :
['one', 'two', 'three']
You should use the strip built-in function of Python before splitting your String. E.g:
wordstring = "one-two-three-"
wordlist = wordstring.strip('-').split('-')
I believe .split() is assuming there is another element after the last - but it is obviously a blank entry.
Are you open to removing the dash in wordstring before splitting it?
wordstring = "one-two-three-"
wordlist = wordstring[:-1].split('-')
print wordlist
OUT: 'one-two-three'
This is explained in the docs:
...
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
...
If you know your strings will always end in '-', then just remove the last one by doing wordlist.pop().
If you need something more complicated you may want to learn about regular expressions.
Just for the variaty of options:
wordlist = [x for x in wordstring.split('-') if x]
Note that the above also handles cases such as: wordstring = "one-two--three-" (double hyphen)
First strip() then split()
wordstring = "one-two-three-"
x = wordstring.strip('-')
y = x.split('-')
for word in y:
print word
Strip/trim the string before splitting. This way you will remove the trailing "\n" and you should be fine.
i am trying to delete certain portion of a string if a match found in the string as below
string = 'Newyork, NY'
I want to delete all the characters after the comma from the string including comma, if comma is present in the string
Can anyone let me now how to do this .
Use .split():
string = string.split(',', 1)[0]
We split the string on the comma once, to save python the work of splitting on more commas.
Alternatively, you can use .partition():
string = string.partition(',')[0]
Demo:
>>> 'Newyork, NY'.split(',', 1)[0]
'Newyork'
>>> 'Newyork, NY'.partition(',')[0]
'Newyork'
.partition() is the faster method:
>>> import timeit
>>> timeit.timeit("'one, two'.split(',', 1)[0]")
0.52929401397705078
>>> timeit.timeit("'one, two'.partition(',')[0]")
0.26499605178833008
You can split the string with the delimiter ",":
string.split(",")[0]
Example:
'Newyork, NY'.split(",") # ['Newyork', ' NY']
'Newyork, NY'.split(",")[0] # 'Newyork'
Try this :
s = "this, is"
m = s.index(',')
l = s[:m]
A fwe options:
string[:string.index(",")]
This will raise a ValueError if , cannot be found in the string. Here, we find the position of the character with .index then use slicing.
string.split(",")[0]
The split function will give you a list of the substrings that were separated by ,, and you just take the first element of the list. This will work even if , is not present in the string (as there'd be nothing to split in that case, we'd have string.split(...) == [string])
I have the following line :
CommonSettingsMandatory = #<Import Project="[\\.]*Shared(\\vc10\\|\\)CommonSettings\.targets," />#,true
and i want the following output:
['commonsettingsmandatory', '<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />', 'true'
If i do a simple regex with the comma, it will split the value if there's a value in it, like i wrote a comma after targets, it will split here.
So i want to ignore the text between the ## to make sure there's no splitting there.
I really don't know how to do!
http://docs.python.org/library/re.html#re.split
import re
string = 'CommonSettingsMandatory = #toto,tata#, true'
splitlist = re.split('\s?=\s?#(.*?)#,\s?', string)
Then splitlist contains ['CommonSettingsMandatory', 'toto,tata', 'true'].
While you might be able to use split with a lookbehind, I would use the groups captured by this expression.
(\S+)\s*=\s*##([^#]+)##,\s*(.*)
m = re.Search(expression, myString). use m.group(1) for the first string, m.group(2) for the second, etc.
If I understand you correctly, you're trying to split the string using spaces as delimiters, but you want to also remove any text between pound signs?
If that's correct, why not simply remove the pound sign-delimited text before splitting the string?
import re
myString = re.sub(r'#.*?#', '', myString)
myArray = myString.split(' ')
EDIT: (based on revised question)
import re
myArray = re.findall(r'^(.*?) = #(.*?)#,(.*?)$', myString)
That will actually return an array of tuples including your matches, in the form of:
[
(
'commonsettingsmandatory',
'<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />',
'true'
)
]
(spacing added to illustrate the format better)