How do you split all of a certain character in Python [duplicate]

How do you split all of a certain character in Python [duplicate] - python

This question already has answers here:
In Python, how do I split a string and keep the separators?
(19 answers)
Closed 5 years ago.
string="i-want-all-dashes-split"
print(split(string,"-"))
So I want the output to be:
string=(I,-,want,-,all,-,dashes,-,split)
I basically want to partition all the "-"'s.

>>> import re
>>> string = "i-want-all-dashes-split"
>>> string.split('-') # without the dashes
['i', 'want', 'all', 'dashes', 'split']
>>> re.split('(-)', string) # with the dashes
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']
>>> ','.join(re.split('(-)', string)) # as a string joined by commas
'i,-,want,-,all,-,dashes,-,split'

string="i-want-all-dashes-split"
print 'string='+str(string.split('-')).replace('[','(').replace(']',')').replace(' ','-,')
>>>string=('i',-,'want',-,'all',-,'dashes',-,'split')

Use the split function from str class:
text = "i-want-all-dashes-split"
splitted = text.split('-')
The value of splitted be a list like the one bellow:
['i', 'want', 'all', 'dashes', 'split']
If you want the output as a tuple, do it like in the code bellow:
t = tuple(splitted)
('i', 'want', 'all', 'dashes', 'split')

string="i-want-all-dashes-split"
print(string.slip('-'))
# Output:
['i', 'want', 'all', 'dashes', 'split']
string.split()
Inside the () you can put your delimiter ('-'), if you don't put anything it would be (',') by default.
You can make a function:
def spliter(string, delimiter=','): # delimiter have a default argument (',')
string = string.split(delimiter)
result = []
for x, y in enumerate(string):
result.append(y)
if x != len(string)-1: result.append(delimiter)
return result
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']

You can use this function too:
Code:
def split_keep(s, delim):
s = s.split(delim)
result = []
for i, n in enumerate(s):
result.append(n)
if i == len(s)-1: pass
else: result.append(delim)
return result
Usage:
split_keep("i-want-all-dashes-split", "-")
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']

Related

How do I make a list of strings withouth invalid characters in any of the strings? [duplicate]

This question already has answers here:
Best way to strip punctuation from a string
(32 answers)
Closed 4 months ago.
For example, if I had this list of invalid characters:
invalid_char_list = [',', '.', '!']
And this list of strings:
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
I would want to get this new list:
new_string_list = ['Hello', 'world', 'I', 'am', 'a', 'programmer']
withouth , or . or ! in any of the strings in the list because those are the characters that are in my list of invalid characters.

You can use regex and create this pattern : [,.!] and replace with ''.
import re
re_invalid = re.compile(f"([{''.join(invalid_char_list)}])")
# re_invalid <-> re.compile(r'([,.!])', re.UNICODE)
new_string_list = [re_invalid.sub(r'', s) for s in string_list]
print(new_string_list)
Output:
['Hello', 'world', 'I', 'am', 'a', 'programmer']
[.,!] : Match only this characters (',', '.', '!') in the set

You can try looping through the string_list and replacing each invalid char with an empty string.
invalid_char_list = [',', '.', '!']
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
for invalid_char in invalid_char_list:
string_list=[x.replace(invalid_char,'') for x in string_list]
print(string_list)
The Output:
['Hello', 'world', 'I', 'am', 'a', 'programmer']

We can loop over each string in string_list and each invalid character and use String.replace to replace any invalid characters with '' (nothing).
invalid_char_list = [',', '.', '!']
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
formatted_string_list = []
for string in string_list:
for invalid in invalid_char_list:
string = string.replace(invalid, '')
formatted_string_list.append(string)

You can use strip():
string_list = ['Hello,', ',world.?', 'I', 'am?', '!a,', 'programmer!!?']
new_string_list = [c.strip(',.!?') for c in string_list]
print(new_string_list)
#['Hello', 'world', 'I', 'am', 'a', 'programmer']

remove all the special chars from a list [duplicate]

This question already has answers here:
Removing punctuation from a list in python
(2 answers)
Closed last year.
i have a list of strings with some strings being the special characters what would be the approach to exclude them in the resultant list
list = ['ben','kenny',',','=','Sean',100,'tag242']
expected output = ['ben','kenny','Sean',100,'tag242']
please guide me with the approach to achieve the same. Thanks

The string module has a list of punctuation marks that you can use and exclude from your list of words:
import string
punctuations = list(string.punctuation)
input_list = ['ben','kenny',',','=','Sean',100,'tag242']
output = [x for x in input_list if x not in punctuations]
print(output)
Output:
['ben', 'kenny', 'Sean', 100, 'tag242']
This list of punctuation marks includes the following characters:
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']

It can simply be done using the isalnum() string function. isalnum() returns true if the string contains only digits or letters, if a string contains any special character other than that, the function will return false. (no modules needed to be imported for isalnum() it is a default function)
code:
list = ['ben','kenny',',','=','Sean',100,'tag242']
olist = []
for a in list:
if str(a).isalnum():
olist.append(a)
print(olist)
output:
['ben', 'kenny', 'Sean', 100, 'tag242']

my_list = ['ben', 'kenny', ',' ,'=' ,'Sean', 100, 'tag242']
stop_words = [',', '=']
filtered_output = [i for i in my_list if i not in stop_words]
The list with stop words can be expanded if you need to remove other characters.

Python, Split the input string on elements of other list and remove digits from it

I have had some trouble with this problem, and I need your help.
I have to make a Python method (mySplit(x)) which takes an input list (which only has one string as element), split that element on the elements of other list and digits.
I use Python 3.6
So here is an example:
l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
banned=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ' ', ';']
The returned lists should be like this:
mySplit(l)=['I', 'am', 'learning']
mySplit(l1)=['This', 'ex', 'ample', 'aint', 'ea', 'sy']
I have tried the following, but I always get stuck:
def mySplit(x):
l=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] #Banned chars
l2=[i for i in x if i not in l] #Removing chars from input list
l2=",".join(l2)
l3=[i for i in l2 if not i.isdigit()] #Removes all the digits
l4=[i for i in l3 if i is not ',']
l5=[",".join(l4)]
l6=l5[0].split(' ')
return l6
and
mySplit(l1)
mySplit(l)
returns:
['T,h,i,s,e,x,a,m,p,l,e,a,i,n,t,e,a,s,y']
['I,', ',a,m,', ',l,e,a,r,n,i,n,g']

Use re.split() for this task:
import re
w_list = [i for i in re.split(r'[^a-zA-Z]',
'____-----This4ex5ample---aint___ea5sy;782') if i ]
Out[12]: ['This', 'ex', 'ample', 'aint', 'ea', 'sy']

I would import the punctuation marks from string and proceed with regular expressions as follows.
l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
import re
from string import punctuation
punctuation # to see the punctuation marks.
>>> '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~\d]',' ', w) for w in l]).split()
Here is the output:
>>> ['I', 'am', 'learning']
Notice the \d attached at the end of the punctuation marks to remove any digits.
Similarly,
' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~\d]',' ', w) for w in l1]).split()
Yields
>>> ['This', 'ex', 'ample', 'aint', 'ea', 'sy']
You can also modify your function as follows:
def mySplit(x):
banned = ['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] + list('0123456789')#Banned chars
return ''.join([word if not word in banned else ' ' for word in list(x[0]) ]).split()

how to turn string of words into list

i have turned a list of words into a string
now i want to turn them back into a list but i dont know how, please help
temp = ['hello', 'how', 'is', 'your', 'day']
temp_string = str(temp)
temp_string will then be "[hello, how, is, your, day]"
i want to turn this back into a list now but when i do list(temp_string), this will happen
['[', "'", 'h', 'e', 'l', 'l', 'o', "'", ',', ' ', "'", 'h', 'o', 'w', "'", ',', ' ', "'", 'i', 's', "'", ',', ' ', "'", 'y', 'o', 'u', 'r', "'", ',', ' ', "'", 'd', 'a', 'y', "'", ']']
Please help

You can do this easily by evaluating the string. That's not something I'd normally suggest but, assuming you control the input, it's quite safe:
>>> temp = ['hello', 'how', 'is', 'your', 'day'] ; type(temp) ; temp
<class 'list'>
['hello', 'how', 'is', 'your', 'day']
>>> tempstr = str(temp) ; type(tempstr) ; tempstr
<class 'str'>
"['hello', 'how', 'is', 'your', 'day']"
>>> temp2 = eval(tempstr) ; type(temp2) ; temp2
<class 'list'>
['hello', 'how', 'is', 'your', 'day']

Duplicate question? Converting a String to a List of Words?
Working code below (Python 3)
import re
sentence_list = ['hello', 'how', 'are', 'you']
sentence = ""
for word in sentence_list:
sentence += word + " "
print(sentence)
#output: "hello how are you "
word_list = re.sub("[^\w]", " ", sentence).split()
print(word_list)
#output: ['hello', 'how', 'are', 'you']

You can split on commas and join them back together:
temp = ['hello', 'how', 'is', 'your', 'day']
temp_string = str(temp)
temp_new = ''.join(temp_string.split(','))
The join() function takes a list, which is created from the split() function while using ',' as the delimiter. join() will then construct a string from the list.

Separating strings (list elements) with many spliters without loosing spliter from list

I want to separate list elements if list element contain any value from
list_operators = ['+', '-', '*', '(', ')']
without losing operator from list and without using regex.
For instance:
my_list = ['a', '=', 'x+y*z', '//', 'moo']
Wanted output :
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']
and x y z are words not one character:
['john+doe/12*5']
['john','+','doe','/','12','*','5']

You can use itertools.groupby() to achieve this:
from itertools import groupby
operators = {'+', '-', '*', '(', ')'}
fragments = ['a', '=', 'x+y*z', '//', 'moo', '-', 'spam*(eggs-ham)']
separated = []
for fragment in fragments:
for is_operator, group in groupby(fragment, lambda c: c in operators):
if is_operator:
separated.extend(group)
else:
separated.append(''.join(group))
>>> separated
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo', '-',
'spam', '*', '(', 'eggs', '-', 'ham', ')']
Note that I've changed the names of your variables to be a little more meaningful, and made operators a set because we only care about membership, not order (although the code would work just as well, if a little more slowly, with a list).
groupby() returns an iterable of (key, group) pairs, starting a new group whenever key changes. Since I've chosen a key function (lambda c: c in operators) that just tests for a character's membership in operators, the result of the groupby() call looks something like this:
[
(False, ['s', 'p', 'a', 'm']),
(True, ['*', '(']),
(False, ['e', 'g', 'g', 's']),
(True, ['-']),
(False, ['h', 'a', 'm']),
(True, [')'])
]
(groupby() actually returns a groupby object made up of (key,grouper object) tuples - I've converted those objects to lists in the example above for clarity).
The rest of the code is straightforward: if is_operator is True, the characters in group are used to extend separated; if it's False, the characters in group are joined back into a string and appended to separated.

This is an easy way of doing it:
for x in my_list:
if len(set(list_operators) & set(list(x)))!=0:
for i in list(x):
slist.append(i)
else:
slist.append(x)
slist
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']

You can also do something like this:
import re
from itertools import chain
list_operators = ['+', '-', '*', '(', ')']
tokenizer = re.compile(r"[{}]|\w+".format("".join(map(re.escape, list_operators))))
my_list = ['a', '=', 'x+y*z', '//', 'moo', 'john+doe/12*5']
parsed = list(chain.from_iterable(map(tokenizer.findall, my_list)))
parsed result:
['a', 'x', '+', 'y', '*', 'z', 'moo', 'john', '+', 'doe', '12', '*', '5']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do you split all of a certain character in Python [duplicate] - python

string="i-want-all-dashes-split" print 'string='+str(string.split('-')).replace('[','(').replace(']',')').replace(' ','-,') >>>string=('i',-,'want',-,'all',-,'dashes',-,'split')

Related

How do I make a list of strings withouth invalid characters in any of the strings? [duplicate]

remove all the special chars from a list [duplicate]

Python, Split the input string on elements of other list and remove digits from it

how to turn string of words into list

Separating strings (list elements) with many spliters without loosing spliter from list

Categories

Resources