How do you split all of a certain character in Python [duplicate] - python

This question already has answers here:
In Python, how do I split a string and keep the separators?
(19 answers)
Closed 5 years ago.
string="i-want-all-dashes-split"
print(split(string,"-"))
So I want the output to be:
string=(I,-,want,-,all,-,dashes,-,split)
I basically want to partition all the "-"'s.

>>> import re
>>> string = "i-want-all-dashes-split"
>>> string.split('-') # without the dashes
['i', 'want', 'all', 'dashes', 'split']
>>> re.split('(-)', string) # with the dashes
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']
>>> ','.join(re.split('(-)', string)) # as a string joined by commas
'i,-,want,-,all,-,dashes,-,split'

string="i-want-all-dashes-split"
print 'string='+str(string.split('-')).replace('[','(').replace(']',')').replace(' ','-,')
>>>string=('i',-,'want',-,'all',-,'dashes',-,'split')

Use the split function from str class:
text = "i-want-all-dashes-split"
splitted = text.split('-')
The value of splitted be a list like the one bellow:
['i', 'want', 'all', 'dashes', 'split']
If you want the output as a tuple, do it like in the code bellow:
t = tuple(splitted)
('i', 'want', 'all', 'dashes', 'split')

string="i-want-all-dashes-split"
print(string.slip('-'))
# Output:
['i', 'want', 'all', 'dashes', 'split']
string.split()
Inside the () you can put your delimiter ('-'), if you don't put anything it would be (',') by default.
You can make a function:
def spliter(string, delimiter=','): # delimiter have a default argument (',')
string = string.split(delimiter)
result = []
for x, y in enumerate(string):
result.append(y)
if x != len(string)-1: result.append(delimiter)
return result
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']

You can use this function too:
Code:
def split_keep(s, delim):
s = s.split(delim)
result = []
for i, n in enumerate(s):
result.append(n)
if i == len(s)-1: pass
else: result.append(delim)
return result
Usage:
split_keep("i-want-all-dashes-split", "-")
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']

Related

How do I make a list of strings withouth invalid characters in any of the strings? [duplicate]

This question already has answers here:
Best way to strip punctuation from a string
(32 answers)
Closed 4 months ago.
For example, if I had this list of invalid characters:
invalid_char_list = [',', '.', '!']
And this list of strings:
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
I would want to get this new list:
new_string_list = ['Hello', 'world', 'I', 'am', 'a', 'programmer']
withouth , or . or ! in any of the strings in the list because those are the characters that are in my list of invalid characters.
You can use regex and create this pattern : [,.!] and replace with ''.
import re
re_invalid = re.compile(f"([{''.join(invalid_char_list)}])")
# re_invalid <-> re.compile(r'([,.!])', re.UNICODE)
new_string_list = [re_invalid.sub(r'', s) for s in string_list]
print(new_string_list)
Output:
['Hello', 'world', 'I', 'am', 'a', 'programmer']
[.,!] : Match only this characters (',', '.', '!') in the set
You can try looping through the string_list and replacing each invalid char with an empty string.
invalid_char_list = [',', '.', '!']
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
for invalid_char in invalid_char_list:
string_list=[x.replace(invalid_char,'') for x in string_list]
print(string_list)
The Output:
['Hello', 'world', 'I', 'am', 'a', 'programmer']
We can loop over each string in string_list and each invalid character and use String.replace to replace any invalid characters with '' (nothing).
invalid_char_list = [',', '.', '!']
string_list = ['Hello,', 'world.', 'I', 'am', 'a', 'programmer!!']
formatted_string_list = []
for string in string_list:
for invalid in invalid_char_list:
string = string.replace(invalid, '')
formatted_string_list.append(string)
You can use strip():
string_list = ['Hello,', ',world.?', 'I', 'am?', '!a,', 'programmer!!?']
new_string_list = [c.strip(',.!?') for c in string_list]
print(new_string_list)
#['Hello', 'world', 'I', 'am', 'a', 'programmer']

remove all the special chars from a list [duplicate]

This question already has answers here:
Removing punctuation from a list in python
(2 answers)
Closed last year.
i have a list of strings with some strings being the special characters what would be the approach to exclude them in the resultant list
list = ['ben','kenny',',','=','Sean',100,'tag242']
expected output = ['ben','kenny','Sean',100,'tag242']
please guide me with the approach to achieve the same. Thanks
The string module has a list of punctuation marks that you can use and exclude from your list of words:
import string
punctuations = list(string.punctuation)
input_list = ['ben','kenny',',','=','Sean',100,'tag242']
output = [x for x in input_list if x not in punctuations]
print(output)
Output:
['ben', 'kenny', 'Sean', 100, 'tag242']
This list of punctuation marks includes the following characters:
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '#', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']
It can simply be done using the isalnum() string function. isalnum() returns true if the string contains only digits or letters, if a string contains any special character other than that, the function will return false. (no modules needed to be imported for isalnum() it is a default function)
code:
list = ['ben','kenny',',','=','Sean',100,'tag242']
olist = []
for a in list:
if str(a).isalnum():
olist.append(a)
print(olist)
output:
['ben', 'kenny', 'Sean', 100, 'tag242']
my_list = ['ben', 'kenny', ',' ,'=' ,'Sean', 100, 'tag242']
stop_words = [',', '=']
filtered_output = [i for i in my_list if i not in stop_words]
The list with stop words can be expanded if you need to remove other characters.

Python, Split the input string on elements of other list and remove digits from it

I have had some trouble with this problem, and I need your help.
I have to make a Python method (mySplit(x)) which takes an input list (which only has one string as element), split that element on the elements of other list and digits.
I use Python 3.6
So here is an example:
l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
banned=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ' ', ';']
The returned lists should be like this:
mySplit(l)=['I', 'am', 'learning']
mySplit(l1)=['This', 'ex', 'ample', 'aint', 'ea', 'sy']
I have tried the following, but I always get stuck:
def mySplit(x):
l=['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] #Banned chars
l2=[i for i in x if i not in l] #Removing chars from input list
l2=",".join(l2)
l3=[i for i in l2 if not i.isdigit()] #Removes all the digits
l4=[i for i in l3 if i is not ',']
l5=[",".join(l4)]
l6=l5[0].split(' ')
return l6
and
mySplit(l1)
mySplit(l)
returns:
['T,h,i,s,e,x,a,m,p,l,e,a,i,n,t,e,a,s,y']
['I,', ',a,m,', ',l,e,a,r,n,i,n,g']
Use re.split() for this task:
import re
w_list = [i for i in re.split(r'[^a-zA-Z]',
'____-----This4ex5ample---aint___ea5sy;782') if i ]
Out[12]: ['This', 'ex', 'ample', 'aint', 'ea', 'sy']
I would import the punctuation marks from string and proceed with regular expressions as follows.
l=['I am learning']
l1=['____-----This4ex5ample---aint___ea5sy;782']
import re
from string import punctuation
punctuation # to see the punctuation marks.
>>> '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~\d]',' ', w) for w in l]).split()
Here is the output:
>>> ['I', 'am', 'learning']
Notice the \d attached at the end of the punctuation marks to remove any digits.
Similarly,
' '.join([re.sub('[!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~\d]',' ', w) for w in l1]).split()
Yields
>>> ['This', 'ex', 'ample', 'aint', 'ea', 'sy']
You can also modify your function as follows:
def mySplit(x):
banned = ['-', '+' , ',', '#', '.', '!', '?', ':', '_', ';'] + list('0123456789')#Banned chars
return ''.join([word if not word in banned else ' ' for word in list(x[0]) ]).split()

how to turn string of words into list

i have turned a list of words into a string
now i want to turn them back into a list but i dont know how, please help
temp = ['hello', 'how', 'is', 'your', 'day']
temp_string = str(temp)
temp_string will then be "[hello, how, is, your, day]"
i want to turn this back into a list now but when i do list(temp_string), this will happen
['[', "'", 'h', 'e', 'l', 'l', 'o', "'", ',', ' ', "'", 'h', 'o', 'w', "'", ',', ' ', "'", 'i', 's', "'", ',', ' ', "'", 'y', 'o', 'u', 'r', "'", ',', ' ', "'", 'd', 'a', 'y', "'", ']']
Please help
You can do this easily by evaluating the string. That's not something I'd normally suggest but, assuming you control the input, it's quite safe:
>>> temp = ['hello', 'how', 'is', 'your', 'day'] ; type(temp) ; temp
<class 'list'>
['hello', 'how', 'is', 'your', 'day']
>>> tempstr = str(temp) ; type(tempstr) ; tempstr
<class 'str'>
"['hello', 'how', 'is', 'your', 'day']"
>>> temp2 = eval(tempstr) ; type(temp2) ; temp2
<class 'list'>
['hello', 'how', 'is', 'your', 'day']
Duplicate question? Converting a String to a List of Words?
Working code below (Python 3)
import re
sentence_list = ['hello', 'how', 'are', 'you']
sentence = ""
for word in sentence_list:
sentence += word + " "
print(sentence)
#output: "hello how are you "
word_list = re.sub("[^\w]", " ", sentence).split()
print(word_list)
#output: ['hello', 'how', 'are', 'you']
You can split on commas and join them back together:
temp = ['hello', 'how', 'is', 'your', 'day']
temp_string = str(temp)
temp_new = ''.join(temp_string.split(','))
The join() function takes a list, which is created from the split() function while using ',' as the delimiter. join() will then construct a string from the list.

Separating strings (list elements) with many spliters without loosing spliter from list

I want to separate list elements if list element contain any value from
list_operators = ['+', '-', '*', '(', ')']
without losing operator from list and without using regex.
For instance:
my_list = ['a', '=', 'x+y*z', '//', 'moo']
Wanted output :
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']
and x y z are words not one character:
['john+doe/12*5']
['john','+','doe','/','12','*','5']
You can use itertools.groupby() to achieve this:
from itertools import groupby
operators = {'+', '-', '*', '(', ')'}
fragments = ['a', '=', 'x+y*z', '//', 'moo', '-', 'spam*(eggs-ham)']
separated = []
for fragment in fragments:
for is_operator, group in groupby(fragment, lambda c: c in operators):
if is_operator:
separated.extend(group)
else:
separated.append(''.join(group))
>>> separated
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo', '-',
'spam', '*', '(', 'eggs', '-', 'ham', ')']
Note that I've changed the names of your variables to be a little more meaningful, and made operators a set because we only care about membership, not order (although the code would work just as well, if a little more slowly, with a list).
groupby() returns an iterable of (key, group) pairs, starting a new group whenever key changes. Since I've chosen a key function (lambda c: c in operators) that just tests for a character's membership in operators, the result of the groupby() call looks something like this:
[
(False, ['s', 'p', 'a', 'm']),
(True, ['*', '(']),
(False, ['e', 'g', 'g', 's']),
(True, ['-']),
(False, ['h', 'a', 'm']),
(True, [')'])
]
(groupby() actually returns a groupby object made up of (key,grouper object) tuples - I've converted those objects to lists in the example above for clarity).
The rest of the code is straightforward: if is_operator is True, the characters in group are used to extend separated; if it's False, the characters in group are joined back into a string and appended to separated.
This is an easy way of doing it:
for x in my_list:
if len(set(list_operators) & set(list(x)))!=0:
for i in list(x):
slist.append(i)
else:
slist.append(x)
slist
['a', '=', 'x', '+', 'y', '*', 'z', '//', 'moo']
You can also do something like this:
import re
from itertools import chain
list_operators = ['+', '-', '*', '(', ')']
tokenizer = re.compile(r"[{}]|\w+".format("".join(map(re.escape, list_operators))))
my_list = ['a', '=', 'x+y*z', '//', 'moo', 'john+doe/12*5']
parsed = list(chain.from_iterable(map(tokenizer.findall, my_list)))
parsed result:
['a', 'x', '+', 'y', '*', 'z', 'moo', 'john', '+', 'doe', '12', '*', '5']

Categories

Resources