To split on the basis of space and special character in python

To split on the basis of space and special character in python - python

v=vi nod-u
i want to split this string to obtain
l=[vi],[nod],[u]
l.split(" ") splits on the basis of space.
And i dont know the usage of the regular expression import functions properly.
Could anyone explain how to do that?

Are you trying to split the string to get words? If so, try the following:
>>> import re
>>> pattern = re.compile(r'\W+')
>>> pattern.split('vi nod-u')
['vi', 'nod', 'u']

Related

I have a string "hello\n1hello123\n2yahoo". Want to split it with \n[integer value]

I have a string in python.
str1 = "hello\n1hello123\n2yahoo"
I would like to split this with \n[integer value] to get a list that looks like:
[hello, hello123, yahoo]
Can anyone please help?

As someone who goes too far in avoiding regular expressions (avoiding them whenever possible rather than simply avoiding them when they are inappropriate), I would lean towards splitting on \n and processing the resulting list element by element:
from string import digits
result = [x.lstrip(digits) for x in str1.split("\n")]
If you are less regex-averse than I, and as recommended in the comments,
from re import split
from string import digits
results = split(f'\n[{digits}]*', str1)

regular expression package
enter code here
str1 = "hello\n1hello123\n2yahoo"
import re
print(re.split(r"\n[1-9]", str1))

How to get all the string after and before two specific words?

I want to replace all the string after "my;encoded;image:" (which is the base64 data of the image) and i want to stop before the word "END" , but the following code is replacing also the two strings "my;encoded;image:" and "END". Any suggestions?
import re
re.sub("my;encoded;image:.*END","random_words",image,flags=re.DOTALL)
NB : a simple way could be to use replacement but i want to use regex in my case Thanks

You can use a non-greedy regex to split the string into three groups. Then replace the second group with your string:
import re
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', r"\1my string\3", image)
print(x)
You can use f-strings with Python 3.6 and higher:
replacement = "hello"
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', fr'\1{replacement}\3', image)

Replacing when a word is in another word but with special circumstances

My program replaces tokens with values when they are in a file. When reading in a certain line it gets stuck here is an example:
1.1.1.1.1.1.1.1.1.1 Token100.1 1.1.1.1.1.1.1Token100a
The two tokens in the example are Token100 and Token100a. I need a way to only replace Token100 with its data and not replace Token100a with Token100's data with an a afterwards. I can't look for spaces before and after because sometimes they are in the middle of lines. Any thoughts are appreciated. Thanks.

You can use regex:
import re
line = "1.1.1.1.1.1.1.1.1.1 Token100.1 1.1.1.1.1.1.1Token100a"
match = re.sub("Token100a", "data", line)
print(match)
Outputs:
1.1.1.1.1.1.1.1.1.1 Token100.1 1.1.1.1.1.1.1data
More about regex here:
https://www.w3schools.com/python/python_regex.asp

You can use a regular expression with a negative lookahead to ensure that the following character is not an "a":
>>> import re
>>> test = '1.1.1.1.1.1.1.1.1.1 Token100.1 1.1.1.1.1.1.1Token100a'
>>> re.sub(r'Token100(?!a)', 'data', test)
'1.1.1.1.1.1.1.1.1.1 data.1 1.1.1.1.1.1.1Token100a'

Separating Strings and other values with comma as a delimiter

I'm working with a project, where there will be variable holding any data types just separated with a comma.
I need to separate all these things and I also need to define which type it is.
For e.g:
data='"Hello, Hey",123,10.04'
I used split() function to separate, but it separates the comma even within "Hello,Hey", outputing:
['"Hello','Hey"','123','10'.'04']
I don't need it like this, all i need is to separate the values by commas but not the ones inside other quotes. The output should be like this:
['"Hello, Hey"','123','10.04']
I killed my brain, but it is still a problem for me. Because I'm a beginner.
Thanks in Advance

I'm struggling to understand your question - it seems you have a string with data inside the string, separated by commas:
data='"Hello, Hey",123,10.04'
You can use the shlex module to split it respecting the quotes
>>> import shlex
>>> s = shlex.shlex(data)
>>> s.whitespace = ','
>>> s.wordchars += '.'
>>> print(list(s))
['"Hello, Hey"', '123', '10.04']

You may use the re module like so:
[m.group(1) or m.group(2) for m in re.finditer(r'"([^"]*)",?|([^,]*),?', '"Hello, Hey",123,10.04')]

You can use re.findall with regex pattern "[^"]+"|[^,]+:
import re
print(re.findall(r'"[^"]+"|[^,]+', '"Hello, Hey",123,10.04'))
This outputs:
['"Hello, Hey"', '123', '10.04']

Just use the shlex module
import shlex
data = '"Hello, Hey",123,10.04'
data = shlex.split(data)
print(data)
Output:
["Hello, Hey", "123" , "10.04"]

You can use re.split to split on a combination of either a double quote before a comma or a comma followed by a digit
import re
data='"Hello, Hey",123,10.04'
re.split(r'(?<="),|,(?=\d)', data)
['"Hello, Hey"', '123', '10.04']

Print only alphabetics in a string using Regular Expression

Goal : i want only alphabets to be printed in a string
#Input
#======
string = ' 529Wind3#. '
#Neededoutput
#============
'Wind'
I tried coding for this using the below code
import re
string=re.sub('[^a-z]+[^A-Z]',' ',string)
print(string)
The output i'm getting is
ind
But this code only applies for lowercase
Can you please tell me how to write code for both upper and lowercase

Try using a list comprehension to check if each character is in string.ascii_letters or not, if it is, it will be stored:
import string
String = ' 529Wind3#. '
print(''.join([i for i in String if i in string.ascii_letters]))
Output:
Wind

I agree with #U8-Forward's point but I think you may also want to know why your regular expression isn't working. This
[^a-z]+[^A-Z]
doesn't do what you want because W matches [^a-z]+ and so gets removed.
Put all of the characters you don't want in a single character class:
[^a-zA-Z]+

You need to write [^a-zA-Z] instead of [^a-z]+[^A-Z]. The + operator is for detecting repetitive characters and not to combine multiple conditions.
Try the below code for your requirement:
import re
string=re.sub('[^a-zA-Z]',' ',string)
print(string)

you can use re.findall
import re
String = ' 529Wind3#. '
string = re.findall('[a-zA-Z]+', String)
print(''.join(string))

print re.sub('[^a-zA-Z]','',string)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

To split on the basis of space and special character in python - python

v=vi nod-u i want to split this string to obtain l=[vi],[nod],[u] l.split(" ") splits on the basis of space. And i dont know the usage of the regular expression import functions properly. Could anyone explain how to do that?

Are you trying to split the string to get words? If so, try the following: >>> import re >>> pattern = re.compile(r'\W+') >>> pattern.split('vi nod-u') ['vi', 'nod', 'u']

Related

I have a string "hello\n1hello123\n2yahoo". Want to split it with \n[integer value]

How to get all the string after and before two specific words?

Replacing when a word is in another word but with special circumstances

Separating Strings and other values with comma as a delimiter

Print only alphabetics in a string using Regular Expression

Categories

Resources