I'm working with a project, where there will be variable holding any data types just separated with a comma.
I need to separate all these things and I also need to define which type it is.
For e.g:
data='"Hello, Hey",123,10.04'
I used split() function to separate, but it separates the comma even within "Hello,Hey", outputing:
['"Hello','Hey"','123','10'.'04']
I don't need it like this, all i need is to separate the values by commas but not the ones inside other quotes. The output should be like this:
['"Hello, Hey"','123','10.04']
I killed my brain, but it is still a problem for me. Because I'm a beginner.
Thanks in Advance
I'm struggling to understand your question - it seems you have a string with data inside the string, separated by commas:
data='"Hello, Hey",123,10.04'
You can use the shlex module to split it respecting the quotes
>>> import shlex
>>> s = shlex.shlex(data)
>>> s.whitespace = ','
>>> s.wordchars += '.'
>>> print(list(s))
['"Hello, Hey"', '123', '10.04']
You may use the re module like so:
[m.group(1) or m.group(2) for m in re.finditer(r'"([^"]*)",?|([^,]*),?', '"Hello, Hey",123,10.04')]
You can use re.findall with regex pattern "[^"]+"|[^,]+:
import re
print(re.findall(r'"[^"]+"|[^,]+', '"Hello, Hey",123,10.04'))
This outputs:
['"Hello, Hey"', '123', '10.04']
Just use the shlex module
import shlex
data = '"Hello, Hey",123,10.04'
data = shlex.split(data)
print(data)
Output:
["Hello, Hey", "123" , "10.04"]
You can use re.split to split on a combination of either a double quote before a comma or a comma followed by a digit
import re
data='"Hello, Hey",123,10.04'
re.split(r'(?<="),|,(?=\d)', data)
['"Hello, Hey"', '123', '10.04']
Related
input string
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
output string should look like:
("Cardinal", "Tom B. Erichsen", "Skagen 21")
The comma at the end should be removed, help me how to do this in python code.
I tried with str.rstrip(",") it dint work.
You can use some regex for example you can replace (.*),([^,]+)$ with \1\2
result = re.sub(r"(.*),([^,]+)$", r"\1\2", yourstring)
here is a regex demo
Check this code
str = str.replace('",)', '")')
you can chain different str.replace()
str.replace(", )",")").replace(",)",")")
That will work for your string
You can do this in following way
str = "(\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
str = str[:len(str)-2] + str[len(str)-1]
You could use the regex module:
import re
s = "INSERT INTO Customers (CustomerName, ContactName, Address, ) VALUES (\"Cardinal\", \"Tom B. Erichsen\", \"Skagen 21\",)"
print re.sub(r',(\s+)*\)', ')', s)
Have a set of string as follows
text:u'MUC-EC-099_SC-Memory-01_TC-25'
text:u'MUC-EC-099_SC-Memory-01_TC-26'
text:u'MUC-EC-099_SC-Memory-01_TC-27'
These data i have extracted from a Xls file and converted to string,
now i have to Extract data which is inside single quotes and put them in a list.
expecting output like
[MUC-EC-099_SC-Memory-01_TC-25, MUC-EC-099_SC-Memory-01_TC-26,MUC-EC-099_SC-Memory-01_TC-27]
Thanks in advance.
Use re.findall:
>>> import re
>>> strs = """text:u'MUC-EC-099_SC-Memory-01_TC-25'
text:u'MUC-EC-099_SC-Memory-01_TC-26'
text:u'MUC-EC-099_SC-Memory-01_TC-27'"""
>>> re.findall(r"'(.*?)'", strs, re.DOTALL)
['MUC-EC-099_SC-Memory-01_TC-25',
'MUC-EC-099_SC-Memory-01_TC-26',
'MUC-EC-099_SC-Memory-01_TC-27'
]
You can use the following expression:
(?<=')[^']+(?=')
This matches zero or more characters that are not ' which are enclosed between ' and '.
Python Code:
quoted = re.compile("(?<=')[^']+(?=')")
for value in quoted.findall(str(row[1])):
i.append(value)
print i
That text: prefix seems a little familiar. Are you using xlrd to extract it? In that case, the reason you have the prefix is because you're getting the wrapped Cell object, not the value in the cell. For example, I think you're doing something like
>>> sheet.cell(2,2)
number:4.0
>>> sheet.cell(3,3)
text:u'C'
To get the unwrapped object, use .value:
>>> sheet.cell(3,3).value
u'C'
(Remember that the u here is simply telling you the string is unicode; it's not a problem.)
All I want to do is remove the dollar sign '$'. This seems simple, but I really don't know why my code isn't working.
import re
input = '$5'
if '$' in input:
input = re.sub(re.compile('$'), '', input)
print input
Input still is '$5' instead of just '5'! Can anyone help?
Try using replace instead:
input = input.replace('$', '')
As Madbreaks has stated, $ means match the end of the line in a regular expression.
Here is a handy link to regular expressions: http://docs.python.org/2/library/re.html
In this case, I'd use str.translate
>>> '$$foo$$'.translate(None,'$')
'foo'
And for benchmarking purposes:
>>> def repl(s):
... return s.replace('$','')
...
>>> def trans(s):
... return s.translate(None,'$')
...
>>> import timeit
>>> s = '$$foo bar baz $ qux'
>>> print timeit.timeit('repl(s)','from __main__ import repl,s')
0.969965934753
>>> print timeit.timeit('trans(s)','from __main__ import trans,s')
0.796354055405
There are a number of differences between str.replace and str.translate. The most notable is that str.translate is useful for switching 1 character with another whereas str.replace replaces 1 substring with another. So, for problems like, I want to delete all characters a,b,c, or I want to change a to d, I suggest str.translate. Conversely, problems like "I want to replace the substring abc with def" are well suited for str.replace.
Note that your example doesn't work because $ has special meaning in regex (it matches at the end of a string). To get it to work with regex you need to escape the $:
>>> re.sub('\$','',s)
'foo bar baz qux'
works OK.
$ is a special character in regular expressions that translates to 'end of the string'
you need to escape it if you want to use it literally
try this:
import re
input = "$5"
if "$" in input:
input = re.sub(re.compile('\$'), '', input)
print input
You need to escape the dollar sign - otherwise python thinks it is an anchor http://docs.python.org/2/library/re.html
import re
fred = "$hdkhsd%$"
print re.sub ("\$","!", fred)
>> !hdkhsd%!
Aside from the other answers, you can also use strip():
input = input.strip('$')
I wish to split a long list of email address into lines, such as:
test01#testing.com; test02#testing.com; test03#testing1.com; test04#testing2.com,
I wish to split them into lines:
test01#testing.com (carriage return)
test02#testing.com (carriage return)
test03#testing1.com (carriage return)
test04#testing2.com
Can anyone pls help? Thanks.
You can use split to split by semi-colon and strip to remove any spaces:
>>> s = "test01#testing.com; test02#testing.com; test03#testing1.com"
>>> [(e.strip() + '\n') for e in s.split(';')]
['test01#testing.com\n', 'test02#testing.com\n', 'test03#testing1.com\n']
You can use split to split the addresses in multiple lines, strip to remove commas and semi-colons, and join to rearrange them.
>>> s = 'test01#testing.com; test02#testing.com; test03#testing1.com; test04#testing2.com,'
>>> print('\n'.join(m.strip(',;') for m in s.split())))
test01#testing.com
test02#testing.com
test03#testing1.com
test04#testing2.com
How about using regex:
import re
elist = re.findall(r'([^;,]+)', long_list)
print "".join("\n", elist)
If you just want to maintain them as a single multiline string, a simple string substitution will do the trick:
long_list.replace('; ', '\n')
A more flexible solution could use regex:
import re
re.sub(r'\s+;\s+', '\n')
v=vi nod-u
i want to split this string to obtain
l=[vi],[nod],[u]
l.split(" ") splits on the basis of space.
And i dont know the usage of the regular expression import functions properly.
Could anyone explain how to do that?
Are you trying to split the string to get words? If so, try the following:
>>> import re
>>> pattern = re.compile(r'\W+')
>>> pattern.split('vi nod-u')
['vi', 'nod', 'u']