how to find special characters in a string

how to find special characters in a string - python

I have a code in Python and want to find vowels in a string.
The code I have written is following....I tried different combinations for using For-Loop, but it throws two different errors;
'int' object is not iterable,
string indices must be integers, not str.
how can I find all vowels in a line?
str1 = 'sator arepo tenet opera rotas'
vow1 = [str1[i] for i in str1 if str1[i] is 'a' | 'e' | 'o']

what about:
vowels = [ c for c in str1 if c in 'aeo' ]
You're getting errors because when you loop over a string, you loop over the characters in the string (not string indices) and because 'a' | 'e' | 'o' doesn't make sense for strings -- (they don't support the | operator)
>>> str1 = 'sator arepo tenet opera rotas'
>>> vowels = [ c for c in str1 if c in 'aeo' ]
>>> print vowels
['a', 'o', 'a', 'e', 'o', 'e', 'e', 'o', 'e', 'a', 'o', 'a']
One final comment, you shouldn't use is to test for equality. is tests for identity. A simple test:
a = 565
b = 565
print a == b #True
print a is b #False (!)
The reason is because a and b reference different objects that have the same value.

Try this code:
str1 = 'sator arepo tenet opera rotas'
i=0
vowl=''
for char in str1:
if char in 'aeiouAEIOU':
vowl=vowl+char+','
vowl=vowl[:-1]
print (vowl)
The output is:
a,o,a,e,o,e,e,o,e,a,o,a

In [1]: str1 = 'sator arepo tenet opera rotas'
In [2]: filter(lambda x: x in 'aeiou', str1)
Out[2]: 'aoaeoeeoeaoa'

Related

Check if a string contains a string except a list

I have a string as follows:
f = 'ATCTGTCGTYCACGT'
I want to check whether the string contains any characters except: A, C, G or T, and if so, print them.
for i in f:
if i != 'A' and i != 'C' and i != 'G' and i != 'T':
print(i)
Is there a way to achieve this without looping through the string?

You can use set to achieve the desired output.
f = 'ATCTGTCGTYCACGTXYZ'
not_valid={'A', 'C', 'G' , 'T'}
unique=set(f)
print(unique-not_valid)
output
{'Y','X','Z'} #characters in f which are not equal to 'A','C','G','T'

Depending on the size of your input string, the for loop might be the most efficient solution.
However, since you explicitly ask for a solution without an explicit loop, this can be done with a regex.
import re
f = 'ABCDEFG'
print(*re.findall('[^ABC]', f), sep='\n')
Outputs
D
E
F
G

Just do
l = ['A', 'C', 'G', 'T']
for i in f:
if i not in l:
print(i)
It checks whether the list contains a char of the list
If you don't want to loop through the list you can do:
import re
l = ['A', 'C', 'G', 'T']
contains = bool(re.search("%s" % "[" + "".join(l) + "]", f))

Technically this loops but we convert your input string to a set which removes duplicate values
accepted_values = ['a','t','c','g']
input = 'ATCTGTCGTYCACGT'
print([i for i in set(input.lower()) if i not in accepted_values])

Multiply an integer in a list by a word in the list

I'm not sure how to multiply a number following a string by the string. I want to find the RMM of a compound so I started by making a dictionary of RMMs then have them added together. My issue is with compounds such as H2O.
name = input("Insert the name of a molecule/atom to find its RMM/RAM: ")
compound = re.sub('([A-Z])', r' \1', name)
Compound = compound.split(' ')
r = re.split('(\d+)', compound)
For example:
When name = H2O
Compound = ['', 'H2', 'O']
r = ['H', '2', 'O']
I want to multiply 2 by H making a value "['H', 'H', 'O']."
TLDR: I want integers following names in a list to print the previously listed object 'x' amount of times (e.g. [O, 2] => O O, [C, O, 2] => C O O)
The question is somewhat complicated, so let me know if I can clarify it. Thanks.

How about the following, after you define compound:
test = re.findall('([a-zA-z]+)(\d*)', compound)
expand = [a*int(b) if len(b) > 0 else a for (a, b) in test]
Match on letters of 1 or more instances followed by an optional number of digits - if there's no digit we just return the letters, if there is a digit we duplicate the letters by the appropriate value. This doesn't quite return what you expected - it instead will return ['HH', 'O'] - so please let me know if this suits.
EDIT: assuming your compounds use elements consisting of either a single capital letter or a single capital followed by a number of lowercase letters, you can add the following:
final = re.findall('[A-Z][a-z]*', ''.join(expand))
Which will return your elements each as a separate entry in the list, e.g. ['H', 'H', 'O']
EDIT 2: with the assumption of my previous edit, we can actually reduce the whole thing down to just a couple of lines:
name = raw_input("Insert the name of a molecule/atom to find its RMM/RAM: ")
test = re.findall('([A-z][a-z]*)(\d*)', name)
final = re.findall('[A-Z][a-z]*', ''.join([a*int(b) if len(b) > 0 else a for (a, b) in test]))

You could probably do something like...
compound = 'h2o'
final = []
for x in range(len(compound)):
if compound[x].isdigit() and x != 0:
for count in range(int(compound[x])-1):
final.append(compound[x-1])
else:
final.append(compound[x])

Use regex and a generator function:
import re
def multilpy_string(seq):
regex = re.compile("([a-zA-Z][0-9])|([a-zA-Z])")
for alnum, alpha in regex.findall(''.join(seq)):
if alnum:
for char in alnum[0] * int(alnum[1]):
yield char
else:
yield alpha
l = ['C', 'O', '2'] # ['C', 'O', 'O']
print(list(multilpy_string(l)))
We join your list back together using ''.join. Then we compile a regex pattern that matches two types of strings in your list. If the string is a letter and is followed by a number its put in a group. If its a single number, its put in its own group. We then iterate over each group. If we've found something in a group, we yield the correct values.

Here are a few nested for comprehensions to get it done in two lines:
In [1]: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', 'H2O')]
In[2]: [c for cG in groups for c in cG]
Out[2]: ['H', 'H', 'O']
Note: I am deconstructing and reconstructing strings so this is probably not the most efficient method.
Here is a longer example:
In [2]: def findElements(molecule):
...: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', molecule)]
...: return [c for cG in groups for c in cG]
In [3]: findElements("H2O5S7D")
Out[3]: ['H', 'H', 'O', 'O', 'O', 'O', 'O', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'D']

In python3 (I don't know about python2) you can simply multiply strings.
for example:
print("H"*2) # HH
print(2*"H") # HH
Proof that this information is useful:
r = ['H', '2', 'O']
replacements = [(index, int(ch)) for index, ch in enumerate(r) if ch.isdigit()]
for postion, times in replacements:
r[postion] = (times - 1) * r[postion - 1]
# flaten the result
r = [ch for s in r for ch in s]
print(r) # ['H', 'H', 'O']

How to simplify the IF statement in Python 3

I have the IF statement as follows:
...
if word.endswith('a') or word.endswith('e') or word.endswith('i') or word.endswith('o') or word.endswith('u'):
...
Here I had to use 4 ORs to cover all the circumstances. Is there anyway I can simplify this? I'm using Python 3.4.

Use any
>>> word = 'fa'
>>> any(word.endswith(i) for i in ['a', 'e', 'i', 'o', 'u'])
True
>>> word = 'fe'
>>> any(word.endswith(i) for i in ['a', 'e', 'i', 'o', 'u'])
True
>>>

Try
if word[-1] in ['a','e','i','o','u']:
where word[-1] is the last letter

Simply:
>>> "apple"[-1] in 'aeiou'
True
>>> "boy"[-1] in 'aeiou'
False

word.endswith(c) is just the same as word[-1] == c so:
VOWELS = 'aeiou'
if word[-1] in VOWELS:
print('{} ends with a vowel'.format(word)
will do. There is no need to construct a list, tuple, set, or other data structure: just test membership in a string, in this case VOWELS.

How to sort the letters in a string alphabetically in Python

Is there an easy way to sort the letters in a string alphabetically in Python?
So for:
a = 'ZENOVW'
I would like to return:
'ENOVWZ'

You can do:
>>> a = 'ZENOVW'
>>> ''.join(sorted(a))
'ENOVWZ'

>>> a = 'ZENOVW'
>>> b = sorted(a)
>>> print b
['E', 'N', 'O', 'V', 'W', 'Z']
sorted returns a list, so you can make it a string again using join:
>>> c = ''.join(b)
which joins the items of b together with an empty string '' in between each item.
>>> print c
'ENOVWZ'

Sorted() solution can give you some unexpected results with other strings.
List of other solutions:
Sort letters and make them distinct:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower())))
' belou'
Sort letters and make them distinct while keeping caps:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s)))
' Bbelou'
Sort letters and keep duplicates:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(s))
' BBbbbbeellou'
If you want to get rid of the space in the result, add strip() function in any of those mentioned cases:
>>> s = "Bubble Bobble"
>>> ''.join(sorted(set(s.lower()))).strip()
'belou'

Python functionsorted returns ASCII based result for string.
INCORRECT: In the example below, e and d is behind H and W due it's to ASCII value.
>>>a = "Hello World!"
>>>"".join(sorted(a))
' !!HWdellloor'
CORRECT: In order to write the sorted string without changing the case of letter. Use the code:
>>> a = "Hello World!"
>>> "".join(sorted(a,key=lambda x:x.lower()))
' !deHllloorW'
OR (Ref: https://docs.python.org/3/library/functions.html#sorted)
>>> a = "Hello World!"
>>> "".join(sorted(a,key=str.lower))
' !deHllloorW'
If you want to remove all punctuation and numbers.
Use the code:
>>> a = "Hello World!"
>>> "".join(filter(lambda x:x.isalpha(), sorted(a,key=lambda x:x.lower())))
'deHllloorW'

You can use reduce
>>> a = 'ZENOVW'
>>> reduce(lambda x,y: x+y, sorted(a))
'ENOVWZ'

the code can be used to sort string in alphabetical order without using any inbuilt function of python
k = input("Enter any string again ")
li = []
x = len(k)
for i in range (0,x):
li.append(k[i])
print("List is : ",li)
for i in range(0,x):
for j in range(0,x):
if li[i]<li[j]:
temp = li[i]
li[i]=li[j]
li[j]=temp
j=""
for i in range(0,x):
j = j+li[i]
print("After sorting String is : ",j)

Really liked the answer with the reduce() function. Here's another way to sort the string using accumulate().
from itertools import accumulate
s = 'mississippi'
print(tuple(accumulate(sorted(s)))[-1])
sorted(s) -> ['i', 'i', 'i', 'i', 'm', 'p', 'p', 's', 's', 's', 's']
tuple(accumulate(sorted(s)) -> ('i', 'ii', 'iii', 'iiii', 'iiiim', 'iiiimp', 'iiiimpp', 'iiiimpps', 'iiiimppss', 'iiiimppsss', 'iiiimppssss')
We are selecting the last index (-1) of the tuple

range over character in python

Is there an way to range over characters? something like this.
for c in xrange( 'a', 'z' ):
print c
I hope you guys can help.

This is a great use for a custom generator:
Python 2:
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
for c in xrange(ord(c1), ord(c2)+1):
yield chr(c)
then:
for c in char_range('a', 'z'):
print c
Python 3:
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
then:
for c in char_range('a', 'z'):
print(c)

import string
for char in string.ascii_lowercase:
print char
See string constants for the other possibilities, including uppercase, numbers, locale-dependent characters, all of which you can join together like string.ascii_uppercase + string.ascii_lowercase if you want all of the characters in multiple sets.

You have to convert the characters to numbers and back again.
for c in xrange(ord('a'), ord('z')+1):
print chr(c) # resp. print unicode(c)
For the sake of beauty and readability, you can wrap this in a generator:
def character_range(a, b, inclusive=False):
back = chr
if isinstance(a,unicode) or isinstance(b,unicode):
back = unicode
for c in xrange(ord(a), ord(b) + int(bool(inclusive)))
yield back(c)
for c in character_range('a', 'z', inclusive=True):
print(chr(c))
This generator can be called with inclusive=False (default) to imitate Python's usual bhehaviour to exclude the end element, or with inclusive=True (default) to include it. So with the default inclusive=False, 'a', 'z' would just span the range from a to y, excluding z.
If any of a, b are unicode, it returns the result in unicode, otherwise it uses chr.
It currently (probably) only works in Py2.

There are other good answers here (personally I'd probably use string.lowercase), but for the sake of completeness, you could use map() and chr() on the lower case ascii values:
for c in map(chr, xrange(97, 123)):
print c

If you have a short fixed list of characters, just use Python's treatment of strings as lists.
for x in 'abcd':
print x
or
[x for x in 'abcd']

I like an approach which looks like this:
base64chars = list(chars('AZ', 'az', '09', '++', '//'))
It certainly can be implemented with a lot of more comfort, but it is quick and easy and very readable.
Python 3
Generator version:
def chars(*args):
for a in args:
for i in range(ord(a[0]), ord(a[1])+1):
yield chr(i)
Or, if you like list comprehensions:
def chars(*args):
return [chr(i) for a in args for i in range(ord(a[0]), ord(a[1])+1)]
The first yields:
print(chars('ĀĈ'))
<generator object chars at 0x7efcb4e72308>
print(list(chars('ĀĈ')))
['Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ']
while the second yields:
print(chars('ĀĈ'))
['Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ']
It is really convenient:
base64chars = list(chars('AZ', 'az', '09', '++', '//'))
for a in base64chars:
print(repr(a),end='')
print('')
for a in base64chars:
print(repr(a),end=' ')
outputs
'A''B''C''D''E''F''G''H''I''J''K''L''M''N''O''P''Q''R''S''T''U''V''W''X''Y''Z''a''b''c''d''e''f''g''h''i''j''k''l''m''n''o''p''q''r''s''t''u''v''w''x''y''z''0''1''2''3''4''5''6''7''8''9''+''/'
'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '+' '/'
Why the list()? Without base64chars might become a generator (depending on the implementation you chose) and thus can only be used in the very first loop.
Python 2
Similar can be archived with Python 2. But it is far more complex if you want to support Unicode, too. To encourage you to stop using Python 2 in favor of Python 3 I do not bother to provide a Python 2 solution here ;)
Try to avoid Python 2 today for new projects. Also try to port old projects to Python 3 first before extending them - in the long run it will be worth the effort!
Proper handling of Unicode in Python 2 is extremely complex, and it is nearly impossible to add Unicode support to Python 2 projects if this support was not build in from the beginning.
Hints how to backport this to Python 2:
Use xrange instead of range
Create a 2nd function (unicodes?) for handling of Unicode:
Use unichr instead of chr to return unicode instead of str
Never forget to feed unicode strings as args to make ord and array subscript work properly

for character in map( chr, xrange( ord('a'), ord('c')+1 ) ):
print character
prints:
a
b
c

# generating 'a to z' small_chars.
small_chars = [chr(item) for item in range(ord('a'), ord('z')+1)]
# generating 'A to Z' upper chars.
upper_chars = [chr(item).upper() for item in range(ord('a'), ord('z')+1)]

For Uppercase Letters:
for i in range(ord('A'), ord('Z')+1):
print(chr(i))
For Lowercase letters:
for i in range(ord('a'), ord('z')+1):
print(chr(i))

Inspired from the top post above, I came up with this :
map(chr,range(ord('a'),ord('z')+1))

Using #ned-batchelder's answer here, I'm amending it a bit for python3
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
"""Using range instead of xrange as xrange is deprecated in Python3"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
Then same thing as in Ned's answer:
for c in char_range('a', 'z'):
print c
Thanks Ned!

i had the same need and i used this :
chars = string.ascii_lowercase
range = list(chars)[chars.find('a'):chars.find('k')+1]
Hope this will Help Someone

Use "for count in range" and chr&ord:
print [chr(ord('a')+i) for i in range(ord('z')-ord('a'))]

Use list comprehension:
for c in [chr(x) for x in range(ord('a'), ord('z'))]:
print c

Another option (operates like range - add 1 to stop if you want stop to be inclusive)
>>> import string
>>> def crange(arg, *args):
... """character range, crange(stop) or crange(start, stop[, step])"""
... if len(args):
... start = string.ascii_letters.index(arg)
... stop = string.ascii_letters.index(args[0])
... else:
... start = string.ascii_letters.index('a')
... stop = string.ascii_letters.index(arg)
... step = 1 if len(args) < 2 else args[1]
... for index in range(start, stop, step):
... yield string.ascii_letters[index]
...
>>> [_ for _ in crange('d')]
['a', 'b', 'c']
>>>
>>> [_ for _ in crange('d', 'g')]
['d', 'e', 'f']
>>>
>>> [_ for _ in crange('d', 'v', 3)]
['d', 'g', 'j', 'm', 'p', 's']
>>>
>>> [_ for _ in crange('A', 'G')]
['A', 'B', 'C', 'D', 'E', 'F']

Depending on how complex the range of characters is, a regular expression may be convenient:
import re
import string
re.findall("[a-f]", string.printable)
# --> ['a', 'b', 'c', 'd', 'e', 'f']
re.findall("[n-qN-Q]", string.printable)
# --> ['n', 'o', 'p', 'q', 'N', 'O', 'P', 'Q']
This works around the pesky issue of accidentally including the punctuation characters in between numbers, uppercase and lowercase letters in the ASCII table.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to find special characters in a string - python

Try this code: str1 = 'sator arepo tenet opera rotas' i=0 vowl='' for char in str1: if char in 'aeiouAEIOU': vowl=vowl+char+',' vowl=vowl[:-1] print (vowl) The output is: a,o,a,e,o,e,e,o,e,a,o,a

In [1]: str1 = 'sator arepo tenet opera rotas' In [2]: filter(lambda x: x in 'aeiou', str1) Out[2]: 'aoaeoeeoeaoa'

Related

Check if a string contains a string except a list

Multiply an integer in a list by a word in the list

How to simplify the IF statement in Python 3

How to sort the letters in a string alphabetically in Python

range over character in python

Categories

Resources