Here is my code. I'm not exactly sure if I need a counter for this to work. The answer should be 'iiii'.
def eliminate_consonants(x):
vowels= ['a','e','i','o','u']
vowels_found = 0
for char in x:
if char == vowels:
print(char)
eliminate_consonants('mississippi')
Correcting your code
The line if char == vowels: is wrong. It has to be if char in vowels:. This is because you need to check if that particular character is present in the list of vowels. Apart from that you need to print(char,end = '') (in python3) to print the output as iiii all in one line.
The final program will be like
def eliminate_consonants(x):
vowels= ['a','e','i','o','u']
for char in x:
if char in vowels:
print(char,end = "")
eliminate_consonants('mississippi')
And the output will be
iiii
Other ways include
Using in a string
def eliminate_consonants(x):
for char in x:
if char in 'aeiou':
print(char,end = "")
As simple as it looks, the statement if char in 'aeiou' checks if char is present in the string aeiou.
A list comprehension
''.join([c for c in x if c in 'aeiou'])
This list comprehension will return a list that will contain the characters only if the character is in aeiou
A generator expression
''.join(c for c in x if c in 'aeiou')
This gen exp will return a generator than will return the characters only if the character is in aeiou
Regular Expressions
You can use re.findall to discover only the vowels in your string. The code
re.findall(r'[aeiou]',"mississippi")
will return a list of vowels found in the string i.e. ['i', 'i', 'i', 'i']. So now we can use str.join and then use
''.join(re.findall(r'[aeiou]',"mississippi"))
str.translate and maketrans
For this technique you will need to store a map which matches each of the non vowels to a None type. For this you can use string.ascii_lowecase. The code to make the map is
str.maketrans({i:None for i in string.ascii_lowercase if i not in "aeiou"})
this will return the mapping. Do store it in a variable (here m for map)
"mississippi".translate(m)
This will remove all the non aeiou characters from the string.
Using dict.fromkeys
You can use dict.fromkeys along with sys.maxunicode. But remember to import sys first!
dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')
and now use str.translate.
'mississippi'.translate(m)
Using bytearray
As mentioned by J.F.Sebastian in the comments below, you can create a bytearray of lower case consonants by using
non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))
Using this we can translate the word ,
'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").
Using bytes
bytes returns an bytes object and is the immutable version of bytearray. (It is Python 3 specific)
non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))
Using this we can translate the word ,
'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").
Timing comparison
Python 3
python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 2.88 usec per loop
python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 3.06 usec per loop
python3 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
10000 loops, best of 3: 71.3 usec per loop
python3 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
10000 loops, best of 3: 71.6 usec per loop
python3 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
10000 loops, best of 3: 60.1 usec per loop
python3 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
10000 loops, best of 3: 53.2 usec per loop
python3 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
10000 loops, best of 3: 57 usec per loop
The timings in sorted order
translate (bytes) | 2.88
translate (bytearray)| 3.06
List Comprehension | 53.2
Regular expressions | 57.0
Generator exp | 60.1
dict.fromkeys | 71.3
translate (unicode) | 71.6
As you can see the final method using bytes is the fastest.
Python 3.5
python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 4.17 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
100000 loops, best of 3: 4.21 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
100000 loops, best of 3: 2.39 usec per loop
python3.5 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
100000 loops, best of 3: 2.33 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
10000 loops, best of 3: 97.1 usec per loop
python3.5 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
10000 loops, best of 3: 86.6 usec per loop
python3.5 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
10000 loops, best of 3: 74.3 usec per loop
The timings in sorted order
translate (unicode) | 2.33
dict.fromkeys | 2.39
translate (bytes) | 4.17
translate (bytearray)| 4.21
List Comprehension | 86.6
Regular expressions | 74.3
Generator exp | 97.1
You can try pythonic way like this,
In [1]: s = 'mississippi'
In [3]: [char for char in s if char in 'aeiou']
Out[3]: ['i', 'i', 'i', 'i']
Function;
In [4]: def eliminate_consonants(x):
...: return ''.join(char for char in x if char in 'aeiou')
...:
In [5]: print(eliminate_consonants('mississippi'))
iiii
== tests for equality. You are looking to see if any of the characters exist in the string that are in your list 'vowels'. To do that, you can simply use in such as below.
Additionally, I see you have a 'vowels_found' variable but are not utilizing it. Below one example how you can solve this:
def eliminate_consonants(x):
vowels= ['a','e','i','o','u']
vowels_found = 0
for char in x:
if char in vowels:
print(char)
vowels_found += 1
print "There are", vowels_found, "vowels in", x
eliminate_consonants('mississippi')
Your output would then be:
i
i
i
i
There are 4 vowels in mississippi
Related
In both statements, I am appending a character "a" to the string s:
s += "a"
s = s + "a"
Which statement has the better time complexity in Python?
They have the same time complexity.
In the general Python standard defined case: They both have the same time complexity of O(n), where n is the length of string s.
In practice, with the CPython implementation: They can in some cases both be O(1), because of an optimization that the interpreter can do when it detects that s is the only reference to the string in question.
Demo (using Python 3.10.1):
O(1) (optimization in play):
String of length 10⁹, using +=:
$ python -m timeit --setup='s = "s" * 10**9' 's += "a"'
5000000 loops, best of 5: 96.6 nsec per loop
String of length 10⁹, using +:
$ python -m timeit --setup='s = "s" * 10**9' 's = s + "a"'
5000000 loops, best of 5: 95.5 nsec per loop
String of length 1, using +=:
$ python -m timeit --setup='s = "s"' 's += "a"'
5000000 loops, best of 5: 97 nsec per loop
String of length 1, using +:
$ python -m timeit --setup='s = "s"' 's = s + "a"'
5000000 loops, best of 5: 97.9 nsec per loop
O(n) (optimization doesn't apply):
String of length 10⁹, optimization doesn't apply, using +=:
$ python -m timeit --setup='s = "s" * 10**9; b = s' 's += "a"'
1 loop, best of 5: 440 msec per loop
String of length 10⁹, optimization doesn't apply, using +:
$ python -m timeit --setup='s = "s" * 10**9; b = s' 's = s + "a"'
1 loop, best of 5: 445 msec per loop
String of length 1, optimization doesn't apply, using +=:
$ python -m timeit --setup='s = "s"; b = s' 's += "a"'
5000000 loops, best of 5: 85.8 nsec per loop
String of length 1, optimization doesn't apply, using +:
$ python -m timeit --setup='s = "s"; b = s' 's = s + "a"'
5000000 loops, best of 5: 84.8 nsec per loop
More info about the time complexity of string concatenation:
https://stackoverflow.com/a/37133870/9835872
https://stackoverflow.com/a/34008199/9835872
Strings in Python are immutable, so whenever you "append" to s, Python makes a copy of s and appends the new character, effectively taking O(n) time complexity for both.
Which is a faster way to calculate number of characters without whitespaces?
which is more pythonic?
def sent_length(sentence):
return sum(1 for c in sentence if c != ' ')
or
def sent_length(sentence):
return len(sentence.replace(" ", ""))
or
import re
pattern = re.compile(r'\s+')
def sent_length(sentence):
return len(re.sub(pattern, '', sentence))
You can get timings from python -m timeit:
[matt tmp] python -m timeit "sum(1 for c in 'blah blah blah' if c != ' ')"
100000 loops, best of 3: 2.96 usec per loop
[matt benchmark] python -m timeit -s "import re; pattern = re.compile(r'\s+')" "len(pattern.sub('', 'blah blah blah'))"
100000 loops, best of 3: 2.2 usec per loop
[matt tmp] python -m timeit "len(''.join('blah blah blah'.split()))"
1000000 loops, best of 3: 0.785 usec per loop
[matt tmp] python -m timeit "len('blah blah blah'.replace(' ', ''))"
1000000 loops, best of 3: 0.437 usec per loop
[matt tmp] python -m timeit "len('blah blah blah') - 'blah blah blah'.count(' ')"
1000000 loops, best of 3: 0.384 usec per loop
This will help you determine what is the quickest. More pythonic? I'd go with the fastest as performance is always important.
In terms of faster you can test that yourself.
from datetime import datetime
start = datetime.now()
# some method
end = datetime.now()
diff = end-start
In terms of what is more pythonic, I don't believe any of them are more pythonic. They are each a valid implementation that most people would accept. It's just a slightly different approach. In general regular expressions will take slightly longer to run.
I need to have a 100000 characters long string. What is the most efficient and shortest way of producing such a string in python?
The content of the string is not of importance.
Something like:
'x' * 100000 # or,
''.join('x' for x in xrange(100000)) # or,
from itertools import repeat
''.join(repeat('x', times=100000))
Or for a bit of a mixup of letters:
from string import ascii_letters
from random import choice
''.join(choice(ascii_letters) for _ in xrange(100000))
Or, for some random data:
import os
s = os.urandom(100000)
You can simply do
s = 'a' * 100000
Since efficiency is important, here's a quick benchmark for some of the approaches mentioned so far:
$ python -m timeit "" "'a'*100000"
100000 loops, best of 3: 4.99 usec per loop
$ python -m timeit "from itertools import repeat" "''.join(repeat('x', times=100000))"
1000 loops, best of 3: 2.24 msec per loop
$ python -m timeit "import array" "array.array('c',[' ']*100000).tostring()"
100 loops, best of 3: 3.92 msec per loop
$ python -m timeit "" "''.join('x' for x in xrange(100000))"
100 loops, best of 3: 5.69 msec per loop
$ python -m timeit "import os" "os.urandom(100000)"
100 loops, best of 3: 6.17 msec per loop
Not surprisingly, of the ones posted, using string multiplication is the fastest by far.
Also note that it is more efficient to multiply a single char than a multi-char string (to get the same final string length).
$ python -m timeit "" "'a'*100000"
100000 loops, best of 3: 4.99 usec per loop
$ python -m timeit "" "'ab'*50000"
100000 loops, best of 3: 6.02 usec per loop
$ python -m timeit "" "'abcd'*25000"
100000 loops, best of 3: 6 usec per loop
$ python -m timeit "" "'abcdefghij'*10000"
100000 loops, best of 3: 6.03 usec per loop
Tested on Python 2.7.3
Strings can use the multiplication operator:
"a" * 100000
Try making an array of blank characters.
import array
longCharArray = array.array('c',[' ']*100000)
This will allocate an array of ' ' characters of size 100000
longCharArray.tostring()
Will convert to a string.
Just pick some character and repeat it 100000 times:
"a"*100000
Why you would want this is another question. . .
You can try something like this:
"".join(random.sample(string.lowercase * 385,10000))
As a one liner:
''.join([chr(random.randint(32, 126)) for x in range(30)])
Change the range() value to get a different length of string; change the bounds of randint() to get a different set of characters.
I need to check that some text only contains lower-case letters a-z and a comma (",").
What is the best way to do this in Python?
import re
def matches(s):
return re.match("^[a-z,]*$", s) is not None
Which gives you:
>>> matches("tea and cakes")
False
>>> matches("twiddledee,twiddledum")
True
You can optimise a bit with re.compile:
import re
matcher = re.compile("^[a-z,]*$")
def matches(s):
return matcher.match(s) is not None
import string
allowed = set(string.lowercase + ',')
if set(text) - allowed:
# you know it has forbidden characters
else:
# it doesn't have forbidden characters
Doing it with sets will be faster than doing it with for loops (especially if you want to check more than one text) and is all together cleaner than regexes for this situation.
an alternative that might be faster than two sets, is
allowed = string.lowercase + ','
if not all(letter in allowed for letter in text):
# you know it has forbidden characthers
here's some meaningless mtimeit results. one is the generator expression and two is the set based solution.
$ python -mtimeit -s'import scratch3' 'scratch3.one("asdfas2423452345sdfadf34")'
100000 loops, best of 3: 3.98 usec per loop
$ python -mtimeit -s'import scratch3' 'scratch3.two("asdfas2423452345sdfadf34")'
100000 loops, best of 3: 4.39 usec per loop
$ python -mtimeit -s'import scratch3' 'scratch3.two("asdfasasdfadsfasdfasdfdaf")'
100000 loops, best of 3: 3.51 usec per loop
$ python -mtimeit -s'import scratch3' 'scratch3.one("asdfasasdfadsfasdfasdfdaf")'
100000 loops, best of 3: 7.7 usec per loop
You can see that the setbased one is significantly faster than the generator expression with a small expected alphabet and success conditions. the generator expression is faster with failures because it can bail. This is pretty much whats to be expected so it's interesting to see the numbers back it up.
another possibility that I forgot about is the hybrid approach.
not all(letter in allowed for letter in set(text))
$ python -mtimeit -s'import scratch3' 'scratch3.three("asdfasasdfadsfasdfasdfdaf")'
100000 loops, best of 3: 5.06 usec per loop
$ python -mtimeit -s'import scratch3' 'scratch3.three("asdfas2423452345sdfadf34")'
100000 loops, best of 3: 6.71 usec per loop
it slows down the best case-ish but speeds up the worst case-ish. All in all, you'd have to test the different possibilities over a sample of your expected input. the broader the sample, the better.
import re
if not re.search('[^a-z\,]', yourString):
# True: contains only a-z and comma
# False: contains also something else
Not sure what do you mean with "contain", but this should go in your direction:
reobj = re.compile(r"[a-z,]+")
match = reobj.search(subject)
if match:
result = match.group()
else
result = ""
Just:
def alllower(s):
if ',' in s:
s=s.replace(',','a')
return s.isalpha() and s.islower()
with most efficient and simple.
or in one line:
lambda s:s.isalpha() or (',' in s and s.replace(',','a').isalpha()) and s.islower()
#!/usr/bin/env python
import string
text = 'aasdfadf$oih,234'
for letter in text:
if letter not in string.ascii_lowercase and letter != ',':
print letter
characters a -z are represented by bytes 97 - 122 and ord(char) returns the byte value of the character. Reading the file in binary and making the match should suffice.
f = open("myfile", "rb")
retVal = False
lowerAlphabets = range(97, 123)
try:
byte = f.read(1)
while byte != "":
# Do stuff with byte.
byte = f.read(1)
if byte:
if ord(byte) not in lowerAlphabets:
retVal = True
break
finally:
f.close()
if retVal:
print "characters not from a - z"
else:
print "characters from a - z"
How can I remove all characters except numbers from string?
Use re.sub, like so:
>>> import re
>>> re.sub('\D', '', 'aas30dsa20')
'3020'
\D matches any non-digit character so, the code above, is essentially replacing every non-digit character for the empty string.
Or you can use filter, like so (in Python 2):
>>> filter(str.isdigit, 'aas30dsa20')
'3020'
Since in Python 3, filter returns an iterator instead of a list, you can use the following instead:
>>> ''.join(filter(str.isdigit, 'aas30dsa20'))
'3020'
In Python 2.*, by far the fastest approach is the .translate method:
>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>>
string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.
.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.
Back to 2.*, the performance difference is impressive...:
$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop
Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:
$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop
is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.
In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:
import string
class Del:
def __init__(self, keep=string.digits):
self.comp = dict((ord(c),c) for c in keep)
def __getitem__(self, k):
return self.comp.get(k)
DD = Del()
x='aaa12333bb445bb54b5b52'
x.translate(DD)
also emits '1233344554552'. However, putting this in xx.py we have...:
$ python3.1 -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop
...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.
s=''.join(i for i in s if i.isdigit())
Another generator variant.
You can use filter:
filter(lambda x: x.isdigit(), "dasdasd2313dsa")
On python3.0 you have to join this (kinda ugly :( )
''.join(filter(lambda x: x.isdigit(), "dasdasd2313dsa"))
You can easily do it using Regex
>>> import re
>>> re.sub("\D","","£70,000")
70000
along the lines of bayer's answer:
''.join(i for i in s if i.isdigit())
The op mentions in the comments that he wants to keep the decimal place. This can be done with the re.sub method (as per the second and IMHO best answer) by explicitly listing the characters to keep e.g.
>>> re.sub("[^0123456789\.]","","poo123.4and5fish")
'123.45'
x.translate(None, string.digits)
will delete all digits from string. To delete letters and keep the digits, do this:
x.translate(None, string.letters)
Use a generator expression:
>>> s = "foo200bar"
>>> new_s = "".join(i for i in s if i in "0123456789")
A fast version for Python 3:
# xx3.py
from collections import defaultdict
import string
_NoneType = type(None)
def keeper(keep):
table = defaultdict(_NoneType)
table.update({ord(c): c for c in keep})
return table
digit_keeper = keeper(string.digits)
Here's a performance comparison vs. regex:
$ python3.3 -mtimeit -s'import xx3; x="aaa12333bb445bb54b5b52"' 'x.translate(xx3.digit_keeper)'
1000000 loops, best of 3: 1.02 usec per loop
$ python3.3 -mtimeit -s'import re; r = re.compile(r"\D"); x="aaa12333bb445bb54b5b52"' 'r.sub("", x)'
100000 loops, best of 3: 3.43 usec per loop
So it's a little bit more than 3 times faster than regex, for me. It's also faster than class Del above, because defaultdict does all its lookups in C, rather than (slow) Python. Here's that version on my same system, for comparison.
$ python3.3 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
100000 loops, best of 3: 13.6 usec per loop
Try:
import re
string = '1abcd2XYZ3'
string_without_letters = re.sub(r'[a-z]', '', string.lower())
this should give:
123
Ugly but works:
>>> s
'aaa12333bb445bb54b5b52'
>>> a = ''.join(filter(lambda x : x.isdigit(), s))
>>> a
'1233344554552'
>>>
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 2.48 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000 loops, best of 3: 2.02 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 2.37 usec per loop
$ python -mtimeit -s'import re; x="aaa12333bab445bb54b5b52"' '"".join(re.findall("[a-z]+",x))'
100000 loops, best of 3: 1.97 usec per loop
I had observed that join is faster than sub.
You can read each character. If it is digit, then include it in the answer. The str.isdigit() method is a way to know if a character is digit.
your_input = '12kjkh2nnk34l34'
your_output = ''.join(c for c in your_input if c.isdigit())
print(your_output) # '1223434'
You can use join + filter + lambda:
''.join(filter(lambda s: s.isdigit(), "20 years ago, 2 months ago, 2 days ago"))
Output: '2022'
Not a one liner but very simple:
buffer = ""
some_str = "aas30dsa20"
for char in some_str:
if not char.isdigit():
buffer += char
print( buffer )
I used this. 'letters' should contain all the letters that you want to get rid of:
Output = Input.translate({ord(i): None for i in 'letters'}))
Example:
Input = "I would like 20 dollars for that suit"
Output = Input.translate({ord(i): None for i in 'abcdefghijklmnopqrstuvwxzy'}))
print(Output)
Output:
20
my_string="sdfsdfsdfsfsdf353dsg345435sdfs525436654.dgg("
my_string=''.join((ch if ch in '0123456789' else '') for ch in my_string)
print(output:+my_string)
output: 353345435525436654
Another one:
import re
re.sub('[^0-9]', '', 'ABC123 456')
Result:
'123456'