I'm looking for a way to find words with the exact number of the given character.
For example:
If we have this input: ['teststring1','strringrr','wow','strarirngr'] and we are looking for 4 r characters
It will return only ['strringrr','strarirngr'] because they are the words with 4 letters r in it.
I decided to use regex and read the documentation and I can't find a function that satisfies my needs.
I tried with [r{4}] but it apparently returns any word with letters r in it.
Please help
something like this:
import collections
def map_characters(string):
characters = collections.defaultdict(lambda: 0)
for char in string:
characters[char] += 1
return characters
items = ['teststring1','strringrr','wow','strarirngr']
for item in items:
characters_map = map_characters(item)
# if any of string has 4 identical letters
# we print it
if max(characters_map.values()) >= 4:
print(item)
# in the result it outputs strringrr and strarirngr
# because these words have 4 r letters
You can use str.count() to count the occurrences of a character, combined with list comprehensions to create a new list:
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
filtered = [item for item in myArray if item.count(letter) == amount]
print(filtered) # ['strringrr', 'strarirngr']
If you wanted to make this reusable (to look for different letters or different amounts), you could pack it into a function:
def filterList(stringList, pattern, occurrences):
return [item for item in stringList if item.count(pattern)==occurrences]
myArray = ['teststring1','strringrr','wow','strarirngr']
letter = "r"
amount = 4
print(filterList(myArray, letter, amount)) # ['strringrr', 'strarirngr']
The square brackets are for matching any items in the set, e.g. [abc] matches any words with a,b or c. In your case, it evaluates to [rrrr], so any one r is a match. Try it without the brackets: r{4}
Since you asked about using regex, you could use the following:
import re
l = ['teststring1', 'strringrr', 'wow', 'strarirngr']
[ word for word in l if re.match(r'(.*r.*){4}', word) ]
output: ['strringrr', 'strarirngr']
Related
so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break
The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb
A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb
Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.
You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])
I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT
Let me keep it simple,
I have a string that I want it from "10fo22baar" into ["1022","fobaar"] or ["10","fo","22","baar"]
Is there a way to do something like that in Python 3 or 2?
Part 1: You can use filter with str.isdigit() to filter numeric characters as:
>>> my_str = "10fo22baar"
>>> ''.join(filter(str.isdigit, my_str))
'1022'
To get non-numeric, you can use itertools.filterfalse():
>>> from itertools import filterfalse
>>> ''.join(filterfalse(str.isdigit, my_str))
'fobaar'
# OR, for older python versions, use list comprehension:
# ''.join(c for c in my_str if not c.isdigit())
Store above values in list to get list of your desired format.
Alternatively, you can also use regex to filter out digits and alphabets into separate lists as:
import re
my_str = "10fo22baar"
# - To extract digits, use expressions as "\d+"
# - To extract alphabets, use expressions as "[a-zA-Z]+"
digits = ''.join(re.findall('\d+', my_str))
# where `digits` variable will hold string:
# '1022'
alphabets = ''.join(re.findall('[a-zA-Z]+', my_str))
# where `alphabets` variable will hold string:
# 'fobaar'
# Create your desired list from above variables:
# my_list = [digits, alphabets]
You can simplify above logic in one-line as:
my_regex = ['\d+', '[a-zA-Z]+']
my_list = [''.join(re.findall(r, my_str)) for r in my_regex]
# where `my_list` will give you:
# ['1022', 'fobaar']
Part 2: You can use itertools.groupby() to get your second desired format of list with digits and alphabets grouped together maintaining the ordwe in single list as:
from itertools import groupby
my_list = [''.join(x) for _, x in groupby(my_str, str.isdigit)]
# where `my_list` will give you:
# ['10', 'fo', '22', 'baar']
You could try to make a for loop. Like this:
str = "10fo22baar"
nums = []
chars = []
for char in str:
try:
int(char)
nums.append(char)
except ValueError:
chars.append(char)
sep = ["".join(nums), "".join(chars)]
print(sep)
Output would be: ['1022', 'fobaar']
Using string methods:
s = "10fo22baar"
num = ""
string = ""
for char in s:
if char.isnumeric():
num += str(char)
else:
string += str(char)
print(num, string)
Gives ('1022', 'fobaar')
Here's a simple and easy to understand solution.
mystring = '10fo22baar'
nums = []
chars = []
for char in mystring:
if char in ['0','1','2','3','4','5','6','7','8''9']:
nums.append(char)
else:
chars.append(char)
How it works:
We start with mystring set to the string we want to read.
We define two new lists for our numbers and regular chars.
We loop through each char in mystring.
If the current char of the loop iteration is a number, we append it to the number list.
If it's not a number, it must be a normal char. We append it to chars.
That's it
1-First, we would have to do a for to go through the entire string.
2-After that to check if the character is a number or not, we could use two methods:
string.isnumeric()
or
string.isalpha()
3-After checking, we separate the characters into lists and format them to our liking.
Our code looks like this:
myString = '10fo22baar'
charString = []
charNum = []
for char in myString:
if char.isnumeric():
charNum.append(char)
else:
charString.append(char)
myString = [''.join(charNum), ''.join(charString)]
print(myString)
I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.
A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g
If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string
You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']
I'm trying to create a basic program to pick out the positions of words in a quote. So far, I've got the following code:
print("Your word appears in your quote at position(s)", string.index(word))
However, this only prints the first position where the word is indexed, which is fine if the quote only contains the word once, but if the word appears multiple times, it will still only print the first position and none of the others.
How can I make it so that the program will print every position in succession?
Note: very confusingly, string here stores a list. The program is supposed to find the positions of words stored within this list.
It seems that you're trying to find occurrences of a word inside a string: the re library has a function called finditer that is ideal for this purpose. We can use this along with a list comprehension to make a list of the indexes of a word:
>>> import re
>>> word = "foo"
>>> string = "Bar foo lorem foo ipsum"
>>> [x.start() for x in re.finditer(word, string)]
[4, 14]
This function will find matches even if the word is inside another, like this:
>>> [x.start() for x in re.finditer("foo", "Lorem ipsum foobar")]
[12]
If you don't want this, encase your word inside a regular expression like this:
[x.start() for x in re.finditer("\s+" + word + "\s+", string)]
Probably not the fastest/best way but it will work. Used in rather than == in case there were quotations or other unexpected punctuation aswell! Hope this helps!!
def getWord(string, word):
index = 0
data = []
for i in string.split(' '):
if i.lower() in word.lower():
data.append(index)
index += 1
return data
Here is a code I quickly made that should work:
string = "Hello my name is Amit and I'm answering your question".split(' ')
indices = [index for (word, index) in enumerate(string) if word == "QUERY"]
That should work, although returns the index of the word. You could make a calculation that adds the lengths of all words before that word to get the index of the letter.
I have a string of characters with no specific pattern. I have to look for some specific words and then extract some information.
Currently I am stuck at finding the position of the last number in a string.
So, for example if:
mystring="The total income from company xy was 12320 for the last year and 11932 in the previous year"
I want to find out the position of the last number in this string.
So the result should be "2" in position "70".
You can do this with a regular expression, here's a quick attempt:
>>>mo = re.match('.+([0-9])[^0-9]*$', mystring)
>>>print mo.group(1), mo.start(1)
2 69
This is a 0-based position, of course.
You can use a generator expression to loop over the enumerate from trailing within a next function:
>>> next(i for i,j in list(enumerate(mystring,1))[::-1] if j.isdigit())
70
Or using regex :
>>> import re
>>>
>>> m=re.search(r'(\d)[^\d]*$',mystring)
>>> m.start()+1
70
Save all the digits from the string in an array and pop the last one out of it.
array = [int(s) for s in mystring.split() if s.isdigit()]
lastdigit = array.pop()
It is faster than a regex approach and looks more readable than it.
def find_last(s):
temp = list(enumerate(s))
temp.reverse()
for pos, chr in temp:
try:
return(pos, int(chr))
except ValueError:
continue
You could reverse the string and get the first match with a simple regex:
s = mystring[::-1]
m = re.search('\d', s)
pos = len(s) - m.start(0)