find all possible overlapping prefixes in a word using python [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
many Natural Languages have prefixes that adds some meaning to a word.
for example: anti for antivirus, co for coordinator, counter for counterpart
detecting the stem needs these prefixes to be separated. suppose having a list of prefixes for a certain language:
prefix_list = ['c', 'ca', 'ata', 'de']
How to mach all possible overlapping occurrence in a word "catastrophic"
the result should be:
['c', 'ca']
trials:
| character doesn't support overlapping
Otto's solution doesn't mach overlaps in the beginning of the word
I tried to backward assertion instead in the previous solution but look-behind requires fixed-width pattern
notes:
ata can't be a result as the word doesn't start with ata

Don't use a regular expression. Use a list comprehension instead:
[prefix for prefix in prefix_list if word.startswith(prefix)]
This creates a list of all entries in prefix_list that are a prefix of word.

Related

Finding a combination of characters and digits in a string python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a list of lists, and I'm attempting to loop through and check to see if the strings in a specific index in each of the inner lists contain a combination of "XY" and then 4 numbers immediately following. The "XY" could be in various locations of the string, so I'm struggling with the syntax beyond just using "XY" in row[5]. How to I add the digits after the "XY" to check? Something that combines the "XY" and isdigit()? Am I stuck using the find function to return an index and then going from there?
You can use Python's regex module re with this pattern that matches XY and then four digits anywhere in the string.
import re
pattern = r'XY\d{4}'
my_list = [['XY0'],['XY1234','AB1234'],['XY1234','ABC123XY5678DEF6789']]
elem_to_check = 1
for row in my_list:
if len(row) > elem_to_check:
for found in re.findall(pattern, row[elem_to_check]):
print(f'{found} found in {row[elem_to_check]}')
Output:
XY5678 found in ABC123XY5678DEF6789

Split a string into segments in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to split a molecule as a string into it's individual atom components. Each atom starts at a capital letter and ends at the last number.
For example, 'SO4' would become ['S', 'O4'].
And 'C6H12O6' would become ['C6', 'H12', 'O6'].
Pretty sure I need to use the regex module. This answer is close to what I'm looking for: Split a string at uppercase letters
Use re.findall() with the pattern:
[A-Z][a-z]?\d*
[A-Z] matches any uppercase character
[a-z]? matches zero or one lowercase character
\d* matches zero or more digits
Based on your example this should work, although you should look out for any specific library for this purpose.
Example:
>>> re.findall(r'[A-Z][a-z]?\d*', 'C6H12O6')
['C6', 'H12', 'O6']
>>> re.findall(r'[A-Z][a-z]?\d*', 'SO4')
['S', 'O4']
>>> re.findall(r'[A-Z][a-z]?\d*', 'HCl')
['H', 'Cl']

How to split array of strings into sub-arrays according to special starting character? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Can you simplify this code from Python into Ruby? Lets say I have this data
data = ['hello', 'person', ';hello', 'otherperson']
print([x.split("#") for x in "#".join(data).split(";")])
When I print it it prints this:
[['hello', 'person', ''], ['hello', 'otherperson']]
Is there something like this in Ruby? If it can be accomplished in one line, I would prefer that, but I'm after just knowing how it is.
Literally translated,
data.join(?#).split(?;).map { |x| x.split(?#) }
But you might want a different approach entirely. This will misbehave if any of the strings contain #.
This works for intended output, but do note it modifies original strings, so ideally data is a deep clone (or otherwise not a problem to alter contained strings):
data.slice_before { |s| s.gsub!(/^;/,'') }.to_a
=> [["hello", "person"], ["hello", "otherperson"]]

regex for a character pattern that is scattered throughout the text [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm a Python and regex noob. I managed to get a full page of html source into the command line by the following statement.
print (driver.page_source).encode('utf-8')
Cool. But there are some predictable strings in that text that I need to extract and store into an array. The string pattern being looked for is, [4 numbers] followed by a [hyphen] followed by between 1 and 5 numbers, e.g.:
2013-80324 or 2013-03 but not 2013-832888
Thanks for any help.
(?:^|(?<=\D))\d{4}-\d{1,5}(?=\D|$)
?: denotes a non capturing group
^ matches the pattern at start of string (though unlikely for HTML input)
$ mathces the pattern at the end of string
\d denotes a digit [0-9] and \D a non-digit
{n} is a quantifier for length n
{m,n} quantifies a length of range m to n (both inclusive)

“Indentation Error: unindent does not match any outer indentation level” [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
can you please tell me what is wrong with this code?
def insert_sequence(str1, str2, index):
'''The first two parameters are DNA sequences and the third parameter
is an index. Return the DNA sequence obtained by inserting the second
DNA sequence into the first DNA sequence at the given index.
>>>insert_sequence('CCGG', 'AT',2)
CCATGG
'''
str1 = str1[0:index] + str2 + str1[index:len(str1)]
return str1
Your docstring is indented one space further than the rest of the function body. Either dedent the docstring one space or indent the rest one space (probably the latter, since that would make it four spaces, if I'm counting right).
Python is very strict about the indentation, you therefore need to make sure all your blocks are aligned correctly, the problem here is therefore that the string is one space further than the two lines below. Make them aligned and you should be good.

Categories

Resources