I have a list of strings that all follow a format of parts of the name divided by underscores. Here is the format:
string="somethingX_somethingY_one_two"
What I want to know how to do it extract "one_two" from each string in the list and rebuild the list so that each entry only has "somethingX_somethingY". I know that in C, there is a strtok function that is useful for splitting into tokens, but I'm not sure if there is a method like that or a strategy to get that same effect in Python. Help me please?
You can use split and a list comprehension:
l = ['_'.join(s.split('_')[:2]) for s in l]
If you're literally trying to remove "_one_two" from the end of the strings, then you can do this:
tail_len = len("_one_two")
strs = [s[:-tail_len] for s in strs]
If you want to remove the last two underscore-separated components, then you can do this:
strs = ["_".join(s.split("_")[:-2]) for s in strs]
If neither of these is what you want, then let update the question with more details.
I think this does what you're asking for.
s = "somethingX_somethingY_one_two"
splitted = s.split( "_" )
splitted = [ x for x in splitted if "something" in x ]
print "_".join( splitted )
Related
I have some string which contains parts separated by commas and need to add some part to each and assign all to array of variables.
the string looks like
chp_algos = 'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'
I want to put in array which looks like:
arr = [
[AES128_CBC],
[AES128_CBC_fer],
[AES128_SSE],
[AES128_SSE_fer],
[AES64_CBC],
[AES64_CBC_fer],
[AES33_CBC],
[AES33_CBC_fer]
]
and I want to map the following final result to db
f = 'AES128_CBC_fer AES128_SSE_fer AES64_CBC_fer AES33_CBC_fer'
As written in the question, chp_algos is a tuple, not a string. So, it is already "split"
I'd recommend not using a list of lists. Just create a list of strings.
from itertools import chain
arr = list(chain.from_iterable([x, x + '_fer'] for x in chp_algos))
Output
['AES256_SSE',
'AES256_SSE_fer',
'AES128_CBC',
'AES128_CBC_fer',
'AES64_CBC',
'AES64_CBC_fer',
'AES33_CBC',
'AES33_CBC_fer']
With that, you can filter.
But you could also just skip arr and build a new list from concatenating to the original string values
f = ' '.join(x for x in arr if x.endswith('_fer'))
You can do this by sorting chp_algos then using f strings in a generator expression
>>> ' '.join(f'{i}_fer' for i in sorted(chp_algos))
'AES128_CBC_fer AES256_SSE_fer AES33_CBC_fer AES64_CBC_fer'
Could you clarify your string chp_algos? The way you wrote it now it is not compatible with python.
Anyway, what you can do in your case, assuming that chp_algos is a string of the form chp_algos= "'AES256_SSE','AES128_CBC','AES64_CBC','AES33_CBC'", then you can split the string into a list of strings via chp_algos.split(",").
The argument of split() is the delimiter which should be used to split the string.
Now you have something like array = ["'AES256_SSE'", "'AES128_CBC'", "'AES64_CBC'", "'AES33_CBC'"].
To get the array that you want you can just do a simple loop through your array:
arr = []
for element in array:
arr.append([element])
arr.append([element + str("_fer")])
Now you might have some issues with the quotes (depends on how your data looks like). But these you can just remove by looking at the relevant indices of element. To do this just replace element in the code above by element[1:-2]. This removes the first and the last element of the string.
To get the f string in the very end, you can just loop through arr[1::2] which returns every 2nd element of arr starting at the second one (index 1).
Say we have a string:
s = 'AES256_SSE,AES128_CBC,AES64_CBC,AES33_CBC'
In order to replace commas with a space and append a suffix to each part:
' '.join(f'{p}_fer' for p in s.split(','))
As for an array:
def g(s):
for s in s.split(','):
yield s
yield f'{s}_fer'
arr = [*g(s)]
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
How do I remove the gibberish from my string before valid so that I have something like this -
valid_name0
valid_name1
If your strings always contains valid word, then you can try something like -
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
for s in (a, b):
print(s[s.rfind('valid'):])
So, even if the prefix contains _ or substring valid in it, the output will be correct. Though if your valid substring contains the word valid multiple times, then this will not work
We can try using re.sub here:
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
inp = [a, b]
output = [re.sub(r'^[^_]+_', '', i) for i in inp]
print(output) # ['valid_name0', 'valid_name1']
You can use a split join approach for this.
Try this:
a = "aajfkdfvf_valid_name0"
valid_a = '_'.join(a.split('_')[1:])
# 'valid_name0'
# can use maxsplit to split only once at the first _ and then take the remaining part of the string
another_valid_a = a.split('_',1)[1]
# valid_name0
Basically what this is doing is that it is splitting the original string at the _, then ignoring the first element and joining the remaining part again using _.
The other approaches seem a bit too over-engineered for this task, at least in my opinion.
If you already know that the gibberish comes before the first underscore _ character, you can just do a single str.split and discard the first split result:
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
def clean_string(s: str) -> str:
return s.split('_', 1)[1]
print(clean_string(a)) # valid_name0
print(clean_string(b)) # valid_name1
If you're sure that just a '_' is your need, a string split will help:
fixed_a = '_'.join(a.split('_')[1:])
The worst case is that this pattern is not the only one you're looking at. Then, check this:
You need to know exactly what your 'valid_name' looks like, you could make a REGEX to achieve your need.
Check for standards, patterns and all those.
I'm pretty sure if is there a pattern, a Regex can handle.
I recommend this site to do so.
Have a scenario where I wanted to split a string partially and pick up the 1st portion of the string.
Say String could be like aloha_maui_d0_b0 or new_york_d9_b10. Note: After d its numerical and it could be any size.
I wanted to partially strip any string before _d* i.e. wanted only _d0_b0 or _d9_b10.
Tried below code, but obviously it removes the split term as well.
print(("aloha_maui_d0_b0").split("_d"))
#Output is : ['aloha_maui', '0_b0']
#But Wanted : _d0_b0
Is there any other way to get the partial portion? Do I need to try out in regexp?
How about just
stArr = "aloha_maui_d0_b0".split("_d")
st2 = '_d' + stArr[1]
This should do the trick if the string always has a '_d' in it
You can use index() to split in 2 parts:
s = 'aloha_maui_d0_b0'
idx = s.index('_d')
l = [s[:idx], s[idx:]]
# l = ['aloha_maui', '_d0_b0']
Edit: You can also use this if you have multiple _d in your string:
s = 'aloha_maui_d0_b0_d1_b1_d2_b2'
idxs = [n for n in range(len(s)) if n == 0 or s.find('_d', n) == n]
parts = [s[i:j] for i,j in zip(idxs, idxs[1:]+[None])]
# parts = ['aloha_maui', '_d0_b0', '_d1_b1', '_d2_b2']
I have two suggestions.
partition()
Use the method partition() to get a tuple containing the delimiter as one of the elements and use the + operator to get the String you want:
teste1 = 'aloha_maui_d0_b0'
partitiontest = teste1.partition('_d')
print(partitiontest)
print(partitiontest[1] + partitiontest[2])
Output:
('aloha_maui', '_d', '0_b0')
_d0_b0
The partition() methods returns a tuple with the first element being what is before the delimiter, the second being the delimiter itself and the third being what is after the delimiter.
The method does that to the first case of the delimiter it finds on the String, so you can't use it to split in more than 3 without extra work on the code. For that my second suggestion would be better.
replace()
Use the method replace() to insert an extra character (or characters) right before your delimiter (_d) and use these as the delimiter on the split() method.
teste2 = 'new_york_d9_b10'
replacetest = teste2.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
new_york|_d9_b10
['new_york', '_d9_b10']
Since it replaces all cases of _d on the String for |_d there is no problem on using it to split in more than 2.
Problem?
A situation to which you may need to be careful would be for unwanted splits because of _d being present in more places than anticipated.
Following the apparent logic of your examples with city names and numericals, you might have something like this:
teste3 = 'rio_de_janeiro_d3_b32'
replacetest = teste3.replace('_d', '|_d')
print(replacetest)
splitlist = replacetest.split('|')
print(splitlist)
Output:
rio|_de_janeiro|_d3_b32
['rio', '_de_janeiro', '_d3_b32']
Assuming you always have the numerical on the end of the String and _d won't happen inside the numerical, rpartition() could be a solution:
rpartitiontest = teste3.rpartition('_d')
print(rpartitiontest)
print(rpartitiontest[1] + rpartitiontest[2])
Output:
('rio_de_janeiro', '_d', '3_b32')
_d3_b32
Since rpartition() starts the search on the String's end and only takes the first match to separate the terms into a tuple, you won't have to worry about the first term (city's name?) causing unexpected splits.
Use regex's split and keep delimiters capability:
import re
patre = re.compile(r"(_d\d)")
#👆 👆
#note the surrounding parenthesises - they're what drives "keep"
for line in """aloha_maui_d0_b0 new_york_d9_b10""".split():
parts = patre.split(line)
print("\n", line)
print(parts)
p1, p2 = parts[0], "".join(parts[1:])
print(p1, p2)
output:
aloha_maui_d0_b0
['aloha_maui', '_d0', '_b0']
aloha_maui _d0_b0
new_york_d9_b10
['new_york', '_d9', '_b10']
new_york _d9_b10
credit due: https://stackoverflow.com/a/15668433
How do I split a string at the second underscore in Python so that I get something like this
name = this_is_my_name_and_its_cool
split name so I get this ["this_is", "my_name_and_its_cool"]
the following statement will split name into a list of strings
a=name.split("_")
you can combine whatever strings you want using join, in this case using the first two words
b="_".join(a[:2])
c="_".join(a[2:])
maybe you can write a small function that takes as argument the number of words (n) after which you want to split
def func(name, n):
a=name.split("_")
b="_".join(a[:n])
c="_".join(a[n:])
return [b,c]
Assuming that you have a string with multiple instances of the same delimiter and you want to split at the nth delimiter, ignoring the others.
Here's a solution using just split and join, without complicated regular expressions. This might be a bit easier to adapt to other delimiters and particularly other values of n.
def split_at(s, c, n):
words = s.split(c)
return c.join(words[:n]), c.join(words[n:])
Example:
>>> split_at('this_is_my_name_and_its_cool', '_', 2)
('this_is', 'my_name_and_its_cool')
I think you're trying the split the string based on second underscore. If yes, then you used use findall function.
>>> import re
>>> s = "this_is_my_name_and_its_cool"
>>> re.findall(r'^[^_]*_[^_]*|[^_].*$', s)
['this_is', 'my_name_and_its_cool']
>>> [i for i in re.findall(r'^[^_]*_[^_]*|(?!_).*$', s) if i]
['this_is', 'my_name_and_its_cool']
print re.split(r"(^[^_]+_[^_]+)_","this_is_my_name_and_its_cool")
Try this.
Here's a quick & dirty way to do it:
s = 'this_is_my_name_and_its_cool'
i = s.find('_'); i = s.find('_', i+1)
print [s[:i], s[i+1:]]
output
['this_is', 'my_name_and_its_cool']
You could generalize this approach to split on the nth separator by putting the find() into a loop.
Is there a way to pass in a list instead of a char to str.strip() in python? I have been doing it this way:
unwanted = [c for c in '!##$%^&*(FGHJKmn']
s = 'FFFFoFob*&%ar**^'
for u in unwanted:
s = s.strip(u)
print s
Desired output, this output is correct but there should be some sort of a more elegant way than how i'm coding it above:
oFob*&%ar
Strip and friends take a string representing a set of characters, so you can skip the loop:
>>> s = 'FFFFoFob*&%ar**^'
>>> s.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
(the downside of this is that things like fn.rstrip(".png") seems to work for many filenames, but doesn't really work)
Since, you are looking to not delete elements from the middle, you can just use.
>>> 'FFFFoFob*&%ar**^'.strip('!##$%^&*(FGHJKmn')
'oFob*&%ar'
Otherwise, Use str.translate().
>>> 'FFFFoFob*&%ar**^'.translate(None, '!##$%^&*(FGHJKmn')
'oobar'