I have f = imgString.split('medias/')[1] g = f.split('?')[0] print(g) but I'd prefer it on one line. How can I split this string into multiple parts 'media/Clearance.png?sometexthere' .Ideally I'd like just the Clearance.png. so if I was splitting it it'd be 'media/', 'Clearance.png' and '?sometexthere'
string = 'media/Clearance.png?sometexthere'
string.split("/")[1].split("?")[0]
If it is always the same format you can use regex like this one :
([a-zA-Z]*)\/(.*)\?([a-zA-Z]*) and then with re.group() you can have all the parts of your string :)
You can check it here link !
Related
I need to import CSV file which contains all values in one column although it should be on 3 different columns.
The value I want to split is looking like this "2020-12-30 13:17:00Mojito5.5". I want to look like this: "2020-12-30 13:17:00 Mojito 5.5"
I tried different approaches to splitting it but I either get the error " Dataframe object has no attribute 'split' or something similar.
Any ideas how I can split this?
Assuming you always want to add spaces around a word without special characters and numbers you can use this regex:
def add_spaces(m):
return f' {m.group(0)} '
import re
s = "2020-12-30 13:17:00Mojito5.5"
re.sub('[a-zA-Z]+', add_spaces, s)
We could use a regex approach here:
inp = "2020-12-30 13:17:00Mojito5.5"
m = re.findall(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(\w+?)(\d+(?:\.\d+)?)', inp)
print(m) # [('2020-12-30 13:17:00', 'Mojito', '5.5')]
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
How do I remove the gibberish from my string before valid so that I have something like this -
valid_name0
valid_name1
If your strings always contains valid word, then you can try something like -
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
for s in (a, b):
print(s[s.rfind('valid'):])
So, even if the prefix contains _ or substring valid in it, the output will be correct. Though if your valid substring contains the word valid multiple times, then this will not work
We can try using re.sub here:
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
inp = [a, b]
output = [re.sub(r'^[^_]+_', '', i) for i in inp]
print(output) # ['valid_name0', 'valid_name1']
You can use a split join approach for this.
Try this:
a = "aajfkdfvf_valid_name0"
valid_a = '_'.join(a.split('_')[1:])
# 'valid_name0'
# can use maxsplit to split only once at the first _ and then take the remaining part of the string
another_valid_a = a.split('_',1)[1]
# valid_name0
Basically what this is doing is that it is splitting the original string at the _, then ignoring the first element and joining the remaining part again using _.
The other approaches seem a bit too over-engineered for this task, at least in my opinion.
If you already know that the gibberish comes before the first underscore _ character, you can just do a single str.split and discard the first split result:
a = "aajfkdfvf_valid_name0"
b = "gdhdhsdsdeeeeex_valid_name1"
def clean_string(s: str) -> str:
return s.split('_', 1)[1]
print(clean_string(a)) # valid_name0
print(clean_string(b)) # valid_name1
If you're sure that just a '_' is your need, a string split will help:
fixed_a = '_'.join(a.split('_')[1:])
The worst case is that this pattern is not the only one you're looking at. Then, check this:
You need to know exactly what your 'valid_name' looks like, you could make a REGEX to achieve your need.
Check for standards, patterns and all those.
I'm pretty sure if is there a pattern, a Regex can handle.
I recommend this site to do so.
Lets say I have a bunch of strings, and they can only be in the following formats:
format1 = 'substring1#substring2'
format2 = 'substring1$substring2'
format3 = 'substring1'
Let me explain. The strings are sometimes divided using the # or $ character. However other times, they are not.
I want to remove the part that appears after the # or $, if it exists. If it was just one special character, that is #, I could have done this:
string = string.split('#')[0]
But how can I do it for the 2 special characters in a quick and elegant way? Also assume the following things:
Only one special character can appear in the string.
The special characters will not appear in any other part of the string.
Thanks.
If you want to avoid regex, one possibility would be:
string = min(string.split('#')[0], string.split('$')[0])
Regular expressions.
import re
re.sub('[$#].*', '', string_to_modify)
Use regex!
import re
new_string = re.sub(r"(#|$).*$", "", string)
Use re.split() for it.
import re
print(re.split("#|$","STRING#OTHER_STRING#OTHER_STRING_2"))
Use replace before split:
format1 = 'substring1#substring2'
format2 = 'substring1$substring2'
format3 = 'substring1'
print(format1.replace('#', '$').split('$')[0])
print(format2.replace('#', '$').split('$')[0])
print(format3.replace('#', '$').split('$')[0])
Output
substring1
substring1
substring1
You can use a for loop to split by an arbitrary number of delimiters. Regular expression is typically less efficient than Python str methods.
def converter(x, delims='#$'):
for delim in delims:
x = x.split(delim, maxsplit=1)[0]
return x
format1 = 'substring1#substring2'
format2 = 'substring1$substring2'
format3 = 'substring1'
for value in [format1, format2, format3]:
print(converter(value))
# substring1
# substring1
# substring1
I think you can use a list to maintain the special characters that can be used and for every of them check if is present in the string, when you find one execute the splitting process and retrieve only the left part like so:
delimters = ["#","$"]
for symbol in delimters:
if symbol in string1:
left_part = string1.split(symbol)[0]
Now this approach has some disadvantages but is the simplest in my opinion. The problem is that if you have more than one string you need to nested loops.
I have the following text:
ABC=ABC.2016.001.02.Yomama.01234
How to lowercase just the Yomama part. I'd like it to look like this:
ABA.2016.001.02.yomama.01234
How can I accomplish this with python?
Any help would be appreciated. Thanks.
Assuming that you want a generic solution (otherwise you could just use str.replace() with a hard coded string) you can split the string on the ., lowercase the string in the appropriate field, and then stitch it back together with str.join():
s = 'ABC=ABC.2016.001.02.Yomama.01234'
fields = s.split('.')
fields[4] = fields[4].lower()
print('.'.join(fields))
Alternative solution, provided text ABC don't have repeating text
tmp = ABC.split('.')[-2]
ABC = ABC.replace(tmp, tmp.lower())
How do I split a string at the second underscore in Python so that I get something like this
name = this_is_my_name_and_its_cool
split name so I get this ["this_is", "my_name_and_its_cool"]
the following statement will split name into a list of strings
a=name.split("_")
you can combine whatever strings you want using join, in this case using the first two words
b="_".join(a[:2])
c="_".join(a[2:])
maybe you can write a small function that takes as argument the number of words (n) after which you want to split
def func(name, n):
a=name.split("_")
b="_".join(a[:n])
c="_".join(a[n:])
return [b,c]
Assuming that you have a string with multiple instances of the same delimiter and you want to split at the nth delimiter, ignoring the others.
Here's a solution using just split and join, without complicated regular expressions. This might be a bit easier to adapt to other delimiters and particularly other values of n.
def split_at(s, c, n):
words = s.split(c)
return c.join(words[:n]), c.join(words[n:])
Example:
>>> split_at('this_is_my_name_and_its_cool', '_', 2)
('this_is', 'my_name_and_its_cool')
I think you're trying the split the string based on second underscore. If yes, then you used use findall function.
>>> import re
>>> s = "this_is_my_name_and_its_cool"
>>> re.findall(r'^[^_]*_[^_]*|[^_].*$', s)
['this_is', 'my_name_and_its_cool']
>>> [i for i in re.findall(r'^[^_]*_[^_]*|(?!_).*$', s) if i]
['this_is', 'my_name_and_its_cool']
print re.split(r"(^[^_]+_[^_]+)_","this_is_my_name_and_its_cool")
Try this.
Here's a quick & dirty way to do it:
s = 'this_is_my_name_and_its_cool'
i = s.find('_'); i = s.find('_', i+1)
print [s[:i], s[i+1:]]
output
['this_is', 'my_name_and_its_cool']
You could generalize this approach to split on the nth separator by putting the find() into a loop.