Get substring between strings from a python list - python

How to get the content between strings &quot and autoRefresh which will be /commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828? from a list as below, I just need the first match (there could be multiple matches).
['something', 'something', ' something top.window.location.href = "/commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828?autoRefresh=0&s=Jobs";">','something']
Tried
link = re.search('"(.*?)autoRefresh', big_list)
print link.group(1)
and got TypeError: expected string or buffer

You need to iterate over the list, checking each string:
big_list = ['something', 'something', ' something top.window.location.href = "/commander/link/jobDetails/jobs/a2537f238-8622-11ee-a1a0-f0921c14c828?autoRefresh=0&s=Jobs";">','something']
def get_all_subs(lst, pat, grp=0):
patt = re.compile(pat)
for s in lst:
m = patt.search(s, grp)
if m:
yield m.group(grp)
print(list(get_all_subs(big_list, '"(.*?)autoRefresh', 1)))
Or call str.join on the list and use findall:
print(re.findall('"(.*?)autoRefresh', "".join(big_list)))

You may use the following:
re.search(r'(?<=&quot).*?(?=autoRefresh)', ''.join(YourList))

Related

How to split a single value list to multiple values in same list

I did some workarounds but none of them worked so here I am with a question on how can we split a value from a list based on a keyword and update in the same list
here is my code,
result_list = ['48608541\ncsm_radar_main_dev-7319-userdevsigned\nLogd\nG2A0P3027145002X\nRadar\ncompleted 2022-10-25T10:43:01\nPASS: 12FAIL: 1SKIP: 1\n2:25:36']
what I want to remove '\n' and write something like this,
result_list = ['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', .....]
You need to split each of the string by \n which results in a list that you need to flatten. You can use list-comprehension:
>>> [x for item in result_list for x in item.split('\n') ]
# output:
['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', 'Radar', 'completed 2022-10-25T10:43:01', 'PASS: 12FAIL: 1SKIP: 1', '2:25:36']
this will split each element of your list at \n and update in same list
result_list = [item for i in result_list for item in i.split('\n') ]
Solution using regex (re):
import re
result_list = re.split('\n', result_list[0])
#output:
['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', 'Radar', 'completed 2022-10-25T10:43:01', 'PASS: 12FAIL: 1SKIP: 1', '2:25:36']
The split() method of the str object does this:
Return a list of the words in the string, using sep as the delimiter string.
>>> '1,2,3'.split(',')
['1', '2', '3']
so here we have the answer as follows:
string_object = "48608541\ncsm_radar_main_dev-7319-userdevsigned\nLogd\nG2A0P3027145002X\nRadar\ncompleted 2022-10-25T10:43:01\nPASS: 12FAIL: 1SKIP: 1\n2:25:36"
result_list = string_object.split(sep='\n')

Python Split Strings While Preserving Order?

I have a list of strings in python, where I need to preserve order and split some strings.
The condition to split a string is that after first match of : there is a none space/new line/tab char.
For example, this must be split:
example: Test to ['example':, 'Test']
While this stays the same: example: , IGNORE_ME_EXAMPLE
Given an input like this:
['example: Test', 'example: ', 'IGNORE_ME_EXAMPLE']
I'm expecting:
['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
Please Note that split strings are yet stick to each other and follow original order.
Plus, whenever I split a string I don't want to check split parts again. In other words, I don't want to check 'Test' after I split it.
To make it more clear, Given an input like this:
['example: Test::YES']
I'm expecting:
['example:', 'Test::YES']
You can use regular expressions for that:
import re
pattern = re.compile(r"(.+:)\s+([^\s].+)")
result = []
for line in lines:
match = pattern.match(line)
if match:
result.append(match.group(1))
result.append(match.group(2))
else:
result.append(line)
You can use nested loop comprehension for the input list:
l = ['example: Test::YES']
l1 = [j.lower().strip() for i in l for j in i.split(":", 1) if j.strip().lower() != '']
print(l1)
Output:
['example', 'Test::YES']
you need to iterate over your list of words, for each word, you need to check if : present or not. if present the then split the word in 2 parts, pre : and post part. append these pre and post to final list and if there is no : in word add that word in the result list and skip other operation for that word
# your code goes here
wordlist = ['example:', 'Test', 'example: ', 'IGNORE_ME_EXAMPLE']
result = []
for word in wordlist:
index = -1
part1, part2 = None, None
if ':' in word:
index = word.index(':')
else:
result.append(word)
continue
part1, part2 = word[:index+1], word[index+1:]
if part1 is not None and len(part1)>0:
result.append(part1)
if part2 is not None and len(part2)>0:
result.append(part2)
print(result)
output
['example:', 'Test', 'example:', ' ', 'IGNORE_ME_EXAMPLE']

Replace multiple characters in a string

Is there a simple way in python to replace multiples characters by another?
For instance, I would like to change:
name1_22:3-3(+):Pos_bos
to
name1_22_3-3_+__Pos_bos
So basically replace all "(",")",":" with "_".
I only know to do it with:
str.replace(":","_")
str.replace(")","_")
str.replace("(","_")
You could use re.sub to replace multiple characters with one pattern:
import re
s = 'name1_22:3-3(+):Pos_bos '
re.sub(r'[():]', '_', s)
Output
'name1_22_3-3_+__Pos_bos '
Use a translation table. In Python 2, maketrans is defined in the string module.
>>> import string
>>> table = string.maketrans("():", "___")
In Python 3, it is a str class method.
>>> table = str.maketrans("():", "___")
In both, the table is passed as the argument to str.translate.
>>> 'name1_22:3-3(+):Pos_bos'.translate(table)
'name1_22_3-3_+__Pos_bos'
In Python 3, you can also pass a single dict mapping input characters to output characters to maketrans:
table = str.maketrans({"(": "_", ")": "_", ":": "_"})
Sticking to your current approach of using replace():
s = "name1_22:3-3(+):Pos_bos"
for e in ((":", "_"), ("(", "_"), (")", "__")):
s = s.replace(*e)
print(s)
OUTPUT:
name1_22_3-3_+___Pos_bos
EDIT: (for readability)
s = "name1_22:3-3(+):Pos_bos"
replaceList = [(":", "_"), ("(", "_"), (")", "__")]
for elem in replaceList:
print(*elem) # : _, ( _, ) __ (for each iteration)
s = s.replace(*elem)
print(s)
OR
repList = [':','(',')'] # list of all the chars to replace
rChar = '_' # the char to replace with
for elem in repList:
s = s.replace(elem, rChar)
print(s)
Another possibility is usage of so-called list comprehension combined with so-called ternary conditional operator following way:
text = 'name1_22:3-3(+):Pos_bos '
out = ''.join(['_' if i in ':)(' else i for i in text])
print(out) #name1_22_3-3_+__Pos_bos
As it gives list, I use ''.join to change list of characters (strs of length 1) into str.

How to replace string to the other string in list (python)

What is the best way to replace every string in the list?
For example if I have a list:
a = ['123.txt', '1234.txt', '654.txt']
and I would like to have:
a = ['123', '1234', '654']
Assuming that sample input is similar to what you actually have, use os.path.splitext() to remove file extensions:
>>> import os
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> [os.path.splitext(item)[0] for item in a]
['123', '1234', '654']
Use a list comprehension as follows:
a = ['123.txt', '1234.txt', '654.txt']
answer = [item.replace('.txt', '') for item in a]
print(answer)
Output
['123', '1234', '654']
Assuming that all your strings end with '.txt', just slice the last four characters off.
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> a = [x[:-4] for x in a]
>>> a
['123', '1234', '654']
This will also work if you have some weird names like 'some.txtfile.txt'
You could split you with . separator and get first item:
In [486]: [x.split('.')[0] for x in a]
Out[486]: ['123', '1234', '654']
Another way to do this:
a = [x[: -len("txt")-1] for x in a]
What is the best way to replace every string in the list?
That completely depends on how you define 'best'. I, for example, like regular expressions:
import re
a = ['123.txt', '1234.txt', '654.txt']
answer = [re.sub('^(\w+)\..*', '\g<1>', item) for item in a]
#print(answer)
#['123', '1234', '654']
Depending on the content of the strings, you could adjust it:
\w+ vs [0-9]+ for only digits
\..* vs \.txt if all strings end with .txt
data.colname = [item.replace('anythingtoreplace', 'desiredoutput') for item in data.colname]
Please note here 'data' is the dataframe, 'colname' is the column name you might have in that dataframe. Even the spaces are accounted, if you want to remove them from a string or number. This was quite useful for me. Also this does not change the datatype of the column so you might have to do that separately if required.

Splitting a string based on a certain set of words

I have a list of strings like such,
['happy_feet', 'happy_hats_for_cats', 'sad_fox_or_mad_banana','sad_pandas_and_happy_cats_for_people']
Given a keyword list like ['for', 'or', 'and'] I want to be able to parse the list into another list where if the keyword list occurs in the string, split that string into multiple parts.
For example, the above set would be split into
['happy_feet', 'happy_hats', 'cats', 'sad_fox', 'mad_banana', 'sad_pandas', 'happy_cats', 'people']
Currently I've split each inner string by underscore and have a for loop looking for an index of a key word, then recombining the strings by underscore. Is there a quicker way to do this?
>>> [re.split(r"_(?:f?or|and)_", s) for s in l]
[['happy_feet'],
['happy_hats', 'cats'],
['sad_fox', 'mad_banana'],
['sad_pandas', 'happy_cats', 'people']]
To combine them into a single list, you can use
result = []
for s in l:
result.extend(re.split(r"_(?:f?or|and)_", s))
>>> pat = re.compile("_(?:%s)_"%"|".join(sorted(split_list,key=len)))
>>> list(itertools.chain(pat.split(line) for line in data))
will give you the desired output for the example dataset provided
actually with the _ delimiters you dont really need to sort it by length so you could just do
>>> pat = re.compile("_(?:%s)_"%"|".join(split_list))
>>> list(itertools.chain(pat.split(line) for line in data))
You could use a regular expression:
from itertools import chain
import re
pattern = re.compile(r'_(?:{})_'.format('|'.join([re.escape(w) for w in keywords])))
result = list(chain.from_iterable(pattern.split(w) for w in input_list))
The pattern is dynamically created from your list of keywords. The string 'happy_hats_for_cats' is split on '_for_':
>>> re.split(r'_for_', 'happy_hats_for_cats')
['happy_hats', 'cats']
but because we actually produced a set of alternatives (using the | metacharacter) you get to split on any of the keywords:
>>> re.split(r'_(?:for|or|and)_', 'sad_pandas_and_happy_cats_for_people')
['sad_pandas', 'happy_cats', 'people']
Each split result gives you a list of strings (just one if there was nothing to split on); using itertools.chain.from_iterable() lets us treat all those lists as one long iterable.
Demo:
>>> from itertools import chain
>>> import re
>>> keywords = ['for', 'or', 'and']
>>> input_list = ['happy_feet', 'happy_hats_for_cats', 'sad_fox_or_mad_banana','sad_pandas_and_happy_cats_for_people']
>>> pattern = re.compile(r'_(?:{})_'.format('|'.join([re.escape(w) for w in keywords])))
>>> list(chain.from_iterable(pattern.split(w) for w in input_list))
['happy_feet', 'happy_hats', 'cats', 'sad_fox', 'mad_banana', 'sad_pandas', 'happy_cats', 'people']
Another way of doing this, using only built-in method, is to replace all occurrence of what's in ['for', 'or', 'and'] in every string with a replacement string, say for example _1_ (it could be any string), then at then end of each iteration, to split over this replacement string:
l = ['happy_feet', 'happy_hats_for_cats', 'sad_fox_or_mad_banana','sad_pandas_and_happy_cats_for_people']
replacement_s = '_1_'
lookup = ['for', 'or', 'and']
lookup = [x.join('_'*2) for x in lookup] #Changing to: ['_for_', '_or_', '_and_']
results = []
for i,item in enumerate(l):
for s in lookup:
if s in item:
l[i] = l[i].replace(s,'_1_')
results.extend(l[i].split('_1_'))
OUTPUT:
['happy_feet', 'happy_hats', 'cats', 'sad_fox', 'mad_banana', 'sad_pandas', 'happy_cats', 'people']

Categories

Resources