Remove double quotes and special characters from string list python - python

I'm pretty new in python.
I have a list like this:
['SACOL1123', "('SA1123', 'AAW38003.1')"]
['SACOL1124', "('SA1124', 'AAW38004.1')"]
And I want to remove the extra double quotes and paranthesis, so it looks like this:
['SACOL1123', 'SA1123', 'AAW38003.1']
['SACOL1124', 'SA1124', 'AAW38004.1']
This is what I managed to do:
newList = [s.replace('"(', '') for s in list]
newList = [s.replace(')"', '') for s in newList]
But the output is exactly like the input list. How can I do it?

This is possible using ast.literal_eval. Your second element from list is string representation of a valid Python tuple which you can safely evaluate.
[[x[0]] + list(ast.literal_eval(x[1])) for x in lst]
Code:
import ast
lst = [['SACOL1123', "('SA1123', 'AAW38003.1')"],
['SACOL1124', "('SA1124', 'AAW38004.1')"]]
output = [[x[0]] + list(ast.literal_eval(x[1])) for x in lst]
# [['SACOL1123', 'SA1123', 'AAW38003.1'],
# ['SACOL1124', 'SA1124', 'AAW38004.1']]

This can be done by converting each item in the list to a string and then substituting the punctuation with empty string. Hope this helps:
import re
List = [['SACOL1123', "('SA1123', 'AAW38003.1')"],
['SACOL1124', "('SA1124', 'AAW38004.1')"]]
New_List = []
for Item in List:
New_List.append(re.sub('[\(\)\"\'\[\]\,]', '', str(Item)).split())
New_List
Output: [['SACOL1123', 'SA1123', 'AAW38003.1'],
['SACOL1124', 'SA1124', 'AAW38004.1']]

Related

How to split a single value list to multiple values in same list

I did some workarounds but none of them worked so here I am with a question on how can we split a value from a list based on a keyword and update in the same list
here is my code,
result_list = ['48608541\ncsm_radar_main_dev-7319-userdevsigned\nLogd\nG2A0P3027145002X\nRadar\ncompleted 2022-10-25T10:43:01\nPASS: 12FAIL: 1SKIP: 1\n2:25:36']
what I want to remove '\n' and write something like this,
result_list = ['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', .....]
You need to split each of the string by \n which results in a list that you need to flatten. You can use list-comprehension:
>>> [x for item in result_list for x in item.split('\n') ]
# output:
['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', 'Radar', 'completed 2022-10-25T10:43:01', 'PASS: 12FAIL: 1SKIP: 1', '2:25:36']
this will split each element of your list at \n and update in same list
result_list = [item for i in result_list for item in i.split('\n') ]
Solution using regex (re):
import re
result_list = re.split('\n', result_list[0])
#output:
['48608541', 'csm_radar_main_dev-7319-userdevsigned', 'Logd', 'G2A0P3027145002X', 'Radar', 'completed 2022-10-25T10:43:01', 'PASS: 12FAIL: 1SKIP: 1', '2:25:36']
The split() method of the str object does this:
Return a list of the words in the string, using sep as the delimiter string.
>>> '1,2,3'.split(',')
['1', '2', '3']
so here we have the answer as follows:
string_object = "48608541\ncsm_radar_main_dev-7319-userdevsigned\nLogd\nG2A0P3027145002X\nRadar\ncompleted 2022-10-25T10:43:01\nPASS: 12FAIL: 1SKIP: 1\n2:25:36"
result_list = string_object.split(sep='\n')

Not finding a good regex pattern to substitute the strings in a correct order(python)

I have a list of column names that are in string format like below:
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
Now I want to add df[] with " ' " to each column name using regex and I did it which does that when the list has (wallet-phone) this kind of string it gives an output like this df[('wallet']-df['phone')]. How do I get like this (df['wallet']-df['phone']), Is my pattern wrong. Please refer it below:
import re
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
x=[]
y=[]
for l in lst:
x.append(re.sub(r"([^+\-*\/'\d]+)", r"'\1'", l))
for f in x:
y.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',f))
print(x)
print(y)
gives:
x:["'plug'", "'[plug'+'wallet]'", "'(wallet'-'phone)'"]
y:["df['plug']", "df['[plug']+df['wallet]']", "df['(wallet']-df['phone)']"]
Is the pattern wrong?
Expected output:
x:["'plug'", "['plug'+'wallet']", "('wallet'-'phone')"]
y:["df['plug']", "[df['plug']+df['wallet']]", "(df['wallet']-df['phone'])"]
I also tried ([^+\-*\/()[]'\d]+) this pattern but it isn't avoiding () or []
It might be easier to locate words and enclose them in the dictionary reference:
import re
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
z = [re.sub(r"(\w+)",r"df['\1']",w) for w in lst]
print(z)
["df['plug']", "[df['plug']+df['wallet']]", "(df['wallet']-df['phone'])"]

How to extract strings between two markers for each object of a list in python

I got a list of strings. Those strings have all the two markers in. I would love to extract the string between those two markers for each string in that list.
example:
markers 'XXX' and 'YYY' --> therefore i want to extract 78665786 and 6866
['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
You can just loop over your list and grab the substring. You can do something like:
import re
my_list = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
output = []
for item in my_list:
output.append(re.search('XXX(.*)YYY', item).group(1))
print(output)
Output:
['78665786', '6866']
import re
l = ['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
l = [re.search(r'XXX(.*)YYY', i).group(1) for i in l]
This should work
Another solution would be:
import re
test_string=['XXX78665786YYYjajk','XXX78665783336YYYjajk']
int_val=[int(re.search(r'\d+', x).group()) for x in test_string]
the command split() splits a String into different parts.
list1 = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
list2 = []
for i in list1:
d = i.split("XXX")
for g in d:
d = g.split("YYY")
list2.append(d)
print(list2)
it's saved into a list

Does string contain any of the words in my list?

I want to check a string to see if it contains any of the words i have in my list.
the list is has somewhere around 100 individual words.
i have tried using regex but cant get it to work...
string = "<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>"
list = ['Café','Afrikansk','............','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
in this case the string has 'Dansk' in it. The string could contain more than one of the words in the list.
i want to write a piece of code that prints the words in the list which is also in the string.
in this case the output should be: Dansk
if there was more than one word in the string it should be: Dansk, ...., ....
I hope someone can help
>>> list = ['Café','Afrikansk','............','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
>>> string = """<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>"""
>>> [x for x in list if x in string]
['Dansk']
I recommend not using list as a variable name, as it usually referring to the type list (like str or int)
Use a list comprehension with a membership check:
[x for x in lst if x in string]
Note that I have renamed your list to lst, as list is built-in.
Example:
string = '<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>'
lst = ['Café','Afrikansk','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
print([x for x in lst if x in string])
# ['Dansk']
in your case you can use:
string_intersection = set(string.replace(',', '').split()).intersection(my_list)
print(*string_intersection, sep =',')
output:
Dansk

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

Categories

Resources