Adding special character to a column names - python

I have a list of column names that are in string format like below:
lst = ["plug", "plug+wallet", "wallet-phone"]
I want to add df[] along with " ' ".
I am using regex to substitute it. But the regex which I am using works fine when the list is like this:-
lst = [" 'plug'", "'plug'+'wallet'", "'wallet'-'phone'"]
x=[]
for l in lst: x.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',l))
print(x)
the result is as excepted
x: [" df['plug']", "df['plug']+df['wallet']", "df['wallet']-df['phone']"]
But when list is like this:
lst = ["plug", "plug+wallet", "wallet-phone"]
x=[]
y=[]
for l in lst: x.append(re.sub(r"('[^+\-*\/'\d]+')", r'\1',l))
for f in x: y.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',f))
print(x)
print(y)
This gives:
['plug', 'plug+wallet', 'wallet-phone']
['plug', 'plug+wallet', 'wallet-phone']
Where am I going wrong? Am I missing anything in the first regex pattern or not passing the r'\1' properly?
Excepted Output:
x: [" 'plug'", "'plug'+'wallet'", "'wallet'-'phone'"]
y: [" df['plug']", "df['plug']+df['wallet']", "df['wallet']-df['phone']"]

This works:
import re
lst = ["plug", "plug+wallet", "wallet-phone"]
x = [re.sub(r"([^+\-*\/'\d]+)", r"'\1'", l) for l in lst]
y = [re.sub(r"('[^+\-*\/'\d]+')", r"df[\1]", l) for l in x]
print(x)
print(y)
Your first regular expression was wrongly matching on the '' and was then in the replace subject not enclosing it in ''.
Tested under Python 3.8.0.

Related

Remove the part with a character and numbers connected together in a string

How to remove the part with "_" and numbers connected together in a string using Python?
For example,
Input: ['apple_3428','red_458','D30','green']
Excepted output: ['apple','red','D30','green']
Thanks!
This should work:
my_list = ['apple_3428','red_458','D30','green']
new_list = []
for el in my_list:
new_list.append(el.split('_')[0])
new_list will be ['apple', 'red', 'D30', 'green'].
Basically you split every element of my_list (which are supposed to be strings) and then you take the first, i.e. the part before the _. If _ is not present, the string will not be split.
Using regular expressions with re.sub:
import re
[re.sub("_\d+$", "", x) for x in ['apple_3428','red_458','D30','green']]
# ['apple_3428','red_458','D30','green']
This will strip an underscore followed by only digits from the end of a string.
I am not sure which is needed, so present few options
Also list comp is better instead of map + lambda, also list comp is more pythonic, List comprehension vs map
\d+ stand for atleast one digit
\d* stand for >= 0 digit
>>> import re
>>> list(map(lambda x: re.sub('_\d+$', '', x), ['green_', 'green_458aaa']))
['green', 'greenaaa']
>>> list(map(lambda x: re.sub('_\d*', '', x), ['green_', 'green_458aaa']))
['green', 'greenaaa']
>>> list(map(lambda x: re.sub('_\d+', '', x), ['green_', 'green_458aaa']))
['green_', 'greenaaa']
>>> list(map(lambda x: x.split('_', 1)[0], ['green_', 'green_458aaa']))
['green', 'green']
Try this:
output_list = [x.split('_')[0] for x in input_list]
input_list = ['apple_3428','red_458','D30','green']
output_list = []
for i in input_list:
output_list.append(i.split('_', 1)[0])
You can simply split the string.

Not finding a good regex pattern to substitute the strings in a correct order(python)

I have a list of column names that are in string format like below:
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
Now I want to add df[] with " ' " to each column name using regex and I did it which does that when the list has (wallet-phone) this kind of string it gives an output like this df[('wallet']-df['phone')]. How do I get like this (df['wallet']-df['phone']), Is my pattern wrong. Please refer it below:
import re
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
x=[]
y=[]
for l in lst:
x.append(re.sub(r"([^+\-*\/'\d]+)", r"'\1'", l))
for f in x:
y.append(re.sub(r"('[^+\-*\/'\d]+')", r'df[\1]',f))
print(x)
print(y)
gives:
x:["'plug'", "'[plug'+'wallet]'", "'(wallet'-'phone)'"]
y:["df['plug']", "df['[plug']+df['wallet]']", "df['(wallet']-df['phone)']"]
Is the pattern wrong?
Expected output:
x:["'plug'", "['plug'+'wallet']", "('wallet'-'phone')"]
y:["df['plug']", "[df['plug']+df['wallet']]", "(df['wallet']-df['phone'])"]
I also tried ([^+\-*\/()[]'\d]+) this pattern but it isn't avoiding () or []
It might be easier to locate words and enclose them in the dictionary reference:
import re
lst = ["plug", "[plug+wallet]", "(wallet-phone)"]
z = [re.sub(r"(\w+)",r"df['\1']",w) for w in lst]
print(z)
["df['plug']", "[df['plug']+df['wallet']]", "(df['wallet']-df['phone'])"]

How to reverse a sublist in python

Given the following list:
a = ['aux iyr','bac oxr','lmn xpn']
c = []
for i in a:
x = i.split(" ")
b= x[1][::-1] --- Got stuck after this line
Can anyone help me how to join it to the actual list and bring the expected output
output = ['aux ryi','bac rxo','lmn npx']
I believe you need two lines of codes, first splitting the values:
b = [x.split() for x in a]
Which returns:
[['aux', 'iyr'], ['bac', 'oxr'], ['lmn', 'xpn']]
And then reverting the order:
output = [x[0] +' '+ x[1][::-1] for x in b]
Which returns:
['aux ryi', 'bac rxo', 'lmn npx']
You can use the following simple comprehension:
[" ".join((x, y[::-1])) for x, y in map(str.split, a)]
# ['aux ryi', 'bac rxo', 'lmn npx']

ignore empty string in a list or when there is a $none

I would like to totally ignored these lists within a list when there is an empty string or when there is a "$none" string (by the way, why does this "$none" appears and what does it mean?). In my program, I returned the list an empty string when using this:
Code:
aaa = ["mom", "is", "king"]
example = ["buying", "mom", "is", "spending"]
Below code:
for x in aaa:
if xx in example:
if x in xx:
return ""
else:
return xx
I only know how to return an empty string but do not know other way of ignore this part of "if" when triggered
If the above cannot be done, then the below will be my main question.
My code:
a = [['checking-$none', ''],
['', 'checking-some'],
['checking-people', 'checking-might'],
['-checking-too', 'checking-be']]
for x in a:
f = filter(None, x)
for ff in f:
print(ff)
Current output:
checking-$none
checking-some
checking-people
checking-might
-checking-too
checking-be
Expected output:
checking-people
checking-might
-checking-too
checking-be
Is there a way to do so?
You can use list comprehension like this:
[item for lst in a if all(item and '$none' not in item for item in lst) for item in lst]
With your sample input a, this returns:
['checking-people', 'checking-might', '-checking-too', 'checking-be']
Alternatively, if you only want to print, the following nested for loop will do:
for lst in a:
for item in lst:
if not item or '$none' in item:
break
else:
print(*lst, sep='\n')
This outputs:
checking-people
checking-might
-checking-too
checking-be
Min change to you code would be filter out string that contains $none
a = [['checking-$none', ''],
['', 'checking-some'],
['checking-people', 'checking-might'],
['-checking-too', 'checking-be']]
f = filter(lambda y: '' not in y and "checking-$none" not in y, a)
for x in sum(f, []):
print(x)

Make List to String Python

I want to make list data to string.
My list data like this :
[['data1'],['data2'],['data3']]
I want to convert to string like this :
"[data1] [data2] [data3]"
I try to use join like this :
data=[['data1'],['data2'],['data3']]
list=" ".join(data)
But get error like this :
string= " ".join(data)
TypeError: sequence item 0: expected string, list found
Can somebody help me?
Depending on how closely you want the output to conform to your sample, you have a few options, show here in ascending order of complexity:
>>> data=[['data1'],['data2'],['data3']]
>>> str(data)
"[['data1'], ['data2'], ['data3']]"
>>> ' '.join(map(str, data))
"['data1'] ['data2'] ['data3']"
>>> ' '.join(map(str, data)).replace("'", '')
'[data1] [data2] [data3]'
Keep in mind that, if your given sample of data doesn't match your actual data, these methods may or may not produce the desired results.
Have you tried?
data=[['data1'],['data2'],['data3']]
t = map(lambda x : str(x), data)
print(" ".join(t))
Live demo - https://repl.it/BOaS
In Python 3.x , the elements of the iterable for str.join() has to be a string .
The error you are getting - TypeError: sequence item 0: expected string, list found - is because the elements of the list you pass to str.join() is list (as data is a list of lists).
If you only have a single element per sublist, you can simply do -
" ".join(['[{}]'.format(x[0]) for x in data])
Demo -
>>> data=[['data1'],['data2'],['data3']]
>>> " ".join(['[{}]'.format(x[0]) for x in data])
'[data1] [data2] [data3]'
If the sublists can have multiple elements and in your output you want those multiple elements separated by a , . You can use a list comprehension inside str.join() to create a list of strings as you want. Example -
" ".join(['[{}]'.format(','.join(x)) for x in data])
For some other delimiter other than ',' , use that in - '<delimiter>'.join(x) .
Demo -
>>> data=[['data1'],['data2'],['data3']]
>>> " ".join(['[{}]'.format(','.join(x)) for x in data])
'[data1] [data2] [data3]'
For multiple elements in sublist -
>>> data=[['data1','data1.1'],['data2'],['data3','data3.1']]
>>> " ".join(['[{}]'.format(','.join(x)) for x in data])
'[data1,data1.1] [data2] [data3,data3.1]'
>>> import re
>>> l = [['data1'], ['data2'], ['data3']]
>>> s = ""
>>> for i in l:
s+= re.sub(r"\'", "", str(i))
>>> s
'[data1][data2][data3]'
How about this?
data = [['data1'], ['data2'], ['data3']]
result = " ".join('[' + a[0] + ']' for a in data)
print(result)
How about this:
In [13]: a = [['data1'],['data2'],['data3']]
In [14]: import json
In [15]: temp = " ".join([json.dumps(x) for x in a]).replace("\"", "")
In [16]: temp
Out[16]: '[data1] [data2] [data3]'
Try the following. This can also be achieved by "Reduce":
from functools import reduce
data = [['data1'], ['data2'], ['data3']]
print(list(reduce(lambda x,y : x+y, data)))
output: ['data1', 'data2', 'data3']

Categories

Resources