how remove every thing from a list except words? - python

I have a list like this:
my_list=["'-\\n'",
"'81\\n'",
"'-\\n'",
"'0913\\n'",
"'Assistant nursing\\n'",
"'0533\\n'",
"'0895 Astronomy\\n'",
"'0533\\n'",
"'Astrophysics\\n'",
"'0532\\n'"]
Is there any way to delete every thing from this list except words?
out put:
my_list=['Assistant nursing',
'Astronomy',
'Astrophysics',]
I know for example if i wanna remove integers in string form i can do this:
no_integers = [x for x in my_list if not (x.isdigit()
or x[0] == '-' and x[1:].isdigit())]
but it dosn't work well enough

The non-regex solution:
You can start by striping off the characters '-\\n, then take only the characters that are alphabets using str.isalpha or a white space, then filter out the sub-strings that are empty ''. You may need to strip off the white space characters in the end, whic
>>> list(filter(lambda x: x!='', (''.join(j for j in i.strip('\'-\\\\n') if j.isalpha() or j==' ').strip() for i in my_list)))
['Assistant nursing', 'Astronomy', 'Astrophysics']
If you want to use regex, you can use the pattern: '([A-Za-z].*?)\\\\n' with re.findall, then filter out the elements that are empty list, finally you can flatten the list
>>> import re
>>> list(filter(lambda x: x, [re.findall('([A-Za-z].*?)\\\\n', i) for i in my_list]))
[['Assistant nursing'], ['Astronomy'], ['Astrophysics']]

with regular expresssions
import re
my_list = # above
# remove \n, -, digits, ' symbols
my_new_list = [re.sub(r"[\d\\n\-']", '', s) for s in my_list]
# remove empty strings
my_new_list = [s for s in my_new_list if s != '']
print(my_new_list)
Output
['Assistat ursig', ' Astroomy', 'Astrophysics']

Related

How can i keep the dot in a string while removing letters from the alphabet

I have a string: lst = 'sbs1.23444nroen'
im using lst2 = ''.join(filter(str.isdigit, lst)) to remove all the letters so the result is: lst2 = '123444'
is there any way to include the "." so that the result would be '1.23444' without the letters but keeping the dot?
A more friendly to the eye solution and extendable if you want to include more characters.
s = 'sbs1.23444nroen'
toKeep = set('0123456789.')
s = ''.join(ch for ch in s if ch in toKeep)
print(s)
lst2 = ''.join(filter(lambda x: str.isdigit(x) or x=='.', lst))
An alternative solution would be to use a regular expression, although the set solution is the best so far.
>>> re.findall(r"\d+\.\d*", lst)
['1.23444']
With the added benefit of grabbing other groups of numbers as well:
>>> re.findall(r"\d+\.?\d*", "sbs1.23444nroe631n")
['1.23444', '631']

Remove the part with a character and numbers connected together in a string

How to remove the part with "_" and numbers connected together in a string using Python?
For example,
Input: ['apple_3428','red_458','D30','green']
Excepted output: ['apple','red','D30','green']
Thanks!
This should work:
my_list = ['apple_3428','red_458','D30','green']
new_list = []
for el in my_list:
new_list.append(el.split('_')[0])
new_list will be ['apple', 'red', 'D30', 'green'].
Basically you split every element of my_list (which are supposed to be strings) and then you take the first, i.e. the part before the _. If _ is not present, the string will not be split.
Using regular expressions with re.sub:
import re
[re.sub("_\d+$", "", x) for x in ['apple_3428','red_458','D30','green']]
# ['apple_3428','red_458','D30','green']
This will strip an underscore followed by only digits from the end of a string.
I am not sure which is needed, so present few options
Also list comp is better instead of map + lambda, also list comp is more pythonic, List comprehension vs map
\d+ stand for atleast one digit
\d* stand for >= 0 digit
>>> import re
>>> list(map(lambda x: re.sub('_\d+$', '', x), ['green_', 'green_458aaa']))
['green', 'greenaaa']
>>> list(map(lambda x: re.sub('_\d*', '', x), ['green_', 'green_458aaa']))
['green', 'greenaaa']
>>> list(map(lambda x: re.sub('_\d+', '', x), ['green_', 'green_458aaa']))
['green_', 'greenaaa']
>>> list(map(lambda x: x.split('_', 1)[0], ['green_', 'green_458aaa']))
['green', 'green']
Try this:
output_list = [x.split('_')[0] for x in input_list]
input_list = ['apple_3428','red_458','D30','green']
output_list = []
for i in input_list:
output_list.append(i.split('_', 1)[0])
You can simply split the string.

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

Delete Extra Space from Values In List

I have a list of numbers in Python all with a extra space in the front. How do I remove the extra space (so that only the numbers are left)? A short example is below (note the extra space):
List = [' 5432', ' 23421', ' 43242', .......]
For your case, with a list, you can use str.strip()
l = [x.strip() for x in List]
This will strip both trailing and leading spaces. If you only need to remove leading spaces, go with Alex' solution.
Use str.lstrip here, as the white-space is only at the front:
List = [s.lstrip() for s in List]
# ['5432', '23421', '43242', ...]
Or in this case, seeing as you know how many spaces there are you can just do:
List = [s[1:] for s in List]
map(str.strip, List)
or
map(lambda l: l.strip(), List)

python regular expression, pulling all letters out

Is there a better way to pull A and F from this: A13:F20
a="A13:F20"
import re
pattern = re.compile(r'\D+\d+\D+')
matches = re.search(pattern, a)
num = matches.group(0)
print num[0]
print num[len(num)-1]
output
A
F
note: the digits are of unknown length
You don't have to use regular expressions, or re at all. Assuming you want just letters to remain, you could do something like this:
a = "A13:F20"
a = filter(lambda x: x.isalpha(), a)
I'd do it like this:
>>> re.findall(r'[a-z]', a, re.IGNORECASE)
['A', 'F']
Use a simple list comprehension, as a filter and get only the alphabets from the actual string.
print [char for char in input_string if char.isalpha()]
# ['A', 'F']
You could use re.sub:
>>> a="A13.F20"
>>> re.sub(r'[^A-Z]', '', a) # Remove everything apart from A-Z
'AF'
>>> re.sub(r'[A-Z]', '', a) # Remove A-Z
'13.20'
>>>
If you're working with strings that all have the same format, you can just cut out substrings:
a="A13:F20"
print a[0], a[4]
More on python slicing in this answer:
Is there a way to substring a string in Python?

Categories

Resources