I have a very messy data, I am trying to remove elements that contains alphabets or words. I am trying to capture the elements that have alphanumerical and numerical values. I tried .isalpha() but it not working. How do I remove this?
lista = ['A8817-2938-228','12421','12323-12928-A','12323-12928',
'-','A','YDDEWE','hello','world','testing_purpose','testing purpose',
'A8232-2938-228','N7261-8271']
lista
Tried:
[i.isalnum() for i in lista] # gives boolean, but opposite of what I need.
Output:
['A8817-2938-228','12421','12323-12928-A','12323-12928','-','A8232-2938-228','N7261-8271']
Thanks!
You can add conditional checks in list comprehensions, so this is what you want:
new_list = [i for i in lista if not i.isalnum()]
print(new_list)
Output:
['A8817-2938-228', '12323-12928-A', '12323-12928', '-', 'testing_purpose', 'testing purpose', 'A8232-2938-228', 'N7261-8271']
Note that isalnum won't say True if the string contains spaces or underscores. One option is to remove them before checking: (You also need to use isalpha instead of isalnum)
new_list_2 = [i for i in lista if not i.replace(" ", "").replace("_", "").isalpha()]
print(new_list_2)
Output:
['A8817-2938-228', '12421', '12323-12928-A', '12323-12928', '-', 'A8232-2938-228', 'N7261-8271']
It seems you can just test at least one character is a digit or equality with '-':
res = [i for i in lista if any(ch.isdigit() for ch in i) or i == '-']
print(res)
['A8817-2938-228', '12421', '12323-12928-A', '12323-12928',
'-', 'A8232-2938-228', 'N7261-8271']
What type your data in the list?
You can try to do this:
[str(i).isalnum() for i in lista]
Related
For a given string s='ab12dc3e6' I want to add 'ab' and '12' in two different lists. that means for output i am trying to achieve as temp1=['ab','dc','e'] and for temp2=['12,'3','6'].
I am not able to do so with the following code. Can someone provide an efficient way to do it?
S = "ab12dc3e6"
temp=list(S)
x=''
temp1=[]
temp2=[]
for i in range(len(temp)):
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
x+=temp[i]
i+=1
temp1.append(x)
if not temp[i].isdigit():
break
You can also solve this without any imports:
S = "ab12dc3e6"
def get_adjacent_by_func(content, func):
"""Returns a list of elements from content that fullfull func(...)"""
result = [[]]
for c in content:
if func(c):
# add to last inner list
result[-1].append(c)
elif result[-1]: # last inner list is filled
# add new inner list
result.append([])
# return only non empty inner lists
return [''.join(r) for r in result if r]
print(get_adjacent_by_func(S, str.isalpha))
print(get_adjacent_by_func(S, str.isdigit))
Output:
['ab', 'dc', 'e']
['12', '3', '6']
you can use regex, where you group letters and digits, then append them to lists
import re
S = "ab12dc3e6"
pattern = re.compile(r"([a-zA-Z]*)(\d*)")
temp1 = []
temp2 = []
for match in pattern.finditer(S):
# extract words
#dont append empty match
if match.group(1):
temp1.append(match.group(1))
print(match.group(1))
# extract numbers
#dont append empty match
if match.group(2):
temp2.append(match.group(2))
print(match.group(2))
print(temp1)
print(temp2)
Your code does nothing for isalpha - you also run into IndexError on
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
for i == len(temp)-1.
You can use itertools.takewhile and the correct string methods of str.isdigit and str.isalpha to filter your string down:
S = "ab12dc3e6"
r = {"digit":[], "letter":[]}
from itertools import takewhile, cycle
# switch between the two test methods
c = cycle([str.isalpha, str.isdigit])
r = {}
i = 0
while S:
what = next(c) # get next method to use
k = ''.join(takewhile(what, S))
S = S[len(k):]
r.setdefault(what.__name__, []).append(k)
print(r)
Output:
{'isalpha': ['ab', 'dc', 'e'],
'isdigit': ['12', '3', '6']}
This essentially creates a dictionary where each seperate list is stored under the functions name:
To get the lists, use r["isalpha"] or r["isdigit"].
I have a list like this:
my_list=["'-\\n'",
"'81\\n'",
"'-\\n'",
"'0913\\n'",
"'Assistant nursing\\n'",
"'0533\\n'",
"'0895 Astronomy\\n'",
"'0533\\n'",
"'Astrophysics\\n'",
"'0532\\n'"]
Is there any way to delete every thing from this list except words?
out put:
my_list=['Assistant nursing',
'Astronomy',
'Astrophysics',]
I know for example if i wanna remove integers in string form i can do this:
no_integers = [x for x in my_list if not (x.isdigit()
or x[0] == '-' and x[1:].isdigit())]
but it dosn't work well enough
The non-regex solution:
You can start by striping off the characters '-\\n, then take only the characters that are alphabets using str.isalpha or a white space, then filter out the sub-strings that are empty ''. You may need to strip off the white space characters in the end, whic
>>> list(filter(lambda x: x!='', (''.join(j for j in i.strip('\'-\\\\n') if j.isalpha() or j==' ').strip() for i in my_list)))
['Assistant nursing', 'Astronomy', 'Astrophysics']
If you want to use regex, you can use the pattern: '([A-Za-z].*?)\\\\n' with re.findall, then filter out the elements that are empty list, finally you can flatten the list
>>> import re
>>> list(filter(lambda x: x, [re.findall('([A-Za-z].*?)\\\\n', i) for i in my_list]))
[['Assistant nursing'], ['Astronomy'], ['Astrophysics']]
with regular expresssions
import re
my_list = # above
# remove \n, -, digits, ' symbols
my_new_list = [re.sub(r"[\d\\n\-']", '', s) for s in my_list]
# remove empty strings
my_new_list = [s for s in my_new_list if s != '']
print(my_new_list)
Output
['Assistat ursig', ' Astroomy', 'Astrophysics']
Input a given string and check if any word in that string matches with its reverse in the same string then print that word else print $
I split the string and put the words in a list and then I reversed the words in that list. After that, I couldn't able to compare both the lists.
str = input()
x = str.split()
for i in x: # printing i shows the words in the list
str1 = i[::-1] # printing str1 shows the reverse of words in a new list
# now how to check if any word of the new list matches to any word of the old list
if(i==str):
print(i)
break
else:
print('$)
Input: suman is a si boy.
Output: is ( since reverse of 'is' is present in the same string)
You almost have it, just need to add another loop to compare each word against each inverted word. Try using the following
str = input()
x = str.split()
for i in x:
str1 = i[::-1]
for j in x: # <-- this is the new nested loop you are missing
if j == str1: # compare each inverted word against each regular word
if len(str1) > 1: # Potential condition if you would like to not include single letter words
print(i)
Update
To only print the first occurrence of a match, you could, in the second loop, only check the elements that come after. We can do this by keeping track of the index:
str = input()
x = str.split()
for index, i in enumerate(x):
str1 = i[::-1]
for j in x[index+1:]: # <-- only consider words that are ahead
if j == str1:
if len(str1) > 1:
print(i)
Note that I used index+1 in order to not consider single word palindromes a match.
a = 'suman is a si boy'
# Construct the list of words
words = a.split(' ')
# Construct the list of reversed words
reversed_words = [word[::-1] for word in words]
# Get an intersection of these lists converted to sets
print(set(words) & set(reversed_words))
will print:
{'si', 'is', 'a'}
Another way to do this is just in a list comprehension:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split()]
print(output)
The split on string creates a list split on spaces. Then the word is included only if the reverse is in the string.
Output is:
['is', 'a', 'si']
One note, you have a variable name str. Best not to do that as str is a Python thing and could cause other issues in your code later on.
If you want word more than one letter long then you can do:
string = 'suman is a si boy'
output = [x for x in string.split() if x[::-1] in string.split() and len(x) > 1]
print(output)
this gives:
['is', 'si']
Final Answer...
And for the final thought, in order to get just the 'is':
string = 'suman is a si boy'
seen = []
output = [x for x in string.split() if x[::-1] not in seen and not seen.append(x) and x[::-1] in string.split() and len(x) > 1]
print(output)
output is:
['is']
BUT, this is not necessarily a good way to do it, I don't believe. Basically you are storing information in seen during the list comprehension AND referencing that same list. :)
This answer wouldn't show you 'a' and won't output 'is' with 'si'.
str = input() #get input string
x = str.split() #returns list of words
y = [] #list of words
while len(x) > 0 :
a = x.pop(0) #removes first item from list and returns it, then assigns it to a
if a[::-1] in x: #checks if the reversed word is in the list of words
#the list doesn't contain that word anymore so 'a' that doesn't show twice wouldn't be returned
#and 'is' that is present with 'si' will be evaluated once
y.append(a)
print(y) # ['is']
The list ['a','a #2','a(Old)'] should become {'a'} because '#' and '(Old)' are to be excised and a list of duplicates isn't needed. I struggled to develop a list comprehension with a generator and settled on this since I knew it'd work and valued time more than looking good:
l = []
groups = ['a','a #2','a(Old)']
for i in groups:
if ('#') in i: l.append(i[:i.index('#')].strip())
elif ('(Old)') in i: l.append(i[:i.index('(Old)')].strip())
else: l.append(i)
groups = set(l)
What's the slick way to get this result?
Here is general solution, if you want to clean elements of list lst from parts in wastes:
lst = ['a','a #2','a(Old)']
wastes = ['#', '(Old)']
cleaned_set = {
min([element.split(waste)[0].strip() for waste in wastes])
for element in arr
}
You could write this whole expression in a single set comprehension
>>> groups = ['a','a #2','a(Old)']
>>> {i.split('#')[0].split('(Old)')[0].strip() for i in groups}
{'a'}
This will get everything preceding a # and everything preceding '(Old)', then trim off whitespace. The remainder is placed into a set, which only keeps unique values.
You could define a helper function to apply all of the splits and then use a set comprehension.
For example:
lst = ['a','a #2','a(Old)', 'b', 'b #', 'b(New)']
splits = {'#', '(Old)', '(New)'}
def split_all(a):
for s in splits:
a = a.split(s)[0]
return a.strip()
groups = {split_all(a) for a in lst}
#{'a', 'b'}
I have two lists containing strings with different lengths.
Now I want to check if a string in one list is the substring of the other list
to create a newlist with the same length as the string_list.
string_list = ['expensive phone', 'big house', 'shiny key', 'wooden door']
substring_list = ['phone','door']
What I have done so far
newlist=[]
for i in string_list:
for j in substring_list:
if i in j:
newlist.append(j)
print newlist
So it gives me
newlist = ['phone', 'door']
But what I am trying to achieve is a list as following
newlist = ['phone', '-', '-', 'door']
for loops can take an else block. You can use this else block to append the '-' in the case where the string is not found:
newlist=[]
for i in string_list:
for j in substring_list:
if j in i:
newlist.append(j)
break
else:
newlist.append('-')
print(newlist)
# ['phone', '-', '-', 'door']
If you want the result to be same length as the first list, you need to put a break in the if so that if the two items are contained in one of the strings (e.g. 'expensive phone door'), you won't make two appends which will skew the resulting list length.
The break also ensures the else block of the for is not executed when an item has been found.