How to get string list index? - python

I have a list text_lines = ['asdf','kibje','ABC','beea'] and I need to find an index where string ABCappears.
ABC = [s for s in text_lines if "ABC" in s]
ABC is now "ABC".
How to get index?

Greedy (raises exception if not found):
index = next(i for i, s in enumerate(text_lines) if "ABC" in s)
Or, collect all of them:
indices = [i for i, s in enumerate(text_lines) if "ABC" in s]

text_lines = ['asdf','kibje','ABC','beea']
abc_index = text_lines.index('ABC')
if 'ABC' appears only once. the above code works, because index gives the index of first occurrence.
for multiple occurrences you can check wim's answer

Simple python list function.
index = text_lines.index("ABC")
If the string is more complicated, you may need to combine with regex, but for a perfect match this simple solution is best.

Did you mean the index of "ABC" ?
If there is just one "ABC", you can use built-in index() method of list:
text_lines.index("ABC")
Else, if there are more than one "ABC"s, you can use enumerate over the list:
indices = [idx for idx,val in enumerate(text_lines) if val == "ABC"]

Related

check if a nested list contains a substring

How to check if a nested list contains a substring?
strings = [[],["one", "two", "three"]]
substring = "wo"
strings_with_substring = [string for string in strings if substring in string]
print(strings_with_substring)
this script just prints :
[]
how to fix it? output should be:
two
==
Sayse, solution you provided doesn't work for me. I am new to python. I am sure I am missing something here. any thoughts?
import re
s = [[],["one", "two", "three"]]
substring = "wo"
# strings_with_substring = [string for string in strings if substring in string]
strings_with_substring = next(s for sl in strings for s in sl if substring in s)
print(strings_with_substring)
You are missing another level of iteration. Here is the looping logic without using a comprehension:
for sublist in strings:
for item in sublist:
if substring in item:
print(item)
Roll that up to a comprehension:
[item for sublist in strings for item in sublist if substring in item]
You're looking for
next(s for sl in strings for s in sl if substring in s)
This outputs "two", if you want a list of all elements then change the next for your list comprehension with given ammendments, or likewise, change next to any if you just want a boolean result
Since you said it should just print the string ~ You could use itertools to flatten your list and run it through a filter that you loop over.
from itertools import chain
strings = [[], ['one', 'two', 'three']]
substring = 'wo'
for found in filter(lambda s: substring in s, chain.from_iterable(strings)):
print(found)

list comprehension with substring replacement not working as intended

I have a list as follows:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
I want to replace all occurrences of '_comb' with '_eeee' .
But not 'xx_combined'. Only if the word ends with '_comb', then the replacement should happen.
I tried
[sub.replace('_comb', '_eeee') for sub in alist if '_combined' not in sub)]
But this does not work.
Only if the word ends with _comb, then the replacement should occurs.
This is job for .endswith not in (is substring), also you should use ternary if rather than comprehension if which do filtering. That is:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
result = [i.replace('_comb', '_eeee') if i.endswith('_comb') else i for i in alist]
print(result) # ['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']
The way your condition is written means that any value with _combined in it is not in your output list. Instead, you need to make the replace conditional on _combined not being in the value:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([sub.replace('_comb', '_eeee') if '_combined' not in sub else sub for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']
Based on the wording of your question though, you might be better off using re.sub to replace _comb at the end of the string with _eeee:
import re
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([re.sub(r'_comb$', '_eeee', sub) for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']

Extracting the first word from every value in a list

So I have a long list of column headers. All are strings, some are several words long. I've yet to find a way to write a function that extracts the first word from each value in the list and returns a list of just those singular words.
For example, this is what my list looks like:
['Customer ID', 'Email','Topwater -https:', 'Plastics - some uml']
And I want it to look like:
['Customer', 'Email', 'Topwater', 'Plastics']
I currently have this:
def first_word(cur_list):
my_list = []
for word in cur_list:
my_list.append(word.split(' ')[:1])
and it returns None when I run it on a list.
You can use list comprehension to return a list of the first index after splitting the strings by spaces.
my_list = [x.split()[0] for x in your_list]
To address "and it returns None when I run it on a list."
You didn't return my_list. Because it created a new list, didn't change the original list cur_list, the my_list is not returned.
To extract the first word from every value in a list
From #dfundako, you can simplify it to
my_list = [x.split()[0] for x in cur_list]
The final code would be
def first_word(cur_list):
my_list = [x.split()[0] for x in cur_list]
return my_list
Here is a demo. Please note that some punctuation may be left behind especially if it is right after the last letter of the name:
names = ["OMG FOO BAR", "A B C", "Python Strings", "Plastics: some uml"]
first_word(names) would be ['OMG', 'A', 'Python', 'Plastics:']
>>> l = ['Customer ID', 'Email','Topwater -https://karls.azureedge.net/media/catalog/product/cache/1/image/627x470/9df78eab33525d08d6e5fb8d27136e95/f/g/fgh55t502_web.jpg', 'Plastics - https://www.bass.co.za/1473-thickbox_default/berkley-powerbait-10-power-worm-black-blue-fleck.jpg']
>>> list(next(zip(*map(str.split, l))))
['Customer', 'Email', 'Topwater', 'Plastics']
[column.split(' ')[0] for column in my_list] should do the trick.
and if you want it in a function:
def first_word(my_list):
return [column.split(' ')[0] for column in my_list]
(?<=\d\d\d)\d* try using this in a loop to extract the words using regex

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

Removing character in list of strings

If I have a list of strings such as:
[("aaaa8"),("bb8"),("ccc8"),("dddddd8")...]
What should I do in order to get rid of all the 8s in each string? I tried using strip or replace in a for loop but it doesn't work like it would in a normal string (that not in a list). Does anyone have a suggestion?
Try this:
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
print([s.strip('8') for s in lst]) # remove the 8 from the string borders
print([s.replace('8', '') for s in lst]) # remove all the 8s
Beside using loop and for comprehension, you could also use map
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
mylst = map(lambda each:each.strip("8"), lst)
print mylst
A faster way is to join the list, replace 8 and split the new string:
mylist = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
mylist = ' '.join(mylist).replace('8','').split()
print mylist
mylist = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")]
print mylist
j=0
for i in mylist:
mylist[j]=i.rstrip("8")
j+=1
print mylist
Here's a short one-liner using regular expressions:
print [re.compile(r"8").sub("", m) for m in mylist]
If we separate the regex operations and improve the namings:
pattern = re.compile(r"8") # Create the regular expression to match
res = [pattern.sub("", match) for match in mylist] # Remove match on each element
print res
lst = [("aaaa8"),("bb8"),("ccc8"),("dddddd8")...]
msg = filter(lambda x : x != "8", lst)
print msg
EDIT:
For anyone who came across this post, just for understanding the above removes any elements from the list which are equal to 8.
Supposing we use the above example the first element ("aaaaa8") would not be equal to 8 and so it would be dropped.
To make this (kinda work?) with how the intent of the question was we could perform something similar to this
msg = filter(lambda x: x != "8", map(lambda y: list(y), lst))
I am not in an interpreter at the moment so of course mileage may vary, we may have to index so we do list(y[0]) would be the only modification to the above for this explanation purposes.
What this does is split each element of list up into an array of characters so ("aaaa8") would become ["a", "a", "a", "a", "8"].
This would result in a data type that looks like this
msg = [["a", "a", "a", "a"], ["b", "b"]...]
So finally to wrap that up we would have to map it to bring them all back into the same type roughly
msg = list(map(lambda q: ''.join(q), filter(lambda x: x != "8", map(lambda y: list(y[0]), lst))))
I would absolutely not recommend it, but if you were really wanting to play with map and filter, that would be how I think you could do it with a single line.

Categories

Resources