Python iterator question - python

I have this list:
names = ['john','Jonh','james','James','Jardel']
I want loop over the list and handle consecutive names with a case insensitive match in the same iteration. So in the first iteration I would do something with'john' and 'John' and I want the next iteration to start at 'james'.
I can't think of a way to do this using Python's for loop, any suggestions?

This would be one for itertools.groupby, which groups consecutive equal elements from a list or other iterable. you can specify a function to do the comparison, so that, in your case, the same name in different cases can still be counted as the same thing.
for k, g in itertools.groupby(names, lambda s: s.lower()):
# Example: in the first iteration:
# k = "john"
# g = an iterator over ["john", "John"]
# Process them as you like

names = ['john','John','james','James']
for name, capitalized_name in zip(names[::2], names[1::2]):
print name, capitalized_name
Note that you need an even amount of items for this to work properly.
Or (maybe better; hard to tell with little context) use a set to filter the list to contain only unique names (note that this loses order):
>>> names = ['john','John','james','James','Jardel']
>>> unique_names = set([x.lower() for x in names])
>>> for unique_name in unique_names:
... print unique_name
...
jardel
james
john

You could just use a while loop:
i = 0
while i < len(names):
# compare names[i] with names[i + 1]
i = i + 2 # or + 1 if names not equal, for example
Or are you looking for something a bit more involved?

As you iterate thru the loop, you could try keeping track of the previous name in the list. At the same time, when you're going to store the names, you can make a call to lower() or capitalize() to make the formatting of each name consistent so that you can compare them easier.
e.g.
first = True
prev= ""
for name in names:
if first: #First iteration
prev = name.lower() #Need to get the first elem
do_something_to(curr)
first = False
else:
if prev == name.lower():
print "do nothing"
else:
do_something_to(curr)
prev = name.lower()
May not be the most efficient, but works.

My $0.02:
def byPairs(li):
for i in xrange(1, len(li), 2):
yield (li[i-1], li[i])
for a,b in byPairs(names):
if a.lower()==b.lower():
doSomething(a,b)
I'm not sure I understood the question exactly; what are you trying to accomplish?

Related

Which item in list - Python

I am making a console game using python and I am checking if an item is in a list using:
if variable in list:
I want to check which variable in that list it was like list[0] for example. Any help would be appreciated :)
You can do it using the list class attribute index as following:
list.index(variable)
Index gives you an integer that matches the location of the first appearance of the value you are looking for, and it will throw an error if the value is not found.
If you are already checking if the value is in the list, then within the if statement you can get the index by:
if variable in list:
variable_at = list.index(variable)
Example:
foo = ['this','is','not','This','it','is','that','This']
if 'This' in foo:
print(foo.index('This'))
Outputs:
3
Take a look at the answer below, which has more complete information.
Finding the index of an item in a list
We may be inspired from other languages such as Javascript and create a function which returns index if item exists or -1 otherwise.
list_ = [5, 6, 7, 8]
def check_element(alist: list, item: any):
if item in alist:
return alist.index(item)
else:
return -1
and the usage is
check1 = check_element(list_, 5)
check2 = check_element(list_, 9)
and this one is for one line lovers
check_element_one_liner = lambda alist, item: alist.index(item) if item in alist else -1
alternative_check1 = check_element_one_liner(list_, 5)
alternative_check2 = check_element_one_liner(list_, 9)
and a bit shorter version :)
check_shorter = lambda a, i: a.index(i) if i in a else -1
Using a librairy you could use numpy's np.where(list == variable).
In vanilla Python, I can think of something like:
idx = [idx for idx, item in enumerate(list) if item == variable][0]
But this solution is not fool proof, for instance, if theres no matching results, it will crash. You could complete this using an if right before:
if variable in list:
idx = [idx for idx, item in enumerate(list) if item == variable][0]
else:
idx = None
I understand that you want to get a sublist containing only the elements of the original list that match a certain condition (in your example case, you want to extract all the elements that are equal to the first element of the list).
You can do that by using the built-in filter function which allows you to produce a new list containing only the elements that match a specific condition.
Here's an example:
a = [1,1,1,3,4]
variable = a[0]
b = list(filter(lambda x : x == variable, a)) # [1,1,1]
This answer assumes that you only search for one (the first) matching element in the list.
Using the index method of a list should be the way to go. You just have to wrap it in a try-except statement. Here is an alternative version using next.
def get_index(data, search):
return next((index for index, value in enumerate(data) if value == search), None)
my_list = list('ABCDEFGH')
print(get_index(my_list, 'C'))
print(get_index(my_list, 'X'))
The output is
2
None
assuming that you want to check that it exists and get its index, the most efficient way is to use list.index , it returns the first item index found, otherwise it raises an error so it can be used as follows:
items = [1,2,3,4,5]
item_index = None
try:
item_index = items.index(3) # look for 3 in the list
except ValueError:
# do item not found logic
print("item not found") # example
else:
# do item found logic knowing item_index
print(items[item_index]) # example, prints 3
also please avoid naming variables list as it overrides the built-in function list.
If you simply want to check if the number is in the list and print it or print it's index, you could simply try this:
ls = [1,2,3]
num = 2
if num in ls:
# to print the num
print(num)
# to print the index of num
print(ls.index(num))
else:
print('Number not in the list')
animals = ['cat', 'dog', 'rabbit', 'horse']
index = animals.index('dog')
print(index)

Create a list from dictionary values and some conditions in python

I need some help with python and dictionary.
So the basically idea is to create a list that will contain several values on a python dictionary.
I parse each key of the dic, and then if the number of values is > 1, I check wether these values contain a particular prefix, if so I put the values that do not have the prefix into a list.
Here is the dic:
defaultdict(<class 'list'>, {'ACTGA': ['YP_3878.3', 'HUYUII.1'], 'ACTGT': ['XP_46744.1', 'JUUIIL.2'], 'ACCTTGG': ['YP_8990.1'], 'ACCTTTT': ['YP_8992.1'], 'ATGA': ['YP_9000.1', 'YP_3222.1'], 'AAATTTGG': ['ORAAA.8', 'OTTTAX']})
and a here is the prefix_list = ["XP_","YP_"]
Let me explain it better:
I would like actually to create a new sequence_list with value content.
So the basicaly idea is going though each key and if there are > 1 values, I put the n-1 values into the sequence_list depending on some conditions.
Here is an exemple :
The first key is 'ACTGA' where there are 2 values: YP_3878.3 and HUYUII.1, then because HUYUII.1 does not have any prefix into the prefix_list, then I put it into the sequence_list:
print(sequence_list):
["HUYUII.1"]
The second key is 'ACTGT' where there are 3 values: XP_46744.1, JUUIIL.2 and JUUIIL.3, then because JUUIIL.2 and JUUIIL.3 do not have any prefix into the prefix_list, then I put them into the sequence_list:
print(sequence_list):
["HUYUII.1","JUUIIL.2","JUUIIL.3"]
The third key where n value > 1 is 'ATGAAA' where there are 3 values : 'YP_9000.1', 'YP_3222.1' and 'HUU3222.1', then because HUU3222.1 does not have any prefix into the prefix_list, then I put them into the sequence_list, AND because there is 2 values left with both prefix, I put the first one also in the sequence_list :
print(sequence_list):
["HUYUII.1","JUUIIL.2","JUUIIL.3","YP_9000.1","HUU3222.1"]
The fourth key where n value > 1 is 'AAATTTGG' where there are 2 values : 'ORLOP.8' and 'OTTTAX', then because both does not have a prefix into the prefix_list, I put the first one into the sequence_list:
print(sequence_list):
["HUYUII.1","JUUIIL.2","JUUIIL.3","YP_9000.1","HUU3222.1","ORAAA.8"]
So at the end I should get the sequence_list such as:
["HUYUII.1","JUUIIL.2","JUUIIL.3","YP_9000.1","HUU3222.1","ORAAA.8"]
Does someone have an idea? I tried something but it is quite difficult and maybe totally messy:
sequence_list=[]
for value in dedup_records.items():
if(len(value[1]))>1:
try:
length=len(value[1])
liste=value[1]
print("liste1",liste)
r = re.compile("YP_*.|XP_*.")
newlist = list(filter(r.match, liste))
if len(newlist)!=0:
print(newlist)
for i in newlist:
if i in liste:
liste.remove(i)
while len(newlist)>1:
liste.remove(newlist[0])
else:
while len(liste)>1:
liste.pop(0)
print(liste)
except :
continue
for i in liste:
sequence_list.append(i)
You can make your code much cleaner by using a function so that it is easier to read what is happening inside the loop.
Also, just personal preference, I'd suggest using list_ as a variable name instead of liste, As the misspellings can be tough to work with.
The approach is to first split every list into two groups: one with prefix, and one without prefix. After that, We just need to verify that there is at least 1 item with prefix (in which case, append every items except the last one with prefix, and append all non-prefixed items), otherwise we need to leave 1 non-prefixed item, and append all the others.
dedup_records = {'ACTGA': ['YP_3890.3', 'HUYUII.1'], 'ACTGT': ['XP_46744.1', 'JUUIIL.2','JUUIIL.3'], 'ACCTTGG': ['YP_8990.1'], 'ACCTTTT': ['YP_8992.1'], 'ATGAAA': ['YP_9000.1', 'YP_3222.1','HUU3222.1'], 'AAATTTGG': ['ORLOP.8', 'OTTTAX']}
prefix_list = ["XP_","YP_"]
def separate_items_with_prefix(list_, prefix_list):
'''separates a list into two lists based on prefix
returns two lists: one for items with prefix
another for items without prefix
'''
with_prefix = []
without_prefix = []
for item in list_:
if any(item.startswith(prefix) for prefix in prefix_list):
with_prefix.append(item)
else:
without_prefix.append(item)
return with_prefix, without_prefix
sequence_list = []
for val in dedup_records.values():
if len(val) <= 1:
continue #skip items with only upto 1 value in them
with_prefix, without_prefix = separate_items_with_prefix(val, prefix_list)
if with_prefix: #So there is at least 1 item in the list with prefix
sequence_list.extend(with_prefix[:-1])
sequence_list.extend(without_prefix)
else: #there are no items with a prefix in the list
sequence_list.extend(without_prefix[:-1])
Output:
print(sequence_list)
['HUYUII.1', 'JUUIIL.2', 'JUUIIL.3', 'YP_9000.1', 'HUU3222.1', 'ORLOP.8']
If I get youre code right, you want to achieve this:
prefix_list = ["XP_", "YP_"]
sequence_list = []
have_interesting_prefix = lambda v: any(
v.startswith(prefix) for prefix in prefix_list
)
for values in dedup_records.values():
if len(values) > 1:
sequence_list.extend(v for v in values if not have_interesting_prefix(v))
prefixed = filter(have_interesting_prefix, values)
if len(prefixed) > 1:
sequence_list.append(prefixed[0])

If list index exists, do X

In my program, user inputs number n, and then inputs n number of strings, which get stored in a list.
I need to code such that if a certain list index exists, then run a function.
This is made more complicated by the fact that I have nested if statements about len(my_list).
Here's a simplified version of what I have now, which isn't working:
n = input ("Define number of actors: ")
count = 0
nams = []
while count < n:
count = count + 1
print "Define name for actor ", count, ":"
name = raw_input ()
nams.append(name)
if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
if len(nams) > 3:
do_something
if len(nams) > 4
do_something_else
if nams[3]: #etc.
Could it be more useful for you to use the length of the list len(n) to inform your decision rather than checking n[i] for each possible length?
I need to code such that if a certain list index exists, then run a function.
This is the perfect use for a try block:
ar=[1,2,3]
try:
t=ar[5]
except IndexError:
print('sorry, no 5')
# Note: this only is a valid test in this context
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...
However, by definition, all items in a Python list between 0 and len(the_list)-1 exist (i.e., there is no need for a try block if you know 0 <= index < len(the_list)).
You can use enumerate if you want the indexes between 0 and the last element:
names=['barney','fred','dino']
for i, name in enumerate(names):
print(i + ' ' + name)
if i in (3,4):
# do your thing with the index 'i' or value 'name' for each item...
If you are looking for some defined 'index' though, I think you are asking the wrong question. Perhaps you should consider using a mapping container (such as a dict) versus a sequence container (such as a list). You could rewrite your code like this:
def do_something(name):
print('some thing 1 done with ' + name)
def do_something_else(name):
print('something 2 done with ' + name)
def default(name):
print('nothing done with ' + name)
something_to_do={
3: do_something,
4: do_something_else
}
n = input ("Define number of actors: ")
count = 0
names = []
for count in range(n):
print("Define name for actor {}:".format(count+1))
name = raw_input ()
names.append(name)
for name in names:
try:
something_to_do[len(name)](name)
except KeyError:
default(name)
Runs like this:
Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice
You can also use .get method rather than try/except for a shorter version:
>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice
It can be done simply using the following code:
if index < len(my_list):
print(index, 'exists in the list')
else:
print(index, "doesn't exist in the list")
len(nams) should be equal to n in your code. All indexes 0 <= i < n "exist".
Using the length of the list would be the fastest solution to check if an index exists:
def index_exists(ls, i):
return (0 <= i < len(ls)) or (-len(ls) <= i < 0)
This also tests for negative indices, and most sequence types (Like ranges and strs) that have a length.
If you need to access the item at that index afterwards anyways, it is easier to ask forgiveness than permission, and it is also faster and more Pythonic. Use try: except:.
try:
item = ls[i]
# Do something with item
except IndexError:
# Do something without the item
This would be as opposed to:
if index_exists(ls, i):
item = ls[i]
# Do something with item
else:
# Do something without the item
I need to code such that if a certain list index exists, then run a function.
You already know how to test for this and in fact are already performing such tests in your code.
The valid indices for a list of length n are 0 through n-1 inclusive.
Thus, a list has an index i if and only if the length of the list is at least i + 1.
If you want to iterate the inserted actors data:
for i in range(n):
if len(nams[i]) > 3:
do_something
if len(nams[i]) > 4:
do_something_else
ok, so I think it's actually possible (for the sake of argument):
>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
True
>>> 3 in zip(*enumerate(your_list))[0]
False
You can try something like this
list = ["a", "b", "C", "d", "e", "f", "r"]
for i in range(0, len(list), 2):
print list[i]
if len(list) % 2 == 1 and i == len(list)-1:
break
print list[i+1];
Oneliner:
do_X() if len(your_list) > your_index else do_something_else()
Full example:
In [10]: def do_X():
...: print(1)
...:
In [11]: def do_something_else():
...: print(2)
...:
In [12]: your_index = 2
In [13]: your_list = [1,2,3]
In [14]: do_X() if len(your_list) > your_index else do_something_else()
1
Just for info. Imho, try ... except IndexError is better solution.
Here's a simple, if computationally inefficient way that I felt like solving this problem today:
Just create a list of available indices in my_list with:
indices = [index for index, _val in enumerate(my_list)]
Then you can test before each block of code:
if 1 in indices:
"do something"
if 2 in indices:
"do something more"
but anyone reading this should really just take the correct answer from: #user6039980
Do not let any space in front of your brackets.
Example:
n = input ()
^
Tip:
You should add comments over and/or under your code. Not behind your code.
Have a nice day.

How to remove duplicates only if consecutive in a string? [duplicate]

This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
For a string such as '12233322155552', by removing the duplicates, I can get '1235'.
But what I want to keep is '1232152', only removing the consecutive duplicates.
import re
# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')
# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')
You can use itertools, here is the one liner
>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'
Microsoft / Amazon job interview type of question:
This is the pseudocode, the actual code is left as exercise.
for each char in the string do:
if the current char is equal to the next char:
delete next char
else
continue
return string
As a more high level, try (not actually the implementation):
for s in string:
if s == s+1: ## check until the end of the string
delete s+1
Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:
itertools.groupby(iterable[, key])
Make an iterator that returns consecutive keys and groups from
the iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged. Generally, the iterable
needs to already be sorted on the same key function.
So since strings are iterable, what you could do is:
use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together
which can all be done in one clean line..
First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).
M first approach would be:
foo = '12233322155552'
bar = ''
for chr in foo:
if bar == '' or chr != bar[len(bar)-1]:
bar += chr
or, using the itertools hint from above:
''.join([ k[0] for k in groupby(a) ])
+1 for groupby. Off the cuff, something like:
from itertools import groupby
def remove_dupes(arg):
# create generator of distinct characters, ignore grouper objects
unique = (i[0] for i in groupby(arg))
return ''.join(unique)
Cooks for me in Python 2.7.2
number = '12233322155552'
temp_list = []
for item in number:
if len(temp_list) == 0:
temp_list.append(item)
elif len(temp_list) > 0:
if temp_list[-1] != item:
temp_list.append(item)
print(''.join(temp_list))
This would be a way:
def fix(a):
list = []
for element in a:
# fill the list if the list is empty
if len(list) == 0:list.append(element)
# check with the last element of the list
if list[-1] != element: list.append(element)
print(''.join(list))
a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi
t = '12233322155552'
for i in t:
dup = i+i
t = re.sub(dup, i, t)
You can get final output as 1232152

python look-and-say sequence improved

I would like to introduce look-and-say sequence at first. It goes like a = {1, 11, 21, 1211, 111221 ...
The system is it checks the previous digit and counts the numbers.
1 = one 1 (so = 11)
11 = two 1 (so = 21)
21 = one 2 one 1 (so = 1211)
As a rule of the sequence, no number can go beyond 3, so creating a translation table can fit in. But it is not semantic, I don't like it.
What I want is, a script which evaluates the given value and return a look-and-say-alike string.
However, to go beyond out limits, I want it to even evaluate chars, so it can return 1A2b41.
I have been trying to make it work for hours, the logic went bad and I am having a brainfreeze at the moment.
Here is the script that actually doesn't work(returns false results), but it can give you the idea, at least.
def seq(a):
k,last,result,a = 1,'','',str(a)
for i in range(len(a)):
if last==a[i]:k+=1
else:
result = result+str(k)+a[i]
k=1
last = a[i]
return result
You can use groupby, it's just what you want:
from itertools import groupby
def lookandsay(n):
return ''.join( str(len(list(g))) + k for k, g in groupby(n))
>>> lookandsay('1')
'11'
>>> lookandsay('1A2b41')
'111A121b1411'
>>> lookandsay(lookandsay('1A2b41'))
'311A1112111b111421'
groupby returns consecutive keys and groups from an iterable object. The key is a function computed for each element, or an identity function if not specified (as above). The group is an iterator - a new group is generated when the value of the key function changes. So, for instance, according to the documentation:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
I can see two issues with your code:
The result is expanded by k and a[i] although the counter k does not count chars a[i] but chars last. Replace a[i] by last here (you may not want to add anything in the first round).
After the loop you have to add the last value of the counter together with the last character again (this was not yet done), i.e. add another result = result+str(k)+last after the loop.
In total it looks like
def seq(a):
a = str(a)
k,last,result = 1,a[0],''
for i in range(1,len(a)):
if last==a[i]:k+=1
else:
result = result+str(k)+last
k=1
last = a[i]
result = result+str(k)+last
return result
I think part of why you got stumped is your use of meaningless variable names. You described the problem quite well and called it by name, but didn't even use that name for your function.
If you think of the string you start with as "look", and the one you end up with as "say", that is a start. result is probably fine but a and k have confused you. last is, I think, misleading, because it can mean either previous or final.
Also, Python's for is really foreach for a reason -- you're taking each character in the "look" one at a time, so do it explicitly in the loop.
def looksay(look):
look = str(look)
prev, count, say = look[0], 1, ''
for char in look[1:]:
if char == prev:
count += 1
continue
say += str(count) + prev
prev = char
count = 1
return say + str(count) + prev
The spacing is less important, but Python does have a standard coding style, and it does help readability to use it. The less mental time you have to spend parsing your code, the more focus you have for the problem.

Categories

Resources