Related
I am working on a task where I need to sort and remove the duplicate letters in a string. I ended up with the function doing what I wanted but out of shear luck. I don't know why these lines of code produce different outputs. Could someone help me understand?
def format_string(string1):
sorted1 = sorted(string1)
print(sorted1)
i = 0
while i < len(sorted1) - 1:
if sorted1[i] == sorted1[i + 1]:
del sorted1[i + 1]
else:
i += 1
return sorted1
print(format_string("aretheyhere"))
['a', 'e', 'e', 'e', 'e', 'h', 'h', 'r', 'r', 't', 'y']
['a', 'e', 'h', 'r', 't', 'y']
#This did what I wanted. but these seemingly similar lines don't.
def format_string(string1):
sorted1 = sorted(string1)
print(sorted1)
i = 0
j = i + 1
while i < len(sorted1) - 1:
if sorted1[i] == sorted1[j]:
del sorted1[j]
else:
i += 1
return sorted1
print(format_string("aretheyhere"))
['a', 'e', 'e', 'e', 'e', 'h', 'h', 'r', 'r', 't', 'y']
['a', 'y']
def format_string(string1):
sorted1 = sorted(string1)
print(sorted1)
i = 0
while i < len(sorted1) - 1:
if sorted1[i] == sorted1[i + 1]:
del sorted1[i + 1]
i += 1
return sorted1
print(format_string("aretheyhere"))
['a', 'e', 'e', 'e', 'e', 'h', 'h', 'r', 'r', 't', 'y']
['a', 'e', 'e', 'h', 'r', 't', 'y']
What are the crucial differences here that change the output?
The variable j doesn't increment because it's not updated inside your while loop, i.e. changing the value of i after setting the value of j to i+1 does not change the value of j. For example, this function would give the same result as the first one because the value of j is updated inside the while loop:
def format_string(string1):
sorted1 = sorted(string1)
print(sorted1)
i = 0
while i < len(sorted1) - 1:
j = i + 1
if sorted1[i] == sorted1[j]:
del sorted1[j]
else:
i += 1
return sorted1
print(format_string("aretheyhere"))
I have this code:
steps = [['A', 'B', 'C', 'C', 'C'], ['D', 'E', 'F', 'F', 'F']]
for step in steps:
while True:
last_item = ""
for item in step:
if item != last_item:
print(item)
last_item = item
else:
break
The desired result is for the loop to print A, then B, then C, but when hitting the first duplicate C it should move on to printing D, then E, then F, and then stop when hitting the first duplicate F.
This is a minimal reproducible example of a loop to be used in a web scraping job, so solutions that involve doing set(steps)or other operations on the example steps as such will not solve it. My question has to to with the architecture of the loop.
steps = [['A', 'B', 'C', 'C', 'C'], ['D', 'E', 'F', 'F', 'F']]
for step in steps:
last_item = ""
for item in step:
if item != last_item:
print(item)
last_item = item
else:
break
When you keep while true and break is encountered from inner for loop, control will never pass to outer for loop for getting next item
(['D', 'E', 'F', 'F', 'F'])
in outer list, creating infinite loop.
Option with while loop, accessing objects by index:
steps = [['A', 'B', 'C', 'C', 'C'], ['D', 'E', 'F', 'F', 'F']]
i = 0
ii = 0
memo = []
res = []
while True:
if i == len(steps): break
e = steps[i][ii]
if e in memo:
res.append(memo)
memo = []
ii = 0
i += 1
else:
memo.append(e)
print(e)
ii += 1
It prints out:
# A
# B
# C
# D
# E
# F
While res value is:
print(res) #=> [['A', 'B', 'C'], ['D', 'E', 'F']]
You do not need while True. Except for that part your code works as expected:
steps = [['A', 'B', 'C', 'C', 'C'], ['D', 'E', 'F', 'F', 'F']]
for step in steps:
# while True:
last_item = ""
for item in step:
if item != last_item:
print(item)
last_item = item
else:
break
Output:
A
B
C
D
E
F
Remove this while loop from your code. [break] below works in this loop. To achieve your desired output, [break] need to break the for loop above.
steps = [['A', 'B', 'C', 'C', 'C'], ['D', 'E', 'F', 'F', 'F']]
for step in steps:
# while True:
last_item = ""
for item in step:
if item != last_item:
print(item)
last_item = item
else:
break
How can I remove all occurrences of a specific value in a list except for the first occurrence?
E.g. I have a list:
letters = ['a', 'b', 'c', 'c', 'c', 'd', 'c', 'a', 'a', 'c']
And I need a function that looks something like this:
preserve_first(letters, 'c')
And returns this:
['a', 'b', 'c', 'd', 'a', 'a']
Removing all but the first occurrence of the given value while otherwise preserving the order. If there is a way to do this with a pandas.Series that would be even better.
You want to remove duplicates of 'c' only. So you want to filter where the series is either not duplicated at all or it isn't equal to 'c'. I like to use pd.Series.ne in place of pd.Series != because the reduction in wrapping parenthesis adds to readability (my opinion).
s = pd.Series(letters)
s[s.ne('c') | ~s.duplicated()]
0 a
1 b
2 c
5 d
7 a
8 a
dtype: object
To do exactly what was asked for.
def preserve_first(letters, letter):
s = pd.Series(letters)
return s[s.ne(letter) | ~s.duplicated()].tolist()
preserve_first(letters, 'c')
['a', 'b', 'c', 'd', 'a', 'a']
A general Python solution:
def keep_first(iterable, value):
it = iter(iterable)
for val in it:
yield val
if val == value:
yield from (el for el in it if el != value)
This yields all items up to and including the first value if found, then yields the rest of the iterable filtering out items matching the value.
You can try this using generators:
def conserve_first(l, s):
last_seen = False
for i in l:
if i == s and not last_seen:
last_seen = True
yield i
elif i != s:
yield i
letters = ['a', 'b', 'c', 'c', 'c', 'd', 'c', 'a', 'a', 'c']
print(list(conserve_first(letters, "c")))
Output:
['a', 'b', 'c', 'd', 'a', 'a']
Late to the party, but
letters = ['a', 'b', 'c', 'c', 'c', 'd', 'c', 'a', 'a', 'c']
def preserve_first(data, letter):
new = []
count = 0
for i in data:
if i not in new:
if i == letter and count == 0:
new.append(i)
count+=1
elif i == letter and count == 1:
continue
else:
new.append(i)
else:
if i == letter and count == 1:
continue
else:
new.append(i)
l = preserve_first(letters, "c")
You can use a list filter and slices:
def preserve_first(letters, elem):
if elem in letters:
index = letters.index(elem)
return letters[:index + 1] + filter(lambda a: a != 'c', letters[index + 1:])
Doesn't use pandas but this is a simple algorithm to do the job.
def preserve_firsts(letters, target):
firsts = []
seen = False
for letter in letters:
if letter == target:
if not seen:
firsts.append(letter)
seen = True
else:
firsts.append(letter)
return firsts
> letters = ['a', 'b', 'c', 'c', 'c', 'd', 'c', 'a', 'a']
> preserve_firsts(letters, 'c')
['a', 'b', 'c', 'd', 'a', 'a']
Simplest solution I could come up with.
letters = ['a', 'b', 'c', 'c', 'c', 'd', 'c', 'a', 'a', 'c']
key = 'c'
def preserve_first(letters, key):
first_occurrence = letters.index(key)
return [item for i, item in enumerate(letters) if i == first_occurrence or item != key]
I have a list of tokens that I want to use for accessing an API. I'd like to always be able to select the next token in the list for use, and when the end of the list is reached, start over.
I have this now, which works, but I find it to be pretty messy and unreadable.
class tokenz:
def __init__(self):
self.tokens = ['a', 'b', 'c', 'd', 'e']
self.num_tokens = len(tokens)
self.last_token_used = 0
def select_token(self):
if self.last_token_used == 0:
self.last_token_used += 1
return self.tokens[0]
elif self.last_token_used < (self.num_tokens - 1):
self.last_token_used += 1
return self.tokens[self.last_token_used - 1]
elif self.last_token_used == (self.num_tokens -1):
self.last_token_used = 0
return self.tokens[self.num_tokens - 1]
Any thoughts on making this more pythonic?
Use itertools.cycle() to get a generator that repeats a list of items infinitely.
In [13]: tokens = ['a', 'b', 'c', 'd', 'e']
In [14]: import itertools
In [15]: infinite_tokens = itertools.cycle(tokens)
In [16]: [next(infinite_tokens) for _ in range(13)]
Out[16]: ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e', 'a', 'b', 'c']
If you really want to make your posted code simpler, use modular arithmetic.
self.last_token_used = (self.last_token_used + 1) % len(self.tokens)
Also, you can use negative indexes in Python lists, so your if statements are unnecessary:
In [26]: for n in range(len(tokens)):
...: print('{}: tokens[{}] = {}'.format(n, n-1, tokens[n-1]))
...:
0: tokens[-1] = e
1: tokens[0] = a
2: tokens[1] = b
3: tokens[2] = c
4: tokens[3] = d
And then your code becomes:
class tokenz:
def __init__(self):
self.tokens = ['a', 'b', 'c', 'd', 'e']
self.num_tokens = len(self.tokens)
self.last_token_used = 0
def select_token(self):
self.last_token_used = (self.last_token_used + 1) % self.num_tokens
return self.tokens[self.last_token_used - 1]
Given a very large (gigabytes) list of arbitrary objects (I've seen a similar solution to this for ints), can I either group it easily into sublists by equivalence? Either in-place or by generator which consumes the original list.
l0 = [A,B, A,B,B, A,B,B,B,B, A, A, A,B] #spaces for clarity
Desired result:
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
I wrote a looping version like so:
#find boundaries
b0 = []
prev = A
group = A
for idx, elem in enumerate(l0):
if elem == group:
b0.append(idx)
prev = elem
b0.append(len(l0)-1)
for idx, b in enumerate(b0):
try:
c = b0[idx+1]
except:
break
if c == len(l0)-1:
l1.append(l0[b:])
else:
l1.append(l0[b:c])
Can this be done as a generator gen0(l) that will work like:
for g in gen(l0):
print g
....
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
....
etc?
EDIT: using python 2.6 or 2.7
EDIT: preferred solution, mostly based on the accepted answer:
def gen_group(f, items):
out = [items[0]]
while items:
for elem in items[1:]:
if f(elem, out[0]):
break
else:
out.append(elem)
for _i in out:
items.pop(0)
yield out
if items:
out = [items[0]]
g = gen_group(lambda x, y: x == y, l0)
for out in g:
print out
Maybe something like this:
def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
Used like:
>>> items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
>>> g = subListGenerator(lambda x: x == 'A',items)
>>> for x in g: print(x)
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
I assume that A is your breakpoint.
>>> A, B = 'A', 'B'
>>> x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
>>> map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))
[['A', 'B'], ['A', 'B', 'B'], ['A', 'B', 'B', 'B', 'B'], ['A'], ['A'], ['A', 'B']]
Here's a simple generator to perform your task:
def gen_group(L):
DELIMETER = "A"
out = [DELIMETER]
while L:
for ind, elem in enumerate(L[1:]):
if elem == DELIMETER :
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = [DELIMETER ]
The idea is to cut down the list and yield the sublists until there is nothing left. This assumes the list starts with "A" (DELIMETER variable).
Sample output:
for out in gen_group(l0):
print out
Produces
['A', 'B']
['A', 'B', 'B']
['A', 'B', 'B', 'B', 'B']
['A']
['A']
['A', 'B']
['A']
Comparitive Timings:
timeit.timeit(s, number=100000) is used to test each of the current answers, where s is the multiline string of the code (listed below):
Trial 1 Trial 2 Trial 3 Trial 4 | Avg
This answer (s1): 0.08247 0.07968 0.08635 0.07133 0.07995
Dilara Ismailova (s2): 0.77282 0.72337 0.73829 0.70574 0.73506
John Coleman (s3): 0.08119 0.09625 0.08405 0.08419 0.08642
This answer is the fastest, but it is very close. I suspect the difference is the additional argument and anonymous function in John Coleman's answer.
s1="""l0 = ["A","B", "A","B","B", "A","B","B","B","B", "A", "A", "A","B"]
def gen_group(L):
out = ["A"]
while L:
for ind, elem in enumerate(L[1:]):
if elem == "A":
break
else:
out.append(elem)
for i in range(ind + 1):
L.pop(0)
yield out
out = ["A"]
out =gen_group(l0)"""
s2 = """A, B = 'A', 'B'
x = [A,B, A,B,B, A,B,B,B,B, A, A, A,B]
map(lambda arr: [i for i in arr[0]], map(lambda e: ['A'+e], ''.join(x).split('A')[1:]))"""
s3 = """def subListGenerator(f,items):
i = 0
n = len(items)
while i < n:
sublist = [items[i]]
i += 1
while i < n and not f(items[i]):
sublist.append(items[i])
i += 1
yield sublist
items = ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'A', 'A', 'B']
g = subListGenerator(lambda x: x == 'A',items)"""
The following works in this case. You could change the l[0] != 'A' condition to be whatever. I would probably pass it as an argument, so that you can reuse it somewhere else.
def gen(l_arg, boundary):
l = l_arg.copy() # Optional if you want to save memory
while l:
sub_list = [l.pop(0)]
while l and l[0] != boundary: # Here boundary = 'A'
sub_list.append(l.pop(0))
yield sub_list
It assumes that there is an 'A' at the beginning of your list. And it copies the list, which isn't great when the list is in the range of Gb. you could remove the copy to save memory if you don't care about keeping the original list.