Comparing adjacent elements together without using zip method - python

How would you go about comparing two adjacent elements in a list in python? How would save or store the value of that item while going through a for loop? I'm trying not to use the zip method and just using an ordinary for loop.
comparing_two_elements = ['Hi','Hello','Goodbye','Does it really work finding longest length of string','Jet','Yes it really does work']
longer_string = ''
for i in range(len(comparing_two_elements)-1):
if len(prior_string) < len(comparing_two_elements[i + 1]):
longer_string = comparing_two_elements[i+1]
print(longer_string)

The below works simply by 'saving' the first element of your list as the longest element, as it will be the first time you loop over your list, and then on subsequent iterations it will compare the length of that item to the length of the next item in the list.
longest_element = None
for element in comparing_two_elements:
if not longest_element:
longest_element = element
continue
if len(longest_element) < len(element):
longest_element = element
If you want to go the "interesting" route, you could do it with combination of other functions, for eg
length_map = map(len, comparing_two_elements)
longest_index = length_map.index(max(length_map))
longest_element = comparing_two_elements[longest_index]

Use the third, optional step argument to range - and don't subtract 1 from len(...) ! Your logic is incomplete: what if the first of a pair of strings is longer? you don't do anything in that case.
It's not clear what you're trying to do. This for loop runs through i = 0, 2, 4, ... up to but excluding len(comparing_two_elements) (assumed to be even!), and prints the longer of each adjacent pair:
for i in range(0, len(comparing_two_elements), 2):
if len(comparing_two_elements[i]) < len(comparing_two_elements[i + 1]):
idx = i
else:
idx = i + 1
print(comparing_two_elements[idx])
This may not do exactly what you want, but as several people have observed, it's unclear just what that is. At least it's something you can reason about and adapt.
If you just want the longest string in a sequence seq, the whole adjacent pairs rigamarole is pointless; simply use:
longest_string = max(seq, key=len)

Related

Python list first n entries in a custom base number system

I am sorry if the title is a misnomer and/or doesn't properly describe what this is all about, you are welcome to edit the title to make it clear once you understand what this is about.
The thing is very simple, but I find it hard to describe, this thing is sorta like a number system, except it is about lists of integers.
So we start with a list of integers with only zero, foreach iteration we add one to it, until a certain limit is reached, then we insert 1 at the start of the list, and set the second element to 0, then iterate over the second element until the limit is reached again, then we add 1 to the first element and set the second element 0, and when the first element reaches the limit, insert another element with value of 1 to the start of the list, and zero the two elements after it, et cetera.
And just like this, when a place reaches limit, zero the place and the places after it, increase the place before it by one, and when all available places reach limit, add 1 to the left, for example:
0
1
2
1, 0
1, 1
1, 2
2, 0
2, 1
2, 2
1, 0, 0
The limit doesn't have to be three.
This is what I currently have that does something similar to this:
array = []
for c in range(26):
for b in range(26):
for a in range(26):
array.append((c, b, a))
I don't want leading zeroes but I can remove them, but I can't figure out how to do this with a variable number of elements.
What I want is a function that takes two arguments, limit (or base) and number of tuples to be returned, and returns the first n such tuples in order.
This must be very simple, but I just can't figure it out, and Google returns completely irrelevant results, so I am asking for help here.
How can this be done? Any help will truly be appreciated!
Hmm, I was thinking about something like this, but very unfortunately I can't make it work, please help me figure out why it doesn't work and how to make it work:
array = []
numbers = [0]
for i in range(1000):
numbers[-1] += 1
while 26 in numbers:
index = numbers.index(26)
numbers[index:] = [0] * (len(numbers) - index)
if index != 0:
numbers[index - 1] += 1
else:
numbers.insert(0, 1)
array.append(numbers)
I don't quite understand it, my testing shows everything inside the loop work perfectly fine outside the loop, the results are correct, but it just simply magically will not work in a loop, I don't know the reason for this, it is very strange.
I discovered the fact that if I change the last line to print(numbers) then everything prints correctly, but if I use append only the last element will be added, how so?
from math import log
def number_to_base(n,base):
number=[]
for digit in range(int(log(n+0.500001,base)),-1,-1):
number.append(n//base**digit%base)
return number
def first_numbers_in_base(n,base):
numbers=[]
for i in range(n):
numbers.append(tuple(number_to_base(i,base)))
return numbers
#tests:
print(first_numbers_in_base(10,3))
print(number_to_base(1048,10))
print(number_to_base(int("10201122110212",3),3))
print(first_numbers_in_base(25,10))
I finally did it!
The logic is very simple, but the hard part is to figure out why it won't work in a loop, turns out I need to use .copy(), because for whatever reason, doing an in-place modification to a list directly modifies the data reside in its memory space, such behavior modifies the same memory space, and .append() method always appends the latest data in a memory space.
So here is the code:
def steps(base, num):
array = []
numbers = [0]
for i in range(num):
copy = numbers.copy()
copy[-1] += 1
while base in copy:
index = copy.index(base)
copy[index:] = [0] * (len(copy) - index)
if index != 0:
copy[index - 1] += 1
else:
copy.insert(0, 1)
array.append(copy)
numbers = copy
return array
Use it like this:
steps(26, 1000)
For the first 1000 lists in base 26.
Here is a a function, that will satisfy original requirements (returns list of tuples, first tuple represents 0) and is faster than other functions that have been posted to this thread:
def first_numbers_in_base(n,base):
if n<2:
if n:
return [(0,)]
return []
numbers=[(0,),(1,)]
base-=1
l=-1
num=[1]
for i in range(n-2):
if num[-1]==base:
num[-1]=0
for i in range(l,-1,-1):
if num[i]==base:
num[i]=0
else:
num[i]+=1
break
else:
num=[1]+num
l+=1
else:
num[-1]+=1
numbers.append(tuple(num))#replace tuple(num) with num.copy() if you want resutl to contain lists instead of tuples.
return numbers

Finding locations of repeated 0 in a binary list

I've got a binary list returned from a k means classification with k = 2, and I am trying to 1) identify the number of 0,0,0,... substrings of a given length - say a minimum of length 3, and 2) identify the start and end locations of these sublists, so in a list: L = [1,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0], the outputs would ideally be: number = 2 and start_end_locations = [[2,6],[13,15]].
The lists I'm working with are tens of thousands of elements long, so I need to find a computationally fast way of performing this operation. I've seen many posts using groupby from itertools, but I can't find a way to apply them to my task.
Thanks in advance for your suggestions!
Thanks in advance for your suggestions!
craft a regular expression that matches your pattern: three or more zeros
concatenate the list items to a string
using re.finditer and match object start() and end() methods construct a list of indices
Concatenating the lists to a string could be the most expensive part - you won't know till you try; finditer should be pretty quick. Requires more than one pass through the data but probably low effort to code.
This will probably be better - a single pass through the list but you need to pay attention to the logic - more effort to code.
iterate over the list using enumerate
when you find a zero
capture its index and
set a flag indicating you are tracking zeros
when you find a one
if you are tracking zeros
capture the index
if the length of consecutive zeros meets your criteria capture the start and end indices for that run of zeros
reset flags and intermediate variables as necessary
A bit different than the word version:
def g(a=a):
y = []
criteria = 3
start,end = 0,0
prev = 1
for i,n in enumerate(a):
if not n: # n is zero
end = i
if prev: # previous item one
start = i
else:
if not prev and end - start + 1 >= criteria:
y.append((start,end))
prev = n
return y
You can use zip() to detect indexes of the 1,0 and 0,1 breaks in sequence. Then use zip() on the break indexes to form ranges and extract the ones that start with a zero and span at least 3 positions.
def getZeroStreaks(L,minSize=3):
breaks = [i for i,(a,b) in enumerate(zip(L,L[1:]),1) if a!=b]
return [[s,e-1] for s,e in zip([0]+breaks,breaks+[len(L)])
if e-s>=minSize and not L[s]]
output:
L = [1,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0]
print(getZeroStreaks(L))
[[2, 6], [13, 15]]
from timeit import timeit
t = timeit(lambda:getZeroStreaks(L*1000),number=100)/100
print(t) # 0.0018 sec for 16,000 elements
The function can be generalized to find streaks of any value in a list:
def getStreaks(L,N=0,minSize=3):
breaks = [i for i,(a,b) in enumerate(zip(L,L[1:]),1) if (a==N)!=(b==N)]
return [[s,e-1] for s,e in zip([0]+breaks,breaks+[len(L)])
if e-s>=minSize and L[s]==N]

Longest repeated substring in massive string

Given a long string, find the longest repeated sub-string.
The brute-force approach of course is to find all substrings and check the substrings of the remaining string, but the string(s) in question have millions of characters (like a DNA sequence, AGGCTAGCT etc) and I'd like something that finishes before the universe collapses in on itself.
Tried a number of approaches, and I have one solution that works quite fast on strings of up to several million, but takes literally forever (6+ hours) for larger strings, particularly when the length of the repeated sequence gets really long.
def find_lrs(text, cntr=2):
sol = (0, 0, 0)
del_list = ['01','01','01']
while len(del_list) != 0:
d = defaultdict(list)
for i in range(len(text)):
d[text[i:i + cntr]].append(i)
del_list = [(item, d[item]) for item in d if len(d[item]) > 1]
# if list is empty, we're done
if len(del_list) == 0:
return sol
else:
sol = (del_list[0][1][0], (del_list[0][1][1]),len(del_list[0][0]))
cntr += 1
return sol
I know it's ugly, but hey, I'm a beginner, and I'm just happy I got something to work. Idea is to go through the string starting out with length-2 substrings as the keys, and the index the substring is at the value. If the text was, say, 'BANANA', after the first pass through, the dict would look like this:
{'BA': [0], 'AN': [1, 3], 'NA': [2, 4], 'A': [5]}
BA shows up only once - starting at index 0. AN and NA show up twice, showing up at index 1/3 and 2/4, respectively.
I then create a list that only includes keys that showed up at least twice. In the example above, we can remove BA, since it only showed up once - if there's no substring of length 2 starting out with 'BA', there won't be an substring of length 3 starting with BA.
So after the first past through the pruned list is:
[('AN', [1, 3]), ('NA', [2, 4])]
Since there is at least two possibilities, we save the longest substring and indices found so far and increment the substring length to 3. We continue until no substring was repeated.
As noted, this works on strings up to 10 million in about 2 minutes, which apparently is reasonable - BUT, that's with the longest repeated sequence being fairly short. On a shorter string but longer repeated sequence, it takes -hours- to run. I suspect that it has something to do with how big the dictionary gets, but not quite sure why.
What I'd like to do of course is keep the dictionary short by removing the substrings that clearly aren't repeated, but I can't delete items from the dict while iterating over it. I know there are suffix tree approaches and such that - for now - are outside my ken.
Could simply be that this is beyond my current knowledge, which of course is fine, but I can't help shaking the idea that there is a solution here.
I forgot to update this. After going over my code again, away from my PC - literally writing out little diagrams on my iPad - I realized that the code above wasn't doing what I thought it was doing.
As noted above, my plan of attack was to start out by going through the string starting out with length-2 substrings as the keys, and the index the substring is at the value, creating a list that captures only length-2 substrings that occured at least twice, and only look at those locations.
All well and good - but look closely and you'll realize that I'm never actually updating the default dictionary to only have locations with two or more repeats! //bangs head against table.
I ultimately came up with two solutions. The first solution used a slightly different approach, the 'sorted suffixes' approach. This gets all the suffixes of the word, then sorts them in alphabetical order. For example, the suffixes of "BANANA", sorted, would be:
A
ANA
ANANA
BANANA
NA
NANA
We then look at each adjacent suffix and find how many letters each pair start out having in common. A and ANA have only 'A' in common. ANA and ANANA have "ANA" in common, so we have length 3 as the longest repeated substring. ANANA and BANANA have nothing in common at the start, ditto BANANA and NA. NA and NANA have "NA" in common. So 'ANA', length 3, is the longest repeated substring.
I made a little helper function to do the actual comparing. The code looks like this:
def longest_prefix(suf1, suf2, mx=None):
min_len = min(len(suf1), len(suf2))
for i in range(min_len):
if suf1[i] != suf2[i]:
return (suf1[0:i], len(suf1[0:i]))
return (suf1[0:i], len(suf1[0:i]))
def longest_repeat(txt):
lst = sorted([text[i:] for i in range(len(text))])
print(lst)
mxLen = 0
mx_string = ""
for x in range(len(lst) - 1):
temp = longest_prefix(lst[x], lst[x + 1])
if temp[1] > mxLen:
mxLen = temp[1]
mx_string = temp[0]
first = txt.find(mx_string)
last = txt.rfind(mx_string)
return (first, last, mxLen)
This works. I then went back and relooked at my original code and saw that I wasn't resetting the dictionary. The key is that after each pass through I update the dictionary to -only- look at repeat candidates.
def longest_repeat(text):
# create the initial dictionary with all length-2 repeats
cntr = 2 # size of initial substring length we look for
d = defaultdict(list)
for i in range(len(text)):
d[text[i:i + cntr]].append(i)
# find any item in dict that wasn't repeated at least once
del_list = [(d[item]) for item in d if len(d[item]) > 1]
sol = (0,0,0)
# Keep looking as long as del_list isn't empty,
while len(del_list) > 0:
d = defaultdict(list) # reset dictionary
cntr += 1 # increment search length
for item in del_list:
for i in item:
d[text[i:i + cntr]].append(i)
# filter as above
del_list = [(d[item]) for item in d if len(d[item]) > 1]
# if not empty, update solution
if len(del_list) != 0:
sol = (del_list[0][0], del_list[0][1], cntr)
return sol
This was quite fast, and I think it's easier to follow.

Make the code run faster without nested loop

I am trying to solve assignment problem, the code I wrote takes extremely long to run. I think it's due to nested loop I used. Is there another way to rewrite the code to make it more efficient.
The question I am trying to solve. Basically, starting at first element to compare with every element to its right. if it is larger than the rest, it will be "dominator". Then the second element to compare with every element to its right again. All the way to the last element which will be automatically become "dominator"
def count_dominators(items):
if len(items) ==0:
return len(items)
else:
k = 1
for i in range(1,len(items)):
for j in items[i:]:
if items[i-1]>j:
k = k+1
else:
k = k+0
return k
You can use a list comprehension to check if each item is a "dominator", then take the length - you have to exclude the final one to avoid taking max of an empty list, then we add 1 because we know that the final one is actually a dominator.
num_dominators = len([i for i in range(len(items) - 1) if items[i] > max(items[i + 1:])]) + 1
This is nice because it fits on one line, but the more efficient (single pass through the list) way to do it is to start at the end and every time we find a new number bigger than any we have seen before, count it:
biggest = items[-1]
n = 1
for x in reversed(items):
if x > biggest:
biggest = x
n+=1
return n

Simple selection sort with repeated elements?

Given a list x, I want to sort it with selection sort, and then count the number of swaps made within the sort. So I came out with something like this:
count=0
a=0
n=len(x)
while (n-a)>0:
#please recommend a better way to swap.
i = (min(x[a:n]))
x[i], x[a] = x[a], x[i]
a += 1
#the count must still be there
count+=1
print (x)
Could you help me to find a way to manage this better? It doesn't work that well.
The problem is NOT about repeated elements. Your code doesn't work for lists with all elements distinct, either. Try x = [2,6,4,5].
i = (min(x[a:n]))
min() here gets the value of the minimum element in the slice, and then you use it as an index, that doesn't make sense.
You are confusing the value of an element, with its location. You must use the index to identify the location.
seq = [2,1,0,0]
beg = 0
n = len(seq)
while (n - beg) > 0:
jdx = seq[beg:n].index((min(seq[beg:n]))) # use the remaining unsorted right
seq[jdx + beg], seq[beg] = seq[beg], seq[jdx + beg] # swap the minimum with the first unsorted element.
beg += 1
print(seq)
print('-->', seq)
As the sorting progresses, the left of the list [0:beg] is sorted, and the right side [beg:] is being sorted, until completion.
jdx is the location (the index) of the minimum of the remaining of the list (finding the min must happen on the unsorted right part of the list --> [beg:])

Categories

Resources