Iterate with last element repeated as first in next iteration - python

I have some list with objects looks like:
oldList=[a,b,c,d,e,f,g,h,i,j,...]
what I need is to create a new list with nested list items which will look like this:
newList=[[a,b,c,d],[d,e,f,g],[g,h,i,j]...]
or simply spoken - last element from previous nested is first element in next new nested list.

One of the ways of doing it is
>>> l = ['a','b','c','d','e','f','g','h','i','j']
>>> [l[i:i+4] for i in range(0,len(l),3)]
[['a', 'b', 'c', 'd'], ['d', 'e', 'f', 'g'], ['g', 'h', 'i', 'j'], ['j']]
Here :
l[i:i+4] implies that we print a chunk of 4 values starting from position i
range(0,len(l),3) implies that we traverse the length of the list by taking three jumps
So the basic working of this is that, we are taking a chunk of 3 elements from the list, but we are modifying the slice length so that it includes an additional element. In this way, we can have a list of 4 elements.
Small Note - The initialization oldList=[a,b,c,d,e,f,g,h,i,j,...] is invalid unless a,b,c,d,etc are previously defined. You were perhaps looking for oldList = ['a','b','c','d','e','f','g','h','i','j']
Alternatively, if you wanted a solution which would split into even sized chunks only, then you could try this code :-
>>> [l[i:i+4] for i in range(0,len(l)-len(l)%4,3)]
[['a', 'b', 'c', 'd'], ['d', 'e', 'f', 'g'], ['g', 'h', 'i', 'j']]

Related

Efficiency question: how to compare two huge nested lists and make changes based on criteria

I want to compare two huge identical nested lists and by iterating over both of them. I'm looking for nested lists in where list_a[0] is equal to list_b[1]. In that case I want to merge those lists (the order is important). The non-matches lists I also want in the output.
rows_a = [['a', 'b', 'z'], ['b', 'e', 'f'], ['g', 'h', 'i']]
rows_b = [['a', 'b', 'z'], ['b', 'e', 'f'], ['g', 'h', 'i']]
data = []
for list_a in rows_a:
for list_b in rows_b:
if list_a[0] == list_b[1]:
list_b.extend(list_a)
data.append(list_b)
else:
data.append(list_b)
#print(data): [['a', 'b', 'z', 'b', 'e', 'f'], ['b', 'e', 'f'], ['g', 'h', 'i'], ['a', 'b', 'z', 'b', 'e', 'f'], ['b', 'e', 'f'], ['g', 'h', 'i'], ['a', 'b', 'z', 'b', 'e', 'f'], ['b', 'e', 'f'], ['g', 'h', 'i']]
Above is the output that I do NOT want, because it is way too much data. All this unnecessary data is caused by the double loop over both rows. A solution would be to slice an element off rows_b by every iteration of the for loop on rows_a. This would avoid many duplicate comparisons. Question: How do I skip first element of a list every time it has looped from start to end?
In order to show the desired outcome, I correct the outcome by deleting duplicates below:
res=[]
for i in data:
if tuple(i) not in res:
res.append(tuple(i))
print(res)
#Output: [('a', 'b', 'z', 'b', 'e', 'f'), ('b', 'e', 'f'), ('g', 'h', 'i')]
This is the output I want! But faster...And preferably without removing duplicates.
I managed to get what I want when I work with a small data set. However, I am using this for a very large data set and it gives me a 'MemoryError'. Even if it didn't give me the error, I realise it is a very inefficient script and it takes a lot of time to run.
Any help would be greatly appreciated.
tuple(i) not in res is not efficient since it iterate over the whole list over and over in linear time resulting in a quadratic execution time (O(n²)). You can speed this up using a set:
list({tuple(e) for e in data})
This does not preserve the order. If you want to do that, then you can use a dict (requires a quire recent version of Python):
list({tuple(e): None for e in data}.keys())
This should be significantly faster. An alternative solution is to convert the element to tuple, then sort them and compare close pairs of values so to remove duplicates. Note you can also merge two set or two dict with the update method.
As for the memory space, there is not much to do. The problem is CPython itself which is clearly not designed for computing large data with such data structure (only native data structures like Numpy arrays are efficient). Each character is encoded as a Python object taking 24-32 bytes. Lists contains references to objects taking 8 bytes each on a 64-bit architecture. This means 40 bytes per characters while 1 byte is actually needed (and this is what a native C/C++ program can actually use in practice). That being said CPython can cache 1-byte character so to use "only" 8 byte per character in this specific case (which is still 8 time more than required). If you use list of characters in your real-world application, please consider using string instead. Otherwise, please consider using another language.
I solved this by using a LEFT JOIN in SQL. You can do the same thing with Pandas Data Frames in Python.

Using a list of lists containing indexes to pull strings by index from another list

I have a list of lists that contain integers. These integers are indexes for each element in the list of strings. I need to use the list of indexes to select the correct string from the list of strings; creating a new list of lists of the just the selected strings. Sorry for the tongue twister of an explanation, I tried to make it is clear as possible. Below is a simple example of what I am trying to achieve.
list_of_indexes=[[0,2,4],[5,7,6],[1,9]]
list_of_text=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
desired_output = [['a','c','e'], ['f', 'h', 'g'], ['b', 'j']]
[[list_of_text[idx-1] for idx in indices] for indices in list_of_indexes]

How to make the first element of the list unchange in python?

Here is my list,
a = [['a','b','c','d'],['e','f','g','h'],['i','j','k','l'],['m','n','o','p']]
and Here is my function,
def add(change,unchange):
a = change
b = unchange
a[0].insert(a[0].index(a[0][2]),"high_range")
a[0].insert(a[0].index(a[0][3]),"low_range")
print(a)
print(b)
When I try to execute this function,
add(a,a[0])
I'm getting this output,
[['a', 'b', 'high_range', 'low_range', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j', 'k', 'l'], ['m', 'n', 'o', 'p']]
['a', 'b', 'high_range', 'low_range', 'c', 'd']
But my expected output is the following,
[['a', 'b', 'high_range', 'low_range', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j', 'k', 'l'], ['m', 'n', 'o', 'p']]
['a', 'b', 'c', 'd']
How to make the first element of the list keep on same in the second variable ? Sorry I'm newbie.
Since a list is a mutable type, when you insert values into a this also gets reflected in b, since they are pointers to the same list. You can either print b before inserting values into the list, or make b a copy of unchange like this:
def add(change,unchange):
a = change
b = unchange[:]
a[0].insert(2, "high_range")
a[0].insert(3, "low_range")
print(a)
print(b)
Also, a[0].index(a[0][2]) is redundant, you already know that the index is 2.
The main problem is in line:
add(a, a[0])
as you are mutating a inside the function a[0] will change as well as they point to the same thing. You need to design your program accordingly. You can refer to this answer. How to clone or copy a list?
depending upon your requirement you can do this.
either suply a copy while calling a function.
add(a, a[0][:]) # or
read #alec's answer.
your function is perfect but execute this:
add(a,a[0][:])
this will make the second variable, namely a[0], a copy, which will be left unchanged.

How can I split a list in two unique lists in Python?

Hi I have a list as following:
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
15 members.
I want to turn it into 3 lists, I used this code it worked but I want unique lists. this give me 3 lists that have mutual members.
import random
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
print(random.sample(listt,5))
print(random.sample(listt,5))
print(random.sample(listt,5))
Try this:
from random import shuffle
def randomise():
listt = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
shuffle(listt)
return listt[:5], listt[5:10], listt[10:]
print(randomise())
This will print (for example, since it is random):
(['i', 'k', 'c', 'b', 'a'], ['d', 'j', 'h', 'n', 'f'], ['e', 'l', 'o', 'g', 'm'])
If it doesn't matter to you which items go in each list, then you're better off partitioning the list into thirds:
In [23]: L = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o']
In [24]: size = len(L)
In [25]: L[:size//3]
Out[25]: ['a', 'b', 'c', 'd', 'e']
In [26]: L[size//3:2*size//3]
Out[26]: ['f', 'g', 'h', 'i', 'j']
In [27]: L[2*size//3:]
Out[27]: ['k', 'l', 'm', 'n', 'o']
If you want them to have random elements from the original list, you'll just need to shuffle the input first:
random.shuffle(L)
Instead of sampling your list three times, which will always give you three independent results where individual members may be selected for more than a single list, you could just shuffle the list once and then split it in three parts. That way, you get three random subsets that will not share any items:
>>> random.shuffle(listt)
>>> list[0:5]
>>> listt[0:5]
['b', 'a', 'f', 'e', 'h']
>>> listt[5:10]
['c', 'm', 'g', 'j', 'o']
>>> listt[10:15]
['d', 'l', 'i', 'n', 'k']
Note that random.shuffle will shuffle the list in place, so the original list is modified. If you don’t want to modify the original list, you should make a copy first.
If your list is larger than the desired result set, then of course you can also sample your list once with the combined result size and then split the result accordingly:
>>> sample = random.sample(listt, 5 * 3)
>>> sample[0:5]
['h', 'm', 'i', 'k', 'd']
>>> sample[5:10]
['a', 'b', 'o', 'j', 'n']
>>> sample[10:15]
['c', 'l', 'f', 'e', 'g']
This solution will also avoid modifying the original list, so you will not need a copy if you want to keep it as it is.
Use [:] for slicing all members out of the list which basically copies everything into a new object. Alternatively just use list(<list>) which copies too:
print(random.sample(listt[:],5))
In case you want to shuffle only once, store the shuffle result into a variable and copy later:
output = random.sample(listt,5)
first = output[:]
second = output[:]
print(first is second, first is output) # False, False
and then the original list can be modified without the first or second being modified.
For nested lists you might want to use copy.deepcopy().

search an item of sublist in another list of list by position

I have a list of list created like
biglist=[['A'], ['C', 'T'], ['A', 'T']]
and I will have another list like
smalllist=[['C'], ['T'], ['A', 'T']]
So, I want to check wheter an item in small list contains in that specific index of biglist, if not append to it.
so, making
biglist=[['A','C'], ['C', 'T'], ['A', 'T']]
so, 'C' from fist sublist of smalllist was added to first sublist of biglist. but not for second and third.
I tried like
dd=zip(biglist, smalllist)
for each in dd:
ll=each[0].extend(each[1])
templist.append(list(set(ll)))
but get errors
templist.append(list(set(ll)))
TypeError: 'NoneType' object is not iterable
How to do it?
Thank you
Probably, you should try this:
// This will only work, if smalllist is shorter than biglist
SCRIPT:
biglist = [['A'], ['C', 'T'], ['A', 'T']]
smalllist = [['C'], ['T'], ['A', 'T']]
for i, group in enumerate(smalllist):
for item in group:
if item not in biglist[i]:
biglist[i].append(item)
DEMO:
print(biglist)
# [['A', 'C'], ['C', 'T'], ['A', 'T']]
[list(set(s+b)) for (s,b) in zip(smalllist,biglist)]
For some reason, extend in Python doesn't return the list itself after extending. So ll in your case is None. Just put ll=each[0] on the second line in the loop, and your solution should start working.
Still, I'm not getting, why you don' keep your elements in sets in the first place. This would avoid you from having to convert from list to set and then backwards.
I would just or sets instead of appending to the list and then filtering out duplicates by resorting to set and then to list.
>>> from itertools import izip
>>> templist = []
>>> for els1,els2 in izip(biglist,smalllist):
joined = list(set(els1) | set(els2))
templist.append(joined)
>>> templist
[['A', 'C'], ['C', 'T'], ['A', 'T']]
Keeping elements in sets in the first place seems to be the fastest in Python 3 even for such small amount of elements in each set (see comments):
biglist=[set(['A']), set(['C', 'T']), set(['A', 'T'])]
smalllist=[set(['C']), set(['T']), set(['A', 'T'])]
for els1,els2 in zip(biglist,smalllist):
els1.update(els2)
print(biglist)
Ouput:
[{'A', 'C'}, {'C', 'T'}, {'A', 'T'}]

Categories

Resources