Iterate though list on each for loop iteration - python

I'm working on the following code:
mylist = [1,2,3,4,5,6,7,8,9,10.....]
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
print(value)
Basically, I'm taking 3 items in mylist at a time, the code is bigger than that, but I'm doing a lot of things with them returning a value from it, then it takes the next 3 items from mylist and keep doing it till the end of this list.
But now I have a problem, I need to identify each iteration, but they follow a rule:
The first loop are from A, the second are from B and the third are from C.
When it reaches the third, it starts over with A, so what I'm trying to do is something like this:
mylist[0:3] are from A
mylist[3:6] are from B
mylist[6:9] are from C
mylist[9:12]are from A
mylist[12:15] are from B......
The initial idea was to implement a identifier the goes from A to C, and each iteration it jumps to the next identifier, but when it reaches C, it backs to A.
So the output seems like this:
[1,2,3] from A
[4,5,6] from B
[6,7,8] from C
[9,10,11] from A
[12,13,14] from B
[15,16,17] from C
[18,19,20] from A.....
My bad solution:
Create identifiers = [A,B,C] multiply it by the len of mylist -> identifiers = [A,B,C]*len(mylist)
So the amount of A's, B's and C's are the same of mylist numbers that it needs to identify. Then inside my for loop I add a counter that adds +1 to itself and access the index of my list.
mylist = [1,2,3,4,5,6,7,8,9,10.....]
identifier = ['A','B','C']*len(mylist)
counter = -1
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
counter += 1
print(value, identifier[counter])
But its too ugly and not fast at all. Does anyone know a faster way to do it?

Cycle, zip, and unpack:
mylist = [1,2,3,4,5,6,7,8,9,10]
for value, iden in zip(mylist, itertools.cycle('A', 'B', 'C')):
print(value, iden)
Output:
1 A
2 B
3 C
4 A
5 B
6 C
7 A
8 B
9 C
10 A

You can always use a generator to iterate over your identifiers:
def infinite_generator(seq):
while True:
for item in seq:
yield item
Initialise the identifiers:
identifier = infinite_generator(['A', 'B', 'C'])
Then in your loop:
print(value, next(identifier))

Based on Ignacio's answer fitted for your problem.
You can first reshape your list into a list of arrays containing 3 elements:
import pandas as pd
import numpy as np
import itertools
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = np.reshape(mylist[:len(mylist)-len(mylist)%3],(-1,3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that it works since your list contains multiple of 3 elements (so you need to drop the last elements in order to respect this condition, mylist[:len(mylist)-len(mylist)%3]) - Understanding slice notation
See UPDATE section for a reshape that fits to your question.
Then apply Ignacio's solution on the reshaped list
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
UPDATE
You can use #NedBatchelder's chunk generator to reshape you array as expected:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = list(chunks(mylist, 3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]
[10]]
Then:
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
[10] A
Performances
Your solution : 1.32 ms ± 94.3 µs per loop
With a reshaped list : 1.32 ms ± 84.6 µs per loop
You notice that there is no sensitive difference in terms of performances for an equivalent result.

You could create a Generator for the slices:
grouped_items = zip(*[seq[i::3] for i in range(3)])

Related

Stacking Items of a list with np.vstack in one Matrix

I am pretty new to Python, trying to tackle the following problem:
Imagine having the following numpy arrays (either empty or n x 2 Arrays):
a = np.array([])
b = np.array([1,2])
c = np.array([[5,6],[7,8]])
d = np.array([[11,12],[13,14],[15,16]])
These are now put in a list E like so:
E = [a,b,c,d]
Now, I want the items to be stacked in a m x 2 Matrix, I tried using:
F = np.vstack(E)
but this gives me an error, because of the dimension problems of the empty array.
I want the Output to look like this:
output = [[1 2]
[5 6]
[7 8]
[11 12]
[13 14]
[15 16]]
As per the comments from #Karina, you must remove empty arrays beforehand. Sop just change how you declare E to include some sort of filter:
E = [i for i in (a, b, c, d) if len(i) > 0]

A Problem in understanding of some fundamentals in python

I'm a beginner in python and I have a problem with some basics of it.
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
b.append(sum(i[col] for i in m))
print(b)
This code will sum the columns of m.
But if I extract the for i in m and write it as follows:
for col in range(3):
for i in m:
b.append(sum(i[col]))
print(b)
it doesn't work and gives me an error. All I've done is extracting the loop from the parenthesis.
What is the problem and what should I do?
Note the original code has a single call to append each time around the outer for loop.
Try:
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
toadd = 0
for i in m:
toadd += sum(i[col])
b.append(toadd)
print(b)
for col in range(3): # col is an int
for i in m: # each i is a list e.g. [1, 2, 3]
b.append(sum(i[col])) # i[col] is an int, e.g. i[0] = 1 or i[1] = 2
The problem is that sum() receives iterable as argument not an int https://docs.python.org/3/library/functions.html#sum
If in doubt, use some print statements to see whats going on within the for loops:
m = [[1,2,3],[4,5,6],[7,8,9]]
b = []
for col in range(3):
print([i[col] for i in m])
print("----")
for col in range(3):
print("new col")
for i in m:
print(i[col])
outputs:
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
----
new col
1
4
7
new col
2
5
8
new col
3
6
9
So the first for loop, i[col] is returning a list of numbers and in the second for loop, i[col] is returning an int.
This may give you a clue why sum(i[col]) is returning an error in the second for loop.

Calculate number of items in one list are in another

Let's say I have two very large lists (e.g. 10 million rows) with some values or strings. I would like to figure out how many items from list1 are in list2.
As such this can be done by:
true_count = 0
false_count = 0
for i, x in enumerate(list1):
print(i)
if x in list2:
true_count += 1
else:
false_count += 1
print(true_count)
print(false_count)
This will do the trick, however, if you have 10 million rows, this could take quite some time. Is there some sweet function I don't know about that can do this much faster, or something entirely different?
Using Pandas
Here's how you will do it using Pandas dataframe.
import pandas as pd
import random
list1 = [random.randint(1,10) for i in range(10)]
list2 = [random.randint(1,10) for i in range(10)]
df1 = pd.DataFrame({'list1':list1})
df2 = pd.DataFrame({'list2':list2})
print (df1)
print (df2)
print (all(df2.list2.isin(df1.list1).astype(int)))
I am just picking 10 rows and generating 10 random numbers:
List 1:
list1
0 3
1 5
2 4
3 1
4 5
5 2
6 1
7 4
8 2
9 5
List 2:
list2
0 2
1 3
2 2
3 4
4 3
5 5
6 5
7 1
8 4
9 1
The output of the if statement will be:
True
The random lists I checked against are:
list1 = [random.randint(1,100000) for i in range(10000000)]
list2 = [random.randint(1,100000) for i in range(5000000)]
Ran a test with 10 mil. random numbers in list1, 5 mil. random numbers in list2, result on my mac came back in 2.207757880999999 seconds
Using Set
Alternate, you can also convert the list into a set and check if one set is a subset of the other.
set1 = set(list1)
set2 = set(list2)
print (set2.issubset(set1))
Comparing the results of the run, set is also fast. It came back in 1.6564296570000003 seconds
You can convert the lists to sets and compute the length of the intersection between them.
len(set(list1) & set(list2))
You will have to use Numpy array to translate the lists into a np.array()
After that, both lists will be considered as np.array objects, and because they have only one dimension you can use np.intersect() and count the common items with .size
import numpy as np
lst = [1, 7, 0, 6, 2, 5, 6]
lst2 = [1, 8, 0, 6, 2, 4, 6]
a_list=np.array(lst)
b_list=np.array(lst2)
c = np.intersect1d(a_list, b_list)
print (c.size)

Why does vstack change the type of the elments? And how do I solve this?

I have some lists such as
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
Where 'hi', 'hello' and 'morning' are strings, while the rest are numbers.
However then I try to stack them up as:
matrix = np.vstack((list1,list2,list3))
However the types of the numbers become string. In particular they become numpy_str.
How do I solve this? I tried replacing the items, I tried changing their type, nothing works
edit
I made a mistake above! In my original problem, the first list is actually a list of headings, so for example
list1 = ['hi', 'number of hours', 'number of days', 'ideas']
So the first column (in the vertically stacked array) is a column of strings. The other columns have a string as their first element and then numbers.
You could use Pandas DataFrames, they allow for heterogeneous data:
>>> pandas.DataFrame([list1, list2, list3])
0 1 2 3
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1
If you want to name the columns, you can do that too:
pandas.DataFrame([list1, list2, list3], columns=list0)
hi nb_hours nb_days ideas
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1
Since number can be written as strings, but strings can not be written as number, your matrix will have all its elements of type string.
If you want to have a matrix of integers, you can:
1- Extract a submatrix corresponding to your numbers and then map it to be integers 2- Or you can directly extract only the numbers from your lists and stack them.
import numpy as np
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
matrix = np.vstack((list1,list2,list3))
# First
m = map(np.int32,matrix[:,1:])
# [array([2, 3, 4], dtype=int32), array([7, 1, 8], dtype=int32), array([7, 2, 1], dtype=int32)]
# Second
m = np.vstack((list1[1:],list2[1:],list3[1:]))
# [[2 3 4] [7 1 8] [7 2 1]]
edit (Answer to comment)
I'll call the title list list0:
list0 = ['hi', 'nb_hours', 'nb_days', 'ideas']
It's basically the same ideas:
1- Stack all then extract submatrix (Here we don't take neither first row neither first column: [1:,1:])
matrix = np.vstack((list0,list1,list2,list3))
matrix_nb = map(np.int32,matrix[1:,1:])
2- Directly don't stack the list0 and stack all the other lists (except their first element [1:]):
m = np.vstack((list1[1:],list2[1:],list3[1:]))

Matrix as a dictionary; is it safe?

I know that the order of the keys is not guaranteed and that's OK, but what exactly does it mean that the order of the values is not guaranteed as well*?
For example, I am representing a matrix as a dictionary, like this:
signatures_dict = {}
M = 3
for i in range(1, M):
row = []
for j in range(1, 5):
row.append(j)
signatures_dict[i] = row
print signatures_dict
Are the columns of my matrix correctly constructed? Let's say I have 3 rows and at this signatures_dict[i] = row line, row will always have 1, 2, 3, 4, 5. What will signatures_dict be?
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
or something like
1 2 3 4 5
1 4 3 2 5
5 1 3 4 2
? I am worried about cross-platform support.
In my application, the rows are words and the columns documents, so can I say that the first column is the first document?
*Are order of keys() and values() in python dictionary guaranteed to be the same?
You will guaranteed have 1 2 3 4 5 in each row. It will not reorder them. The lack of ordering of values() refers to the fact that if you call signatures_dict.values() the values could come out in any order. But the values are the rows, not the elements of each row. Each row is a list, and lists maintain their order.
If you want a dict which maintains order, Python has that too: https://docs.python.org/2/library/collections.html#collections.OrderedDict
Why not use a list of lists as your matrix? It would have whatever order you gave it;
In [1]: matrix = [[i for i in range(4)] for _ in range(4)]
In [2]: matrix
Out[2]: [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]
In [3]: matrix[0][0]
Out[3]: 0
In [4]: matrix[3][2]
Out[4]: 2

Categories

Resources