I am looking at a very large structured dataset that I would like to make unstructured. Here is the example…
x1 x2 x3 day id
1 5 9 2 A
9 7 9 3 B
3 1 4 1 A
2 6 5 1 B
3 5 8 2 B
3 2 3 2 C
The rows above are presented in a random order. Another way to think of this example is as follows…
x = [[1, 5, 9, 2, “A”],
[9, 7, 9, 3, “B”],
[3, 1, 4, 1, “A”],
[2, 6, 5, 1, “B”],
[3, 5, 8, 2, “B”],
[3, 2, 3, 2, “C”]]
Once processed, the desired output is…
[[[3, 1, 4, 1], [1, 5, 9, 2]],
[[2, 6, 5, 1], [3, 5, 8, 2], [9, 7, 9, 3]],
[[3, 2, 3, 2]]],
[[1, A], [1,B], [2,C]]
The first list has the x variables, and the second list has the start date with each identifier.
I have an idea of how to achieve this, but it is in O(n^3). Is there a more efficient method, maybe in O(nlogn)?
Edit: Although mentioned in my previous post, I have made it clearer that the rows are presented in random order. I have also removed redundant column in the code example.
Try:
x = [
[3, 1, 4, 1, 1, "A"],
[1, 5, 9, 2, 2, "A"],
[2, 6, 5, 1, 1, "B"],
[3, 5, 8, 2, 2, "B"],
[9, 7, 9, 3, 3, "B"],
[3, 2, 3, 2, 2, "C"],
]
out = {}
for row in x:
out.setdefault(row[-1], []).append(row[:-1])
print(list(out.values()) + [[[v[0][-1], k] for k, v in out.items()]])
Prints:
[
[[3, 1, 4, 1, 1], [1, 5, 9, 2, 2]],
[[2, 6, 5, 1, 1], [3, 5, 8, 2, 2], [9, 7, 9, 3, 3]],
[[3, 2, 3, 2, 2]],
[[1, "A"], [1, "B"], [2, "C"]],
]
Related
I have a matrix that I want to convert to 3D so that I can be able to print the element of list[i][j][k]
a = [[[0, 1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [2, 3, 2, 3, 4,5], [3, 2, 3, 4, 3, 4], [4, 3, 4, 5, 4, 3], [5, 4, 5, 6, 5, 4]]]
print(a[5][5][0]) # I want to be able to print in 3D`
I get right output if I do a[5][5] but wrong when I add the [0]. Is there anyway of converting my matrix such that this will be solved?
I tried to just wrap the list up with brackets [list], but it did not work. I also did:
b = [[i] for i in a]
which gave me [[[0,1,2,3,4,5]],[[1,2,3,4,5,6]],...
and it still did not work!
NOTE: I want the i to be the row, j to be the column and k to be 0 or 1, so k = 0 (in which case the value is the row index of the cell is pointing to), or the k = 1 (the value is the column index).
Tried to reproduce your issue. To me, it works if you use the right index. Here, it perfectly works if you do for instance
print(a[0][0][5]) # I want to be able to print in 3D`
for list a = [[[0, 1, 2, 3, 4, 5], [1, 2, 3, 4, 5, 6], [2, 3, 2, 3, 4,5], [3, 2, 3, 4, 3, 4], [4, 3, 4, 5, 4, 3], [5, 4, 5, 6, 5, 4]]] you have just a[0][n][n]. You can try a[0][5][5]
You have index like below:
a = [
#0 element i
[
#0 element j
[0, 1, 2, 3, 4, 5],
#1 element j
[1, 2, 3, 4, 5, 6],
#2 element j
[2, 3, 2, 3, 4,5],
#3 element j
[3, 2, 3, 4, 3, 4],
#4 element j
[4, 3, 4, 5, 4, 3],
#5 element j
[5, 4, 5, 6, 5, 4]
]
]
print(a[0][5][5]) # a[i][j][k]
I am trying to generate a list of unique lists each 5 elements long, the order is not important but there can't be any repeated elements. The first 3 elements needs to be from [1,2,3,4] and elements 4 and 5 from [5,6,7,8]. for example [1,2,3,7,8] is valid but [1,2,2,7,8] is not nor is [1,2,7,8,9]
The below code works but I am wondering is there a better way of incorporating the product function? something like d = product([L1, repeat=3][L4,repeat=2). From reading the docs the repeat keyword can only be used once, like this: d = product(L1,L4,repeat=2).
Any ideas how i could do this?
Thanks
from itertools import product
L1 = [1,2,3,4]
L2 = [1,2,3,4]
L3 = [1,2,3,4]
L4 = [5,6,7,8]
L5 = [5,6,7,8]
d = product(L1,L2,L3,L4,L5)
result=[]
for x in d:
if x.count(1)<2 and x.count(2)<2 and x.count(3)<2 and x.count(4)<2 and x.count(5)<2 and x.count(6)<2 and x.count(7)<2 and x.count(8)<2:
result.append(sorted(x))
result2 = []
for x in result:
if x not in result2:
result2.append(x)
print(result2)
result2
[[1, 2, 3, 5, 6],
[1, 2, 3, 5, 7],
[1, 2, 3, 5, 8],
[1, 2, 3, 6, 7],
[1, 2, 3, 6, 8],
[1, 2, 3, 7, 8],
[1, 2, 4, 5, 6],
[1, 2, 4, 5, 7],
[1, 2, 4, 5, 8],
[1, 2, 4, 6, 7],
[1, 2, 4, 6, 8],
[1, 2, 4, 7, 8],
[1, 3, 4, 5, 6],
[1, 3, 4, 5, 7],
[1, 3, 4, 5, 8],
[1, 3, 4, 6, 7],
[1, 3, 4, 6, 8],
[1, 3, 4, 7, 8],
[2, 3, 4, 5, 6],
[2, 3, 4, 5, 7],
[2, 3, 4, 5, 8],
[2, 3, 4, 6, 7],
[2, 3, 4, 6, 8],
[2, 3, 4, 7, 8]]
I would instead use itertools.combinations in combination with itertools.product:
from itertools import chain, combinations, product
result = list(
map(
list,
map(
chain.from_iterable,
product(
combinations([1, 2, 3, 4], 3),
combinations([5, 6, 7, 8], 2),
),
),
),
)
the repeat is going to repeat the result two times, in case anyone is wondering about it .
the product takes 1 parameter, the second is optional
for more details :
https://blog.teclado.com/python-itertools-part-1-product/
Consider a variable setSize (it can take value 2 or 3), and a numpy array v.
The number of columns in v is divisible by setSize. Here's a small sample:
import numpy as np
setSize = 2
# the array spaces are shown to emphasize that the rows
# are made up of sets having, in this case, 2 elements each.
v = np.array([[2,5, 3,5, 1,8],
[4,6, 2,7, 5,9],
[1,8, 2,3, 1,4],
[2,8, 1,4, 3,5],
[5,7, 2,3, 7,8],
[1,2, 4,6, 3,5],
[3,5, 2,8, 1,4]])
PROBLEM: For the rows that have all elements unique, I need to ALPHABETIZE the sets.
For example: set 1,14 would precede set 3,5, which would precede set 5,1.
As a final step, I need to eliminate any duplicated rows that may result.
In this example above, the array rows having indices 1,3,5,and 6 have unique elements,
so these rows must be alphabetized. The other rows are not changed.
Further, the rows v[3] and v[6], after alphabetization, are now identical. One of them may be dropped.
The final output looks like:
v = [[2,5, 3,5, 1,8],
[2,7, 4,6, 5,9],
[1,8, 2,3, 1,4],
[1,4, 2,8, 3,5],
[5,7, 2,3, 7,8],
[1,2, 3,5, 4,6]]
I can identify the rows having unique elements with code like below, but I stuck with the alphabetization code.
s = np.sort(v,axis=1)
v[(s[:,:-1] != s[:,1:]).all(1)]
Assuming you have unsuitable rows dropped with:
s = np.sort(v, axis=1)
idx = (s[:,:-1] != s[:,1:]).all(1)
w = v[idx]
Then you can get orders of each row with np.lexsort on a reshaped array:
w = w.reshape(-1,3,2)
s = np.lexsort((w[:,:,1], w[:,:,0]))
Then you can apply fancy indexing and reshape it back:
rows, orders = np.repeat(np.arange(len(s)), 3), s.flatten()
v[idx] = w[rows, orders].reshape((-1,6))
If you need to drop duplicated rows, you can do it like so:
u, idx = np.unique(v, return_index=True, axis=0)
output = v[np.sort(idx)]
Visualization of process:
Sample run:
>>> s
array([[1, 0, 2],
[1, 0, 2],
[0, 2, 1],
[2, 1, 0]], dtype=int64)
>>> rows
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3])
>>> orders
array([1, 0, 2, 1, 0, 2, 0, 2, 1, 2, 1, 0], dtype=int64)
>>> v[idx]
array([[2, 7, 4, 6, 5, 9],
[1, 4, 2, 8, 3, 5],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> v
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6],
[1, 4, 2, 8, 3, 5]])
>>> output
array([[2, 5, 3, 5, 1, 8],
[2, 7, 4, 6, 5, 9],
[1, 8, 2, 3, 1, 4],
[1, 4, 2, 8, 3, 5],
[5, 7, 2, 3, 7, 8],
[1, 2, 3, 5, 4, 6]])
I have a function written in python that performs some consecutive operations on two lists. The problem is that at random times during the execution of these functions, they give wrong answer. The code inside the function is
def temp(c, p):
random.seed(0)
x = random.randint(0 , len(c)-1)
y = random.randint(0 , len(c)-1)
s_1 = c[x][0]
s_2 = c[y][0]
p[x] += [s_1]
p[y] += [s_2]
p[x].remove(s_2)
p[y].remove(s_1)
c[x], c[y] = c[y], c[x]
return c, p
def anotherFunction():
iter = 1000
for i in iter:
c_main, p_main = temp(c, p)
I have a list of list with numbers ranging from 0 to n. For example c contains the following
c = [[7], [6], [1], [2], [5], [4], [0], [3]]
And p is also a list of list that contains all the numbers from 0 to n except that at index which are there in c.
p = [[0, 2, 4, 6, 5, 1, 3]
[0, 1, 2, 3, 4, 5, 7]
[0, 2, 3, 4, 6, 7, 5]
[0, 1, 3, 5, 6, 7, 4]
[0, 2, 4, 6, 7, 3, 1]
[0, 1, 2, 3, 6, 7, 5]
[1, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 4, 5, 6, 7]]
This is how the values should be at any random point in the function. That is the values at idx in c should not be present in the list at idx in p.
But sometimes during the execution of the function, the values selected by x and y are swapped but one other value also gets affected. This is how the two list looks like sometimes
c = [[3], [1], [4], [5], [7], [0], [2], [6]]
p = [[0, 1, 2, 4, 5, 6, 7]
[0, 2, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 6, 7, 4]
[0, 1, 2, 3, 6, 5, 5]
[0, 1, 2, 3, 4, 6, 7]
[1, 2, 3, 4, 6, 7, 5]
[0, 1, 3, 4, 5, 6, 7]
[0, 1, 2, 3, 4, 5, 7]]
I'm unable to understand how these consecutive operations are getting affected by one another. This function gets called inside a loop of another function.
UPDATE:
I debugged my code more carefully and realized that at some iterations of the for loop two more values get swapped in c in addition to x and y. And because these values get swapped but they are not updated in p in some executions I get a faulty output. Any ideas why two more values are getting swapped.
Your code is not complete.
You seam to call your function like this:
c = [[3], [1], [4], [5], [7], [0], [2], [6]]
p = [[0, 1, 2, 4, 5, 6, 7],
[0, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 6, 7, 4],
[0, 1, 2, 3, 6, 5, 5],
[0, 1, 2, 3, 4, 6, 7],
[1, 2, 3, 4, 6, 7, 5],
[0, 1, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 7]]
for i in range(1000):
c_main, p_main = temp(c, p)
note: if fixed your code: add range and add comma for each lines in p.
But inside your temp() function, your are modifying the content of p.
So, you may not have what you expect. Because you reuse the same p at each iteration. So sometimes it becomes inconsistent.
What you want, is certainly something like that:
import random
def temp(c):
# -- raw matrix
p = [[col for col in range(8)] for row in range(len(c))]
# -- drop a number
for p_row, c_row in zip(p, c):
p_row.pop(c_row[0])
# -- shuffle
for row in p:
random.shuffle(row)
return p
You can use it like this:
cols = [[3], [1], [4], [5], [7], [0], [2], [6]]
print(temp(cols))
You get:
[[6, 1, 4, 5, 7, 2, 0],
[3, 0, 5, 4, 2, 7, 6],
[2, 7, 0, 3, 6, 5, 1],
[1, 2, 7, 4, 6, 3, 0],
[3, 1, 4, 6, 2, 0, 5],
[4, 5, 6, 3, 7, 1, 2],
[0, 6, 1, 5, 7, 3, 4],
[4, 3, 7, 0, 1, 5, 2]]
I have a lists of lists in variable lists something like this:
[7, 6, 1, 8, 3]
[1, 7, 2, 4, 2]
[5, 6, 4, 2, 3]
[0, 3, 3, 1, 6]
[3, 5, 2, 14, 3]
[3, 11, 9, 1, 1]
[1, 10, 2, 3, 1]
When I write lists[1] I get vertically:
6
7
6
3
5
11
10
but when I loop it:
for i in list:
print(i)
I get this horizontally.
7
6
1
8
3
etc...
So, how it works? How can I modify loop to go and give me all vertically?
Short answer:
for l in lists:
print l[1]
Lists of lists
list_of_lists = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]
for list in list_of_lists:
for x in list:
print x
Here is how you would print out the list of lists columns.
lists = [[7, 6, 1, 8, 3],
[1, 7, 2, 4, 2],
[5, 6, 4, 2, 3],
[0, 3, 3, 1, 6],
[3, 5, 2, 14, 3],
[3, 11, 9, 1, 1],
[1, 10, 2, 3, 1]]
for i in range(0, len(lists[1])):
for j in range(0, len(lists)):
print lists[j][i],
print "\n"