I am trying to create this data structure in Python:
2-d array structure
There have to be column keys and row keys that I will be using later.
Column keys and row keys are random numbers.
For now I have this code:
import random
cols, rows = 5, 5
Matrix = [[0 for x in range(cols)] for y in range(rows)]
set_col = 0
for row in Matrix:
row[set_col] = random.randint(1,2)
columnKeys = random.sample(range(1,5), 4)
Matrix[0] = columnKeys
for row in Matrix:
print(row)
Output:
[3, 1, 2, 4]
[2, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
[2, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
This is not quite what I want. For now each cell value have zero. But later it will have some relevant data and I will be using this data along with corresponding row and column keys. I don't know how to correctly organize this data structure so I can use cell values with corresponding row/column keys.
How to do it without Pandas and Numpy so I can use column and row keys?
It depends on what you want.
The best way is probably not to use nested lists, but instead to use dictionaries. Since you mentioned pandas, the pandas DataFrame objects have a to_dict function that will convert a DataFrame into a dictionary, and there are several options depending on what you prefer.
I see from your example that you are trying to create your data structure with duplicate indices. The best option here is likely to use the structure created by running df.to_dict("split").
Say your DataFrame (df) looks like this:
3 1 2 4
2 0 0 0 0
1 0 0 0 0
2 0 0 0 0
1 0 0 0 0
Running `df.to_dict("split") will then do this:
d = df.to_dict("split")
{
'columns': [3, 1, 2, 4],
'data': [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]],
'index': [2, 1, 2, 1]
}
Accessing data in this scenario, and in the one shown by #Makiflow is tricky. Even within Pandas, having duplicate indices or columns on your Dataframe makes operations more interesting. In this case, selecting df['data'][3][1] picks the second element in the third list contained by the data key. That is actually selecting the 4th row and the 2nd column of your matrix. If you want to be able to reference items by the column name, you have to do a little more leg work.
You can run col_num = d['columns'].index(3) which will give you the index value of the element 3, but doing d['index'].index(2) will always give you 0, even if wanted to select 2 at index 3. That's because index() returns the index of the first value that matches the condition. You can of course simply select by the (col,row) index tuples, but that defeats the purpose of having column names and index values in the first place.
If you want to generate this structure without pandas, you can run:
COLS, ROWS = 5, 5
columns = [random.randint(0,COLS) for _ in range(COLS)]
rows = [random.randint(1,2) for _ in range(ROWS)]
d = {"columns": columns,
"index": rows,
"data": [[0 for _ in range(COLS)] for _ in range(ROWS)]
}
IMHO - a better solution would actually be to force your data structure to have unique index and columns values. The default output of to_dict() will output a very simply dictionary:
d = df.to_dict() # also the same as df.to_dict("dict")
{
1: {1: 0, 2: 0},
2: {1: 0, 2: 0},
3: {1: 0, 2: 0},
4: {1: 0, 2: 0}
}
In this configuration, each key to the dictionary is the name of a column. Each of those keys points to another dicitonary that represents the information in that column - each key is an index value, followed by the value.
This likely makes the most intuitive sense because if you wanted to get the value at the column named 3 at the index named 1, you would do:
d = df.to_dict()
d[3][1]
# 0
You can create this data structure without using Pandas quite simply:
COLS, ROWS = 5,5
rows = [i for i in range(ROWS)]
columns = [i for in range(COLS)]
{c : {i:0 for i in rows} for c in columns}
# {
# 0: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
# 1: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
# 2: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
# 3: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
# 4: {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}
# }
It's really dependent on the constraints/requirements that you have.
import random
COLS, ROWS = 5, 5
Matrix = [[0 for x in range(COLS)] for y in range(ROWS)]
set_col = 0
for row in Matrix:
row[set_col] = random.randint(1,2)
columnKeys = random.sample(range(1,5), 4)
Matrix[0] = [0] + columnKeys
for row in Matrix:
print(row)
Output
[0, 3, 1, 2, 4]
[2, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
[2, 0, 0, 0, 0]
[1, 0, 0, 0, 0]
Related
I am trying to "map a list of elements to a range of an element from another list to create unique matrices." Let me explain with a drawing.
Kickstart-inspired question
I hope that it makes sense.
This is inspired by Google Kickstart competition, which means that it is not a question exactly required by the contest.
But I thought of this question and I think that it is worth exploring.
But I am stuck with myself and not being able to move on much.
Here is the code I have, which obviously is not a correct solution.
values = input("please enter your input: ")
values = values.split()
values = [int(i) for i in values]
>>> please enter your input: 2 4 3 1 0 0 1 0 1 1 0 0 1 1 0 6 4 1 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0
rows_columns = []
matrix = []
for i in values:
if i > 1:
rows_columns[:1].append(i) # The "2" at the very beginning indicates how many matrices should be formed
elif i <= 1:
matrix.append(i)
rows_columns[:1]
>>> [4, 3, 6, 4]
matrix_all = []
for i in range(1, len(rows_columns)):
matrix_sub = []
for j in range(rows_columns[i]):
matrix_sub.append(matrix[j])
if matrix_sub not in matrix_all:
matrix_all.append(matrix_sub)
>>> [[1, 0, 0, 1], [1, 0, 0], [1, 0, 0, 1, 0, 1], [1, 0, 0, 1]]
I really wonder if the nested loop is a good idea to solve this question. This is the best way I could think of for the last couple of hours. What I want to get as a final result looks like below.
Final expected output
Given that there is information about how many rows and columns there should be on a matrix on one list and just enough numbers of elements to form the matrix on the other, what would be the solution to map(or create) the two matrices out of the other list, based on the dimensionality information on a list?
I hope that it is clear, let me know when it is not.
Thanks!
Without using numpy, here is one working solution, based on the input found in your code snippet, and the expected result listed in your final expected result link:
values = [2, 4, 3, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 6, 4, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,
0, 1, 0, 1, 0, 1, 1, 1, 0]
v_idx = 1
"""
As per example, the number of matrices desired is found in the first input list element.
In the above values list, we want 2 matrices. The for loop below therefore executes exactly 2 times
"""
for matrix_nr in range(values[0]):
# The nr of rows and nr of columns are the next two elements in the values list
nr_rows = values[v_idx]
nr_cols = values[v_idx + 1]
# Calculate the start index for the next matrix specifications
new_idx = v_idx+2+(nr_rows*nr_cols)
# Slice the values list to extract the values for the current matrix to format
sub_elements = values[v_idx+2: new_idx]
matrix = []
# Append elements to the matrix by slicing values according to nr_rows and nr_cols
for r in range(nr_rows):
start_idx = r*nr_cols
end_idx = (r+1)*nr_cols
matrix.append(sub_elements[start_idx:end_idx])
print(matrix)
v_idx = new_idx
This gives the expected result:
[[1, 0, 0], [1, 0, 1], [1, 0, 0], [1, 1, 0]]
[[1, 0, 0, 0], [1, 0, 0, 1], [1, 1, 1, 1], [1, 0, 1, 0], [1, 0, 1, 0], [1, 1, 1, 0]]
As said, numpy could very likely be used to be a lot more efficient.
Let's say I have an array like this
grid:
[[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]]
I want to isolate the group of "items" in this case 1's which are three groups the rule being the 0's are used to separate them like intersections. So this example has 3 groups of 1.
If you know how to do this with python, the first question I'd be asked is what I've tried as proof of not handing my homework to the community, the idea I had was to iterate down and left but that would have a high likelihood of missing some numbers since if you think about it, it would form a cross eminating from the top left and well this group is here to learn.
So for me and others who have an interest in this data science like problem be considerate.
If you do not need to know which sets are duplicates, you can use python's set built-in to determine unique items in a list. This can be a bit tricky since set doesn't work on a list of lists. However, you can convert this to a list of tuples, put those back in a list, and then get the len of that list to find out how many unique value sets there are.
grid = [[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1]]
unique = [list(x) for x in set(tuple(x) for x in grid)]
unique_count = len(unique) # this will return 3
Relatively straightforward depth first search based implementation of connected component labeling.
def get_components(grid, indicator=1):
def label(g, row, col, group):
if row >= 0 and col >= 0 and row < len(g) and col < len(g[row]) and g[row][col] == -1:
# only label if currently unlabeled
g[row][col] = group
# attempt to label neighbors with same label
label(g, row + 1, col, group)
label(g, row, col + 1, group)
label(g, row - 1, col, group)
label(g, row, col - 1, group)
return True
else:
return False
# initialize label grid as -1 for entries that need labeled
label_grid = [[-1 if gc == indicator else 0 for gc in gr] for gr in grid]
group_count = 0
for row, grid_row in enumerate(grid):
for col in range(len(grid_row)):
if label(label_grid, row, col, group_count + 1):
group_count += 1
return label_grid, group_count
The results of label_grid, group_count = get_components(grid) for your example inputs are
label_grid = [[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 0, 3, 3]]
group_count = 3
And for cases like the following
grid = [[1 0 1],
[1 1 1]]
we get group_count = 1.
I have created an empty array which I want to fill.
The array is 10 by 10. I want the first row and column to display text names, which I have in a list of 9. I want the inner 9 by 9 cells to contain another matrix, which I already have filled in with the values.
Here is how I made the matrix and tried to fill in the names so far:
rows, cols = (10, 10)
array = [[0 for i in range (cols)] for j in range (rows)]
array [0][1:9] = photographs
array [1:9][0] = photographs
where photographs is my list of 9 words.
This gives me an array where the first row is as desired, but the first column is still all displaying 0.
This is what my array looks like:
[[0, 'DSC001 \n', 'DSC4587 \n', 'DSC3948 \n', 'DSC98798 \n', 'DSC44 \n', 'DSC098098d \n', 'DSC098734a-796876 \n', 'DSC8976 \n', 'DSC098707-a-b \n', 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
I tried to make the cell in the first row and first column display - or just a space, but got this error back:
array [0][0] = -
^
SyntaxError: invalid syntax
I have also tried to fill in my array with the values from my 9 by 9 matrix like this:
array [1:9][1:9] = matrix
But this did not work at all.
Filling in the first column should be
array[0][1:10] = photographs
In python, list slices go from the starting number to one less than the ending number, just like range
You can't use array[1:9][0] to refer to the first column.
array[1:9] is a list containing rows indexed 1 to 8 (so 2nd row to 9th row) so array[1:9][0] is just the second row. You could use a for loop to insert the column names instead like:
for row in array[1:10]:
row[0] = photographs[i]
Also, to insert a value into the first cell you want:
array[0][0] = '-'
just like how you would assign a variable.
nrows = 4
ncols = 4
# Initialize an empty list of lists.
# NB this is a list of lists, not an array. Think of the outer list as a list of rows. Each row is an inner list of 1 element per column.
array = [[0] * ncols for _ in range(ncols)]
# Note that array[n] gets the nth row. array[n][m] gets the element at (n, m).
# But to get the mth column, you need to do [array[row][m] for row in range(nrows)].
# This is reason enough to start thinking about numpy or pandas for an application list this.
headers = ["A", "B", "C"]
# Add the row headers to your 'array'
array[0][1:] = headers
# remember that array[0] gets the first row. It is a list. You can get all the elements except the first by slicing it with [1:]
# Add the column headers to your 'array'
for row_number, row in enumerate(array[1:]):
row[0] = headers[row_number]
# in this case we need a loop as we want to change the first element of each of the inner lists. A loop over array gives us a row at each iteration. row[0] is then the first column of that row.
# put - in the corner
array[0][0] = "-"
# fill the array with another list
data = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
# because both data and array are lists of rows, we do this row by row, skipping the first row
for data_row_number, array_row in enumerate(array[1:]):
array_row[1:] = data[data_row_number]
gives the output for array of
[['-', 'A', 'B', 'C'], ['A', 1, 2, 3], ['B', 4, 5, 6], ['C', 7, 8, 9]]
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have a large number of arrays of zeros and ones, like so -
Array length for each array is 812.
a = [1, 0, 0, 1, 0,....]
b = [0, 1, 0, 0, 1,....]
.
.
.
x = [0, 1, 0,..........]
What I would like to do is count the number of times 1 and 0 appear at the first, second,...812th position. Any thoughts or ideas are appreciated.
What I would like is an array like:
array = [(32,56), (78,89)....] where the tuple's first element gives number of 1s at the first position(index) and second element, the number of 0s. The arrays are used to store 812 features for Naive Bayes classifier implementation.
The sum idea would work too but I had the idea of transposing the list then running collections.Counter for each row
import numpy as np
import collections
nums = [
[1, 0, 0, 1, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 1, 1]
]
np_nums = np.array(nums)
transposed = np.transpose(np_nums)
for i, row in enumerate(transposed):
print("Index {} {}".format(i, dict(collections.Counter(row))))
This outputs:
Index 0 {1: 1, 0: 2}
Index 1 {0: 1, 1: 2}
Index 2 {0: 3}
Index 3 {1: 2, 0: 1}
Index 4 {0: 1, 1: 2}
Which means that at index 0 there is one one and two zeroes, at index 1 there is one zero and two ones
This is what I understood based on your question
def arrayCount(arrays, index):
data = {}
for x in arrays:
if data.get(x[index]) is None:
data[x[index]] = 1
else:
data[x[index]] += 1
return data
a = [1, 0, 0, 1, 0]
b = [0, 1, 0, 0, 1]
x = [0, 1, 0, 1, 1]
y = [0, 1, 0, 1, 1]
z = [0, 2, 0, 1, 1]
arrays = [a, b, x, y, z]
print arrayCount(arrays, 1)
'''OUTPUT'''
# {0: 1, 1: 3, 2: 1}
Here I am providing a general solution(u can use it for any value in an array including 0 and 1). combine all your list to numpy nd-array, as all list have same length
import numpy as np
concat_array = np.concatenate((a,b,c,...x), axis=0)
Find number of occurrence of a value along axis=0(column wise) and combine to form a tuple
# use loop if have more unique values
n_ones = (concat_array = 1).sum(axis=0)
n_zeros = (concat_array = 0).sum(axis=0)
#zip it to form a tuple
result = list(zip(n_ones, n_zeros))
print(result)
[(1, 2), (2, 1), (1, 2), (0, 3)] #a dummy result
I am using Python to write a program that looks through lists of lists and changes values.
In my list of lists I have 3's and I want to find their index. Right now I can only get it to work on the first row. I want it to find 3's on any of the lists in "numbers."
Here is some sample code to wash away the mud:
numbers = [
[3, 3, 3, 5, 3, 3, 3, 3, 6],
[8, 0, 0, 0, 4, 7, 5, 0, 3],
[0, 5, 0, 0, 0, 3, 0, 0, 0],
[0, 7, 0, 8, 0, 0, 0, 0, 9],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[9, 0, 0, 0, 0, 4, 0, 2, 0],
[0, 0, 0, 9, 0, 0, 0, 1, 0],
[7, 0, 8, 3, 2, 0, 0, 0, 5],
[3, 0, 0, 0, 0, 8, 0, 0, 0],
]
a = -1
while a:
try:
for row in numbers:
a = row[a+1:].index(3) + a + 1
print("Found 3 at index", a)
except ValueError:
break
When I run it I get:
Found 3 at index 0
Found 3 at index 1
Found 3 at index 2
Found 3 at index 4
Found 3 at index 5
Found 3 at index 6
Found 3 at index 8
Which shows that it is working but only on the first row.
Thanks!
Here's a little code snippet to get you started:
>>> for i, row in enumerate(numbers):
if 3 in row:
print i, row.index(3)
1 8
2 5
7 3
8 0
>>> numbers[1][8]
3
>>> numbers[2][5]
3
>>> numbers[7][3]
3
>>> numbers[8][0]
3
If you only want to get the row index, iterate over numbers using enumerate [docs] and test whether 3 is in the list using in [docs]:
for index, row in enumerate(numbers):
if 3 in row:
print "3 found in row %i" % index
For row and column index, iterate over both lists:
for index, row in enumerate(numbers):
for col, value in enumerate(row):
if value == 3:
print "3 found in row %i at position %i" % (index, col)
If you just want to get the indexes in a new list, you can use list comprehension [docs]:
indexes = [(row, col) for row, r in enumerate(numbers) for col, val in enumerate(r) if val == 3]
Try the following:
[[i for i, v in enumerate(row) if v == 3] for row in numbers]
This will result in a list of lists where each entry in the inner lists is an index of a 3 in the corresponding row from the original list:
[[], [8], [5], [], [], [], [], [3], [0]]
You said you were looking for 3 but your code appears to be looking for 0, which do you want?
You could use it like this:
threes = [[i for i, v in enumerate(row) if v == 3] for row in numbers]
for row, indices in enumerate(threes):
for col in indices:
print "3 found in row %d, column %d" % (row, col)
Instead of displaying the information, let's actually build up a data structure that gives us the "coordinates" of every 3:
x = [
(r, c)
for r, row in enumerate(numbers)
for c, cell in enumerate(row)
if cell == 3
]
And we can easily verify it:
assert(all(numbers[r][c] == 3 for r, c in x))
But if you want to replace values, it's silly to try to build up this list and then use it to manually go back in and replace stuff. Much cleaner to just produce the desired output directly. That is, "a list of lists such that the value is None (let's say, for the sake of argument) if the corresponding original value is 3, and otherwise the original value)".
That's spelled like
[[None if cell == 3 else cell for cell in row] for row in numbers]
Try using scipy/numpy to do this
import scipy as sp
matrix = your_matrix #write your matrix here. I left it.
x,y = sp.where(matrix == 3)
for i in range(len(x)):
print x[i],y[i]