Getting a list of arrays into a Pandas dataframe

Getting a list of arrays into a Pandas dataframe - python

So I have a list of arrays in Python: [[0, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]]. I would like to make this list of arrays into a Pandas dataframe, with each array being a row. Is there a way to do this quickly and easily in Python? I tried values = np.split(values, len(values)) to split the list of arrays into multiple arrays (well, I tried). And then tried to create a dataframe with df = pd.DataFrame(values). But this is where my error came from. I got a "must pass 2-d input" error message. Any idea what I'm doing wrong and how to fix it? Or an easier way to go about this? Thanks!

No need to do all that splitting, etc. If you have it as a list of lists that is two dimensional (meaning all rows have the same number of elements), you can simply pass it to the DataFrame constructor:
data = [[0, 1, 0, 1], [1, 0, 1, 1], [0, 1, 1, 1]]
pd.DataFrame(data)
generating the expected:
>>> pd.DataFrame(data)
0 1 2 3
0 0 1 0 1
1 1 0 1 1
2 0 1 1 1

Related

Python How to replace values in specific columns (defined by an array) with zero

I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
Given the following 2 numpy arrays
a = np.array([[ 1, 2, 3, 4],
[ 1, 2, 1, 2],
[ 0, 3, 2, 2]])
and
b = np.array([1,3])
b indicates column numbers in array "a" where values need to be replaced with zero.
So the expected output is
([[ 1, 0, 3, 0],
[ 1, 0, 1, 0],
[ 0, 0, 2, 0]])
Any ideas on how I can accomplish this? Thanks.

Your question is:
I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
This can be done like this:
a[:,b] = 0
Output:
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]
The Integer array indexing section of Indexing on ndarrays in the numpy docs has some similar examples.

A simple for loop will accomplish this.
for column in b:
for row in range(len(a)):
a[row][column] = 0
print(a)
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]

Mapping a list of elements to a range of an element from another list to create unique matrices

I am trying to "map a list of elements to a range of an element from another list to create unique matrices." Let me explain with a drawing.
Kickstart-inspired question
I hope that it makes sense.
This is inspired by Google Kickstart competition, which means that it is not a question exactly required by the contest.
But I thought of this question and I think that it is worth exploring.
But I am stuck with myself and not being able to move on much.
Here is the code I have, which obviously is not a correct solution.
values = input("please enter your input: ")
values = values.split()
values = [int(i) for i in values]
>>> please enter your input: 2 4 3 1 0 0 1 0 1 1 0 0 1 1 0 6 4 1 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0
rows_columns = []
matrix = []
for i in values:
if i > 1:
rows_columns[:1].append(i) # The "2" at the very beginning indicates how many matrices should be formed
elif i <= 1:
matrix.append(i)
rows_columns[:1]
>>> [4, 3, 6, 4]
matrix_all = []
for i in range(1, len(rows_columns)):
matrix_sub = []
for j in range(rows_columns[i]):
matrix_sub.append(matrix[j])
if matrix_sub not in matrix_all:
matrix_all.append(matrix_sub)
>>> [[1, 0, 0, 1], [1, 0, 0], [1, 0, 0, 1, 0, 1], [1, 0, 0, 1]]
I really wonder if the nested loop is a good idea to solve this question. This is the best way I could think of for the last couple of hours. What I want to get as a final result looks like below.
Final expected output
Given that there is information about how many rows and columns there should be on a matrix on one list and just enough numbers of elements to form the matrix on the other, what would be the solution to map(or create) the two matrices out of the other list, based on the dimensionality information on a list?
I hope that it is clear, let me know when it is not.
Thanks!

Without using numpy, here is one working solution, based on the input found in your code snippet, and the expected result listed in your final expected result link:
values = [2, 4, 3, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 6, 4, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,
0, 1, 0, 1, 0, 1, 1, 1, 0]
v_idx = 1
"""
As per example, the number of matrices desired is found in the first input list element.
In the above values list, we want 2 matrices. The for loop below therefore executes exactly 2 times
"""
for matrix_nr in range(values[0]):
# The nr of rows and nr of columns are the next two elements in the values list
nr_rows = values[v_idx]
nr_cols = values[v_idx + 1]
# Calculate the start index for the next matrix specifications
new_idx = v_idx+2+(nr_rows*nr_cols)
# Slice the values list to extract the values for the current matrix to format
sub_elements = values[v_idx+2: new_idx]
matrix = []
# Append elements to the matrix by slicing values according to nr_rows and nr_cols
for r in range(nr_rows):
start_idx = r*nr_cols
end_idx = (r+1)*nr_cols
matrix.append(sub_elements[start_idx:end_idx])
print(matrix)
v_idx = new_idx
This gives the expected result:
[[1, 0, 0], [1, 0, 1], [1, 0, 0], [1, 1, 0]]
[[1, 0, 0, 0], [1, 0, 0, 1], [1, 1, 1, 1], [1, 0, 1, 0], [1, 0, 1, 0], [1, 1, 1, 0]]
As said, numpy could very likely be used to be a lot more efficient.

Novice Python question: How to create crosstabs across multiple predictor variables and outcome variable

Using the following test data frame containing binary 0/1 variables:
test_df = pd.DataFrame([
[0, 0, 0, 1],
[1, 0, 1, 1],
[0, 0, 0, 1],
[1, 0, 1, 0],
[0, 0, 0, 0],
[1, 0, 1, 0]], columns=["y", "age_catg", "race_catg", "sex_catg"])
I'd like to use the pd.crosstab() function to create two-way tables of y vs. age_catg, race_catg, sex_catg in order to check for complete separation of y values among the predictor categories.
My actual data frame contains several thousand predictors, so rather than explicitly naming the age, race, and sex predictors I'd prefer to use columns #'s. However, I'm still confused with row & column references in Python - for example the following code doesn't work:
desc_tab = pd.crosstab(test_df[:,1], test_df[:,2:4])
desc_tab

To use integer indexes you need the iloc method:
pd.crosstab(test_df.iloc[:, 1], test_df.iloc[:, 2])
Output:
race_catg 0 1
age_catg
0 3 3
You can pass several arrays/series to either columns or rows if you put them in a list:
pd.crosstab(test_df.iloc[:, 1], [test_df.iloc[:, 2], test_df.iloc[:, 3]])
race_catg 0 1
sex_catg 0 1 0 1
age_catg
0 1 2 2 1
EDIT
If you want to batch define the columns by their indices (list is a reserved word in python, please don't use it):
cols = [test_df.iloc[:, i] for i in [2, 3]]
pd.crosstab(test_df.iloc[:, 1], cols)
Output:
race_catg 0 1
sex_catg 0 1 0 1
age_catg
0 1 2 2 1

find nonzero indices as an array

I know numpy.where gives a tuple of the array coordinates where the condition applies. But what if I want an array?
assume the following 2d array:
a=np.array([[1 1 1 1 0],
[1 1 1 0 0],
[1 0 0 0 0],
[1 0 1 1 1],
[1 0 0 1 0]])
Now what I want is only the first occurrence of zeros, but for every row, even if it doesn't exist. Something like indexOf() in Java. So the output look like:
array([-1,2,2,1,0])
I need to cut pieces of an ndarray and it would be much easier to reduce a dimension rather than having a tuple and try to regenerate the missing rows.

Is this what you are looking for?
import numpy as np
a=np.array([[1, 1, 1, 1, 0],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 1, 1, 1],
[1, 0, 0, 1, 0]])
np.argmax(a==0, axis=0) - ~np.any(a==0, axis=0)
Output:
array([-1, 2, 2, 1, 0], dtype=int64)
The idea here is that np.argmax finds the index of the first matching element in each column (axis=0 for columns, which appears to be what you want in the output, but if you actually want rows, use axis=1). Because np.argmax returns 0 for columns that do not match at all, I subtract 1 from the result for each column that doesn't contain any 0.

Here is a less crafty solution but arguably easier to undestand.
First finds all matches and then creates an array with the first element of the matches and -1 if len == 0.
a=np.array([[1,1,1,1,0],
[1,1,1,0,0],
[1,0,0,0,0],
[1,0,1,1,1],
[1,0,0,1,0]])
matches = [np.where(np.array(i)==0)[0] for i in a.T]
np.array([i[0] if len(i) else -1 for i in matches]) # first occurence, else -1
array([-1, 2, 2, 1, 0])

creating a sparse matrix from dictionary with tuple keys

I have the following problem and I was hoping that someone could help me out here:
I have a dictionary with tuples as keys d={(1,2):1, (2,1):1, (1,3):1, (3,1):1, (2,3):1, (3,2):1, (1,4):1, (4,1):1, (2,4):1, (4,2):1, (3,4):1, (4,3):1}
Now, I would like to create a matrix that is appropriate to the tuples, with four rows and four columns. In my head, I imagine it likes this (sorry if that seems a little messy):
1 2 3 4
1: 0 1 1 1
2: 1 0 1 1
3: 1 1 0 1
4: 1 1 1 0
Where the four numbers a the top (1 2 3 4) represent the columns corresponding to the numbers in the tuples and the same goes for the numbers on the left (from top to bottom 1 2 3 4) for the rows.
The output should look like this:
array([[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]])
Unfortunately, I have absolutely no idea how to get the sparse matrix from my head into a proper code (python 3), and it seems like I reached the end of my wisdom, although I am sure that there must be a simple answer to this.
If anybody could help me out here, I would really appreciate it.
Thanks in advance

A one-line solution, assuming you want to fill your array with 0's in the places not specified in the dictionary:
np.asarray([[(d[(x,y)] if (x,y) in d else 0) for y in range(1,5)] for x in range(1,5)])
Output:
array([[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]])
A more laborious but maybe easier-to-understand approach would be to initialize an empty array of zeros, and then go through it and replace 0 with 1 at each position that appears in the dictionary.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting a list of arrays into a Pandas dataframe - python

Related

Python How to replace values in specific columns (defined by an array) with zero

Mapping a list of elements to a range of an element from another list to create unique matrices

Novice Python question: How to create crosstabs across multiple predictor variables and outcome variable

find nonzero indices as an array

creating a sparse matrix from dictionary with tuple keys

Categories

Resources