Numpy Array to Pandas Data Frame of X Y Coordinates - python

I have a two dimensional numpy array:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
How would I go about converting this into a pandas data frame that would have the x coordinate, y coordinate, and corresponding array value at that index into a pandas data frame like this:
x y val
0 0 1
0 1 4
0 2 7
1 0 2
1 1 5
1 2 8
...

With stack and reset index:
df = pd.DataFrame(arr).stack().rename_axis(['y', 'x']).reset_index(name='val')
df
Out:
y x val
0 0 0 1
1 0 1 2
2 0 2 3
3 1 0 4
4 1 1 5
5 1 2 6
6 2 0 7
7 2 1 8
8 2 2 9
If ordering is important:
df.sort_values(['x', 'y'])[['x', 'y', 'val']].reset_index(drop=True)
Out:
x y val
0 0 0 1
1 0 1 4
2 0 2 7
3 1 0 2
4 1 1 5
5 1 2 8
6 2 0 3
7 2 1 6
8 2 2 9

Here's a NumPy method -
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> shp = arr.shape
>>> r,c = np.indices(shp)
>>> pd.DataFrame(np.c_[r.ravel(), c.ravel(), arr.ravel('F')], \
columns=((['x','y','val'])))
x y val
0 0 0 1
1 0 1 4
2 0 2 7
3 1 0 2
4 1 1 5
5 1 2 8
6 2 0 3
7 2 1 6
8 2 2 9

Related

Pandas display all index labels in jupyter notebook despite repetition

When displaying a DataFrame in jupyter notebook. The index is displayed in a hierarchical way. So that repeated labels are not shown in the following row. E.g. a dataframe with a Multiindex with the following labels
[1, 1, 1, 1]
[1, 1, 0, 1]
will be displayed as
1 1 1 1 ...
0 1 ...
Can I change this behaviour so that all index values are shown despite repetition? Like this:
1 1 1 1 ...
1 1 0 1 ...
?
import pandas as pd
import numpy as np
import itertools
N_t = 5
N_e = 2
classes = tuple(list(itertools.product([0, 1], repeat=N_e)))
N_c = len(classes)
noise = np.random.randint(0, 10, size=(N_c, N_t))
df = pd.DataFrame(noise, index=classes)
df
0 1 2 3 4
0 0 5 9 4 1 2
1 2 2 7 9 9
1 0 1 7 3 6 9
1 4 9 8 2 9
# should be shown as
0 1 2 3 4
0 0 5 9 4 1 2
0 1 2 2 7 9 9
1 0 1 7 3 6 9
1 1 4 9 8 2 9
Use -
with pd.option_context('display.multi_sparse', False):
print (df)
Output
0 1 2 3 4
0 0 8 1 4 0 2
0 1 0 1 7 4 7
1 0 9 6 5 2 0
1 1 2 2 7 2 7
And globally:
pd.options.display.multi_sparse = False
or
thanks #Kyle -
print(df.to_string(sparsify=False))

How do you add an array to each previous row in pandas?

If I have an array [1, 2, 3, 4, 5] and a Pandas Dataframe
df = pd.DataFrame([[1,1,1,1,1], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0]])
0 1 2 3 4
0 1 1 1 1 1
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
How do I iterate through the Pandas DataFrame adding my array to each previous row?
The expected result would be:
0 1 2 3 4
0 1 1 1 1 1
1 2 3 4 5 6
2 3 5 7 9 11
3 4 7 10 13 16
The array is added n times to the nth row, which you can create using np.arange(len(df))[:,None] * a and then add the first row:
df
# 0 1 2 3 4
#0 1 1 1 1 1
#1 0 0 0 0 0
#2 0 0 0 0 0
#3 0 0 0 0 0
a = np.array([1, 2, 3, 4, 5])
np.arange(len(df))[:,None] * a
#array([[ 0, 0, 0, 0, 0],
# [ 1, 2, 3, 4, 5],
# [ 2, 4, 6, 8, 10],
# [ 3, 6, 9, 12, 15]])
df[:] = df.iloc[0].values + np.arange(len(df))[:,None] * a
df
# 0 1 2 3 4
#0 1 1 1 1 1
#1 2 3 4 5 6
#2 3 5 7 9 11
#3 4 7 10 13 16
df = pd.DataFrame([
[1,1,1],
[0,0,0],
[0,0,0],
])
s = pd.Series([1,2,3])
# add to every row except first, then cumulative sum
result = df.add(s, axis=1)
result.iloc[0] = df.iloc[0]
result.cumsum()
Or if you want a one-liner:
pd.concat([df[:1], df[1:].add(s, axis=1)]).cumsum()
Either way, result:
0 1 2
0 1 1 1
1 2 3 4
2 3 5 7
Using cumsum and assignment:
df[1:] = (df+l).cumsum()[:-1].values
0 1 2 3 4
0 1 1 1 1 1
1 2 3 4 5 6
2 3 5 7 9 11
3 4 7 10 13 16
Or using concat:
pd.concat((df[:1], (df+l).cumsum()[:-1]))
0 1 2 3 4
0 1 1 1 1 1
0 2 3 4 5 6
1 3 5 7 9 11
2 4 7 10 13 16
After cumsum, you can shift and add back to the original df:
a = [1,2,3,4,5]
updated = df.add(pd.Series(a), axis=1).cumsum().shift().fillna(0)
df.add(updated)

How to reshape multiple time-series signals for use with sns.tsplot?

I'm trying to reshape data that looks like this:
t y0 y1 y2
0 0 -1 0 1
1 1 0 1 2
2 2 1 2 3
3 3 2 3 4
4 4 3 4 5
into something like this:
t trial signal value
0 0 0 y -1
1 0 1 y 0
2 0 2 y 1
3 1 0 y 0
4 1 1 y 1
5 1 2 y 2
6 2 0 y 1
7 2 1 y 2
8 2 2 y 3
9 3 0 y 2
10 3 1 y 3
11 3 2 y 4
12 4 0 y 3
13 4 1 y 4
14 4 2 y 5
so that I can feed it into sns.tsplot.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
fig = plt.figure()
num_points = 5
# Create some dummy line signals and assemble a data frame
t = np.arange(num_points)
y0 = t - 1
y1 = t
y2 = t + 1
df = pd.DataFrame(np.vstack((t, y0, y1, y2)).transpose(), columns=['t', 'y0', 'y1', 'y2'])
print(df)
# Do some magic transformations
df = pd.melt(df, id_vars=['t'])
print(df)
# Plot the time-series data
sns.tsplot(time="t", value="value", unit="trial", condition="signal", data=df, ci=[68, 95])
plt.savefig("dummy.png")
plt.close()
I'm hoping to achieve this for lines:
https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.tsplot.html
http://pandas.pydata.org/pandas-docs/stable/reshaping.html
I think you can use melt for reshaping, get first and second char by indexing with str and last sort_values with reordering columns:
df1 = pd.melt(df, id_vars=['t'])
#create helper Series
variable = df1['variable']
#extract second char, convert to int
df1['trial'] = variable.str[1].astype(int)
#extract first char
df1['signal'] = variable.str[0]
#sort values by column t, reset index
df1 = df1.sort_values('t').reset_index(drop=True)
#reorder columns
df1 = df1[['t','trial','signal','value']]
print df1
t trial signal value
0 0 0 y -1
1 0 1 y 0
2 0 2 y 1
3 1 0 y 0
4 1 1 y 1
5 1 2 y 2
6 2 0 y 1
7 2 1 y 2
8 2 2 y 3
9 3 0 y 2
10 3 1 y 3
11 3 2 y 4
12 4 0 y 3
13 4 1 y 4
14 4 2 y 5
Another solution, if all values in column signal are only y:
#remove y from column name, first value of column names is same
df.columns = df.columns[:1].tolist() + [int(col[1]) for col in df.columns[1:]]
print df
t 0 1 2
0 0 -1 0 1
1 1 0 1 2
2 2 1 2 3
3 3 2 3 4
4 4 3 4 5
df1 = pd.melt(df, id_vars=['t'], var_name=['trial'])
#all values in column signal are y
df1['signal'] = 't'
#sort values by column t, reset index
df1 = df1.sort_values('t').reset_index(drop=True)
#reorder columns
df1 = df1[['t','trial','signal','value']]
print df1
t trial signal value
0 0 0 t -1
1 0 1 t 0
2 0 2 t 1
3 1 0 t 0
4 1 1 t 1
5 1 2 t 2
6 2 0 t 1
7 2 1 t 2
8 2 2 t 3
9 3 0 t 2
10 3 1 t 3
11 3 2 t 4
12 4 0 t 3
13 4 1 t 4
14 4 2 t 5

Pandas dataframe: how to group by values in a column and create new columns out of grouped values

I have a dataframe with two columns:
x y
0 1
1 1
2 2
0 5
1 6
2 8
0 1
1 8
2 4
0 1
1 7
2 3
What I want is:
x val1 val2 val3 val4
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
I know that the values in column x are repeated all N times.
You could use groupby/cumcount to assign column numbers and then call pivot:
import pandas as pd
df = pd.DataFrame({'x': [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
'y': [1, 1, 2, 5, 6, 8, 1, 8, 4, 1, 7, 3]})
df['columns'] = df.groupby('x')['y'].cumcount()
# x y columns
# 0 0 1 0
# 1 1 1 0
# 2 2 2 0
# 3 0 5 1
# 4 1 6 1
# 5 2 8 1
# 6 0 1 2
# 7 1 8 2
# 8 2 4 2
# 9 0 1 3
# 10 1 7 3
# 11 2 3 3
result = df.pivot(index='x', columns='columns')
print(result)
yields
y
columns 0 1 2 3
x
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
Or, if you can really rely on the values in x being repeated in order N times,
N = 3
result = pd.DataFrame(df['y'].values.reshape(-1, N).T)
yields
0 1 2 3
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
Using reshape is quicker than calling groupby/cumcount and pivot, but it
is less robust since it relies on the values in y appearing in the right order.

Merge dataframes on index

So I have two dataframes, in pandas:
x = pd.DataFrame([[1,2],[3,4]])
>>> x
0 1
0 1 2
1 3 4
y = pd.DataFrame([[7,8],[5,6]])
>>> y
0 1
0 7 8
1 5 6
Clearly they are the same size. Now it seems you can do merges and joins on a selected column, but I can't seem to do it on an index. I want the outcome to be as:
0 1 2 3
0 7 8 1 2
1 5 6 3 4
How about:
>>> x = pd.DataFrame([[1,2],[3,4]])
>>> y = pd.DataFrame([[7,8],[5,6]])
>>> df = pd.concat([y,x],axis=1,ignore_index=True)
>>> df
0 1 2 3
0 7 8 1 2
1 5 6 3 4
[2 rows x 4 columns]

Categories

Resources