Pandas display all index labels in jupyter notebook despite repetition - python

When displaying a DataFrame in jupyter notebook. The index is displayed in a hierarchical way. So that repeated labels are not shown in the following row. E.g. a dataframe with a Multiindex with the following labels
[1, 1, 1, 1]
[1, 1, 0, 1]
will be displayed as
1 1 1 1 ...
0 1 ...
Can I change this behaviour so that all index values are shown despite repetition? Like this:
1 1 1 1 ...
1 1 0 1 ...
?
import pandas as pd
import numpy as np
import itertools
N_t = 5
N_e = 2
classes = tuple(list(itertools.product([0, 1], repeat=N_e)))
N_c = len(classes)
noise = np.random.randint(0, 10, size=(N_c, N_t))
df = pd.DataFrame(noise, index=classes)
df
0 1 2 3 4
0 0 5 9 4 1 2
1 2 2 7 9 9
1 0 1 7 3 6 9
1 4 9 8 2 9
# should be shown as
0 1 2 3 4
0 0 5 9 4 1 2
0 1 2 2 7 9 9
1 0 1 7 3 6 9
1 1 4 9 8 2 9

Use -
with pd.option_context('display.multi_sparse', False):
print (df)
Output
0 1 2 3 4
0 0 8 1 4 0 2
0 1 0 1 7 4 7
1 0 9 6 5 2 0
1 1 2 2 7 2 7
And globally:
pd.options.display.multi_sparse = False
or
thanks #Kyle -
print(df.to_string(sparsify=False))

Related

Fill gaps between 1's in Pandas dataframe column with increment values that reset when next 1 is reached

Apparently this is a more complicated problem than I thought.
All I want to do is fill the zeros with ++1 increments until the next 1
My dataset is 1m+ rows, so I'm trying to vectorize this operation if possible.
Here's a sample column:
# Define the input dataframe
df = pd.DataFrame({'col': [1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0]})
0 1
1 0
2 1
3 0
4 1
5 1
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 1
14 0
Goal Result:
0 1
1 2
2 1
3 2
4 1
5 1
6 2
7 3
8 4
9 5
10 1
11 2
12 1
13 1
14 2
I've tried a number of different methods with ffill() and cumsum(), but the issue with cumsum() tends to be that it doesn't reset the increment.
Group by cumulative sums of column col and apply cumcount:
df['col'] = df.groupby(df['col'].cumsum())['col'].cumcount() + 1
col
0 1
1 2
2 1
3 2
4 1
5 1
6 2
7 3
8 4
9 5
10 1
11 2
12 1
13 1
14 2
Replace temporary 0 by 1 then create groups for each real 1 and consecutive 0 then apply cumulative sum for the group:
df['col2'] = df['col'].replace(0, 1).groupby(df['col'].cumsum()).cumsum()
print(df)
# Output
col col2
0 1 1
1 0 2
2 1 1
3 0 2
4 1 1
5 1 1
6 0 2
7 0 3
8 0 4
9 0 5
10 1 1
11 0 2
12 1 1
13 1 1
14 0 2

Python Pandas: populate a column until a different value appears?

I am working on an algo trading project using the pandas and numpy libraries and would like to achieve the following result:
Current output:
1
0
0
2
0
2
0
0
4
0
0
0
5
desired output:
1
1
1
2
2
2
2
2
4
4
4
4
5
How do I go about this?
Replace 0 by NA then fill forward:
df['col1'] = df['col1'].replace(0, pd.NA).ffill()
print(df)
# Output
col1
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 4
9 4
10 4
11 4
12 5
You can try the method argument of pandas.DataFrame.replace
df['col'] = df['col'].replace(to_replace=0, method='ffill')
print(df)
col
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 4
9 4
10 4
11 4
12 5

how to add a DataFrame to some columns of another DataFrame

I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0

Pandas increment integer at certain index points?

Say I have a list of integers which correspond to points where I want to increase an interger value by 1.
for example Int64Index([5, 10]), not necessarily even spaced like that, and I have a dataframe like,
val new_col
0 0.729726564 1
1 0.067509062 1
2 0.943927114 1
3 0.037718436 1
4 0.512142908 1
5 0.767198655 2
6 0.202230787 2
7 0.343767479 2
8 0.540026305 2
9 0.256425022 2
10 0.403845023 3
11 0.444475008 3
12 0.464677745 3
I want to create new_col which is an int, but increases by on a the above index rows.
Edit:
import pandas as pd
import numpy as np
df = pd.DataFrame({'val': np.random.rand(14)})
df['new_col'] = 1
How to increase the value of new_col by one at each index point (5, 10)?
I see from your comment that you refer to an "arbitrary position" so you can space them as you wish with bins.
example:
bins = [-1,3,5,12,14] #space as you wish
labels = [1,2,3,4] #labels or in your case values that you want
df['new_col'] = pd.cut(list(df.index.values), bins=bins, labels=labels)
val new_col
0 0.509742 1
1 0.081701 1
2 0.990583 1
3 0.813398 1
4 0.905022 2
5 0.951973 2
6 0.702487 3
7 0.916432 3
8 0.647568 3
9 0.955188 3
10 0.875067 3
11 0.284496 3
12 0.393931 3
13 0.341115 4
Use numpy.split with enumerate:
import pandas as pd
indices = [5, 10]
df['add_col'] = pd.concat([s + n for n, s in enumerate(pd.np.split(df['new_col'], indices))])
print(df)
Output:
val new_col add_col
0 0.953431 1 1
1 0.929134 1 1
2 0.548343 1 1
3 0.080713 1 1
4 0.465212 1 1
5 0.290549 1 2
6 0.570886 1 2
7 0.232350 1 2
8 0.036968 1 2
9 0.455084 1 2
10 0.385177 1 3
11 0.811477 1 3
12 0.802502 1 3
13 0.001847 1 3

Numpy Array to Pandas Data Frame of X Y Coordinates

I have a two dimensional numpy array:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
How would I go about converting this into a pandas data frame that would have the x coordinate, y coordinate, and corresponding array value at that index into a pandas data frame like this:
x y val
0 0 1
0 1 4
0 2 7
1 0 2
1 1 5
1 2 8
...
With stack and reset index:
df = pd.DataFrame(arr).stack().rename_axis(['y', 'x']).reset_index(name='val')
df
Out:
y x val
0 0 0 1
1 0 1 2
2 0 2 3
3 1 0 4
4 1 1 5
5 1 2 6
6 2 0 7
7 2 1 8
8 2 2 9
If ordering is important:
df.sort_values(['x', 'y'])[['x', 'y', 'val']].reset_index(drop=True)
Out:
x y val
0 0 0 1
1 0 1 4
2 0 2 7
3 1 0 2
4 1 1 5
5 1 2 8
6 2 0 3
7 2 1 6
8 2 2 9
Here's a NumPy method -
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> shp = arr.shape
>>> r,c = np.indices(shp)
>>> pd.DataFrame(np.c_[r.ravel(), c.ravel(), arr.ravel('F')], \
columns=((['x','y','val'])))
x y val
0 0 0 1
1 0 1 4
2 0 2 7
3 1 0 2
4 1 1 5
5 1 2 8
6 2 0 3
7 2 1 6
8 2 2 9

Categories

Resources