Can I use pivotal table to create a heatmap table in pandas - python

I have this data frame and the result dataframe:
df= pd.DataFrame(
{
"I": ["I1", "I2", "I3", "I4", "I5", "I6", "I7"],
"A": [1, 1, 0, 0, 0, 0, 0],
"B": [0, 1, 1, 0, 0, 1, 1],
"C": [0, 0, 0, 0, 0, 1, 1],
"D": [1, 1, 1, 1, 1, 0, 1],
"E": [1, 0, 0, 1, 1, 0, 1],
"F": [0, 0, 0, 1, 1, 0, 0],
"G": [0, 0, 0, 0, 1, 0, 0],
"H": [1, 1, 0, 0, 0, 1, 1],
})
result=pd.DataFrame(
{
"I": ["A", "B", "C", "D", "E", "F", "G", "H"],
"A": [2, 1, 0, 2, 1, 0, 0, 2],
"B": [1, 4, 2, 3, 1, 0, 0, 3],
"C": [0, 2, 2, 1, 1, 0, 0, 2],
"D": [2, 3, 1, 6, 4, 2, 1, 3],
"E": [1, 1, 1, 4, 4, 2, 1, 2],
"F": [0, 0, 0, 2, 2, 2, 1, 0],
"G": [0, 0, 0, 1, 1, 1, 1, 0],
"H": [2, 3, 2, 3, 2, 0, 0, 4],
})
print('input dataframe')
print(df)
print('result dataframe')
print(result)
The result data frame is a square data frame (the number of rows and columns are the same), and the value in each cell is the number of rows with 1 on both columns.
for example the cell at A:B is the number of columns with 1 in Column A and 1 in column B. In this case, the result is 1 since only on row I2 the values for both columns are one.
I can write nested for loop to calculate these values, but I am looking for a better way to do so.
Can I use a pivotal table for this?
My implementation which doesn't use a pivot table is as follows:
df=df.astype(bool)
r=pd.DataFrame(index=df.columns[1:], columns=df.columns[1:])
for c1 in df.columns[1:]:
for c2 in df.columns[1:]:
tmp=df[c1] & df[c2]
r.loc[c1][c2]=tmp.sum()
print(r)
running this code generates:
A B C D E F G H
A 2 1 0 2 1 0 0 2
B 1 4 2 3 1 0 0 3
C 0 2 2 1 1 0 0 2
D 2 3 1 6 4 2 1 3
E 1 1 1 4 4 2 1 2
F 0 0 0 2 2 2 1 0
G 0 0 0 1 1 1 1 0
H 2 3 2 3 2 0 0 4

Yes, but you'd be better off with matrix multiplication:
df.iloc[:,1:].T # df.iloc[:,1:]
Output:
A B C D E F G H
A 2 1 0 2 1 0 0 2
B 1 4 2 3 1 0 0 3
C 0 2 2 1 1 0 0 2
D 2 3 1 6 4 2 1 3
E 1 1 1 4 4 2 1 2
F 0 0 0 2 2 2 1 0
G 0 0 0 1 1 1 1 0
H 2 3 2 3 2 0 0 4

Related

Replace value after continuous same value in column

I have a data frame like this:
df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1]})
If I have n continuous ones (in this case n = 8), the gap between next continuous n ones is 4 zeros(I would like set up a rule eg: the gap between continuous number is m <=4 ), how can I replace those 4 zeros with 1?
My ideal out put would be like this:
df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], 'Fill_Gap': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,0, 0, 0, 0, 0, 0, 1, 1]})
Only four zeros (at index 13-16) replaced by 1 cause they have 8 continuous 1 before and after.
Any advice would be much appreciated!
This will work for series of any length:
df = pd.DataFrame({'A': [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1]})
#Check for runs of 8 (1's)
lst1=(df.shift(periods=0).A==1)
for x in range(1,8):
lst1=lst1&(df.shift(periods=x).A==1)
#Check for runs of 4 (0's)
lst0=(df.shift(periods=0).A==0)
for x in range(1,4):
lst0=lst0&(df.shift(periods=x).A==0)
#Get index
ones=np.array(list(lst1.index))[lst1]
zeros=np.array(list(lst0.index))[lst0]
#Fill Gaps
for x in list(range(1, len(ones))):
if any(lst0[ones[x-1]:ones[x]]):
lst1[ones[x-1]:ones[x]]=True
#Apply to data frame
df.loc[lst1, 'A']=1
Output:
A
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 1
15 1
16 1
17 1
18 1
19 1
20 1
21 1
22 1
23 1
24 1
25 0
26 1
27 1
28 1
29 0
30 0
31 0
32 0
33 0
34 0
35 0
36 0
37 0
38 1
39 1
You can use regex if you join the column into a string. With regex you can search for 4 zeros or less with 0{,4} and lookbehind ... lookahead for 8 ones with (?<=1{8})...(?=1{8}). I don't think this is an efficient solution.
import re
df['fill_gap'] = df['A']
for i in re.finditer('(?<=1{8})0{,4}(?=1{8})', ''.join(df.fill_gap.astype('str'))):
df.fill_gap.iloc[slice(*i.span())] = 1
df
Output
A fill_gap
0 1 1
1 1 1
2 1 1
3 0 0
4 0 0
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
11 1 1
12 1 1
13 0 1
14 0 1
15 0 1
16 0 1
17 1 1
18 1 1
19 1 1
20 1 1
21 1 1
22 1 1
23 1 1
24 1 1
25 0 0
26 1 1
27 1 1
28 1 1
29 0 0
30 0 0
31 0 0
32 0 0
33 0 0
34 0 0
35 0 0
36 0 0
37 0 0
38 1 1
39 1 1

Explode column into columns

I got pandas train df which looks like this:
image
1 [[0, 0, 0], [1, 0, 1], [0, 1, 1]]
2 [[1, 1, 1], [0, 0, 1], [0, 0, 1]]
2 [[0, 0, 1], [0, 1, 1], [1, 1, 1]]
Is there any way to "explode" it but into columns
1 2 3 4 5 6 7 8 9
1 0, 0, 0, 1, 0, 1, 0, 1, 1
2 1, 1, 1, 0, 0, 1, 0, 0, 1
2 0, 0, 1, 0, 1, 1, 1, 1, 1
np.vstack the Series of lists of lists, then reshape
pd.DataFrame(np.vstack(df['image']).reshape(len(df), -1))
0 1 2 3 4 5 6 7 8
0 0 0 0 1 0 1 0 1 1
1 1 1 1 0 0 1 0 0 1
2 0 0 1 0 1 1 1 1 1

How to load Pandas dataframe into Surprise dataset?

I am building a recommender system based on user's ratings for 11 different items.
I started with a dictionary (user_dict) of user ratings:
{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4],
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0],
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4],
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0],
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4],
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0],
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4],
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2],
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0],
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2],
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5],
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3],
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0],
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3],
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0],
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3],
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0],
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5],
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0],
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3],
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0],
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0],
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2],
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]}
I then loaded the dictionary into a Pandas dataframe by using this code:
df= pd.DataFrame(user_dict)
userRatings_df = df.T
print(userRatings_df)
This prints the data like so:
0 1 2 3 4 5 6 7 8 9 10
U1 3 4 2 5 0 4 1 3 0 0 4
U2 2 3 1 0 3 0 2 0 0 3 0
U3 0 4 0 5 0 4 0 3 0 2 4
U4 0 0 2 1 4 3 2 0 0 2 0
U5 0 0 0 5 0 4 0 3 0 0 4
U6 2 3 4 0 3 0 3 0 3 4 0
U7 0 4 3 5 0 5 0 0 0 0 4
U8 4 3 0 3 4 2 2 0 2 3 2
U9 0 2 0 3 1 0 1 0 0 2 0
U10 0 3 0 4 3 3 0 3 0 4 4
U11 2 2 1 2 1 0 2 0 1 0 2
U12 0 4 4 5 0 0 0 3 0 4 5
U13 3 3 0 2 2 3 2 0 2 0 3
U14 0 3 4 5 0 5 0 0 0 4 0
U15 2 0 0 3 0 2 2 3 0 0 3
U16 4 4 0 4 3 4 0 3 0 3 0
U17 0 2 0 3 1 0 2 0 1 0 3
U18 2 3 1 0 3 2 3 2 0 2 0
U19 0 5 0 4 0 3 0 4 0 0 5
U20 0 0 3 0 3 0 4 0 2 0 0
U21 3 0 2 4 2 3 0 4 2 3 3
U22 4 4 0 5 3 5 0 4 0 3 0
U23 3 0 0 0 3 0 2 0 0 4 0
U24 4 0 3 0 3 0 3 0 0 2 2
U25 0 5 0 3 3 4 0 3 3 4 4
When I attempt to load into into a Surprise dataset I run this code:
reader = Reader(rating_scale=(1,5))
userRatings_data=Dataset.load_from_df(userRatings_df[[1,2,3,4,5,6,7,8,9,10]],
reader)
I get this error:
ValueError: too many values to unpack (expected 3)
Can anyone help me to fix this error?
The problem is coming from the way you are converting your dictionary into a pandas dataframe. For the Dataset to be able process a pandas dataframe, you will need to have only three columns. First column is supposed to be the user ID, second column is the item ID and the third column is the actual rating.
This is how I would build a dataframe which would run in "Dataset":
DF = pd.DataFrame()
for key in user_dict.keys():
df = pd.DataFrame(columns=['User', 'Item', 'Rating'])
df['Rating'] = pd.Series(user_dict[key])
df['Item'] = pd.DataFrame(df.index)
df['User'] = key
DF = pd.concat([DF, df], axis = 0)
DF = DF.reset_index(drop=True)
If you pay attention, I am taking every key from the dictionary, which is essentially a user ID, turn it into a pandas column, along with the ratings and the ratings' indices which will be the column for raw item IDs. Then from every key I build a temporary dataframe which is stacked on top of each other in the final and main dataframe.
Hopefully this helps.

percentage histogram with matplotlib, One input to the axis is a combination of two columns

I have a data frame with a class value I am trying to predict. I am interested in label 1.
I am trying to determine if turn plays a role for a given key value.
For a given key value of say 1 and a turn number of 1, what percentage of turns have a class value of 1?
For example for the given data
key=1,turn=1,8/11 have a class label 1
key=1,turn=2,5/6 have a class label 1
How can I plot a percentage histogram for this type of data?
I know a normal histogram using matplotlib
import matplotlib
matplotlib.use('PS')
import matplotlib.pyplot as plt
plt.hist()
but what values I would use to get the percentage histogram?
Sample columns from the dataframe
key=[
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
]
turn=[
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
1
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4]
class=[0
1
0
0
1
1
1
1
1
1
1
1
1
1
1
0
1
1
0
0
1
1
0
1
0
1
1
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
1
0
1
1
0
0
1
0
0
0
0
0
1
0
1
1
0
0
1
1
1
0
0]
Since the concepts from the linked question are apparently not what you need, an alternative would be to produce pie charts as shown below.
key=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 ]
turn=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
clas=[0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame({"key":key, "turn":turn, "class":clas})
piv = pd.pivot_table(df, values="class", index="key", columns="turn")
print piv
fig, axes = plt.subplots(ncols=4, nrows=2)
for i in range(2):
axes[i,0].set_ylabel("key {}".format(i+1))
for j in range(4):
pie = axes[i,j].pie([piv.values[i,j],1.-piv.values[i,j]], autopct="%.1f%%")
axes[i,j].set_aspect("equal")
axes[0,j].set_title("turn {}".format(j+1))
plt.legend(pie[0],["class 1","class 0"], bbox_to_anchor=(1,0.5), loc="right",
bbox_transform=plt.gcf().transFigure)
plt.show()

Displaying python 2d list without commas, brackets, etc. and newline after every row

I'm trying to display a python 2D list without the commas, brackets, etc., and I'd like to display a new line after every 'row' of the list is over.
This is my attempt at doing so:
ogm = repr(ogm).replace(',', ' ')
ogm = repr(ogm).replace('[', ' ')
ogm = repr(ogm).replace("'", ' ')
ogm = repr(ogm).replace('"', ' ')
print repr(ogm).replace(']', ' ')
This is the input:
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 1, 1, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], [1, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
This is the output:
"' 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 0 0 0 0 '"
I'm encountering two problems:
There are stray " and ' which I can't get rid of
I have no idea how to do a newline
Simple way:
for row in list2D:
print " ".join(map(str,row))
Maybe join is appropriate for you:
print "\n".join(" ".join(str(el) for el in row) for row in ogm)
0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 1 1
0 0 0 0 0 0 1 1 1 0
0 0 0 1 1 0 1 1 1 1
0 0 1 1 0 0 1 1 1 1
0 1 0 0 0 0 0 1 1 0
0 0 0 0 0 0 1 1 0 0
1 0 1 1 1 1 0 0 0 0
print "\n".join(" ".join(map(str, line)) for line in ogm)
If you want the rows and columns transposed
print "\n".join(" ".join(map(str, line)) for line in zip(*ogm))
for row in list2D:
print(*row)
To make the display even more readable you can use tabs or fill the cells with spaces to align the columns.
def printMatrix(matrix):
for lst in matrix:
for element in lst:
print(element, end="\t")
print("")
It will display
6 8 99
999 7 99
3 7 99
instead of
6 8 99
999 7 99
3 7 99
ogm = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 0, 0], [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 1, 1, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], [1, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
s1 = str(ogm)
s2 = s1.replace('], [','\n')
s3 = s2.replace('[','')
s4 = s3.replace(']','')
s5= s4.replace(',','')
print s5
btw the " is actually two ' without any gap
i am learning python for a week. u guys have given some xcellent solutions. here is how i did it....this works too....... :)

Categories

Resources