creating histograms in pandas - python

I'm trying to create a histogram based on the following groupby,
dfm.groupby(['ID', 'Readings', 'Condition']).size:
578871001 20110603 True 1
20110701 True 1
20110803 True 1
20110901 True 1
20110930 True 1
..
324461897 20130214 False 1
20130318 False 1
20130416 False 1
20130516 False 1
20130617 False 1
532674350 20110616 False 1
20110718 False 1
20110818 False 1
20110916 False 1
20111017 False 1
20111115 False 1
20111219 False 1
However, I'm trying to format the output by Condition and group the number of ID and Readings. Something like this,
True
# of Readings: # of ID
1 : 5
2 : 8
3 : 15
4 : 10
5 : 4
I've tried grouping just by ID and Readings, and transforming by Condition, but have not gotten very far.
Edit:
This is what the dataframe looked like before the groupby:
CustID Condtion Month Reading Consumption
0 108000601 True June 20110606 28320.0
1 108007000 True July 20110705 13760.0
2 108007000 True August 20110804 16240.0
3 108008000 True September 20110901 12560.0
4 108008000 True October 20111004 12400.0
5 108000601 False November 20111101 9440.0
6 108090000 False December 20111205 12160.0

Is this what you are trying to achieve with your groupby? I've included Counter to track the count of each reading. For example, for Condtion = False, there are two CustIDs with a single reading, so the output of the first row is:
Condtion
False 1 2 # One reading, two observations of one reading.
Then, for Condtion = True, there is one customer with one reading (108000601) and two customers with two readings each. The output for this group is:
Condtion
True 1 1 # One customer with one reading.
2 2 # Two customers with two readings each.
from collections import Counter
gb = df.groupby(['Condtion', 'CustID'], as_index=False).Reading.count()
>>> gb
Condtion CustID Reading
0 False 108000601 1
1 False 108090000 1
2 True 108000601 1
3 True 108007000 2
4 True 108008000 2
>>> gb.groupby('Condtion').Reading.apply(lambda group: Counter(group))
Condtion
False 1 2
True 1 1
2 2
dtype: float64
Or, chained together as a single statement:
gb = (df
.groupby(['Condtion', 'CustID'], as_index=False)['Reading']
.count()
.groupby('Condtion')['Reading']
.apply(lambda group: Counter(group))
)

Related

Create a column Counting the consecutive True values on multi-index

Let df be a dataframe of boolean values with a two column index. I want to calculate the value for every id. For example, this is how it would look on this specific case.
value consecutive
id Week
1 1 True 1
1 2 True 2
1 3 False 0
1 4 True 1
1 5 True 2
2 1 False 0
2 2 False 0
2 3 True 1
This is my solution:
def func(id,week):
M = df.loc[id]
M= df.loc[id][:week+1]
consecutive_list = list()
S=0
for index,row in M.iterrows():
if row['value']:
S+=1
else:
S=0
consecutive_list.append(S)
return consecutive_list[-1]
Then we generate the column "consecutive" as a list on the following way:
Consecutive_list = list()
for k in df.index:
id = k[0]
week=k[1]
Consecutive_list.append(func(id,week))
df['consecutive'] = Consecutive_list
I would like to know if there is a more Pythonic way to do this.
EDIT: I wrote the "consecutive" column in order to show what I expect this to be.
If you are trying to add the consecutive column to the df, this should work:
df.assign(consecutive = df['value'].groupby(df['value'].diff().ne(0).cumsum()).cumsum())
Output:
value consecutive
1 a True 1
b True 2
2 a False 0
b True 1
3 a True 2
b False 0
4 a False 0
b True 1

Python - Check values in consecutive columns(i.e across rows)

I have a 6x4 dataframe containing numerical values. I would like to check if the value in the current column is the same as the next column's i.e are there any equal values in consecutive columns per row?. How do I perform this check as a new column?
import itertools as it
import pandas as pd
list(set(it.permutations([1,1,0,0])))
x_list = list(set(it.permutations([1,1,0,0])))
x_df = pd.DataFrame(x_list)
x_df.columns = ['one', 'two', 'three', 'four']
If I understood you correctly:
x = x_df.diff(periods=-1, axis=1)
x['four'] = x_df['four'] - x_df['three']
print((x==0))
Input:
one two three four
0 1 0 1 0
1 1 1 0 0
2 1 0 0 1
3 0 1 1 0
4 0 1 0 1
5 0 0 1 1
Output:
one two three four
0 False False False False
1 True False True True
2 False True False False
3 False True False False
4 False False False False
5 True False True True

Count number of consecutive True in column, restart when False

I work with the following column in a pandas df:
A
True
True
True
False
True
True
I want to add column B that counts the number of consecutive "True" in A. I want to restart everytime a "False" comes up. Desired output:
A B
True 1
True 2
True 3
False 0
True 1
True 2
Using cumsum identify the blocks of rows where the values in column A stays True, then group the column A on these blocks and calculate cumulative sum to assign ordinal numbers
df['B'] = df['A'].groupby((~df['A']).cumsum()).cumsum()
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2
Using a simple & native approach
(For a small code sample it worked fine)
import pandas as pd
df = pd.DataFrame({'A': [True, False, True, True, True, False, True, True]})
class ToNums:
counter = 0
#staticmethod
def convert(bool_val):
if bool_val:
ToNums.counter += 1
else:
ToNums.counter = 0
return ToNums.counter
df['B'] = df.A.map(ToNums.convert)
df
A B
0 True 1
1 False 0
2 True 1
3 True 2
4 True 3
5 False 0
6 True 1
7 True 2
Here's an example
v=0
for i,val in enumerate(df['A']):
if val =="True":
df.loc[i,"C"]= v =v+1
else:
df.loc[i,"C"]=v=0
df.head()
This will give the desired output
A C
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
You can use a combination of groupby, cumsum, and cumcount
df['B'] = (df.groupby((df['A']&
~df['A'].shift(1).fillna(False) # row is True and next is False
)
.cumsum() # make group id
)
.cumcount().add(1) # make cumulated count
*df['A'] # multiply by 0 where initially False, 1 otherwise
)
output:
A B
0 True 1
1 True 2
2 True 3
3 False 0
4 True 1
5 True 2

Checking if values of a pandas Dataframe are between two lists. Adding a boolean column

I am trying to add a new column to a pandas Dataframe (False/True),which reflects if the Value is between two datapoints from another file.
I have a two files which give the following info:
File A:(x) File B:(y)
'time' 'time_A' 'time_B'
0 1 0 1 3
1 3 1 5 6
2 5 2 8 10
3 7
4 9
5 11
6 13
I tried to do it with the .map function, however it gives true and false for each event, not one column.
x['Event'] = x['time'].map((lamda x: x< y['time_A']),(lamda x: x> y['time_B']))
This would be the expected result
->
File A:
'time' 'Event'
0 1 True
1 3 True
2 5 True
3 7 False
4 9 True
5 11 False
6 13 False
However what i get is something like this
->
File A:
'time'
0 1 "0 True
1 True
2 True"
Name:1, dtype:bool"
2 3 "0 True
1 True
2 True
Name:1, dtype:bool"
This should do it:
(x.assign(key=1)
.merge(y.assign(key=1),
on='key')
.drop('key', 1)
.assign(Event=lambda v: (v['time_A'] <= v['time']) &
(v['time'] <= v['time_B']))
.groupby('time', as_index=False)['Event']
.any())
time Event
0 1 True
1 3 True
2 5 True
3 7 False
4 9 True
5 11 False
6 13 False
Use pd.IntervalIndex here:
idx=pd.IntervalIndex.from_arrays(B['time_A'],B['time_B'],closed='both')
#output-> IntervalIndex([[1, 3], [5, 6], [8, 10]],closed='both',dtype='interval[int64]')
A['Event']=B.set_index(idx).reindex(A['time']).notna().all(1).to_numpy()
print(A)
time Event
0 1 True
1 3 True
2 5 True
3 7 False
4 9 True
5 11 False
6 13 False
One liner:
A['Event'] = sum(A.time.between(b.time_A, b.time_B) for _, b in B.iterrows()) > 0
Explain:
For each row b of B dataframe, A.time.between(b.time_A, b.time_B) returns a boolean series whether time is between time_A and time_B
sum(list_of_boolean_series) > 0: Elementwise OR

Adding a count to prior cell value in Pandas

in Pandas I am looking to add a value in one column 'B' depending on the boolean values from another column 'A'. So if 'A' is True then start counting (i.e. adding a one each new line) as long as 'A' is false. When 'A' is True reset and start counting again. I managed to do this with a 'for' loop but this is very time consuming. I am wondering if there is no more time efficient solution?
the result should look like this:
Date A B
01.2010 False 0
02.2010 True 1
03.2010 False 2
04.2010 False 3
05.2010 True 1
06.2010 False 2
You can use cumsum with groupby and cumcount:
print df
Date A
0 1.201 False
1 1.201 True
2 1.201 False
3 2.201 True
4 3.201 False
5 4.201 False
6 5.201 True
7 6.201 False
roll = df.A.cumsum()
print roll
0 0
1 1
2 1
3 2
4 2
5 2
6 3
7 3
Name: A, dtype: int32
df['B'] = df.groupby(roll).cumcount() + 1
#if in first values are False, output is 0
df.loc[roll == 0 , 'B'] = 0
print df
Date A B
0 1.201 False 0
1 1.201 True 1
2 1.201 False 2
3 2.201 True 1
4 3.201 False 2
5 4.201 False 3
6 5.201 True 1
7 6.201 False 2
thanks, I got the solution from another post similar to this:
rolling_count = 0
def set_counter(val):
if val == False:
global rolling_count
rolling_count +=1
else:
val == True
rolling_count = 1
return rolling_count
df['B'] = df['A'].map(set_counter)

Categories

Resources