I want to check if a column has at least one instant with 5 continous days with nonzero value. Such that in the following example, this would be false for column '1' result=0, and ture for column '2' result=1. This code will do the job:
import pandas as pd
days=pd.date_range('1900-1-1',periods=14,freq='D')
df = pd.DataFrame({'1': [0,1,0,1,1,0,1,0,1,1,0,1,1,0], '2':[0,1,0,1,1,0,1,0,1,1,1,1,1,0]},index=days)
col='2' #Select any column (i.e., the result for col1 should be 0 and for col2 should be 1)
nday=5 #Number of consecutive days with nonzero values
result=0 #If nonzero values lasted for 5 consecutive days, then result=1
for index, row in df.iterrows():
if row[col] ==0: #Restart counting if nonzero vaules are not continous for five days
nday=5
elif row[col] ==1: #Check for continous nonzero values
nday-=1
if nday==0:
result=1
break
print(result)
Is there an easier way than this long code?
The code seems good in terms of complexity and the number of lines. Just a few suggestions, see below.
def has_continuous(col, ndays=5) -> bool:
days_left = n_days
for index, row in enumerate(col):
if not row[col]: #Restart counting
days_left = n_days
else:
# I assume that all values are non-negative. If it is not zero, it is positive
days_left -= 1
if not days_left:
return True
return False
result = has_continuos(df['2'], 5)
If you are always checking for 0, you can use rolling with min:
col='2'
nday=5
print (df[col].rolling(nday).min().ge(1).any())
# True
Related
I have a Dataframe similar to the tabel example here, and I want to find a specific value inside this Dataframe, for example the value 2. Herefor i use np.where, in the next step I want to check if there is a next value, and if so is the value smaller/bigger/similar.
My solution would be 'print out the np.where and hardcode the index with [-x] for each value after the 2.' So iam looking for a smarter Solution for cases with for example 100 Values
The output should be: 2 is bigger ,2 is smaller ,2 is the last number.
Value
1
2
1
2
3
2
If I understand your question correctly you could try this code
import pandas as pd
frame = pd.DataFrame([1, 2, 1, 2, 3, 2], columns=['num'])
def find_values(df, number):
for index, row in df.iterrows():
if row['num'] == number:
if len(df) == index+1:
print(number, 'is the last number')
else:
next_num = df.loc[index+1, 'num']
if next_num > number:
print(number, '<', next_num)
elif next_num < number:
print(number, '>', next_num)
else:
print(number, '==', number)
find_values(frame, 2)
output:
2 > 1
2 < 3
2 is the last number
below is the code to find out the missing numbers, but need to select first 3 consecutive numbers from missings
array=[0,1,4,5,9,10]
start=0
end=15
missings=[]
for i in range(start,end):
if i not in array:
missings.append(i)
output: [6,7,8]
Here you go:
array=[0,1,4,5,9,10]
start=0
end=15
missings=[]
for i in range(start,end-1):
if i not in array:
if i+1 not in array:
if i+2 not in array:
missings.append(i)
missings.append(i+1)
missings.append(i+2)
break
Sort the list in ascending order, then compare the values in the array with their neighbor to determine if there is a gap >3.
def find_missing(arr):
sorted_list = sorted(arr)
# Set our lowest value for comparing
curr_value = sorted_list[0]
for i in range(1,len(sorted_list)):
# Compare the previous value to the next value to determine if there is a difference of atleast 4 (6-3 = 3 but we are only missing numbers 4 and 5)
if (sorted_list[i] - curr_value) > 3:
# Return on the first 3 consecutive missing numbers
return [curr_value+1, curr_value+2, curr_value+3]
curr_value = sorted_list[i]
# Return an empty array if there is not 3 consecutive missing numbers
return []
This function works based on the length of the array and the largest number. If there is a need for a specified end value in case all elements in the array do not have a gap of three except for the largest element and the end value, it can be passed as a parameter with some minor modifications.
def find_missing(arr, start_val=0, end_val=0):
# Sorting a list alters the source, so make a copy to not alter the original list
sorted_list = sorted(arr)
curr_value = sorted_list[0]
last_value = sorted_list[-1]
# Make sure start val is the lowest number, otherwise use lowest number
if start_val < curr_value and (curr_value - start_val) > 3:
return [start_val, start_val+1, start_val+2]
for i in range(1,len(sorted_list)):
# Compare the previous value to the next value to determine if there is a difference of atleast 4 (6-3 = 3 but we are only missing numbers 4 and 5)
if (sorted_list[i] - curr_value) > 3:
# Return on the first 3 consecutive missing numbers
return [curr_value+1, curr_value+2, curr_value+3]
curr_value = sorted_list[i]
# If there is an end_value set that has a gap between the largest number and is larger than the last value
if end_val > last_value and (end_val - last_value) > 3:
return [last_value+1, last_value+2, last_value+3]
else:
# Return an empty array if there is not 3 consecutive missing numbers
return []
I am trying to count the number of times a value is greater than the previous value by 2.
I have tried
df['new'] = df.ms.gt(df.ms.shift())
and other similar lines but none give me what I need.
might be less than elegant but:
df['new_ms'] = df['ms'].shift(-1)
df['new'] = np.where((df['ms'] - df['new_ms']) >= 2, 1, 0)
df['new'].sum()
Are you looking for diff? Find the difference between consecutive values and check that their difference is greater than, or equal to 2, then count rows that are True:
(df.ms.diff() >= 2).sum()
If you need to check if the difference is exactly 2, then change >= to ==:
(df.ms.diff() == 2).sum()
Since you need a specific difference, gt won't work. You could simply subtract and see if the difference is bigger than 2:
(df.ms - df.ms.shift() > 2).sum()
edit: changed to get you your answer instead of creating a new column. sum works here because it converts booleans to 1 and 0.
your question was ambiguous but as you wanted to see a program where number of times a value is greater than the previous value by 2 in pandas.here it is :
import pandas as pd
lst2 = [11, 13, 15, 35, 55, 66, 68] #list of int
dataframe = pd.DataFrame(list(lst2)) #converting into dataframe
count = 0 #we will count how many time n+1 is greater than n by 2
d = dataframe[0][0] #storing first index value to d
for i in range(len(dataframe)):
#print(dataframe[0][i])
d = d+2 #incrementing d by 2 to check if it is equal to the next index value
if(d == dataframe[0][i]):
count = count+1 #if n is less than n+1 by 2 then keep counting
d = dataframe[0][i] #update index
print("total count ",count) #printing how many times n was less than n+1 by 2
I've got a pandas df, which has one column with either positive or negative float values:
snapshot 0 (column name)
2018-06-21 00:00:00 -60.18
2018-06-21 00:00:15 43.78
2018-06-21 00:00:30 -22.08
Now I want to append the positive values to a list that's called:
excessSupply=[]
and the negative values to:
excessLoad=[]
by
for row in self.dfenergyBalance:
if self.dfenergyBalance['0'] < 0:
self.excessLoad.append(self.dfenergyBalance['0'])
else:
self.excessLoad.append(0)
(for excessSupply is the if condition self.dfenergyBalance > 0)
The outcome is a key error of the column name '0'
In my opinion no loops (slow) are necessesary, also it seems column name is number 0:
mask = dfenergyBalance[0] < 0
excessSupply = dfenergyBalance.loc[mask, 0].tolist()
excessLoad = dfenergyBalance.loc[~mask, 0].tolist()
print (excessSupply)
[-60.18, -22.08]
print (excessLoad)
[43.78]
EDIT:
For list with only 0 by length of positive values:
excessLoad = [0] * (~mask).sum()
print (excessLoad)
[0]
If need only one list with replaced positive to 0 values:
L = np.where(mask, dfenergyBalance[0], 0).tolist()
print (L)
[-60.18, 0.0, -22.08]
def tableCheck(elev, n, m):
tablePosCount = 0
rowPosCount = 0
for r in range(1, n):
for c in range(1, m):
if elev[r][c] > 0:
tablePosCount = tablePosCount + 1
rowPosCount = rowPosCount + 1
print 'Number of positive entries in row ', r , ' : ', rowPosCount
print 'Number of positive entries in table :', tablePosCount
return tablePosCount
elev = [[1,0,-1,-3,2], [0,0,1,-4,-1], [-2,2,8,1,1]]
tableCheck(elev, 3, 5)
I'm having some difficulty getting this code to run properly. If anyone can tell me why it might being giving me this output
Number of positive entries in row 1 : 1
Number of positive entries in row 2 : 2
Number of positive entries in row 2 : 3
Number of positive entries in row 2 : 4
Number of positive entries in row 2 : 5
Number of positive entries in table : 5
There are three things in your code that I suspect are errors, though since you don't describe the behavior you expect, it's possible that one or more of these is working as intended.
The first issue is that you print out the "row" number every time that you see a new value that is greater than 0. You probably want to unindent the print 'Number of positive entries in row ' line by two levels (to be even with the inner for loop).
The second issue is that you don't reset the count for each row, so the print statement I suggested you move will not give the right output after the first row. You probably want to move the rowPosCount = 0 line inside the outer loop.
The final issue is that you're skipping the first row and the first value of each later row. This is because your ranges go from 1 to n or m. Python indexing starts at 0, and ranges exclude their upper bound. You probably want for r in range(n) and for c in range(m), though iterating on the table values themselves (or an enumeration of them) would be more Pythonic.