Use pandas to count value greater than previous value - python

I am trying to count the number of times a value is greater than the previous value by 2.
I have tried
df['new'] = df.ms.gt(df.ms.shift())
and other similar lines but none give me what I need.

might be less than elegant but:
df['new_ms'] = df['ms'].shift(-1)
df['new'] = np.where((df['ms'] - df['new_ms']) >= 2, 1, 0)
df['new'].sum()

Are you looking for diff? Find the difference between consecutive values and check that their difference is greater than, or equal to 2, then count rows that are True:
(df.ms.diff() >= 2).sum()
If you need to check if the difference is exactly 2, then change >= to ==:
(df.ms.diff() == 2).sum()

Since you need a specific difference, gt won't work. You could simply subtract and see if the difference is bigger than 2:
(df.ms - df.ms.shift() > 2).sum()
edit: changed to get you your answer instead of creating a new column. sum works here because it converts booleans to 1 and 0.

your question was ambiguous but as you wanted to see a program where number of times a value is greater than the previous value by 2 in pandas.here it is :
import pandas as pd
lst2 = [11, 13, 15, 35, 55, 66, 68] #list of int
dataframe = pd.DataFrame(list(lst2)) #converting into dataframe
count = 0 #we will count how many time n+1 is greater than n by 2
d = dataframe[0][0] #storing first index value to d
for i in range(len(dataframe)):
#print(dataframe[0][i])
d = d+2 #incrementing d by 2 to check if it is equal to the next index value
if(d == dataframe[0][i]):
count = count+1 #if n is less than n+1 by 2 then keep counting
d = dataframe[0][i] #update index
print("total count ",count) #printing how many times n was less than n+1 by 2

Related

Fastest way to count event occurences in a Pandas dataframe?

I have a Pandas dataframe with ~100,000,000 rows and 3 columns (Names str, Time int, and Values float), which I compiled from ~500 CSV files using glob.glob(path + '/*.csv').
Given that two different names alternate, the job is to go through the data and count the number of times a value associated with a specific name ABC deviates from its preceding value by ±100, given that the previous 50 values for that name did not deviate by more than ±10.
I initially solved it with a for loop function that iterates through each row, as shown below. It checks for the correct name, then checks the stability of the previous values of that name, and finally adds one to the count if there is a large enough deviation.
count = 0
stabilityTime = 0
i = 0
if names[0] == "ABC":
j = value[0]
stability = np.full(50, values[0])
else:
j = value[1]
stability = np.full(50, values[1])
for name in names:
value = values[i]
if name == "ABC":
if j - 10 < value < j + 10:
stabilityTime += 1
if stabilityTime >= 50 and np.std(stability) < 10:
if value > j + 100 or value < j - 100:
stabilityTime = 0
count += 1
stability = np.roll(stability, -1)
stability[-1] = value
j = value
i += 1
Naturally, this process takes a very long computing time. I have looked at NumPy vectorization, but do not see how I can apply it in this case. Is there some way I can optimize this?
Thank you in advance for any advice!
Bonus points if you can give me a way to concatenate all the data from every CSV file in the directory that is faster than glob.glob(path + '/*.csv').

Get list of "n" unique random numbers from the given range

I need 3 unique random numbers between 1 to 20.
I have a loop generating 3 random numbers from 1-20. And I don't want there to be any repeats. I have something that works for the first two numbers. But in my code the first and third numbers can still be the same which needs to be fixed.
i = 1
while i <= 3:
x = random.randint(1, 20)
print(choice([i for i in range(1, 20) if i != [x]]))
i += 1
Is there a better way to achieve this in Python?
You can use random.sample(). Below example will return you 3 unique random numbers between 1 to 20.
>>> import random
>>> sample_count = 3 # count of required unique numbers
>>> start_range, end_range = 1, 20 # start and end range
>>> random.sample(range(start_range, end_range+1), sample_count)
[4, 1, 14]
Please refer random.sample() documentation for more details.

Python pandas DataFrame: Check if n elements has continous value?

I want to check if a column has at least one instant with 5 continous days with nonzero value. Such that in the following example, this would be false for column '1' result=0, and ture for column '2' result=1. This code will do the job:
import pandas as pd
days=pd.date_range('1900-1-1',periods=14,freq='D')
df = pd.DataFrame({'1': [0,1,0,1,1,0,1,0,1,1,0,1,1,0], '2':[0,1,0,1,1,0,1,0,1,1,1,1,1,0]},index=days)
col='2' #Select any column (i.e., the result for col1 should be 0 and for col2 should be 1)
nday=5 #Number of consecutive days with nonzero values
result=0 #If nonzero values lasted for 5 consecutive days, then result=1
for index, row in df.iterrows():
if row[col] ==0: #Restart counting if nonzero vaules are not continous for five days
nday=5
elif row[col] ==1: #Check for continous nonzero values
nday-=1
if nday==0:
result=1
break
print(result)
Is there an easier way than this long code?
The code seems good in terms of complexity and the number of lines. Just a few suggestions, see below.
def has_continuous(col, ndays=5) -> bool:
days_left = n_days
for index, row in enumerate(col):
if not row[col]: #Restart counting
days_left = n_days
else:
# I assume that all values are non-negative. If it is not zero, it is positive
days_left -= 1
if not days_left:
return True
return False
result = has_continuos(df['2'], 5)
If you are always checking for 0, you can use rolling with min:
col='2'
nday=5
print (df[col].rolling(nday).min().ge(1).any())
# True

subtract n values from input python

I haven't found anything even relevant to my question, so i may be asking it wrong.
I am working on an exercise where I am given sequential values starting at 1 and going to n, but not in order. I must find a missing value from the list.
My method is to add the full 1 => n value in a for loop but I can't figure out how to add n - 1 non-sequential values each as its own line of input in order to subtract it from the full value to get the missing one.
I have been searching modifications to for loops or just how to add n inputs of non-sequential numbers. If I am simply asking the wrong question, I am happy to do my own research if someone could point me in the right direction.
total = 0
for i in range (1 , (int(input())) + 1):
total += i
print(total)
for s in **?????(int(input()))**:
total -= s
print(total)
sample input:
5
3
2
5
1
expected output: 4
To fill in the approach you're using in your example code:
total = 0
n = int(input("How long is the sequence? "))
for i in range(1, n+1):
total += i
for i in range(1, n):
total -= int(input("Enter value {}: ".format(i)))
print("Missing value is: " + str(total))
That first for loop is unnecessary though. First of all, your loop is equivalent to the sum function:
total = sum(range(1,n+1))
But you can do away with any iteration altogether by using the formula:
total = int(n*(n+1)/2) # division causes float output so you have to convert back to an int
I don't know if you are supposed to create the initial data (with the missing item), so I added some lines to generate this sequence:
import random
n = 12 # or n = int(input('Enter n: ')) to get user input
# create a shuffled numeric sequence with one missing value
data = list(range(1,n+1))
data.remove(random.randrange(1,n+1))
random.shuffle(data)
print(data)
# create the corresponding reference sequence (without missing value)
data2 = list(range(1,n+1))
# find missing data with your algorithm
print("Missing value =", sum(data2)-sum(data))
Here is the output:
[12, 4, 11, 5, 2, 7, 1, 6, 8, 9, 10]
Missing value = 3

Python tablecheck error

def tableCheck(elev, n, m):
tablePosCount = 0
rowPosCount = 0
for r in range(1, n):
for c in range(1, m):
if elev[r][c] > 0:
tablePosCount = tablePosCount + 1
rowPosCount = rowPosCount + 1
print 'Number of positive entries in row ', r , ' : ', rowPosCount
print 'Number of positive entries in table :', tablePosCount
return tablePosCount
elev = [[1,0,-1,-3,2], [0,0,1,-4,-1], [-2,2,8,1,1]]
tableCheck(elev, 3, 5)
I'm having some difficulty getting this code to run properly. If anyone can tell me why it might being giving me this output
Number of positive entries in row 1 : 1
Number of positive entries in row 2 : 2
Number of positive entries in row 2 : 3
Number of positive entries in row 2 : 4
Number of positive entries in row 2 : 5
Number of positive entries in table : 5
There are three things in your code that I suspect are errors, though since you don't describe the behavior you expect, it's possible that one or more of these is working as intended.
The first issue is that you print out the "row" number every time that you see a new value that is greater than 0. You probably want to unindent the print 'Number of positive entries in row ' line by two levels (to be even with the inner for loop).
The second issue is that you don't reset the count for each row, so the print statement I suggested you move will not give the right output after the first row. You probably want to move the rowPosCount = 0 line inside the outer loop.
The final issue is that you're skipping the first row and the first value of each later row. This is because your ranges go from 1 to n or m. Python indexing starts at 0, and ranges exclude their upper bound. You probably want for r in range(n) and for c in range(m), though iterating on the table values themselves (or an enumeration of them) would be more Pythonic.

Categories

Resources