I have this dataframe:
grade type
0 402 A
1 312 B
2 321 C
...
If the type is A and the value in grade column if higher than 100 I want to multiply it by 0.7 indefinitely until it gets to a value below 100. I didn't find a good way to do that, right now I'm using this code below:
df.loc[(df['type'] == 'A') & (df['grade'] > 100),'grade'] = df['grade']*0.7
(I repeat that 100 times and cross my fingers for 'grade' to be below 100)
I could just do that a few times and if it didn't reach I would force to be 100, however I don't want to have a lot of equal values in the df and also I can't put a random component in it.
Is there a better way to do that (preferably with Pandas)?
You can calculate the power needed using np.log(natural logarithm), with which you can further calculate the multiplier needed to bring down the value under 100:
df.loc[df.type.eq('A') & df.grade.gt(100), 'grade'] = df.grade * np.power(0.7, np.floor(np.log(100 / df.grade) / np.log(0.7)) + 1)
df
# grade type
#0 96.5202 A
#1 312.0000 B
#2 321.0000 C
This should work:
for index, row in df.iterrows():
if row['type'] == 'A':
grade = row['grade']
while grade > 100:
grade = grade*.7
df.loc[index, 'grade'] = grade
#Psidom's suggestion is interesting and it certainly works, however I wanted something more simple and also I wanted to avoid going to Numpy.
Using #ykrueng's suggestion as inspiration, I found a way to run exactly what I wanted:
while len((df.loc[(df['type'] == 'A') & (df['grade'] > 100)]).index)>0:
df.loc[(df['type'] == 'A') & (df['grade'] > 100),'grade'] = df['grade']*0.7
Related
I have a Pandas dataframe with ~100,000,000 rows and 3 columns (Names str, Time int, and Values float), which I compiled from ~500 CSV files using glob.glob(path + '/*.csv').
Given that two different names alternate, the job is to go through the data and count the number of times a value associated with a specific name ABC deviates from its preceding value by ±100, given that the previous 50 values for that name did not deviate by more than ±10.
I initially solved it with a for loop function that iterates through each row, as shown below. It checks for the correct name, then checks the stability of the previous values of that name, and finally adds one to the count if there is a large enough deviation.
count = 0
stabilityTime = 0
i = 0
if names[0] == "ABC":
j = value[0]
stability = np.full(50, values[0])
else:
j = value[1]
stability = np.full(50, values[1])
for name in names:
value = values[i]
if name == "ABC":
if j - 10 < value < j + 10:
stabilityTime += 1
if stabilityTime >= 50 and np.std(stability) < 10:
if value > j + 100 or value < j - 100:
stabilityTime = 0
count += 1
stability = np.roll(stability, -1)
stability[-1] = value
j = value
i += 1
Naturally, this process takes a very long computing time. I have looked at NumPy vectorization, but do not see how I can apply it in this case. Is there some way I can optimize this?
Thank you in advance for any advice!
Bonus points if you can give me a way to concatenate all the data from every CSV file in the directory that is faster than glob.glob(path + '/*.csv').
I have a list of data frames that I'm opening in a for loop. For each data frame I want to query a portion of it and find the average.
This is what I have so far:
k = 0
for i in open('list.txt', 'r'):
k = k+1
i_name = i.strip()
df = pd.read_csv(i_name, sep='\t')
#Create queries
A = df.query('location == 1' and '1000 >= start <= 120000000')
B = df.query('location == 10' and '2000000 >= start <= 60000000')
print A
print B
#Find average
avgA = (sum(A['height'])/len(A['height']))
print avgA
avgB = (sum(B['height'])/len(B['height']))
print avgB
The problem is I'm not getting the average values I'm expecting (when doing it manually by excel). Printing the query results in the entire data frame being printed so I'm not sure if there's a problem with how I'm querying the data.
Am I correctly assigning the values A and B to the queries? Is there another way to do this that doesn't involve saving every data frame as a csv? I have many queries to create and don't want to save each intermediate query for hundreds of samples as I'm only interested in the average.
This does not do what you expect:
A = df.query('location == 1' and '1000 >= start <= 120000000')
B = df.query('location == 10' and '2000000 >= start <= 60000000')
You are doing the Python "and" of two strings. Since the first string has a True value, the result of that expression is "1000 >= start <= 120000000".
You want the "and" to be inside the query:
A = df.query('location == 1 and 1000 >= start <= 120000000')
B = df.query('location == 10 and 2000000 >= start <= 60000000')
Secondly, you have the inequality operators backwards. The first one is only going to get values less than or equal to 1000. What you really want is:
A = df.query('location == 1 and 1000 <= start <= 120000000')
B = df.query('location == 10 and 2000000 <= start <= 60000000')
I'm new to coding and I'm using python pandas to practice making an algo-trading bot. This is my code.
for date in BTCtest.index:
if BTCtest.loc[date,'Shares'] == 0:
BTCtest.loc[date,'Shares'] = max(0,-5)
if BTCtest.loc[date, 'MA10'] > BTCtest.loc[date, 'MA50']:
BTCtest.loc[date, 'Shares'] = 1
elif BTCtest.loc[date, 'MA10'] < BTCtest.loc[date, 'MA50']:
BTCtest.loc[date, 'Shares'] = -1
BTCtest['Position'] = BTCtest['Shares'].cumsum()
BTCtest['Close1'] = BTCtest['Close'].shift(-1)
BTCtest['Profit'] = [BTCtest.loc[date, 'Close1'] - BTCtest.loc[date, 'Close'] if BTCtest.loc[date, 'Shares']==1 else 0 for date in BTCtest.index]
BTCtest['Profit'].plot()
print (BTCtest)
plt.axhline(y=0, color='red')
This is my code and I'm trying to not add shares when the position is 0.
I tried
if BTCtest.loc[date,'Shares'] == 0:
BTCtest.loc[date,'Shares'] = 0
if BTCtest.loc[date,'Shares'] == 0:
max(BTCtest.loc[date,'Shares'],-1)
Below is the result so far.
enter image description here
I don't want my position to go below 0.
I don't understand your code, but I understand your title.
To convert negative values into positive, we can multiply it by '-1' if it is less than 0. So, write
numberr=int(input("Enter a number: "))
if numberr<0:
numberr*=-1
print(numberr)
Hope this helps.
You should use .apply function instead of iterating over index. It will really simplify your code and it is the best practice.
Additionally, if you want to operate only with no zero rows do this:
BTCtest[BTCtest['Position'] == 0].apply(myfunction, axis=1)
I have a homework assignment in which we have to write a program that outputs the change to be given by a vending machine using the lowest number of coins. E.g. £3.67 can be dispensed as 1x£2 + 1x£1 + 1x50p + 1x10p + 1x5p + 1x2p.
However, my program is outputting the wrong numbers. I know there will probably be rounding issues, but I think the current issue is to do with my method of coding this.
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
while change!=0:
if change-2>-1:
change=change-2
twocount+=1
else:
if change-1>-1:
change=change-1
onecount+=1
else:
if change-0.5>-1:
change=change-0.5
halfcount+=1
else:
if change-0.2>-1:
change=change-0.2
pttwocount+=1
else:
if change-0.1>-1:
change=change-0.1
ptonecount+=1
else:
break
print(twocount,onecount,halfcount,pttwocount,ptonecount)
RESULTS:
Input: 2.3
Output: 11010
i.e. 3.2
Input: 3.2
Output:20010
i.e. 4.2
Input: 2
Output: 10001
i.e. 2.1
All your comparisons use >-1, so you give out change as long as you have more than -1 balance.
This would be correct if you were only dealing with integers, since there >-1 is equal to >=0.
For floating point numbers however, we have for example -0.5>-1, so we will give out change for negative balance (which we do not want).
So the correct way would be to replace all >-1 comparisons by >=0 (larger or equal to 0) comparisons.
The problem is how it calculates the change using your if/else statements. If you walk through the first example change-2>-1 will register true and then result will be .3 but on the next loop the if change - 1 > -1 you are expecting to be false but it's not it's actually -0.7. One of the best ways to do this would be with Python's floor // and mod % operators. You have to round some of the calculations because of the way Python handles floats
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
twocount = int(change//2)
change = round(change%2,1)
if change//1 > 0:
onecount = int(change//1)
change = round(change%1,1)
if change//0.5 > 0:
halfcount = int(change//0.5)
change = round(change%0.5, 1)
if change//0.2 > 0:
pttwocount = int(change//0.2)
change = round(change%0.2, 1)
if change//0.1 > 0:
ptonecount = int(change//0.1)
change = round(change%0.1,1)
print(twocount,onecount,halfcount,pttwocount,ptonecount)
But given the inputs this produces
Input: 2.3
Output: 1 0 0 1 1
Input: 3.2
Output:1 1 0 1 0
Input: 2
Output: 1 0 0 0 0
Response:
I need to create a 'Response' column based on an existing 'Time' column. My response variable has to display 'No' for Time values from 1s to 60s and from 240s to 300s. And display "Yes' for all the remaining values.
I tried the code below but it simply displays 'No' for all the 'Time' values, disregarding the given condition.
Dataset:
dataset['Y'] = np.where(dataset["Time"] > 60 & (dataset["Time"] < 240 ), 'yes', 'no')
def label(row):
if row['Time'] >= 1 and row['Time'] < 60:
return("no")
elif row['Time'] >= 240 and row['Time'] < 300:
return("no")
else:
return("yes")
dataset['Y'] = dataset.apply(lambda row: label(row), axis=1)
In your code, your condition is wrong so it won't work.
I'd be inclined to do the following:
dataset['Y'] = (dataset['Time'] >= 1) & (dataset['Time'] <= 4) | (dataset['Time'] > 5)
Note that this will fill your 'Y' column with bool values. If it's paramount that they are Yes/No then you can add change it to
dataset['Y'] = ((dataset['Time'] >= 1) & (dataset['Time'] <= 4) | (dataset['Time'] > 5)).replace({True: 'Yes', False: 'No'})
Also note that rather than convert your time column into seconds I converted your time intervals into minutes, however you may want to do this different for readability.