is there a way to make a value non-negative in dataframe - python

I'm new to coding and I'm using python pandas to practice making an algo-trading bot. This is my code.
for date in BTCtest.index:
if BTCtest.loc[date,'Shares'] == 0:
BTCtest.loc[date,'Shares'] = max(0,-5)
if BTCtest.loc[date, 'MA10'] > BTCtest.loc[date, 'MA50']:
BTCtest.loc[date, 'Shares'] = 1
elif BTCtest.loc[date, 'MA10'] < BTCtest.loc[date, 'MA50']:
BTCtest.loc[date, 'Shares'] = -1
BTCtest['Position'] = BTCtest['Shares'].cumsum()
BTCtest['Close1'] = BTCtest['Close'].shift(-1)
BTCtest['Profit'] = [BTCtest.loc[date, 'Close1'] - BTCtest.loc[date, 'Close'] if BTCtest.loc[date, 'Shares']==1 else 0 for date in BTCtest.index]
BTCtest['Profit'].plot()
print (BTCtest)
plt.axhline(y=0, color='red')
This is my code and I'm trying to not add shares when the position is 0.
I tried
if BTCtest.loc[date,'Shares'] == 0:
BTCtest.loc[date,'Shares'] = 0
if BTCtest.loc[date,'Shares'] == 0:
max(BTCtest.loc[date,'Shares'],-1)
Below is the result so far.
enter image description here
I don't want my position to go below 0.

I don't understand your code, but I understand your title.
To convert negative values into positive, we can multiply it by '-1' if it is less than 0. So, write
numberr=int(input("Enter a number: "))
if numberr<0:
numberr*=-1
print(numberr)
Hope this helps.

You should use .apply function instead of iterating over index. It will really simplify your code and it is the best practice.
Additionally, if you want to operate only with no zero rows do this:
BTCtest[BTCtest['Position'] == 0].apply(myfunction, axis=1)

Related

I Tried To Add Booleans, But It Gave The Wrong Answer. Advent Of Code 2022, Day 4

So what I had to do for this challenge was parse through a text file, looks like this
1-6,7-9
10-11,11-11
And I had to check if one range contains each other. In the above, in the second pair, range 2 is fully contained in range 1. I had to check for stuff like that.
So I built this code.
with open("input4.txt") as f:
text_array = f.read().split("\n")
total = 0
for i in text_array:
parts = i.split(",")
pair1 = parts[0].split("-")
pair2 = parts[1].split("-")
if (pair1[0] <= pair2[0] and pair1[1] >= pair2[1]) and (pair2[0] <= pair1[0] and pair2[1] >= pair1[1]):
total += 1
print(total)
(Ignore the other print statements but the last one) And it gave me 596, which in the advent of code it says it is too high.
It gave me 596, and when put into the problem, it says it is too high. I am wondering if there are any cases that slid in or idk. I literally made custom input, and it gave the correct answer. Does anyone know where did I go wrong?
Try this.
Not sure but comparing strings may lead to issues when numbers are made of several digits.
pair1 = [int(x) for x in parts[0].split("-")]
pair2 = [int(x) for x in parts[1].split("-")]
Looks like the problem is in your "if".
maybe you can try this code
text_array = f.read().split("\n")
total = 0
for i in text_array:
parts = i.split(",")
pair1 = parts[0].split("-")
pair2 = parts[1].split("-")
if (pair1[0] <= pair2[0] and pair1[1] >= pair2[1]) or (pair2[0] <= pair1[0] and pair2[1] >= pair1[1]):
total += 1

Need to Compare DataFrames in Pandas using < and > operators for specific data in a column

I am trying to compare the following dataframes:
I have a pair of Z Scores with a specific ENST number here :
Z_SCORE_Raw
ENST00000547849 ENST00000587894
0 -1.3099506 21.56600492
I have to compare each of these numbers to their corresponding ENST code in this dataFrame:
df_new
ENST00000547849High_Avg ENST00000587894 High_Avg
ENST00000547849 Low_Avg ENST00000587894 Low_Avg
0.0026421609368421000 -0.0457525087368421
-0.040015074588235300 -0.04140853107142860
I am given the following formula:
if Z_Score[given ENSTCode] > Avg_High[ENSTCode]
return 1
elif Z_Score[given ENSTCode] > Avg_Low[ENSTCode]
return 0
Elif Avg_High>Z_Score>AVg_Low
return 0.5
I currently have the following code to gather the correct ENST code and compare that ZScore to the corresponding High and Low average of each ENST Code:
for x in Z_score_raw:
if Z_score_raw[x].any() > df_new[x + ' High_Avg'].any():
print('1')
elif Z_score_raw[x].any() < df_new[x + ' Low_Avg'].any():
print('0')
elif df_new[x + ' High_Avg'].any() > Z_score_raw[x].any() > df_new[x + ' Low_Avg']:
print('0.5')
The expected output would be for
ENST00000547849: 0 (as -1.309 < -0.0400150745882353)
ENST00000587894: 1 (as 21.56600492 > -0.45725)
My current code gives me no results and skips by all of the checks. How can I get this to work properly?
The problem is, that you are iterating correctly, but then you are comparing a boolean value that is returned by .any() using > or <.
What is True > False or True < True?
So that doesn't make sense.
If you only have one value per column, just use [0] to select the value at the index 0.
Also, Make sure your column naming pattern is consistent (e.g. no spaces everywhere).
Your Example:
ENST00000547849High_Avg ENST00000587894 High_Avg
My Correction (no Space):
ENST00000547849High_Avg ENST00000587894High_Avg
This will provide your desired result:
import pandas as pd
d = {"ENST00000547849": [-1.3099506], "ENST00000587894": [21.56600492]}
d_2 = {"ENST00000547849High_Avg": [0.0026421609368421000], "ENST00000587894High_Avg" : [-0.0457525087368421], "ENST00000547849Low_Avg" : [-0.040015074588235300], "ENST00000587894Low_Avg": [-0.04140853107142860]}
Z_score_raw = pd.DataFrame(data = d)
df_new = pd.DataFrame(data = d_2)
for x in Z_score_raw:
if Z_score_raw[x][0] > df_new[x + 'High_Avg'][0]:
print(f"{x}: 1")
elif Z_score_raw[x][0] < df_new[x + 'Low_Avg'][0]:
print(f"{x}: 0")
elif df_new[x + 'High_Avg'][0] > Z_score_raw[x][0] > df_new[x + 'Low_Avg'][0]:
print(f"{x}: 0.5")
Output:
ENST00000547849: 0
ENST00000587894: 1

How to create a new column in Python, based on an existing column

Response:
I need to create a 'Response' column based on an existing 'Time' column. My response variable has to display 'No' for Time values from 1s to 60s and from 240s to 300s. And display "Yes' for all the remaining values.
I tried the code below but it simply displays 'No' for all the 'Time' values, disregarding the given condition.
Dataset:
dataset['Y'] = np.where(dataset["Time"] > 60 & (dataset["Time"] < 240 ), 'yes', 'no')
def label(row):
if row['Time'] >= 1 and row['Time'] < 60:
return("no")
elif row['Time'] >= 240 and row['Time'] < 300:
return("no")
else:
return("yes")
dataset['Y'] = dataset.apply(lambda row: label(row), axis=1)
In your code, your condition is wrong so it won't work.
I'd be inclined to do the following:
dataset['Y'] = (dataset['Time'] >= 1) & (dataset['Time'] <= 4) | (dataset['Time'] > 5)
Note that this will fill your 'Y' column with bool values. If it's paramount that they are Yes/No then you can add change it to
dataset['Y'] = ((dataset['Time'] >= 1) & (dataset['Time'] <= 4) | (dataset['Time'] > 5)).replace({True: 'Yes', False: 'No'})
Also note that rather than convert your time column into seconds I converted your time intervals into minutes, however you may want to do this different for readability.

How do I return the middle of a python array?

Here is the prompt:
On the first line display the first, last and middle element of the list separated by the , character.
I have been trying to get this figured out for a few hours now, but do not know the correct process to return the middle of the array. Here is my code so far:
primary = []
length = 0
i = ("MORE")
while i != "NOMORE":
i = str(input("?"))
print(i)
if i == "NOMORE":
break
primary.append(i)
length = length + 1
mid = (length/2)
print(primary[0]," , ", primary[-1]," , ",primary.pop([mid]))
The code works to get the correct inputs from the program, but as the lists will be variable lengths I assume some form of a loop will be used. The primary.pop([mid]) was my poor attempt at getting the median printed. I know that the mid will not be printed as it is the wrong variable type, but how would I replace this?
Any help is appreciated.
You're unnecessarily calling the pop() method on primary with [mid] when you should simply be indexing primary with mid. You should also use the // operator instead of / to obtain an integer value for the index. Since the index is 0-based, the mid point should be (length - 1) // 2 instead:
primary = []
length = 0
i = ("MORE")
while i != "NOMORE":
i = str(input("?"))
print(i)
if i == "NOMORE":
break
primary.append(i)
length = length + 1
mid = (length - 1) // 2
print(primary[0]," , ", primary[-1]," , ",primary[mid])

Repeat function indefinitely in a column until values reach certain level

I have this dataframe:
grade type
0 402 A
1 312 B
2 321 C
...
If the type is A and the value in grade column if higher than 100 I want to multiply it by 0.7 indefinitely until it gets to a value below 100. I didn't find a good way to do that, right now I'm using this code below:
df.loc[(df['type'] == 'A') & (df['grade'] > 100),'grade'] = df['grade']*0.7
(I repeat that 100 times and cross my fingers for 'grade' to be below 100)
I could just do that a few times and if it didn't reach I would force to be 100, however I don't want to have a lot of equal values in the df and also I can't put a random component in it.
Is there a better way to do that (preferably with Pandas)?
You can calculate the power needed using np.log(natural logarithm), with which you can further calculate the multiplier needed to bring down the value under 100:
df.loc[df.type.eq('A') & df.grade.gt(100), 'grade'] = df.grade * np.power(0.7, np.floor(np.log(100 / df.grade) / np.log(0.7)) + 1)
df
# grade type
#0 96.5202 A
#1 312.0000 B
#2 321.0000 C
This should work:
for index, row in df.iterrows():
if row['type'] == 'A':
grade = row['grade']
while grade > 100:
grade = grade*.7
df.loc[index, 'grade'] = grade
#Psidom's suggestion is interesting and it certainly works, however I wanted something more simple and also I wanted to avoid going to Numpy.
Using #ykrueng's suggestion as inspiration, I found a way to run exactly what I wanted:
while len((df.loc[(df['type'] == 'A') & (df['grade'] > 100)]).index)>0:
df.loc[(df['type'] == 'A') & (df['grade'] > 100),'grade'] = df['grade']*0.7

Categories

Resources