I am new to python and I've been trying to wrap my head around this code:
stop = int(input())
result = 0
for a in range(4):
print(a, end=': ')
for b in range(2):
result += a + b
if result > stop:
print('-', end=' ')
continue
print(result, end=' ')
print()
When I input 6, the output is
0: 0 1
1: 2 4
2: 6 -
3: - -
why isn't it
0: 0 1
1: 3 4 --> since we're starting out with a = 1 and b = 2 so result is 1 + 2 = 3.
etc
I feel like I'm missing something fundamental.
Value of b will never be 2.
Each iteration of loop will initialise the scope variables. i.e. while looping first loop, value of b will range between 0 & 1.
Whereas, Value of result (a global variable) will be cumulative (value obtained from prev iteration).
iteration
a
b
result
output
1
0
0
0
0: 0..
2
0
1
1
0: 0 1
3
1
0
2
1: 2..
4
1
1
4
1: 2 4
5
2
0
6
2: 6..
6
2
1
9
2: 6 9
7
3
0
12
3: 12..
8
3
1
16
3: 12 16
when a = 0 and b = 1, result = 0 + 0 + 1 = 1,
so, for a = 1 and b = 0, result = 1 + 1 + 0 = 2
when a = 1 and b = 1, result = 2 + 1 + 1 = 4
so, it will print 1: 2 4
Related
I have a dataframe that looks like this:
print(df)
Out[1]:
Numbers
0 0
1 1
2 1
3 1
4 1
5 0
6 0
7 1
8 0
9 1
10 1
I want to transform it to this:
print(dfo)
Out[2]:
Numbers
0 0
1 1
2 2
3 3
4 4
5 0
6 0
7 1
8 0
9 1
10 2
The solution to this,I thought, it could be an iloc with 2 ifs:
Check if the digit in df is 1, if1 true then check if2 the i-1 is 1, if true then in dfo see the value of i-1 and add 1,elifs just put the value of 0 in dfo.
I've tryed this:
# Import pandas library
import pandas as pd
# initialize list elements
list = [0,1,1,1,1,0,0,1,0,1,1]
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(list, columns=['Numbers'])
# print dataframe.
df
data1c = df.copy()
for j in df:
for i in range(len(df)):
if df.loc[i, j] == 1:
if df.loc[i-1, j] == 1:
data1c.loc[i, j] = data1c.loc[i-1, j]+1
elif df.loc[i-1, j] == 0:
data1c.loc[i, j] = 1
elif df.loc[i, j] == 0:
data1c.loc[i, j] = 0
print(data1c)
Numbers
0 0
1 1
2 2
3 3
4 4
5 0
6 0
7 1
8 0
9 1
10 2
and for a dataframe of 1 column it works, but when I've tryed with a dataframe with 2 columns :
input = {'A': [0,1,1,1,1,0,0,1,0,1,1,0,1,1],
'B': [1,1,0,0,1,1,1,0,0,1,0,1,1,0]}
df = pd.DataFrame(input)
# Print the output.
df
data2c = df.copy()
for j in dfo:
for i in range(len(dfo)):
if dfo.loc[i, j] == 1:
if dfo.loc[i-1, j] == 1:
data2c.loc[i, j] = data2c.loc[i-1, j]+1
elif dfo.loc[i-1, j] == 0:
data2c.loc[i, j] = 1
elif dfo.loc[i, j] == 0:
data2c.loc[i, j] = 0
I get :
File "C:\Users\talls\.conda\envs\Spyder\lib\site-packages\pandas\core\indexes\range.py", line 393, in get_loc
raise KeyError(key) from err
KeyError: -1
Why do I get this error and how do I fix it?
or
Is there another way to get my desired out put?
I know this is not the answer to the question "How to use iloc to get reference from the rows above?", but it is the answer to your proposed question.
df = pd.DataFrame([0,1,1,1,1,0,0,1,0,1,1], columns=['Numbers'])
df['tmp'] = (~(df['Numbers'] == df['Numbers'].shift(1))).cumsum()
df['new'] = df.groupby('tmp')['Numbers'].cumsum()
print(df['new'])
Numbers tmp new
0 0 1 0
1 1 2 1
2 1 2 2
3 1 2 3
4 1 2 4
5 0 3 0
6 0 3 0
7 1 4 1
8 0 5 0
9 1 6 1
10 1 6 2
How does this work? The inner part ~(df['Numbers'] == df['Numbers'].shift(1)) checks whether the previous line is the same as the current line. For the first line, this works perfectly as well, because a number and a NaN always compare to False. Then I negate it mark the start of each new sequence with a True. When I then do a cumulative sum, I "group" all values with the created tmp column and do a cumulative sum over it to get the required answer in the new column.
For the two columned version, you'd do exactly the same... for both columns A and B:
df = pd.DataFrame(input)
df['tmp'] = (~(df['A'] == df['A'].shift(1))).cumsum()
df['newA'] = df.groupby('tmp')['A'].cumsum()
#Just reusing column tmp
df['tmp'] = (~(df['B'] == df['B'].shift(1))).cumsum()
df['newB'] = df.groupby('tmp')['B'].cumsum()
print(df)
A B tmp newA newB
0 0 1 1 0 1
1 1 1 1 1 2
2 1 0 2 2 0
3 1 0 2 3 0
4 1 1 3 4 1
5 0 1 3 0 2
6 0 1 3 0 3
7 1 0 4 1 0
8 0 0 4 0 0
9 1 1 5 1 1
10 1 0 6 2 0
11 0 1 7 0 1
12 1 1 7 1 2
13 1 0 8 2 0
To answer the question you originally proposed. (I mentioned it in a comment already.) You need to put in a safeguard against i == 0. You can do that two ways:
for j in dfo:
for i in range(len(dfo)):
if i == 0:
continue
elif dfo.loc[i, j] == 1:
if dfo.loc[i-1, j] == 1:
data2c.loc[i, j] = data2c.loc[i-1, j]+1
elif dfo.loc[i-1, j] == 0:
data2c.loc[i, j] = 1
elif dfo.loc[i, j] == 0:
data2c.loc[i, j] = 0
or start at 1 instead of 0:
for j in dfo:
for i in range(1,len(dfo)):
if dfo.loc[i, j] == 1:
if dfo.loc[i-1, j] == 1:
data2c.loc[i, j] = data2c.loc[i-1, j]+1
elif dfo.loc[i-1, j] == 0:
data2c.loc[i, j] = 1
elif dfo.loc[i, j] == 0:
data2c.loc[i, j] = 0
The resulting dataframe:
A B
0 0 1
1 1 2
2 2 0
3 3 0
4 4 1
5 0 2
6 0 3
7 1 0
8 0 0
9 1 1
10 2 0
11 0 1
12 1 2
13 2 0
I have the following code in R :
N = 100 # number of data points
unifvec = runif(N)
d1 = rpois(sum(unifvec < 0.5),la1);d1
[1] 3 1 1 0 0 0 0 2 1 1 1 0 2 1 0 1 2 0 1 0 1 1 0 0 1 1 0 1 1 3 0
[32] 2 2 1 4 0 1 0 1 1 1 1 3 0 0 2 0 1 1 1 1 3
Trying to translate it in Python I am doing :
la1 = 1
N = 100 # number of data points
unifvec = np.random.uniform(0,1,N)
d1 = np.random.poisson(la1,sum(la1,unifvec < 0.5))
but I receive an error :
TypeError: 'int' object is not iterable
How I can reproduce the same result in Python ?
The sum function receives arguments in the wrong order.
After changing sum(la1,unifvec < 0.5) to sum(unifvec < 0.5, la1) it works fine.
import numpy as np
la1 = 1
N = 100 # number of data points
unifvec = np.random.uniform(0, 1, N)
d1 = np.random.poisson(la1, sum(unifvec < 0.5, la1))
Following is the Dataframe I am starting from:
import pandas as pd
import numpy as np
d= {'PX_LAST':[1,2,3,3,3,1,2,1,1,1,3,3],'ma':[2,2,2,2,2,2,2,2,2,2,2,2],'action':[0,0,1,0,0,-1,0,1,0,0,-1,0]}
df_zinc = pd.DataFrame(data=d)
df_zinc
Now, I need to add a column called 'buy_sell', which:
when 'action'==1, populates with 1 if 'PX_LAST' >'ma', and with -1 if 'PX_LAST'<'ma'
when 'action'==-1, populates with the opposite of the previous non-zero value that was populated
FYI: in my data, the row that needs to be filled with the opposite of the previous non-zero item is always at the same distance from the previous non-zero item (i.e., 2 in the current example). This should facilitate making the code.
the code that I made so far is the following. It seems right to me. Do you have any fixes to propose?
while index < df_zinc.shape[0]:
if df_zinc['action'][index] == 1:
if df_zinc['PX_LAST'][index]<df_zinc['ma'][index]:
df_zinc.loc[index,'buy_sell'] = -1
else:
df_zinc.loc[index,'buy_sell'] = 1
elif df_zinc['action'][index] == -1:
df_zinc['buy_sell'][index] = df_zinc['buy_sell'][index-3]*-1
index=index+1
df_zinc
the resulting dataframe would look like this:
df_zinc['buy_sell'] = [0,0,1,0,0,-1,0,-1,0,0,1,0]
df_zinc
So, this would be my suggestion according to the example output (and assuming I understood the question properly:
def buy_sell(row):
if row['action'] == 0:
return 0
if row['PX_LAST'] > row['ma']:
return 1 * (-1 if row['action'] == 0 else 1)
else:
return -1 * (-1 if row['action'] == 0 else 1)
return 0
df_zinc = df_zinc.assign(buy_sell=df_zinc.apply(buy_sell, axis=1))
df_zinc
This should behave as expected by the rules. It does not take into account the possibility of 'PX_LAST' being equal to 'ma', returning 0 by default, as it was not clear what rule to follow in that scenario.
EDIT
Ok, after the new logic explained, I think this should do the trick:
def assign_buysell(df):
last_nonzero = None
def buy_sell(row):
nonlocal last_nonzero
if row['action'] == 0:
return 0
if row['action'] == 1:
if row['PX_LAST'] < row['ma']:
last_nonzero = -1
elif row['PX_LAST'] > row['ma']:
last_nonzero = 1
elif row['action'] == -1:
last_nonzero = last_nonzero * -1
return last_nonzero
return df.assign(buy_sell=df.apply(buy_sell, axis=1))
df_zinc = assign_buysell(df_zinc)
This solution is independent of how long ago the nonzero value was seen, it simply remembers the last nonzero value and pipes the opposite wen action is -1.
You can use np.select, and use np.nan as a label for the rows that satisfy the third condition:
c1 = df_zinc.action.eq(1) & df_zinc.PX_LAST.gt(df_zinc.ma)
c2 = df_zinc.action.eq(1) & df_zinc.PX_LAST.lt(df_zinc.ma)
c3 = df_zinc.action.eq(-1)
df_zinc['buy_sell'] = np.select([c1,c2, c3], [1, -1, np.nan])
Now in order to fill NaNs with the value from n rows above (in this case 3), you can fillna with a shifted version of the dataframe:
df_zinc['buy_sell'] = df_zinc.buy_sell.fillna(df_zinc.buy_sell.shift(3)*-1)
Output
PX_LAST ma action buy_sell
0 1 2 0 0.0
1 2 2 0 0.0
2 3 2 1 1.0
3 3 2 0 0.0
4 3 2 0 0.0
5 1 2 -1 -1.0
6 2 2 0 0.0
7 1 2 1 -1.0
8 1 2 0 0.0
9 1 2 0 0.0
10 3 2 -1 1.0
11 3 2 0 0.0
I would use np.select for this, since you have multiple conditions:
conditions = [
(df_zinc['action'] == 1) & (df_zinc['PX_LAST'] > df_zinc['ma']),
(df_zinc['action'] == 1) & (df_zinc['PX_LAST'] < df_zinc['ma']),
(df_zinc['action'] == -1) & (df_zinc['PX_LAST'] > df_zinc['ma']),
(df_zinc['action'] == -1) & (df_zinc['PX_LAST'] < df_zinc['ma'])
]
choices = [1, -1, 1, -1]
df_zinc['buy_sell'] = np.select(conditions, choices, default=0)
result
print(df_zinc)
PX_LAST ma action buy_sell
0 1 2 0 0
1 2 2 0 0
2 3 2 1 1
3 3 2 0 0
4 3 2 0 0
5 1 2 -1 -1
6 2 2 0 0
7 1 2 1 -1
8 1 2 0 0
9 1 2 0 0
10 3 2 -1 1
11 3 2 0 0
here my solution using the function shift() to trap the data of 3th up row:
df_zinc['buy_sell'] = 0
df_zinc.loc[(df_zinc['action'] == 1) & (df_zinc['PX_LAST'] < df_zinc['ma']), 'buy_sell'] = -1
df_zinc.loc[(df_zinc['action'] == 1) & (df_zinc['PX_LAST'] > df_zinc['ma']), 'buy_sell'] = 1
df_zinc.loc[df_zinc['action'] == -1, 'buy_sell'] = -df_zinc['buy_sell'].shift(3)
df_zinc['buy_sell'] = df_zinc['buy_sell'].astype(int)
print(df_zinc)
output:
PX_LAST ma action buy_sell
0 1 2 0 0
1 2 2 0 0
2 3 2 1 1
3 3 2 0 0
4 3 2 0 0
5 1 2 -1 -1
6 2 2 0 0
7 1 2 1 -1
8 1 2 0 0
9 1 2 0 0
10 3 2 -1 1
11 3 2 0 0
matrix = []
for index, value in enumerate(['A','C','G','T']):
matrix.append([])
matrix[index].append(value + ':')
for i in range(len(lines[0])):
total = 0
for sequence in lines:
if sequence[i] == value:
total += 1
matrix[index].append(total)
unity = ''
for i in range(len(lines[0])):
column = []
for row in matrix:
column.append(row[1:][i])
maximum = column.index(max(column))
unity += ['A', 'C', 'G', 'T'][maximum]
print("Unity: " + unity)
for row in matrix:
print(' '.join(map(str, row)))
OUTPUT:
Unity: GGCTACGC
A: 1 2 0 2 3 2 0 0
C: 0 1 4 2 1 3 2 4
G: 3 3 2 0 1 2 4 1
T: 3 1 1 3 2 0 1 2
With this code I get this matrix but I want to form the matrix like this:
A C G T
G: 1 0 3 3
G: 2 1 3 1
C: 0 4 2 1
T: 2 2 0 3
A: 3 1 1 2
C: 2 3 2 0
G: 0 2 4 1
C: 0 4 1 2
But I don't know how. I hope someone can help me. Thanks already for the answers.
The sequences are:
AGCTACGT
TAGCTAGC
TAGCTACG
GCTAGCGC
TGCTAGCC
GGCTACGT
GTCACGTC
You're needing to do a transpose of your matrix. I've added comments in the code below to explain what has been changed to make the table.
matrix = []
for index, value in enumerate(['A','C','G','T']):
matrix.append([])
# Don't put colons in column headers
matrix[index].append(value)
for i in range(len(lines[0])):
total = 0
for sequence in lines:
if sequence[i] == value:
total += 1
matrix[index].append(total)
unity = ''
for i in range(len(lines[0])):
column = []
for row in matrix:
column.append(row[1:][i])
maximum = column.index(max(column))
unity += ['A', 'C', 'G', 'T'][maximum]
# Tranpose matrix
matrix = list(map(list, zip(*matrix)))
# Print header with tabs to make it look pretty
print( '\t'+'\t'.join(matrix[0]))
# Print rows in matrix
for row,unit in zip(matrix[1:],unity):
print(unit + ':\t'+'\t'.join(map(str, row)))
The following will be printed:
A C G T
G: 1 0 3 3
G: 2 1 3 1
C: 0 4 2 1
T: 2 2 0 3
A: 3 1 1 2
C: 2 3 2 0
G: 0 2 4 1
C: 0 4 1 2
I think that the best way is to convert your matrix to pandas dataframe and to then use transpose function.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transpose.html
I have a df like so:
Count
1
0
1
1
0
0
1
1
1
0
and I want to return a 1 in a new column if there are two or more consecutive occurrences of 1 in Count and a 0 if there is not. So in the new column each row would get a 1 based on this criteria being met in the column Count. My desired output would then be:
Count New_Value
1 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
1 1
0 0
I am thinking I may need to use itertools but I have been reading about it and haven't come across what I need yet. I would like to be able to use this method to count any number of consecutive occurrences, not just 2 as well. For example, sometimes I need to count 10 consecutive occurrences, I just use 2 in the example here.
You could:
df['consecutive'] = df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count
to get:
Count consecutive
0 1 1
1 0 0
2 1 2
3 1 2
4 0 0
5 0 0
6 1 3
7 1 3
8 1 3
9 0 0
From here you can, for any threshold:
threshold = 2
df['consecutive'] = (df.consecutive > threshold).astype(int)
to get:
Count consecutive
0 1 0
1 0 0
2 1 1
3 1 1
4 0 0
5 0 0
6 1 1
7 1 1
8 1 1
9 0 0
or, in a single step:
(df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)
In terms of efficiency, using pandas methods provides a significant speedup when the size of the problem grows:
df = pd.concat([df for _ in range(1000)])
%timeit (df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size') * df.Count >= threshold).astype(int)
1000 loops, best of 3: 1.47 ms per loop
compared to:
%%timeit
l = []
for k, g in groupby(df.Count):
size = sum(1 for _ in g)
if k == 1 and size >= 2:
l = l + [1]*size
else:
l = l + [0]*size
pd.Series(l)
10 loops, best of 3: 76.7 ms per loop
Not sure if this is optimized, but you can give it a try:
from itertools import groupby
import pandas as pd
l = []
for k, g in groupby(df.Count):
size = sum(1 for _ in g)
if k == 1 and size >= 2:
l = l + [1]*size
else:
l = l + [0]*size
df['new_Value'] = pd.Series(l)
df
Count new_Value
0 1 0
1 0 0
2 1 1
3 1 1
4 0 0
5 0 0
6 1 1
7 1 1
8 1 1
9 0 0