Python Pandas DataFrame Difference Between Masking Left side only and Both Side - python

Lets say we have a pandas dataframe like bellow,
import pandas as pd
df = pd.DataFrame([{'col1':'a', 'col2':'b'}, {'col1':None, 'col2':'d'}, {'col1':'e', 'col2':'f'}, {'col1':None, 'col2':'1'}])
Is there any difference between this two code,
df.loc[~df['col1'].isnull(), 'col1'] = df['col1'].str.upper()
print(df)
vs
mask = ~df['col1'].isnull()
df.loc[mask, 'col1'] = df[mask]['col1'].str.upper()
print(df)

Related

DataFrame returns empty after .update()

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

Creating an empty Pandas DataFrame column with a fixed first value then filling it with a formula

I'd like to create an emtpy column in an existing DataFrame with the first value in only one column to = 100. After that I'd like to iterate and fill the rest of the column with a formula, like row[C][t-1] * (1 + row[B][t])
very similar to:
Creating an empty Pandas DataFrame, then filling it?
But the difference is fixing the first value of column 'C' to 100 vs entirely formulas.
import datetime
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B','C']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0)
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)
df['B'] = df['A'].pct_change()
df['C'] = df['C'].shift() * (1+df['B'])
## how do I set 2016-10-03 in Column 'C' to equal 100 and then calc consequtively from there?
df
Try this. Unfortunately, something similar to a for loop is likely needed because you will need to calculate the next row based on the prior rows value which needs to be saved to a variable as it moves down the rows (c_column in my example):
c_column = []
c_column.append(100)
for x,i in enumerate(df['B']):
if(x>0):
c_column.append(c_column[x-1] * (1+i))
df['C'] = c_column

Pandas: Fill new column by condition row-wise

import pandas as pd
import numpy as np
df = pd.DataFrame([np.random.rand(100),100*[0.1],100*[0.3]]).T
df.columns = ["value","lower","upper"]
df.head()
How can I create a new column which indicates that value is between lower and upper ?
You can use between for this purpose.
df['new_col'] = df['value'].between(df['lower'], df['upper'])

concat a DataFrame with a Series in Pandas

Can someone explain what is wrong with this pandas concat code, and why data frame remains empty ?I am using anaconda distibution, and as far as I remember it was working before.
You want to use this form:
result = pd.concat([dataframe, series], axis=1)
The pd.concat(...) doesn't happen "inplace" into the original dataframe but it would return the concatenated result so you'll want to assign the concatenation somewhere, e.g.:
>>> import pandas as pd
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame()
>>> df = pd.concat([df, s], axis=1) # We assign the result back into df
>>> df
0
0 1
1 2
2 3

Pandas dataframe: Can you assign a label for the column names and/or the df values?

When you define a dataframe in pandas in the following manner
df = pd.DataFrame([['07-Dec-2015', 1,2],
['08-Dec-2015', 3,4],
['09-Dec-2015', 5,6]],
columns=['Date','FR','UK'])
df.set_index('Date')
Out[1]:
FR UK
Date
07-Dec-2015 1 2
08-Dec-2015 3 4
09-Dec-2015 5 6
is there a way to assign a label to the columns (let's say 'Country') and another label for the dataframe values (lets say 'Hits'). I would like to make it look like this:
As a side note: The dataframe in the attached img above has been created as follows:
df = pd.DataFrame()
df['Date'] = ['07-Dec-2015','07-Dec-2015','08-Dec-2015','08-Dec-2015','09-Dec-2015','09-Dec-2015']
df['Country'] = ['UK','FR','UK','FR','UK','FR']
df['Hits'] = [2,1,4,3,6,5]
df = df.set_index(['Date','Country'])
df.unstack()
However this is not good enough for my purpose because in my python application the dataframe constructor is getting passed a numpy array and for the index arg a datetime vector, hence broadly speaking it looks like: pd.DataFrame(numpy.ndarray, columns=columnNames, index=DatetimeIndex)
Thanks in advance
You could:
df = pd.DataFrame(np.random.random((10, 2)), index=pd.DatetimeIndex(start=date(2015,1,1), periods=10, freq='D'))
df.index.name = 'Date'
df.columns = pd.MultiIndex.from_product([['Hits'], ['UK', 'FR']], names=['', 'Country'])
See MultiIndex docs.

Categories

Resources