Dataframe split columns value, how to solve error message?

Dataframe split columns value, how to solve error message? - python

I have a panda dataframe with the following columns:
Stock ROC5 ROC20 ROC63 ROCmean
0 IBGL.SW -0.59 3.55 6.57 3.18
0 EHYA.SW 0.98 4.00 6.98 3.99
0 HIGH.SW 0.94 4.22 7.18 4.11
0 IHYG.SW 0.56 2.46 6.16 3.06
0 HYGU.SW 1.12 4.56 7.82 4.50
0 IBCI.SW 0.64 3.57 6.04 3.42
0 IAEX.SW 8.34 18.49 14.95 13.93
0 AGED.SW 9.45 24.74 28.13 20.77
0 ISAG.SW 7.97 21.61 34.34 21.31
0 IAPD.SW 0.51 6.62 19.54 8.89
0 IASP.SW 1.08 2.54 12.18 5.27
0 RBOT.SW 10.35 30.53 39.15 26.68
0 RBOD.SW 11.33 30.50 39.69 27.17
0 BRIC.SW 7.24 11.08 75.60 31.31
0 CNYB.SW 1.14 4.78 8.36 4.76
0 FXC.SW 5.68 13.84 19.29 12.94
0 DJSXE.SW 3.11 9.24 6.44 6.26
0 CSSX5E.SW -0.53 5.29 11.85 5.54
How can I write in the dataframe a new columns "Symbol" with the stock without ".SW".
Example first row result should be IBGL (modified value IBGL.SW).
Example last row result should be CSSX5E (splited value SSX5E.SW).
If I send the following command:
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
Than I receive an error message:
:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df['Symbol'] = new_df.loc[:, ('Stock')].str.split('.').str[0]
How can I solve this problem?
Thanks a lot for your support.

METHOD 1:
You can do a vectorized operation by str.get(0) -
df['SYMBOL'] = df['Stock'].str.split('.').str.get(0)
METHOD 2:
You can do another vectorized operation by using expand=True in str.split() and then getting the first column.
df['SYMBOL'] = df['Stock'].str.split('.', expand = True)[0]
METHOD 3:
Or you can write a custom lambda function with apply (for more complex processes). Note, this is slower but good if you have your own UDF.
df['SYMBOL'] = df['Stock'].apply(lambda x:x.split('.')[0])

This is not an error, but a warning as you may have probably noticed your script finishes its execution.
edite: Given your comments it seems your issues generate previously in the code, therefore I suggest you use the following:
new_df = new_df.copy(deep=False)
And then proceed to solve it with:
new_df.loc['Symbol'] = new_df['Stock'].str.split('.').str[0]

new_df = new_df.copy()
new_df['Symbol'] = new_df.Stock.str.replace('.SW','')

Related

Combine a row with column in dataFrame and show the corresponding values

So I want to show this data in just two columns. For example, I want to turn this data
Year Jan Feb Mar Apr May Jun
1997 3.45 2.15 1.89 2.03 2.25 2.20
1998 2.09 2.23 2.24 2.43 2.14 2.17
1999 1.85 1.77 1.79 2.15 2.26 2.30
2000 2.42 2.66 2.79 3.04 3.59 4.29
into this
Date Price
Jan-1977 3.45
Feb-1977 2.15
Mar-1977 1.89
Apr-1977 2.03
....
Jan-2000 2.42
Feb-2000 2.66
So far, I have read about how to combine two columns into another dataframe using .apply() .agg(), but no info how to combine them as I showed above.
import pandas as pd
df = pd.read_csv('matrix-A.csv', index_col =0 )
matrix_b = ({})
new = pd.DataFrame(matrix_b)
new["Date"] = df['Year'].astype(float) + "-" + df["Dec"]
print(new)
I have tried this way, but it of course does not work. I have also tried using pd.Series() but no success
I want to ask whether there is any site where I can learn how to do this, or does anybody know correct way to solve this?

Another possible solution, which is based on pandas.DataFrame.stack:
out = df.set_index('Year').stack()
out.index = ['{}_{}'.format(j, i) for i, j in out.index]
out = out.reset_index()
out.columns = ['Date', 'Value']
Output:
Date Value
0 Jan_1997 3.45
1 Feb_1997 2.15
2 Mar_1997 1.89
3 Apr_1997 2.03
4 May_1997 2.25
....
19 Feb_2000 2.66
20 Mar_2000 2.79
21 Apr_2000 3.04
22 May_2000 3.59
23 Jun_2000 4.29

You can first convert it to long-form using melt. Then, create a new column for Date by combining two columns.
long_df = pd.melt(df, id_vars=['Year'], var_name='Month', value_name="Price")
long_df['Date'] = long_df['Month'] + "-" + long_df['Year'].astype('str')
long_df[['Date', 'Price']]
If you want to sort your date column, here is a good resource. Follow those instructions after melting and before creating the Date column.

You can use pandas.DataFrame.melt :
out = (
df
.melt(id_vars="Year", var_name="Month", value_name="Price")
.assign(month_num= lambda x: pd.to_datetime(x["Month"] , format="%b").dt.month)
.sort_values(by=["Year", "month_num"])
.assign(Date= lambda x: x.pop("Month") + "-" + x.pop("Year").astype(str))
.loc[:, ["Date", "Price"]]
)
# Output :
print(out)

Date Price
0 Jan-1997 3.45
4 Feb-1997 2.15
8 Mar-1997 1.89
12 Apr-1997 2.03
16 May-1997 2.25
.. ... ...
7 Feb-2000 2.66
11 Mar-2000 2.79
15 Apr-2000 3.04
19 May-2000 3.59
23 Jun-2000 4.29
[24 rows x 2 columns]

how to subtract first value with all the rest value

I have data like this:
timestamp high windSpeed windDir windU windV
04/05/2019 10:02 100 4.39 179.1 -0.14 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:03 100 4.39 179.1 -0.15 8.53
150 2.44 164.5 -1.26 4.57
200 4.29 180.9 0.12 8.32
04/05/2019 10:04 100 4.52 179.1 -0.16 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
04/05/2019 10:05 100 4.52 179.1 -0.17 8.79
150 2.15 162.8 -1.24 4
200 3.34 181.9 0.21 6.49
and I want to subtract the value from higher level with lower level in each time.This is what I got so far, but this one only give me 1 value. Anyone can help me please? thank you.
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag = sqrt(group["windU"] ** 2 + group["windV"] ** 2)
diffMag = windMag[heightIndex+1]-windMag[heightIndex]

I'm not sure if I'm accomplishing what you're asking, but based on my looking at your code, it seems you are trying to get the difference between the i-th and i+1-th index in the column "high" and call that variable diffMag. If that's the case you can probably use one of the two methods.
Solution 1:
diff_mag = []
for i in range(len(wind['height'])-1):
diff_mag[i] = wind['height'][i+1] - wind['height'][i]
Solution 2:
Use numpy diff.
np.diff(wind['height'])
I made the assumption you're using pandas here based on what your code block looks like. Hope that helps.
EDIT
Okay..I think I understand what you are saying now.
I think this should work:
windMag = []
for timestamp, group in grouped:
HeightIndices = group["high"].keys()
for heightIndex in range(HeightIndices[0], HeightIndices[0] + len(HeightIndices) - 1):
windMag.append(sqrt(group["windU"] ** 2 + group["windV"] ** 2))
diffMag = np.diff(windMag)

Convert fractional values to decimal in Pandas

I have a dataframe with messy data.
df:
1 2 3
-- ------- ------- -------
0 123/100 221/100 103/50
1 49/100 333/100 223/50
2 153/100 81/50 229/100
3 183/100 47/25 31/20
4 2.23 3.2 3.04
5 2.39 3.61 2.69
I want the fractional values to be converted to decimal with the conversion formula being
e.g:
123/100 = (123/100 + 1) = 2.23
333/100 = (333/100 +1) = 4.33
The calculation is fractional value + 1
And of course leave the decimal values as is.
How can I do it in Pandas and Python?

A simple way to do this is to first define a conversion function that will be applied to each element in a column:
def convert(s):
if '/' in s: # is a fraction
num, den = s.split('/')
return 1+(int(num)/int(den))
else:
return float(s)
Then use the .apply function to run all elements of a column through this function:
df['1'] = df['1'].apply(convert)
Result:
df['1']:
0 2.23
1 1.49
2 2.53
3 2.83
4 2.23
5 2.39
Then repeat on any other column as needed.

If you trust the data in your dataset, the simplest way is to use eval or better, suggested by #mozway, pd.eval:
>>> df.replace(r'(\d+)/(\d+)', r'1+\1/\2', regex=True).applymap(pd.eval)
1 2 3
0 2.23 3.21 3.06
1 1.49 4.33 5.46
2 2.53 2.62 3.29
3 2.83 2.88 2.55
4 2.23 3.20 3.04
5 2.39 3.61 2.69

extract only certain rows in a dataframe

I have a dataframe like this:
Code Date Open High Low Close Volume VWAP TWAP
0 US_GWA_BTC 2014-04-01 467.28 488.62 467.28 479.56 74,776.48 482.76 482.82
1 GWA_BTC 2014-04-02 479.20 494.30 431.32 437.08 114,052.96 460.19 465.93
2 GWA_BTC 2014-04-03 437.33 449.74 414.41 445.60 91,415.08 432.29 433.28
.
316 MWA_XRP_US 2018-01-19 1.57 1.69 1.48 1.53 242,563,870.44 1.59 1.59
317 MWA_XRP_US 2018-01-20 1.54 1.62 1.49 1.57 140,459,727.30 1.56 1.56
I want to filter out rows where code which has GWA infront of it.
I tried this code but it's not working.
df.set_index("Code").filter(regex='[GWA_]*', axis=0)

Try using startswith:
df[df.Code.str.startswith('GWA')]

Cannot convert input to Timestamp, bday_range(...) - Pandas/Python

Looking to generate a number for the days in business days between current date and the end of the month of a pandas dataframe.
E.g. 26/06/2017 - 4, 23/06/2017 - 5
I'm having trouble as I keep getting a Type Error:
TypeError: Cannot convert input to Timestamp
From line:
result['bdaterange'] = pd.bdate_range(pd.to_datetime(result['dte'], unit='ns').values, pd.to_datetime(result['bdate'], unit='ns').values)
I have a Data Frame result with the column dte in a date format and I'm trying to create a new column (bdaterange) as a simple integer/float that I can use to see how far from month end in business days it has.
Sample data:
bid ask spread dte day bdate
01:49:00 2.17 3.83 1.66 2016-12-20 20.858333 2016-12-30
02:38:00 2.2 3.8 1.60 2016-12-20 20.716667 2016-12-30
22:15:00 2.63 3.12 0.49 2016-12-20 21.166667 2016-12-30
03:16:00 1.63 2.38 0.75 2016-12-21 21.391667 2016-12-30
07:11:00 1.46 2.54 1.08 2016-12-21 21.475000 2016-12-30
I've tried BDay() and using that the day cannot be 6 & 7 in the calculation but have not got anywhere. I came across bdate_range which I believe will be exactly what I'm looking for, but the closest I've got gives me the error Cannot convert input to Timestamp.
My attempt is:
result['bdate'] = pd.to_datetime(result['dte']) + BMonthEnd(0)
result['bdaterange'] = pd.bdate_range(pd.to_datetime(result['dte'], unit='ns').values, pd.to_datetime(result['bdate'], unit='ns').values)
print(result['bdaterange'])
Not sure how to solve the error though.

I think you need length of bdate_range for each row, so need custom function with apply:
#convert only once to datetime
result['dte'] = pd.to_datetime(result['dte'])
f = lambda x: len(pd.bdate_range(x['dte'], x['dte'] + pd.offsets.BMonthEnd(0)))
result['bdaterange'] = result.apply(f, axis=1)
print (result)
bid ask spread dte day bdaterange
01:49:00 2.17 3.83 1.66 2016-12-20 20.858333 9
02:38:00 2.20 3.80 1.60 2016-12-20 20.716667 9
22:15:00 2.63 3.12 0.49 2016-12-20 21.166667 9
03:16:00 1.63 2.38 0.75 2016-12-21 21.391667 8
07:11:00 1.46 2.54 1.08 2016-12-21 21.475000 8

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dataframe split columns value, how to solve error message? - python

new_df = new_df.copy() new_df['Symbol'] = new_df.Stock.str.replace('.SW','')

Related

Combine a row with column in dataFrame and show the corresponding values

how to subtract first value with all the rest value

Convert fractional values to decimal in Pandas

extract only certain rows in a dataframe

Cannot convert input to Timestamp, bday_range(...) - Pandas/Python

Categories

Resources