Pandas DataFrame Output Format - python

I need help reformatting my DataFrame output for stock closing prices.
Currently my output has the Stock Symbols as Headers where I would like to have them displayed in rows. df_output = 1: https://i.stack.imgur.com/u4jEk.png
I would like to have it displayed as below:
results
This is my current df_output code (not sure if this is the reason):
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})
excel_list
FULL CODE:
import pandas as pd
import numpy as np
import yfinance as yf
from yahoofinancials import YahooFinancials
from datetime import datetime
import time
start_time = time.time()
df = pd.read_excel(r'C:\Users\Ryan\Desktop\Stock Portfolio\\My Portfolio.xlsx', sheet_name=0, skiprows=2)
list1 = list(df['Stock Code'])
assets = list1
yahoo_financials = YahooFinancials(assets)
data = yahoo_financials.get_historical_price_data(start_date=str(datetime.now().date().replace(month=1, day=1)),
end_date=str(datetime.now().date().replace(month=12, day=31)),
time_interval='daily')
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})

Check pandas functions such as https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.wide_to_long.html and https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html for operations for converting between long and wide formats.

Try this:
prices_df.rename_axis('Date').reset_index().melt('Date', var_name='Symbol', value_name='Price')
Output:
Date Symbol Price
0 2020-01-02 FB 209.779999
1 2020-01-03 FB 208.669998
2 2020-01-06 FB 212.600006
3 2020-01-07 FB 213.059998
4 2020-01-08 FB 215.220001
.. ... ... ...
973 2020-08-18 CDNS 109.150002
974 2020-08-19 CDNS 108.529999
975 2020-08-20 CDNS 111.260002
976 2020-08-21 CDNS 110.570000
977 2020-08-24 CDNS 111.260002
[978 rows x 3 columns]

Related

Python: convert str object to dataframe

I have a string object as follows:
time,exchange,fsym,tsym,close,high,low,open,volumefrom,volumeto
1660003200,NYSE,BTC,USD,100.1,103,99.1,100,30,10000
1660003260,NYSE,BTC,USD,101.3,104,100.1,102,39,12000
1660003320,NYSE,BTC,USD,100.9,103.2,98,100,32,100230
I am trying to convert this string object to a DataFrame. I have tried adding brackets "[]" around the data but that still didn't work. Any suggestions would be greatly appreciated.
Looks like your string is in CSV format. You can convert this into a Pandas data frame using the StringIO module:
from io import StringIO
import pandas as pd
data = StringIO("""time,exchange,fsym,tsym,close,high,low,open,volumefrom,volumeto
1660003200,NYSE,BTC,USD,100.1,103,99.1,100,30,10000
1660003260,NYSE,BTC,USD,101.3,104,100.1,102,39,12000
1660003320,NYSE,BTC,USD,100.9,103.2,98,100,32,100230""")
df = pd.read_csv(data)
print(df)
import pandas as pd
string = """time,exchange,fsym,tsym,close,high,low,open,volumefrom,volumeto
1660003200,NYSE,BTC,USD,100.1,103,99.1,100,30,10000
1660003260,NYSE,BTC,USD,101.3,104,100.1,102,39,12000
1660003320,NYSE,BTC,USD,100.9,103.2,98,100,32,100230"""
str_list_with_comma = string.split("\n")
columns = []
data = []
for idx, item in enumerate(str_list_with_comma):
if(idx == 0):
columns = item.split(",")
else:
data.append(item.split(","))
df = pd.DataFrame(data, columns=columns)
print(df)
Output:
time exchange fsym tsym close high low open volumefrom volumeto
0 1660003200 NYSE BTC USD 100.1 103 99.1 100 30 10000
1 1660003260 NYSE BTC USD 101.3 104 100.1 102 39 12000
2 1660003320 NYSE BTC USD 100.9 103.2 98 100 32 100230

Convert yahoofinancials multidimensional dictionary output to dataframe

I'm creating a stock screener based on fundamental metrics using yahoofinancials module.
Below code gives output in multidimensional dictionary format that I'm not able to convert into dataframe format for further analysis.
import pandas as pd
from yahoofinancials import YahooFinancials
ticker = 'RELIANCE.NS'
yahoo_financials = YahooFinancials(ticker)
income_statement_data_qt = yahoo_financials.get_financial_stmts('quarterly', 'income')
income_statement_data_qt
Output:
Ideally, I'd like to have data in this format.
You can use list comprehension to iterate over the dictionaries from that particular ticker and use Pandas concat to concatenate the data along the columns axis (axis=1). Then, use rename_axis and reset_index to convert the index to a column with the desired name. Create a new column with the ticker name at the first position using insert.
import pandas as pd
from yahoofinancials import YahooFinancials
ticker = 'RELIANCE.NS'
yahoo_financials = YahooFinancials(ticker)
income_statement_data_qt = yahoo_financials.get_financial_stmts('quarterly', 'income')
dict_list = income_statement_data_qt['incomeStatementHistoryQuarterly'][ticker]
df = pd.concat([pd.DataFrame(i) for i in dict_list], axis=1)
df = df.rename_axis('incomeStatementHistoryQuarterly').reset_index()
df.insert(0, 'ticker', ticker)
print(df)
Output from df
ticker incomeStatementHistoryQuarterly ... 2021-03-31 2020-12-31
0 RELIANCE.NS costOfRevenue ... 1.034690e+12 7.224900e+11
1 RELIANCE.NS discontinuedOperations ... NaN NaN
2 RELIANCE.NS ebit ... 1.571800e+11 1.490100e+11
3 RELIANCE.NS effectOfAccountingCharges ... NaN NaN
...
...
18 RELIANCE.NS sellingGeneralAdministrative ... 3.976000e+10 4.244000e+10
19 RELIANCE.NS totalOperatingExpenses ... 1.338570e+12 1.029590e+12
20 RELIANCE.NS totalOtherIncomeExpenseNet ... -1.330000e+09 2.020000e+09
21 RELIANCE.NS totalRevenue ... 1.495750e+12 1.178600e+12
[22 rows x 6 columns]

how to draw graph? (x=datetime, y=price)

I type some code to show the relationship between price and datetime about bitcoin.So I want to draw the graph to show them BUT it fail, i dont know the reason, please give me some tips,thanks a lot.
below is my code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv(r'D:\downloads\BTC-USD.csv', date_parser = True)
df.columns = ['datetime','open','high','low','close','adj', 'vol']
print(df.head(5))
df.index = df['datetime']
df.datetime=pd.to_numeric(df.datetime,errors='coerce')
df.adj=pd.to_numeric(df.adj,errors='coerce')
print(df[['datetime', 'adj']].plot(kind = 'line', figsize=[20,5]))
below is the terminal result
PS D:\python> python test3.py
0 2020-10-30 13437.874023 ... 13546.522461 30581485201
1 2020-10-31 13546.532227 ... 13780.995117 30306464719
2 2020-11-01 13780.995117 ... 13737.109375 24453857900
3 2020-11-02 13737.032227 ... 13550.489258 30771455468
4 2020-11-03 13550.451172 ... 13950.300781 29869951617
[5 rows x 7 columns]
PS D:\python> python test3.py
datetime open ... adj vol
0 2020-10-30 13437.874023 ... 13546.522461 30581485201
1 2020-10-31 13546.532227 ... 13780.995117 30306464719
2 2020-11-01 13780.995117 ... 13737.109375 24453857900
3 2020-11-02 13737.032227 ... 13550.489258 30771455468
4 2020-11-03 13550.451172 ... 13950.300781 29869951617
[5 rows x 7 columns]
AxesSubplot(0.125,0.11;0.775x0.77)
There's no need to use the print function. Just call
df[['datetime', 'adj']].plot(kind = 'line', figsize=[20,5])

Download multiple stocks with pandas yahoo finance datareader and putting them in a DataFrame

Hi guys i would like to download multiple stocks from yahoo finance using Pandas.
But at the same time I need to save only the "Adj Close" column for each stock.
Moreover I would like to create a DataFrame with all this "Adj Close" columns and set the columns name as the stock ticker.
I tried to use this code but I'm stuck.
import numpy as np
import pandas as pd
from datetime import datetime
import pandas_datareader.data as web
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2015,1,1)
f = web.DataReader(stocks, 'yahoo',start,end)
f
Hope anyone can help me
df = f[[("Adj Close", s) for s in stocks]]
df.columns = df.columns.droplevel(level=0)
df
>>
Symbols ORCL TSLA IBM YELP MSFT
Date
2014-01-02 33.703285 30.020000 137.696884 67.919998 31.983477
2014-01-03 33.613930 29.912001 138.520721 67.660004 31.768301
2014-01-06 33.479893 29.400000 138.045746 71.720001 31.096956
2014-01-07 33.819431 29.872000 140.799240 72.660004 31.337952
2014-01-08 33.703274 30.256001 139.507858 78.419998 30.778502
... ... ... ... ...
2014-12-24 41.679443 44.452000 123.015839 53.000000 42.568497
2014-12-26 41.562233 45.563999 123.411110 52.939999 42.338593
2014-12-29 41.120468 45.141998 122.019974 53.009998 41.958347
2014-12-30 40.877041 44.445999 121.670265 54.240002 41.578117
2014-12-31 40.543465 44.481998 121.966751 54.730000 41.074078

Problem with group by max period in dataframe pandas

I'm still a novice with python and I'm having problems trying to group some data to show that record that has the highest (maximum) date, the dataframe is as follows:
...
I am trying the following:
df_2 = df.max(axis = 0)
df_2 = df.periodo.max()
df_2 = df.loc[df.groupby('periodo').periodo.idxmax()]
And it gives me back:
Timestamp('2020-06-01 00:00:00')
periodo 2020-06-01 00:00:00
valor 3.49136
Although the value for 'periodo' is correct, for 'valor' it is not, since I need to obtain the corresponding complete record ('period' and 'value'), and not the maximum of each one. I have tried other ways but I can't get to what I want ...
I need to do?
Thank you in advance, I will be attentive to your answers!
Regards!
# import packages we need, seed random number generator
import pandas as pd
import datetime
import random
random.seed(1)
Create example dataframe
dates = [single_date for single_date in (start_date + datetime.timedelta(n) for n in range(day_count))]
values = [random.randint(1,1000) for _ in dates]
df = pd.DataFrame(zip(dates,values),columns=['dates','values'])
ie df will be:
dates values
0 2020-01-01 389
1 2020-01-02 808
2 2020-01-03 215
3 2020-01-04 97
4 2020-01-05 500
5 2020-01-06 30
6 2020-01-07 915
7 2020-01-08 856
8 2020-01-09 400
9 2020-01-10 444
Select rows with highest entry in each column
You can do:
df[df['dates'] == df['dates'].max()]
(Or, if wanna use idxmax, can do: df.loc[[df['dates'].idxmax()]])
Returning:
dates values
9 2020-01-10 444
ie this is the row with the latest date
&
df[df['values'] == df['values'].max()]
(Or, if wanna use idxmax again, can do: df.loc[[df['values'].idxmax()]] - as in Scott Boston's answer.)
and
dates values
6 2020-01-07 915
ie this is the row with the highest value in the values column.
Reference.
I think you need something like:
df.loc[[df['valor'].idxmax()]]
Where you use idxmax on the 'valor' column. Then use that index to select that row.
MVCE:
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'periodo':pd.date_range('2018-07-01', periods = 600, freq='d'),
'valor':np.random.random(600)+3})
df.loc[[df['valor'].idxmax()]]
Output:
periodo valor
474 2019-10-18 3.998918

Categories

Resources