how to draw graph? (x=datetime, y=price) - python

I type some code to show the relationship between price and datetime about bitcoin.So I want to draw the graph to show them BUT it fail, i dont know the reason, please give me some tips,thanks a lot.
below is my code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv(r'D:\downloads\BTC-USD.csv', date_parser = True)
df.columns = ['datetime','open','high','low','close','adj', 'vol']
print(df.head(5))
df.index = df['datetime']
df.datetime=pd.to_numeric(df.datetime,errors='coerce')
df.adj=pd.to_numeric(df.adj,errors='coerce')
print(df[['datetime', 'adj']].plot(kind = 'line', figsize=[20,5]))
below is the terminal result
PS D:\python> python test3.py
0 2020-10-30 13437.874023 ... 13546.522461 30581485201
1 2020-10-31 13546.532227 ... 13780.995117 30306464719
2 2020-11-01 13780.995117 ... 13737.109375 24453857900
3 2020-11-02 13737.032227 ... 13550.489258 30771455468
4 2020-11-03 13550.451172 ... 13950.300781 29869951617
[5 rows x 7 columns]
PS D:\python> python test3.py
datetime open ... adj vol
0 2020-10-30 13437.874023 ... 13546.522461 30581485201
1 2020-10-31 13546.532227 ... 13780.995117 30306464719
2 2020-11-01 13780.995117 ... 13737.109375 24453857900
3 2020-11-02 13737.032227 ... 13550.489258 30771455468
4 2020-11-03 13550.451172 ... 13950.300781 29869951617
[5 rows x 7 columns]
AxesSubplot(0.125,0.11;0.775x0.77)

There's no need to use the print function. Just call
df[['datetime', 'adj']].plot(kind = 'line', figsize=[20,5])

Related

Convert an array-like columns to fill each row (reshape)

This is a sample data:
import pandas as pd
import numpy as np
data = pd.DataFrame([(date(2022,1,1), np.random.randint(10, size=30)),
(date(2022,2,1),np.random.randint(10, size=30)),
(date(2022,3,1),np.random.randint(10, size=30))],
columns=('month_begin','daily_sales'))
I want to (1) create a column to be filled with each day (so the column would be 2022-01-01, 2022-01-02, ... 2022-03-31); (2) break the array-like string column into each row. Something like this:
I was thinking about creating a list of days between 2022-01-01 to 2022-03-01, but was stuck on how to fill each row with the daily data. Any suggestion is appreciated!
import pandas as pd
import numpy as np
from datetime import date
data = pd.DataFrame([(date(2022,1,1), np.random.randint(10, size=30)),
(date(2022,2,1),np.random.randint(10, size=30)),
(date(2022,3,1),np.random.randint(10, size=30))],
columns=('month_begin','daily_sales'))
result = pd.DataFrame()
for index, row in data.iterrows():
df = pd.DataFrame({'date':pd.date_range(row['month_begin'],
periods=len(row['daily_sales'])),
'daily_sales':row['daily_sales'],
})
result = pd.concat([result, df], ignore_index=True)
print(result)
Output:
date daily_sales
0 2022-01-01 9
1 2022-01-02 7
2 2022-01-03 7
3 2022-01-04 8
4 2022-01-05 4
.. ... ...
85 2022-03-26 3
86 2022-03-27 1
87 2022-03-28 9
88 2022-03-29 7
89 2022-03-30 0
[90 rows x 2 columns]

Pandas DataFrame Output Format

I need help reformatting my DataFrame output for stock closing prices.
Currently my output has the Stock Symbols as Headers where I would like to have them displayed in rows. df_output = 1: https://i.stack.imgur.com/u4jEk.png
I would like to have it displayed as below:
results
This is my current df_output code (not sure if this is the reason):
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})
excel_list
FULL CODE:
import pandas as pd
import numpy as np
import yfinance as yf
from yahoofinancials import YahooFinancials
from datetime import datetime
import time
start_time = time.time()
df = pd.read_excel(r'C:\Users\Ryan\Desktop\Stock Portfolio\\My Portfolio.xlsx', sheet_name=0, skiprows=2)
list1 = list(df['Stock Code'])
assets = list1
yahoo_financials = YahooFinancials(assets)
data = yahoo_financials.get_historical_price_data(start_date=str(datetime.now().date().replace(month=1, day=1)),
end_date=str(datetime.now().date().replace(month=12, day=31)),
time_interval='daily')
prices_df = pd.DataFrame({
a: {x['formatted_date']: x['adjclose'] for x in data[a]['prices']} for a in assets})
Check pandas functions such as https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.wide_to_long.html and https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html for operations for converting between long and wide formats.
Try this:
prices_df.rename_axis('Date').reset_index().melt('Date', var_name='Symbol', value_name='Price')
Output:
Date Symbol Price
0 2020-01-02 FB 209.779999
1 2020-01-03 FB 208.669998
2 2020-01-06 FB 212.600006
3 2020-01-07 FB 213.059998
4 2020-01-08 FB 215.220001
.. ... ... ...
973 2020-08-18 CDNS 109.150002
974 2020-08-19 CDNS 108.529999
975 2020-08-20 CDNS 111.260002
976 2020-08-21 CDNS 110.570000
977 2020-08-24 CDNS 111.260002
[978 rows x 3 columns]

How to select two columns to plot with dataframe?

apple is a dataframe whose data structure is as the below:
apple
Date Open High Low Close Adj Close
0 2017-01-03 115.800003 116.330002 114.760002 116.150002 114.311760
1 2017-01-04 115.849998 116.510002 115.750000 116.019997 114.183815
2 2017-01-05 115.919998 116.860001 115.809998 116.610001 114.764473
3 2017-01-06 116.779999 118.160004 116.470001 117.910004 116.043915
4 2017-01-09 117.949997 119.430000 117.940002 118.989998 117.106812
5 2017-01-10 118.769997 119.379997 118.300003 119.110001 117.224907
6 2017-01-11 118.739998 119.930000 118.599998 119.750000 117.854782
7 2017-01-12 118.900002 119.300003 118.209999 119.250000 117.362694
8 2017-01-13 119.110001 119.620003 118.809998 119.040001 117.156021
9 2017-01-17 118.339996 120.239998 118.220001 120.000000 118.100822
Now i want to select two columns Date and Close ,to set Date as x axis and Close as y axis,how to plot it?
import pandas as pd
import matplotlib.pyplot as plt
x=pd.DataFrame({'key':apple['Date'],'data':apple['Close']})
x.plot()
plt.show()
I got the graph such as below.
The x axis is not Date column !
New DataFrame is not necessary, plot apple and use parameters x and y:
#if not datetime column first convert
#apple['Date'] = pd.to_datetime(apple['Date'])
apple.plot(x='Date', y='Close')

interpolate/extrapolate missing dates in python?

lets say i have the following dataframe
bb = pd.DataFrame(data = {'date' :['','','','2015-09-02', '2015-09-02', '2015-09-03','','2015-09-08', '', '2015-09-11','2015-09-14','','' ]})
bb['date'] = pd.to_datetime(bb['date'], format="%Y-%m-%d")
I want to interpolate and exptrapolate linearly to fill the missing date values . I used the following code but it doesn't change anything. I am new to pandas. please help
bb= bb.interpolate(method='time')
To extrapolate you have to use bfill() and ffill(). Missing values will be assigned by back- (or forward) values.
To linear interpolate you have to use function interpolate but dates need to convert to numbers:
import numpy as np
import pandas as pd
from datetime import datetime
bb = pd.DataFrame(data = {'date' :['','','','2015-09-02', '2015-09-02', '2015-09-03','','2015-09-08', '', '2015-09-11','2015-09-14','','' ]})
bb['date'] = pd.to_datetime(bb['date'], format="%Y-%m-%d")
# convert to seconds
tmp = bb['date'].apply(lambda t: (t-datetime(1970,1,1)).total_seconds())
# linear interpolation
tmp.interpolate(inplace=True)
# back convert to dates
bb['date'] = pd.to_datetime(tmp, unit='s')
bb['date'] = bb['date'].apply(lambda t: t.date())
# extrapolation for the first missing values
bb.bfill(inplace='True')
print bb
Result:
date
0 2015-09-02
1 2015-09-02
2 2015-09-02
3 2015-09-02
4 2015-09-02
5 2015-09-03
6 2015-09-05
7 2015-09-08
8 2015-09-09
9 2015-09-11
10 2015-09-14
11 2015-09-14
12 2015-09-14

pandas datetimeindex between_time function(how to get a not_between_time)

I have a pandas df, and I use between_time a and b to clean the data. How do I
get a non_between_time behavior?
I know i can try something like.
df.between_time['00:00:00', a]
df.between_time[b,23:59:59']
then combine it and sort the new df. It's very inefficient and it doesn't work for me as I have data betweeen 23:59:59 and 00:00:00
Thanks
You could find the index locations for rows with time between a and b, and then use df.index.diff to remove those from the index:
import pandas as pd
import io
text = '''\
date,time, val
20120105, 080000, 1
20120105, 080030, 2
20120105, 080100, 3
20120105, 080130, 4
20120105, 080200, 5
20120105, 235959.01, 6
'''
df = pd.read_csv(io.BytesIO(text), parse_dates=[[0, 1]], index_col=0)
index = df.index
ivals = index.indexer_between_time('8:01:30','8:02')
print(df.reindex(index.diff(index[ivals])))
yields
val
date_time
2012-01-05 08:00:00 1
2012-01-05 08:00:30 2
2012-01-05 08:01:00 3
2012-01-05 23:59:59.010000 6

Categories

Resources