pandas dataframe plotting 1 column over 2

pandas dataframe plotting 1 column over 2 - python

this is driving me nuts, I can't plot column 'b'
it plots only column 'A'.....
this is my code, no idea what I'm doing wrong, probably something silly...
the dataframe seems ok, weirdness also is that I can access both df['A'] and df['b'] but only df['A'].plot() works, if I issue a df['b'].plot() I get this error :
Traceback (most recent call last): File
"C:\Python27\lib\site-packages\IPython\core\interactiveshell.py", line
2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in
df['b'].plot() File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 2511,
in plot_series
**kwds) File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 2317,
in _plot
plot_obj.generate() File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 921, in
generate
self._compute_plot_data() File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 997, in
_compute_plot_data
'plot'.format(numeric_data.class.name)) TypeError: Empty 'Series': no numeric data to plot
import sqlalchemy
import pandas as pd
import matplotlib.pyplot as plt
engine = sqlalchemy.create_engine(
'sqlite:///C:/Users/toto/PycharmProjects/my_db.sqlite')
tables = engine.table_names()
dic = {}
for t in tables:
sql = 'SELECT t."weight" FROM "' + t + '" t WHERE t."udl"="IBE SM"'
dic[t] = (pd.read_sql(sql, engine)['weight'][0], pd.read_sql(sql, engine)['weight'][1])
df = pd.DataFrame.from_dict(dic, orient='index').sort_index()
df = df.set_index(pd.DatetimeIndex(df.index))
df.columns = ['A', 'b']
print(df)
print(df.info())
df.plot()
plt.show()
this is the 2 print
A b
2014-08-05 1.81 3.39
2014-08-06 1.81 3.39
2014-08-07 1.81 3.39
2014-08-08 1.80 3.37
2014-08-11 1.79 3.35
2014-08-13 1.80 3.36
2014-08-14 1.80 3.35
2014-08-18 1.80 3.35
2014-08-19 1.79 3.34
2014-08-20 1.80 3.35
2014-08-27 1.79 3.35
2014-08-28 1.80 3.35
2014-08-29 1.79 3.35
2014-09-01 1.79 3.35
2014-09-02 1.79 3.35
2014-09-03 1.79 3.36
2014-09-04 1.79 3.37
2014-09-05 1.80 3.38
2014-09-08 1.79 3.36
2014-09-09 1.79 3.35
2014-09-10 1.78 3.35
2014-09-11 1.78 3.34
2014-09-12 1.78 3.34
2014-09-15 1.78 3.35
2014-09-16 1.78 3.35
2014-09-17 1.78 3.35
2014-09-18 1.78 3.34
2014-09-19 1.79 3.35
2014-09-22 1.79 3.36
2014-09-23 1.80 3.37
... ... ...
2014-12-10 1.73 3.29
2014-12-11 1.74 3.27
2014-12-12 1.74 3.25
2014-12-15 1.74 3.24
2014-12-16 1.74 3.27
2014-12-17 1.75 3.28
2014-12-18 1.76 3.29
2014-12-19 1.04 1.39
2014-12-22 1.04 1.39
2014-12-23 1.04 1.4
2014-12-24 1.04 1.39
2014-12-29 1.04 1.39
2014-12-30 1.04 1.4
2015-01-02 1.04 1.4
2015-01-05 1.04 1.4
2015-01-06 1.04 1.4
2015-01-07 NaN 1.39
2015-01-08 NaN 1.39
2015-01-09 NaN 1.39
2015-01-12 NaN 1.38
2015-01-13 NaN 1.38
2015-01-14 NaN 1.38
2015-01-15 NaN 1.38
2015-01-16 NaN 1.38
2015-01-19 NaN 1.39
2015-01-20 NaN 1.38
2015-01-21 NaN 1.39
2015-01-22 NaN 1.4
2015-01-23 NaN 1,4
2015-01-26 NaN 1.41
[107 rows x 2 columns]
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 107 entries, 2014-08-05 00:00:00 to 2015-01-26 00:00:00
Data columns (total 2 columns):
A 93 non-null float64
b 107 non-null object
dtypes: float64(1), object(1)
memory usage: 2.1+ KB
None
Process finished with exit code 0

just got it, 'b' is of object type and not float64 because of this line :
2015-01-23 NaN 1,4

Related

How to merge two DataFrames with DatetimeIndex preserved in pandas?

I have 2 DataFrames, df1, and df2.
df1 has the following contents:
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2019-01-29 4.02 159.342209 4.02 161.570007 4.07 163.240005 3.93
2019-01-30 4.06 163.395538 4.06 165.679993 4.09 166.279999 4.01
2019-01-31 3.99 165.841370 3.99 168.160004 4.06 168.990005 3.93
2019-02-01 4.02 165.141129 4.02 167.449997 4.07 168.600006 3.93
2019-02-04 3.96 167.192474 3.96 169.529999 4.00 169.529999 3.93
... ... ... ... ... ... ... ...
2019-02-25 4.65 171.127441 4.65 173.520004 4.78 174.660004 4.50
2019-02-26 4.36 171.304947 4.36 173.699997 4.74 174.250000 4.36
2019-02-27 4.30 171.196487 4.30 173.589996 4.50 173.800003 4.30
2019-02-28 4.46 170.802002 4.46 173.190002 4.65 173.809998 4.40
2019-03-01 4.58 171.985443 4.58 174.389999 4.64 174.649994 4.45
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2019-01-29 160.990005 3.970 163.199997 975200 30784200
2019-01-30 162.889999 4.035 163.399994 770700 41346500
2019-01-31 166.470001 4.040 166.699997 1108700 37258400
2019-02-01 166.990005 4.000 167.330002 889100 32143700
2019-02-04 167.330002 3.990 167.479996 871800 26718800
... ... ... ... ... ...
2019-02-25 173.399994 4.625 174.210007 2891200 32608800
2019-02-26 172.809998 4.625 173.100006 2000100 21939700
2019-02-27 171.759995 4.400 172.899994 1537000 25162000
2019-02-28 172.699997 4.420 173.050003 1192600 25085500
2019-03-01 173.179993 4.470 174.440002 948500 31431200
[23 rows x 12 columns]
And here's the contents of df2:
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2019-02-25 4.65 171.127441 4.65 173.520004 4.78 174.660004 4.50
2019-02-26 4.36 171.304947 4.36 173.699997 4.74 174.250000 4.36
2019-02-27 4.30 171.196487 4.30 173.589996 4.50 173.800003 4.30
2019-02-28 4.46 170.802002 4.46 173.190002 4.65 173.809998 4.40
2019-03-01 4.58 171.985443 4.58 174.389999 4.64 174.649994 4.45
... ... ... ... ... ... ... ...
2019-03-28 4.54 176.171432 4.54 178.309998 4.68 178.979996 4.51
2019-03-29 4.78 177.505249 4.78 179.660004 4.83 179.830002 4.55
2019-04-01 4.97 179.856705 4.97 182.039993 5.03 182.259995 4.85
2019-04-02 5.74 180.538437 5.74 182.729996 5.83 182.910004 5.52
2019-04-03 6.19 181.575836 6.19 183.779999 6.59 184.919998 5.93
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2019-02-25 173.399994 4.625 174.210007 2891200 32608800
2019-02-26 172.809998 4.625 173.100006 2000100 21939700
2019-02-27 171.759995 4.400 172.899994 1537000 25162000
2019-02-28 172.699997 4.420 173.050003 1192600 25085500
2019-03-01 173.179993 4.470 174.440002 948500 31431200
... ... ... ... ... ...
2019-03-28 177.240005 4.650 178.360001 2104400 30368200
2019-03-29 178.589996 4.710 179.690002 2937400 35205500
2019-04-01 180.770004 4.850 181.509995 2733600 30969500
2019-04-02 181.779999 5.660 182.240005 6062000 22645200
2019-04-03 183.210007 5.930 183.759995 10002400 31633500
[28 rows x 12 columns]
As you can see from the above, df1 and df2 have overlapping Dates.
How can I create a merged DataFrame df that contains dates from 2019-01-29 to 2019-04-03 with no overlapping Date?
I've tried running df = df1.merge(df2, how='outer'). However, this command returns a DataFrame with Date removed, which is not something desirable.
> df
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
0 4.02 159.342209 4.02 161.570007 4.07 163.240005 3.93
1 4.06 163.395538 4.06 165.679993 4.09 166.279999 4.01
2 3.99 165.841370 3.99 168.160004 4.06 168.990005 3.93
3 4.02 165.141129 4.02 167.449997 4.07 168.600006 3.93
4 3.96 167.192474 3.96 169.529999 4.00 169.529999 3.93
.. ... ... ... ... ... ... ...
41 4.54 176.171432 4.54 178.309998 4.68 178.979996 4.51
42 4.78 177.505249 4.78 179.660004 4.83 179.830002 4.55
43 4.97 179.856705 4.97 182.039993 5.03 182.259995 4.85
44 5.74 180.538437 5.74 182.729996 5.83 182.910004 5.52
45 6.19 181.575836 6.19 183.779999 6.59 184.919998 5.93
Open Volume
QQQ GBTC QQQ GBTC QQQ
0 160.990005 3.970 163.199997 975200 30784200
1 162.889999 4.035 163.399994 770700 41346500
2 166.470001 4.040 166.699997 1108700 37258400
3 166.990005 4.000 167.330002 889100 32143700
4 167.330002 3.990 167.479996 871800 26718800
.. ... ... ... ... ...
41 177.240005 4.650 178.360001 2104400 30368200
42 178.589996 4.710 179.690002 2937400 35205500
43 180.770004 4.850 181.509995 2733600 30969500
44 181.779999 5.660 182.240005 6062000 22645200
45 183.210007 5.930 183.759995 10002400 31633500
[46 rows x 12 columns]
It seems that I should find a way to merge df1.index and df2.index. Then add the merged DatetimeIndex to df.
For the convenience of debugging, you can run the following code to get the same data as mine.
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-29", end="2019-03-01")
df2 = yf.download(symbols, start="2019-02-25", end="2019-04-03")

Taken from the docs:
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
So I believe that if you specify the index in the merge with on=Date, then you should be ok.
df1.merge(df2, how='outer', on='Date')
However, for the problem that you are trying to solve merge is note the correct tool. What you need to do is append the dataframes together and then remove the duplicated days:
df1.append(df2).drop_duplicates()

Trying to use the BeautifulSoup Python module to pull individual elements from table data

I am new to Python and currently using BeautifulSoup with Python to try and pull some table data. I cannot get the individual elements out of the td. What I have so far is:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text
soup = BeautifulSoup(source, 'lxml')
td = soup.find_all('td', {'class': 'text-center'})
print(td)
This does display all of the td that I want to extract but am unable to figure out how to get each individual element out of the td.
Thank you in advanced for the help, it is much appreciated.

Try this:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/').text
soup = BeautifulSoup(source, 'lxml')
td = soup.find_all('td', {'class': 'text-center'})
print(*[text.get_text(strip=True) + '\n' for text in td])
Prints:
S10
NA
14
35.7%
0.91
1744
-48
33:19
11.2
12.4
5.5
7.0
50.0
64.3
2.71
54.2
1.00
57.1
1.14
and so on....

The following script extracts the data and saves the data to a csv file.
import requests
from bs4 import BeautifulSoup
import pandas as pd
res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/region-ALL/tournament-LCS%20Summer%202020/week-ALL/')
soup = BeautifulSoup(res.text, 'html.parser')
table = soup.find("table", class_="table_list playerslist tablesaw trhover")
columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]
data = []
table.find("thead").extract()
for tr in table.find_all("tr"):
data.append([td.get_text(strip=True) for td in tr.find_all("td")])
df = pd.DataFrame(data, columns=columns)
df.to_csv("data.csv", index=False)
Output:
Name Season Region Games Win rate K:D GPM GDM Game duration Kills / game Deaths / game Towers killed Towers lost FB% FT% DRAPG DRA% HERPG HER% DRA#15 TD#15 GD#15 NASHPG NASH% CSM DPM WPM VWPM WCPM
0 100 Thieves S10 NA 14 35.7% 0.91 1744 -48 33:19 11.2 12.4 5.5 7.0 50.0 64.3 2.71 54.2 1.00 57.1 1.14 0.4 -378 0.64 42.9 33.2 1937 3.0 1.19 1.31
1 CLG S10 NA 14 35.7% 0.81 1705 -120 35:25 10.6 13.2 4.9 7.9 28.6 28.6 1.93 31.5 0.57 28.6 0.64 -0.6 -1297 0.57 30.4 32.6 1826 3.2 1.17 1.37
2 Cloud9 S10 NA 14 78.6% 1.91 1922 302 28:52 15.0 7.9 8.3 3.1 64.3 64.3 3.07 72.5 1.43 71.4 1.29 0.7 2410 1.00 78.6 33.3 1921 3.0 1.10 1.26
3 Dignitas S10 NA 14 28.6% 0.86 1663 -147 32:44 8.9 10.4 3.9 8.1 42.9 35.7 2.14 41.7 0.57 28.6 0.79 -0.7 -796 0.36 25.0 32.5 1517 3.1 1.28 1.23
4 Evil Geniuses S10 NA 14 50.0% 0.85 1738 -0 34:09 11.1 13.1 6.5 6.0 64.3 57.1 2.36 48.5 1.00 53.6 1.00 0.5 397 0.50 46.5 32.3 1895 3.2 1.36 1.34
5 FlyQuest S10 NA 14 57.1% 1.28 1770 65 34:55 13.4 10.4 6.5 5.2 71.4 35.7 2.86 53.4 1.00 50.0 0.79 -0.1 69 0.71 69.2 32.7 1801 3.2 1.16 1.72
6 Golden Guardians S10 NA 14 50.0% 0.96 1740 6 36:13 10.7 11.1 6.3 6.1 50.0 35.7 3.29 62.8 0.86 42.9 1.43 0.1 711 0.50 43.6 33.7 1944 3.2 1.27 1.53
7 Immortals S10 NA 14 21.4% 0.54 1609 -246 33:54 7.5 14.0 4.3 7.9 35.7 35.7 2.29 39.9 1.00 53.6 0.79 -0.4 -1509 0.36 25.0 31.4 1734 3.3 1.37 1.47
8 Team Liquid S10 NA 14 78.6% 1.31 1796 135 35:07 11.4 8.6 7.9 4.4 42.9 64.3 2.36 43.6 0.93 50.0 1.14 0.2 522 1.21 78.6 33.1 1755 3.5 1.27 1.42
9 TSM S10 NA 14 64.3% 1.12 1768 52 34:20 11.6 10.4 7.2 5.7 50.0 78.6 2.79 51.9 1.21 64.3 0.93 0.1 -129 0.86 57.1 32.6 1729 3.2 1.33 1.33

Creating new df columns via iteration

I have a dataframe, df which looks like this
Open High Low Close Volume
Date
2007-03-22 2.65 2.95 2.64 2.86 176389
2007-03-23 2.87 2.87 2.78 2.78 63316
2007-03-26 2.83 2.83 2.51 2.52 54051
2007-03-27 2.61 3.29 2.60 3.28 589443
2007-03-28 3.65 4.10 3.60 3.80 1114659
2007-03-29 3.91 3.91 3.33 3.57 360501
2007-03-30 3.70 3.88 3.66 3.71 185787
I'm trying to create a new column, which will takes the df.Open value 5 days ahead from each df.Open value and subtract it.
So the loop I"m using is this:
for i in range(0, len(df.Open)): #goes through indexes values
df['5days'][i]=df.Open[i+5]-df.Open[i] #I use those index values to locate
However, this loop is yielding an error.
KeyError: '5days'
Not sure why. I got this to temporarily work by removing the df['5days'][i], but it seems awfully slow. Not sure if there is a more efficient way to do this.
Thank you.

Using diff
df['5Days'] = df.Open.diff(5)
print(df)
Open High Low Close Volume 5Days
Date
2007-03-22 2.65 2.95 2.64 2.86 176389 NaN
2007-03-23 2.87 2.87 2.78 2.78 63316 NaN
2007-03-26 2.83 2.83 2.51 2.52 54051 NaN
2007-03-27 2.61 3.29 2.60 3.28 589443 NaN
2007-03-28 3.65 4.10 3.60 3.80 1114659 NaN
2007-03-29 3.91 3.91 3.33 3.57 360501 1.26
2007-03-30 3.70 3.88 3.66 3.71 185787 0.83
However, per your code, you may want to look ahead and align the results back. In that case
df['5Days'] = -df.Open.diff(-5)
print(df)
Open High Low Close Volume 5days
Date
2007-03-22 2.65 2.95 2.64 2.86 176389 1.26
2007-03-23 2.87 2.87 2.78 2.78 63316 0.83
2007-03-26 2.83 2.83 2.51 2.52 54051 NaN
2007-03-27 2.61 3.29 2.60 3.28 589443 NaN
2007-03-28 3.65 4.10 3.60 3.80 1114659 NaN
2007-03-29 3.91 3.91 3.33 3.57 360501 NaN
2007-03-30 3.70 3.88 3.66 3.71 185787 NaN

I think you need shift with sub:
df['5days'] = df.Open.shift(5).sub(df.Open)
print (df)
Open High Low Close Volume 5days
Date
2007-03-22 2.65 2.95 2.64 2.86 176389 NaN
2007-03-23 2.87 2.87 2.78 2.78 63316 NaN
2007-03-26 2.83 2.83 2.51 2.52 54051 NaN
2007-03-27 2.61 3.29 2.60 3.28 589443 NaN
2007-03-28 3.65 4.10 3.60 3.80 1114659 NaN
2007-03-29 3.91 3.91 3.33 3.57 360501 -1.26
2007-03-30 3.70 3.88 3.66 3.71 185787 -0.83
Or maybe need substract Open with shifted column:
df['5days'] = df.Open.sub(df.Open.shift(5))
print (df)
Open High Low Close Volume 5days
Date
2007-03-22 2.65 2.95 2.64 2.86 176389 NaN
2007-03-23 2.87 2.87 2.78 2.78 63316 NaN
2007-03-26 2.83 2.83 2.51 2.52 54051 NaN
2007-03-27 2.61 3.29 2.60 3.28 589443 NaN
2007-03-28 3.65 4.10 3.60 3.80 1114659 NaN
2007-03-29 3.91 3.91 3.33 3.57 360501 1.26
2007-03-30 3.70 3.88 3.66 3.71 185787 0.83
df['5days'] = -df.Open.sub(df.Open.shift(-5))
print (df)
Open High Low Close Volume 5days
Date
2007-03-22 2.65 2.95 2.64 2.86 176389 1.26
2007-03-23 2.87 2.87 2.78 2.78 63316 0.83
2007-03-26 2.83 2.83 2.51 2.52 54051 NaN
2007-03-27 2.61 3.29 2.60 3.28 589443 NaN
2007-03-28 3.65 4.10 3.60 3.80 1114659 NaN
2007-03-29 3.91 3.91 3.33 3.57 360501 NaN
2007-03-30 3.70 3.88 3.66 3.71 185787 NaN

Error while reading Boston data from UCL website using pandas

Any help please for reading this file from url website.
eurl = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
data = pandas.read_csv(url, sep=',', header = None)
I tried sep=',', sep=';' and sep='\t' but the data read like this
but with
data = pandas.read_csv(url, sep=' ', header = None)
I received an error,
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7988)()
pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:22649)()
CParserError: Error tokenizing data. C error: Expected 30 fields in line 2, saw 31
Maybe same question asked here enter link description here but the accepted answer does not help me.
any help please to read this file from the url provide it.
BTW, I know there is Boston = load_boston() to read this data but when I read it from this function, the attribute 'MEDV' in the dataset does not download with the dataset.

There are multiple spaces used as a delimiter, that's why it's not working when you use a single space as a delimiter (sep=' ')
you can do it using sep='\s+':
In [171]: data = pd.read_csv(url, sep='\s+', header = None)
In [172]: data.shape
Out[172]: (506, 14)
In [173]: data.head()
Out[173]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222.0 18.7 396.90 5.33 36.2
or using delim_whitespace=True:
In [174]: data = pd.read_csv(url, delim_whitespace=True, header = None)
In [175]: data.shape
Out[175]: (506, 14)
In [176]: data.head()
Out[176]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222.0 18.7 396.90 5.33 36.2

Can't index by timestamp in pandas dataframe

I took an excel sheet which has dates and some values and want to convert them to pandas dataframe and select only rows which are between certain dates.
For some reason I cannot select a row by date index
Raw Data in Excel file
MCU
Timestamp 50D 10P1 10P2 10P3 10P6 10P9 10P12
12-Feb-15 25.17 5.88 5.92 5.98 6.18 6.23 6.33
11-Feb-15 25.9 6.05 6.09 6.15 6.28 6.31 6.39
10-Feb-15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
Code
xls = pd.ExcelFile('e:/Data.xlsx')
vols = xls.parse(asset.upper()+'VOL',header=1)
vols.set_index('Timestamp',inplace=True)
Data before set_index
Timestamp 50D 10P1 10P2 10P3 10P6 10P9 10P12 25P1 25P2 \
0 2015-02-12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08
1 2015-02-11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17
2 2015-02-10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16
Data after set_index
50D 10P1 10P2 10P3 10P6 10P9 10P12 25P1 25P2 25P3 \
Timestamp
2015-02-12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 3.21
2015-02-11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 3.32
2015-02-10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16 3.31
Output
>>> vols.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-12, ..., NaT]
Length: 1478, Freq: None, Timezone: None
>>> vols[date(2015,2,12)]
*** KeyError: datetime.date(2015, 2, 12)
I would expect this not to fail, and also I should be able to select a range of dates. Tried so many combinations but not getting it.

Using a datetime.date instance to try to retrieve the index won't work, you just need a string representation of the date, e.g. '2015-02-12' or '2015/02/14'.
Secondly, vols[date(2015,2,12)] is actually looking in your DataFrame's column headings, not the index. You can use loc to fetch row index labels instead. For example you could write vols.loc['2015-02-12']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas dataframe plotting 1 column over 2 - python

just got it, 'b' is of object type and not float64 because of this line : 2015-01-23 NaN 1,4

Related

How to merge two DataFrames with DatetimeIndex preserved in pandas?

Trying to use the BeautifulSoup Python module to pull individual elements from table data

Creating new df columns via iteration

Error while reading Boston data from UCL website using pandas

Can't index by timestamp in pandas dataframe

Categories

Resources