Access single cell of pandas dataframe? - python

I have the following data with some missing holes. I've looked over the 'how to handle missing data' but can't find anything that applies in this situation. Here is the data:
Species GearUsed AverageFishWeight(lbs) NormalRange(lbs) Caught
0 BlackBullhead Gillnet 0.11 0.8-7.7 0.18
1 BlackCrappie Trapnet 6.22 0.7-3.4 0.30
2 NaN Gillnet 1.00 0.6-3.5 0.30
3 Bluegill Trapnet 11.56 6.1-46.6 0.14
4 NaN Gillnet 1.44 NaN 0.21
5 BrownBullhead Trapnet 0.11 0.4-2.1 1.01
6 NorthernPike Trapnet 0.22 NaN 4.32
7 NaN Gillnet 2.22 3.5-10.5 5.63
8 Pumpkinseed Trapnet 0.89 2.0-8.5 0.23
9 RockBass Trapnet 0.22 0.5-1.8 0.04
10 Walleye Trapnet 0.22 0.3-0.7 0.28
11 NaN Gillnet 1.56 1.3-5.0 2.54
12 WhiteSucker Trapnet 0.33 0.3-1.4 2.76
13 NaN Gillnet 1.78 0.5-2.7 1.32
14 YellowPerch Trapnet 1.33 0.5-3.3 0.14
15 NaN Gillnet 27.67 3.4-43.6 0.14
I need the NaNs in the species column to just be the name above it, for example row 2 would be BlackCrappie. I would like to iterate through the frame and manually specify the species name but am not too sure of how, and also other answers recommend against iterating through the dataframe in the first place.
How do I access each cell of the frame individually? Thanks!
PS the column names are incorrect, there is not a 27lb yellow perch. :)

Do you want to fill the missing values in other rows as well? Seems to be what fillna() is for:
In [83]:
print df.fillna(method='pad')
Species GearUsed AverageFishWeight(lbs) NormalRange(lbs) Caught
0 BlackBullhead Gillnet 0.11 0.8-7.7 0.18
1 BlackCrappie Trapnet 6.22 0.7-3.4 0.30
2 BlackCrappie Gillnet 1.00 0.6-3.5 0.30
3 Bluegill Trapnet 11.56 6.1-46.6 0.14
4 Bluegill Gillnet 1.44 6.1-46.6 0.21
5 BrownBullhead Trapnet 0.11 0.4-2.1 1.01
6 NorthernPike Trapnet 0.22 0.4-2.1 4.32
7 NorthernPike Gillnet 2.22 3.5-10.5 5.63
8 Pumpkinseed Trapnet 0.89 2.0-8.5 0.23
9 RockBass Trapnet 0.22 0.5-1.8 0.04
10 Walleye Trapnet 0.22 0.3-0.7 0.28
11 Walleye Gillnet 1.56 1.3-5.0 2.54
12 WhiteSucker Trapnet 0.33 0.3-1.4 2.76
13 WhiteSucker Gillnet 1.78 0.5-2.7 1.32
14 YellowPerch Trapnet 1.33 0.5-3.3 0.14
15 YellowPerch Gillnet 27.67 3.4-43.6 0.14

Related

How to list the specific countries in a df which have NaN values?

I have created the df_nan below which shows the sum of NaN values from the main df, which, as seen, shows how many are in each specific column.
However, I want to create a new df, which has a column/index of countries, then another with the number of NaN values for the given country.
Country Number of NaN Values
Aruba 4
Finland 3
I feel like I have to use groupby, to create something along the lines of this below, but .isna is not an attribute of the groupby function. Any help would be great, thanks!
df_nan2= df_nan.groupby(['Country']).isna().sum()
Current code
import pandas as pd
import seaborn as sns
import numpy as np
from scipy.stats import spearmanr
# given dataframe df
df = pd.read_csv('countries.csv')
df.drop(columns= ['Population (millions)', 'HDI', 'GDP per Capita','Fish Footprint','Fishing Water',
'Urban Land','Earths Required', 'Countries Required', 'Data Quality'], axis=1, inplace = True)
df_nan= df.isna().sum()
Head of main df
0 Afghanistan Middle East/Central Asia 0.30 0.20 0.08 0.18 0.79 0.24 0.20 0.02 0.50 -0.30
1 Albania Northern/Eastern Europe 0.78 0.22 0.25 0.87 2.21 0.55 0.21 0.29 1.18 -1.03
2 Algeria Africa 0.60 0.16 0.17 1.14 2.12 0.24 0.27 0.03 0.59 -1.53
3 Angola Africa 0.33 0.15 0.12 0.20 0.93 0.20 1.42 0.64 2.55 1.61
4 Antigua and Barbuda Latin America NaN NaN NaN NaN 5.38 NaN NaN NaN 0.94 -4.44
5 Argentina Latin America 0.78 0.79 0.29 1.08 3.14 2.64 1.86 0.66 6.92 3.78
6 Armenia Middle East/Central Asia 0.74 0.18 0.34 0.89 2.23 0.44 0.26 0.10 0.89 -1.35
7 Aruba Latin America NaN NaN NaN NaN 11.88 NaN NaN NaN 0.57 -11.31
8 Australia Asia-Pacific 2.68 0.63 0.89 4.85 9.31 5.42 5.81 2.01 16.57 7.26
9 Austria European Union 0.82 0.27 0.63 4.14 6.06 0.71 0.16 2.04 3.07 -3.00
10 Azerbaijan Middle East/Central Asia 0.66 0.22 0.11 1.25 2.31 0.46 0.20 0.11 0.85 -1.46
11 Bahamas Latin America 0.97 1.05 0.19 4.46 6.84 0.05 0.00 1.18 9.55 2.71
12 Bahrain Middle East/Central Asia 0.52 0.45 0.16 6.19 7.49 0.01 0.00 0.00 0.58 -6.91
13 Bangladesh Asia-Pacific 0.29 0.00 0.08 0.26 0.72 0.25 0.00 0.00 0.38 -0.35
14 Barbados Latin America 0.56 0.24 0.14 3.28 4.48 0.08 0.00 0.02 0.19 -4.29
15 Belarus Northern/Eastern Europe 1.32 0.12 0.91 2.57 5.09 1.52 0.30 1.71 3.64 -1.45
16 Belgium European Union 1.15 0.48 0.99 4.43 7.44 0.56 0.03 0.28 1.19 -6.25
17 Benin Africa 0.49 0.04 0.26 0.51 1.41 0.44 0.04 0.34 0.88 -0.53
18 Bermuda North America NaN NaN NaN NaN 5.77 NaN NaN NaN 0.13 -5.64
19 Bhutan Asia-Pacific 0.50 0.42 3.03 0.63 4.84 0.28 0.34 4.38 5.27 0.43
Nan head
Country 0
Region 0
Cropland Footprint 15
Grazing Footprint 15
Forest Footprint 15
Carbon Footprint 15
Total Ecological Footprint 0
Cropland 15
Grazing Land 15
Forest Land 15
Total Biocapacity 0
Biocapacity Deficit or Reserve 0
dtype: int64
Suppose, you want to get Null count for each Country from "Cropland Footprint" column, then you can use the following code -
Unique_Country = df['Country'].unique()
Col1 = 'Cropland Footprint'
NullCount = []
for i in Unique_Country:
s = df[df['Country']==i][Col1].isnull().sum()
NullCount.append(s)
df2 = pd.DataFrame({'Country': Unique_Country,
'Number of NaN Values': NullCount})
df2 = df2[df2['Number of NaN Values']!=0]
df2
Output -
Country Number of NaN Values
Antigua and Barbuda 1
Aruba 1
Bermuda 1
If you want to get Null Count from another Column then just change the Value of Col1 variable.

Reading in a .txt file to get time series from rows of years and columns of monthly values

How could I read in a txt file like the one from
https://psl.noaa.gov/data/correlation/pna.data (example below)
1960 -0.16 -0.22 -0.69 -0.07 0.99 1.20 1.11 1.85 -0.01 0.48 -0.52 1.15
1961 1.16 0.17 0.28 -1.14 -0.25 1.84 -0.52 0.47 1.10 -1.94 -0.40 -1.54
1962 -0.74 -0.54 -0.71 -1.50 -1.11 -0.97 -0.36 0.57 -0.83 1.33 0.53 -0.38
1963 0.09 0.79 -2.04 -0.79 -0.95 0.50 -1.10 -1.01 0.87 0.93 -0.31 1.46
1964 -0.44 1.36 -1.31 -1.30 -2.27 0.27 0.20 0.83 0.92 0.80 -0.78 -2.03
1965 -0.92 -1.03 -0.80 -1.07 -0.42 1.89 -1.26 0.32 0.36 1.42 -0.81 -1.56
into a pandas dataframe to plot as a time series, for example from 1960-1965 with each value column (corresponding to months) being plotted? I rarely use .txt's
Here's what you can try:
import pandas as pd
import requests
import re
aa=requests.get("https://psl.noaa.gov/data/correlation/pna.data").text
aa=aa.split("\n")[1:-4]
aa=list(map(lambda x:x[1:],aa))
aa="\n".join(aa)
aa=re.sub(" +",",",aa)
with open("test.csv","w") as f:
f.write(aa)
df=pd.read_csv("test.csv", header=None, index_col=0).rename_axis('Year')
df.columns=list(pd.date_range(start='2021-01', freq='M', periods=12).month_name())
print(df.head())
df.to_csv("test.csv")
This is going to give you, in test.csv file:
Year
January
February
March.....
up to December
1948
73
67
67
773....
1949
73
67
67
773....
1950
73
67
67
773....
....
..
..
..
.......
....
..
..
..
.......
2021
73
88
84
733....
Use pd.read_fwf as suggested by #SanskarSingh
>>> pd.read_fwf('data.txt', header=None, index_col=0).rename_axis('Year')
1 2 3 4 5 6 7 8 9 10 11 12
Year
1960 -0.16 -0.22 -0.69 -0.07 0.99 1.20 1.11 1.85 -0.01 0.48 -0.52 1.15
1961 1.16 0.17 0.28 -1.14 -0.25 1.84 -0.52 0.47 1.10 -1.94 -0.40 -1.54
1962 -0.74 -0.54 -0.71 -1.50 -1.11 -0.97 -0.36 0.57 -0.83 1.33 0.53 -0.38
1963 0.09 0.79 -2.04 -0.79 -0.95 0.50 -1.10 -1.01 0.87 0.93 -0.31 1.46
1964 -0.44 1.36 -1.31 -1.30 -2.27 0.27 0.20 0.83 0.92 0.80 -0.78 -2.03
1965 -0.92 -1.03 -0.80 -1.07 -0.42 1.89 -1.26 0.32 0.36 1.42 -0.81 -1.56

How to fill missing timestamps in pandas

I have a CSV file as below:
t dd hh v.amm v.alc v.no2 v.cmo aqi
0 201811170000 17 0 0.40 0.41 1.33 1.55 2.45
1 201811170002 17 0 0.40 0.41 1.34 1.51 2.46
2 201811170007 17 0 0.40 0.37 1.35 1.45 2.40
Now I have to fill in the missing minutes by last observation carried forward. Expected output:
t dd hh v.amm v.alc v.no2 v.cmo aqi
0 201811170000 17 0 0.40 0.41 1.33 1.55 2.45
1 201811170001 17 0 0.40 0.41 1.33 1.55 2.45
2 201811170002 17 0 0.40 0.41 1.34 1.51 2.46
2 201811170003 17 0 0.40 0.41 1.34 1.51 2.46
2 201811170004 17 0 0.40 0.41 1.34 1.51 2.46
2 201811170005 17 0 0.40 0.41 1.34 1.51 2.46
2 201811170006 17 0 0.40 0.41 1.34 1.51 2.46
3 201811170007 17 0 0.40 0.37 1.35 1.45 2.40
I tried following this link but unable to achieve the expected output. Sorry I'm new to coding.
First create DatetimeIndex by to_datetime and DataFrame.set_index and then change frequency by DataFrame.asfreq:
df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().asfreq('Min', method='ffill')
print (df)
dd hh v.amm v.alc v.no2 v.cmo aqi
t
2018-11-17 00:00:00 17 0 0.4 0.41 1.33 1.55 2.45
2018-11-17 00:01:00 17 0 0.4 0.41 1.33 1.55 2.45
2018-11-17 00:02:00 17 0 0.4 0.41 1.34 1.51 2.46
2018-11-17 00:03:00 17 0 0.4 0.41 1.34 1.51 2.46
2018-11-17 00:04:00 17 0 0.4 0.41 1.34 1.51 2.46
2018-11-17 00:05:00 17 0 0.4 0.41 1.34 1.51 2.46
2018-11-17 00:06:00 17 0 0.4 0.41 1.34 1.51 2.46
2018-11-17 00:07:00 17 0 0.4 0.37 1.35 1.45 2.40
Or use DataFrame.resample with Resampler.ffill:
df['t'] = pd.to_datetime(df['t'], format='%Y%m%d%H%M')
df = df.set_index('t').sort_index().resample('Min').ffill()

How to make histogram using pandas

In this problem, a .txt file is read using pandas. The number of genes needs to be calculated, and a histogram needs to be made for the specific sample and the amount of interaction with each gene.
I have tried using .transpose(), as well as, using value_counts() to access the appropriate information; however, because of it being in a row, and the way the table is set up, I cannot figure out how to get the appropriate histogram.
Use Pandas to read the file. Write a program to answer the following questions:
How many samples are in the data set?
How many genes are in the data set?
Which sample has the lowest average expression of genes?
Plot a histogram showing the distribution of the IL6 expression
across all samples.
Data:
protein M-12 M-24 M-36 M-48 M+ANDV-12 M+ANDV-24 M+ANDV-36 M+ANDV-48 M+SNV-12 M+SNV-24 M+SNV-36 M+SNV-48
ARG1 -11.67 -9.92 -4.37 -11.92 -3.62 -9.38 -11.54 -4.88 -3.59 -2.96 -4.95 -4.31
CASP3 0.05 -0.05 -0.18 0.02 0.04 0.14 -0.35 -0.41 0.24 0.23 -0.40 -0.36
CASP7 -1.40 -0.05 -0.78 -1.33 -0.43 0.63 -1.39 -0.95 0.81 1.45 0.09 0.11
CCL22 -0.96 1.47 0.37 -1.48 1.34 2.72 -11.12 -1.05 -0.63 1.42 0.30 0.12
CCL5 -5.59 -3.84 -4.64 -5.84 -5.19 -5.24 -5.45 -5.45 -2.86 -4.53 -4.80 -6.46
CCR7 -11.26 -9.50 -2.96 -11.50 -2.35 -2.31 -11.12 -3.66 -3.18 -1.31 -2.48 -2.84
CD14 2.85 4.14 3.87 4.33 1.16 3.28 3.68 3.74 1.20 2.80 3.23 2.79
CD200R1 -11.67 -9.92 -5.37 -11.92 -4.61 -9.38 -11.54 -11.54 -3.59 -2.96 -4.54 -4.89
CD274 -5.59 -9.92 -4.64 -5.84 -1.78 -3.30 -5.45 -5.45 -4.17 -10.61 -4.80 -4.48
CD80 -6.57 -9.50 -4.96 -6.82 -6.17 -4.28 -6.43 -6.43 -3.18 -5.51 -5.12 -4.16
CD86 0.14 0.94 0.87 1.12 -0.23 0.58 1.09 0.66 -0.15 0.42 0.74 0.49
CXCL10 -6.57 -2.85 -4.96 -6.82 -4.20 -2.31 -4.47 -4.47 -2.38 -2.74 -5.12 -4.67
CXCL11 -5.28 -9.50 -5.63 -11.50 -10.85 -8.97 -11.12 -11.12 -9.83 -10.20 -5.79 -6.14
IDO1 -5.02 -9.92 -4.37 -5.26 -4.61 -2.72 -4.88 -4.88 -2.60 -3.96 -4.54 -5.88
IFNA1 -11.67 -9.92 -5.37 -5.26 -11.27 -9.38 -11.54 -4.88 -3.59 -10.61 -6.52 -5.88
IFNB1 -11.67 -9.92 -6.35 -11.92 -11.27 -9.38 -11.54 -11.54 -10.25 -10.61 -12.19 -12.54
IFNG -2.09 -1.21 -1.66 -2.24 -2.75 -2.50 -2.83 -3.22 -2.48 -1.60 -2.13 -2.48
IFR3 -0.39 0.05 -0.21 0.15 -0.27 0.07 -0.01 -0.11 -0.28 0.28 0.04 -0.09
IL10 -1.53 -0.21 -0.51 0.45 -3.40 -1.00 -0.51 -0.04 -2.38 -1.55 -0.25 -0.72
IL12A -11.67 -9.92 -4.79 -11.92 -3.30 -3.71 -11.54 -11.54 -10.25 -3.38 -4.22 -4.09
IL15 -1.91 -2.53 -3.50 -3.85 -2.75 -9.38 -4.15 -4.15 -2.19 -2.09 -2.81 -3.16
IL1A -4.28 -2.53 -2.26 -3.39 -2.12 -0.51 -11.54 -2.67 -1.73 -1.75 -2.13 -1.84
IL1B -1.61 -2.53 -0.31 -0.16 0.77 -3.30 -1.95 -0.21 -1.73 -2.55 -0.65 -0.64
IL1RN 3.14 -0.40 -1.54 -3.53 3.95 0.76 0.15 -3.15 3.34 0.95 -1.23 -1.02
IL6 -4.60 -0.21 -1.82 -3.53 -1.25 0.76 -11.12 -2.47 -0.94 -0.60 -1.61 -1.74
IL8 5.43 5.04 4.57 4.22 5.67 5.06 4.30 4.53 4.84 4.53 4.25 3.79
IRF7 0.14 0.97 -0.13 -0.72 0.83 1.85 -0.19 -0.19 1.01 0.62 0.07 -0.03
ITGAM -1.68 0.91 0.28 -0.12 0.67 1.73 -0.30 -0.07 1.21 1.28 0.71 1.21
NFKB1 0.80 0.31 0.29 0.43 1.21 -0.74 0.39 0.02 0.15 -0.02 0.01 -0.09
NOS2 -11.26 -3.52 -4.50 -5.52 -4.87 -2.98 -5.14 -5.14 -3.85 -4.22 -5.79 -6.14
PPARG 0.68 0.23 0.02 -1.16 0.56 1.38 0.80 -0.95 1.17 1.04 1.09 0.94
TGFB1 3.99 3.21 2.41 2.62 4.05 3.48 2.87 2.15 3.68 2.97 2.46 2.31
TLR3 -3.61 -1.85 -1.72 -11.92 -2.40 -1.32 -11.54 -11.54 -0.57 0.09 -1.32 -1.60
TLR7 -3.80 -2.05 -1.64 -0.35 -6.17 -4.28 -2.47 -1.75 -3.18 -3.54 -1.86 -2.84
TNF 1.09 0.53 0.71 1.17 1.91 0.58 1.04 1.41 1.20 1.18 1.13 0.66
VEGFA -2.36 -2.85 -3.64 -3.53 -3.40 -4.28 -4.47 -4.47 -5.15 -5.51 -4.32 -4.67
df=pd.read_csv('../Data/virus_miniset0.txt', sep='\t')
len(df['Sample'])
df
Set the index, in order to properly transpose:
in tabular data, the top row should indicate the name of each column
in this data, the first header was named sample, with all the M prefixed names being the samples.
sample was renamed to protein to properly identify the column.
Current Data:
import pandas as pd
import matplotlib.pylot as plt
import seaborn as sns
df.set_index('protein', inplace=True)
Transpose:
df_sample = df.T
df_sample.reset_index(inplace=True)
df_sample.rename(columns={'index': 'sample'}, inplace=True)
df_sample.set_index('sample', inplace=True)
How many samples:
len(df_sample.index)
>>> 12
How many proteins / genes:
len(df_sample.columns)
>>> 36
Lowest average expression:
find the mean and then find the min
df_sample.mean().min() works, but doesn't include the protein name, just the value.
protein_avg = df_sample.mean()
protein_avg[protein_avg == df_sample.mean().min()]
>>> protein
IFNB1 -10.765
dtype: float64
The following boxplot of all genes, confirms IFNB1 as the protein with the lowest average expression across samples, and shows IL8 as the protein with highest average expression.
Boxplot:
seaborn to make your plots look nicer
plt.figure(figsize=(12, 8))
g = sns.boxplot(data=df_sample)
for item in g.get_xticklabels():
item.set_rotation(90)
plt.show()
Alternate Boxplot:
plt.figure(figsize=(8, 8))
sns.boxplot('IL6', data=df_sample, orient='v')
plt.show()
IL6 Histogram:
sns.distplot(df_sample.IL6)
plt.show()
Bonus Plot - Heatmap:
I thought you might like this
plt.figure(figsize=(20, 8))
sns.heatmap(df_sample, annot=True, annot_kws={"size": 7}, cmap='PiYG')
plt.show()
M-12 and M+SNV-48 are only half size in the plot. This will be resolved in the forthcoming matplotlib v3.1.2

Group by - select most recent 4 events

I have the following df in pandas:
df:
DATE STOCK DATA1 DATA2 DATA3
01/01/12 ABC 0.40 0.88 0.22
04/01/12 ABC 0.50 0.49 0.13
07/01/12 ABC 0.85 0.36 0.83
10/01/12 ABC 0.28 0.12 0.39
01/01/13 ABC 0.86 0.87 0.58
04/01/13 ABC 0.95 0.39 0.87
07/01/13 ABC 0.60 0.25 0.56
10/01/13 ABC 0.15 0.28 0.69
01/01/11 XYZ 0.94 0.40 0.50
04/01/11 XYZ 0.65 0.19 0.81
07/01/11 XYZ 0.89 0.59 0.69
10/01/11 XYZ 0.12 0.09 0.18
01/01/12 XYZ 0.25 0.94 0.55
04/01/12 XYZ 0.07 0.22 0.67
07/01/12 XYZ 0.46 0.08 0.54
10/01/12 XYZ 0.04 0.03 0.94
...
I want to group by the stocks, sort by date and then for specified columns (in this case DATA1 and DATA3), I want to get the last four items summed (TTM data).
The output would look like this:
DATE STOCK DATA1 DATA2 DATA3 DATA1_TTM DATA3_TTM
01/01/12 ABC 0.40 0.88 0.22 NaN NaN
04/01/12 ABC 0.50 0.49 0.13 NaN NaN
07/01/12 ABC 0.85 0.36 0.83 NaN NaN
10/01/12 ABC 0.28 0.12 0.39 2.03 1.56
01/01/13 ABC 0.86 0.87 0.58 2.49 1.92
04/01/13 ABC 0.95 0.39 0.87 2.94 2.66
07/01/13 ABC 0.60 0.25 0.56 2.69 2.39
10/01/13 ABC 0.15 0.28 0.69 2.55 2.70
01/01/11 XYZ 0.94 0.40 0.50 NaN NaN
04/01/11 XYZ 0.65 0.19 0.81 NaN NaN
07/01/11 XYZ 0.89 0.59 0.69 NaN NaN
10/01/11 XYZ 0.12 0.09 0.18 2.59 2.18
01/01/12 XYZ 0.25 0.94 0.55 1.90 2.23
04/01/12 XYZ 0.07 0.22 0.67 1.33 2.09
07/01/12 XYZ 0.46 0.08 0.54 0.89 1.94
10/01/12 XYZ 0.04 0.03 0.94 0.82 2.70
...
My approach so far has been to sort by date, then group, then iterate through each group and if there are 3 older events then the current event I sum. Also, I want to check to see if the dates fall within 1 year. Can anyone offer a better way in Python? Thank you.
Added: As a clarification for the 1 year part, let's say you take the last four dates and it goes 1/1/1993, 4/1/12, 7/1/12, 10/1/12 -- a data error. I wouldn't want to sum those four. I would want that one to say NaN.
For this I think you can use transform and rolling_sum. Starting from your dataframe, I might do something like:
>>> df["DATE"] = pd.to_datetime(df["DATE"]) # switch to datetime to ease sorting
>>> df = df.sort(["STOCK", "DATE"])
>>> rsum_columns = "DATA1", "DATA3"
>>> grouped = df.groupby("STOCK")[rsum_columns]
>>> new_columns = grouped.transform(lambda x: pd.rolling_sum(x, 4))
>>> df[new_columns.columns + "_TTM"] = new_columns
>>> df
DATE STOCK DATA1 DATA2 DATA3 DATA1_TTM DATA3_TTM
0 2012-01-01 00:00:00 ABC 0.40 0.88 0.22 NaN NaN
1 2012-04-01 00:00:00 ABC 0.50 0.49 0.13 NaN NaN
2 2012-07-01 00:00:00 ABC 0.85 0.36 0.83 NaN NaN
3 2012-10-01 00:00:00 ABC 0.28 0.12 0.39 2.03 1.57
4 2013-01-01 00:00:00 ABC 0.86 0.87 0.58 2.49 1.93
5 2013-04-01 00:00:00 ABC 0.95 0.39 0.87 2.94 2.67
6 2013-07-01 00:00:00 ABC 0.60 0.25 0.56 2.69 2.40
7 2013-10-01 00:00:00 ABC 0.15 0.28 0.69 2.56 2.70
8 2011-01-01 00:00:00 XYZ 0.94 0.40 0.50 NaN NaN
9 2011-04-01 00:00:00 XYZ 0.65 0.19 0.81 NaN NaN
10 2011-07-01 00:00:00 XYZ 0.89 0.59 0.69 NaN NaN
11 2011-10-01 00:00:00 XYZ 0.12 0.09 0.18 2.60 2.18
12 2012-01-01 00:00:00 XYZ 0.25 0.94 0.55 1.91 2.23
13 2012-04-01 00:00:00 XYZ 0.07 0.22 0.67 1.33 2.09
14 2012-07-01 00:00:00 XYZ 0.46 0.08 0.54 0.90 1.94
15 2012-10-01 00:00:00 XYZ 0.04 0.03 0.94 0.82 2.70
[16 rows x 7 columns]
I don't know what you're asking by "Also, I want to check to see if the dates fall within 1 year", so I'll leave that alone.

Categories

Resources