Convert multiple dataframes into a multi-index dataframe - python

I have a bunch of stock data downloaded from yahoo finance. Each dataframe looks like this:
Date Open High Low Close Adj Close Volume
0 2019-03-11 2.73 2.81 2.71 2.75 2.75 243900
1 2019-03-12 2.66 2.78 2.66 2.75 2.75 69200
2 2019-03-13 2.75 2.80 2.71 2.77 2.77 61200
3 2019-03-14 2.77 2.79 2.75 2.75 2.75 48800
4 2019-03-15 2.76 2.79 2.75 2.79 2.79 124400
.. ... ... ... ... ... ... ...
282 2020-04-22 3.61 3.75 3.61 3.71 3.71 312900
283 2020-04-23 3.74 3.77 3.66 3.76 3.76 99800
284 2020-04-24 3.78 3.78 3.63 3.63 3.63 89100
285 2020-04-27 3.70 3.70 3.55 3.64 3.64 60600
286 2020-04-28 3.70 3.74 3.64 3.70 3.70 248300
I need to concat the data so it looks like the below multi-index format and I'm at a loss. I've tried a number of pd.concat([list of dfs], zip(cols,symbols), axis=[0,1]) combos with no luck so any help is appreciated!
Adj Close Close High Low Open Volume
CHNR GNSS SGRP CHNR GNSS SGRP CHNR GNSS SGRP CHNR GNSS SGRP CHNR GNSS SGRP CHNR GNSS SGRP
Date
2019-04-30 1.85 3.08 0.69 1.85 3.08 0.69 1.94 3.10 0.70 1.74 3.05 0.67 1.74 3.07 0.70 24800 23900 30400
2019-05-01 1.81 3.15 0.65 1.81 3.15 0.65 1.85 3.17 0.69 1.75 3.06 0.62 1.76 3.09 0.67 15500 72800 85900
2019-05-02 1.80 3.12 0.66 1.80 3.12 0.66 1.87 3.16 0.66 1.76 3.10 0.65 1.80 3.16 0.65 12900 28100 97200
2019-05-03 1.85 3.14 0.67 1.85 3.14 0.67 1.89 3.19 0.69 1.74 3.06 0.62 1.74 3.12 0.62 43200 31300 27500
2019-05-06 1.85 3.13 0.66 1.85 3.13 0.66 1.89 3.25 0.69 1.75 3.11 0.65 1.79 3.11 0.67 37000 50200 31500
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-04-22 0.93 3.71 0.73 0.93 3.71 0.73 1.04 3.75 0.73 0.93 3.61 0.69 0.93 3.61 0.72 2600 312900 14600
2020-04-23 1.01 3.76 0.74 1.01 3.76 0.74 1.01 3.77 0.77 0.94 3.66 0.73 0.94 3.74 0.73 2500 99800 15200
2020-04-24 1.05 3.63 0.76 1.05 3.63 0.76 1.05 3.78 0.77 0.92 3.63 0.74 1.05 3.78 0.74 4400 89100 1300
2020-04-27 1.03 3.64 0.76 1.03 3.64 0.76 1.07 3.70 0.77 0.92 3.55 0.76 1.07 3.70 0.77 6200 60600 3500
2020-04-28 1.00 3.70 0.77 1.00 3.70 0.77 1.07 3.74 0.77 0.96 3.64 0.75 1.07 3.70 0.77 22300 248300 26100
EDIT per Quang Hoang's suggestion:
Tried:
ret = pd.concat(stock_data.values(), keys=stocks, axis=1)
ret = ret.swaplevel(0, 1, axis=1)
Got the following output which looks much closer but still off a bit:
Date Open High Low Close Adj Close Volume Date Open High Low Close Adj Close Volume Date Open High Low Close Adj Close Volume
CHNR CHNR CHNR CHNR CHNR CHNR CHNR GNSS GNSS GNSS GNSS GNSS GNSS GNSS SGRP SGRP SGRP SGRP SGRP SGRP SGRP
0 2010-04-29 11.39 11.74 11.39 11.57 11.57 3100 2019-03-11 2.73 2.81 2.71 2.75 2.75 243900.0 2010-04-29 0.79 0.79 0.79 0.79 0.79 0
1 2010-04-30 11.60 11.61 11.50 11.56 11.56 5400 2019-03-12 2.66 2.78 2.66 2.75 2.75 69200.0 2010-04-30 0.79 0.79 0.79 0.79 0.79 0
2 2010-05-03 11.95 11.95 11.22 11.44 11.44 19400 2019-03-13 2.75 2.80 2.71 2.77 2.77 61200.0 2010-05-03 0.79 0.79 0.79 0.79 0.79 0
3 2010-05-04 11.20 11.49 11.20 11.46 11.46 10700 2019-03-14 2.77 2.79 2.75 2.75 2.75 48800.0 2010-05-04 0.79 0.79 0.66 0.79 0.79 9700
4 2010-05-05 11.50 11.60 11.25 11.50 11.50 13400 2019-03-15 2.76 2.79 2.75 2.79 2.79 124400.0 2010-05-05 0.69 0.80 0.67 0.80 0.80 6700
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2512 2020-04-22 0.93 1.04 0.93 0.93 0.93 2600 NaT NaN NaN NaN NaN NaN NaN 2020-04-22 0.72 0.73 0.69 0.73 0.73 14600
2513 2020-04-23 0.94 1.01 0.94 1.01 1.01 2500 NaT NaN NaN NaN NaN NaN NaN 2020-04-23 0.73 0.77 0.73 0.74 0.74 15200
2514 2020-04-24 1.05 1.05 0.92 1.05 1.05 4400 NaT NaN NaN NaN NaN NaN NaN 2020-04-24 0.74 0.77 0.74 0.76 0.76 1300
2515 2020-04-27 1.07 1.07 0.92 1.03 1.03 6200 NaT NaN NaN NaN NaN NaN NaN 2020-04-27 0.77 0.77 0.76 0.76 0.76 3500
2516 2020-04-28 1.07 1.07 0.96 1.00 1.00 22300 NaT NaN NaN NaN NaN NaN NaN 2020-04-28 0.77 0.77 0.75 0.77 0.77 26100

Related

How to list the specific countries in a df which have NaN values?

I have created the df_nan below which shows the sum of NaN values from the main df, which, as seen, shows how many are in each specific column.
However, I want to create a new df, which has a column/index of countries, then another with the number of NaN values for the given country.
Country Number of NaN Values
Aruba 4
Finland 3
I feel like I have to use groupby, to create something along the lines of this below, but .isna is not an attribute of the groupby function. Any help would be great, thanks!
df_nan2= df_nan.groupby(['Country']).isna().sum()
Current code
import pandas as pd
import seaborn as sns
import numpy as np
from scipy.stats import spearmanr
# given dataframe df
df = pd.read_csv('countries.csv')
df.drop(columns= ['Population (millions)', 'HDI', 'GDP per Capita','Fish Footprint','Fishing Water',
'Urban Land','Earths Required', 'Countries Required', 'Data Quality'], axis=1, inplace = True)
df_nan= df.isna().sum()
Head of main df
0 Afghanistan Middle East/Central Asia 0.30 0.20 0.08 0.18 0.79 0.24 0.20 0.02 0.50 -0.30
1 Albania Northern/Eastern Europe 0.78 0.22 0.25 0.87 2.21 0.55 0.21 0.29 1.18 -1.03
2 Algeria Africa 0.60 0.16 0.17 1.14 2.12 0.24 0.27 0.03 0.59 -1.53
3 Angola Africa 0.33 0.15 0.12 0.20 0.93 0.20 1.42 0.64 2.55 1.61
4 Antigua and Barbuda Latin America NaN NaN NaN NaN 5.38 NaN NaN NaN 0.94 -4.44
5 Argentina Latin America 0.78 0.79 0.29 1.08 3.14 2.64 1.86 0.66 6.92 3.78
6 Armenia Middle East/Central Asia 0.74 0.18 0.34 0.89 2.23 0.44 0.26 0.10 0.89 -1.35
7 Aruba Latin America NaN NaN NaN NaN 11.88 NaN NaN NaN 0.57 -11.31
8 Australia Asia-Pacific 2.68 0.63 0.89 4.85 9.31 5.42 5.81 2.01 16.57 7.26
9 Austria European Union 0.82 0.27 0.63 4.14 6.06 0.71 0.16 2.04 3.07 -3.00
10 Azerbaijan Middle East/Central Asia 0.66 0.22 0.11 1.25 2.31 0.46 0.20 0.11 0.85 -1.46
11 Bahamas Latin America 0.97 1.05 0.19 4.46 6.84 0.05 0.00 1.18 9.55 2.71
12 Bahrain Middle East/Central Asia 0.52 0.45 0.16 6.19 7.49 0.01 0.00 0.00 0.58 -6.91
13 Bangladesh Asia-Pacific 0.29 0.00 0.08 0.26 0.72 0.25 0.00 0.00 0.38 -0.35
14 Barbados Latin America 0.56 0.24 0.14 3.28 4.48 0.08 0.00 0.02 0.19 -4.29
15 Belarus Northern/Eastern Europe 1.32 0.12 0.91 2.57 5.09 1.52 0.30 1.71 3.64 -1.45
16 Belgium European Union 1.15 0.48 0.99 4.43 7.44 0.56 0.03 0.28 1.19 -6.25
17 Benin Africa 0.49 0.04 0.26 0.51 1.41 0.44 0.04 0.34 0.88 -0.53
18 Bermuda North America NaN NaN NaN NaN 5.77 NaN NaN NaN 0.13 -5.64
19 Bhutan Asia-Pacific 0.50 0.42 3.03 0.63 4.84 0.28 0.34 4.38 5.27 0.43
Nan head
Country 0
Region 0
Cropland Footprint 15
Grazing Footprint 15
Forest Footprint 15
Carbon Footprint 15
Total Ecological Footprint 0
Cropland 15
Grazing Land 15
Forest Land 15
Total Biocapacity 0
Biocapacity Deficit or Reserve 0
dtype: int64
Suppose, you want to get Null count for each Country from "Cropland Footprint" column, then you can use the following code -
Unique_Country = df['Country'].unique()
Col1 = 'Cropland Footprint'
NullCount = []
for i in Unique_Country:
s = df[df['Country']==i][Col1].isnull().sum()
NullCount.append(s)
df2 = pd.DataFrame({'Country': Unique_Country,
'Number of NaN Values': NullCount})
df2 = df2[df2['Number of NaN Values']!=0]
df2
Output -
Country Number of NaN Values
Antigua and Barbuda 1
Aruba 1
Bermuda 1
If you want to get Null Count from another Column then just change the Value of Col1 variable.

Reading in a .txt file to get time series from rows of years and columns of monthly values

How could I read in a txt file like the one from
https://psl.noaa.gov/data/correlation/pna.data (example below)
1960 -0.16 -0.22 -0.69 -0.07 0.99 1.20 1.11 1.85 -0.01 0.48 -0.52 1.15
1961 1.16 0.17 0.28 -1.14 -0.25 1.84 -0.52 0.47 1.10 -1.94 -0.40 -1.54
1962 -0.74 -0.54 -0.71 -1.50 -1.11 -0.97 -0.36 0.57 -0.83 1.33 0.53 -0.38
1963 0.09 0.79 -2.04 -0.79 -0.95 0.50 -1.10 -1.01 0.87 0.93 -0.31 1.46
1964 -0.44 1.36 -1.31 -1.30 -2.27 0.27 0.20 0.83 0.92 0.80 -0.78 -2.03
1965 -0.92 -1.03 -0.80 -1.07 -0.42 1.89 -1.26 0.32 0.36 1.42 -0.81 -1.56
into a pandas dataframe to plot as a time series, for example from 1960-1965 with each value column (corresponding to months) being plotted? I rarely use .txt's
Here's what you can try:
import pandas as pd
import requests
import re
aa=requests.get("https://psl.noaa.gov/data/correlation/pna.data").text
aa=aa.split("\n")[1:-4]
aa=list(map(lambda x:x[1:],aa))
aa="\n".join(aa)
aa=re.sub(" +",",",aa)
with open("test.csv","w") as f:
f.write(aa)
df=pd.read_csv("test.csv", header=None, index_col=0).rename_axis('Year')
df.columns=list(pd.date_range(start='2021-01', freq='M', periods=12).month_name())
print(df.head())
df.to_csv("test.csv")
This is going to give you, in test.csv file:
Year
January
February
March.....
up to December
1948
73
67
67
773....
1949
73
67
67
773....
1950
73
67
67
773....
....
..
..
..
.......
....
..
..
..
.......
2021
73
88
84
733....
Use pd.read_fwf as suggested by #SanskarSingh
>>> pd.read_fwf('data.txt', header=None, index_col=0).rename_axis('Year')
1 2 3 4 5 6 7 8 9 10 11 12
Year
1960 -0.16 -0.22 -0.69 -0.07 0.99 1.20 1.11 1.85 -0.01 0.48 -0.52 1.15
1961 1.16 0.17 0.28 -1.14 -0.25 1.84 -0.52 0.47 1.10 -1.94 -0.40 -1.54
1962 -0.74 -0.54 -0.71 -1.50 -1.11 -0.97 -0.36 0.57 -0.83 1.33 0.53 -0.38
1963 0.09 0.79 -2.04 -0.79 -0.95 0.50 -1.10 -1.01 0.87 0.93 -0.31 1.46
1964 -0.44 1.36 -1.31 -1.30 -2.27 0.27 0.20 0.83 0.92 0.80 -0.78 -2.03
1965 -0.92 -1.03 -0.80 -1.07 -0.42 1.89 -1.26 0.32 0.36 1.42 -0.81 -1.56

How to make histogram using pandas

In this problem, a .txt file is read using pandas. The number of genes needs to be calculated, and a histogram needs to be made for the specific sample and the amount of interaction with each gene.
I have tried using .transpose(), as well as, using value_counts() to access the appropriate information; however, because of it being in a row, and the way the table is set up, I cannot figure out how to get the appropriate histogram.
Use Pandas to read the file. Write a program to answer the following questions:
How many samples are in the data set?
How many genes are in the data set?
Which sample has the lowest average expression of genes?
Plot a histogram showing the distribution of the IL6 expression
across all samples.
Data:
protein M-12 M-24 M-36 M-48 M+ANDV-12 M+ANDV-24 M+ANDV-36 M+ANDV-48 M+SNV-12 M+SNV-24 M+SNV-36 M+SNV-48
ARG1 -11.67 -9.92 -4.37 -11.92 -3.62 -9.38 -11.54 -4.88 -3.59 -2.96 -4.95 -4.31
CASP3 0.05 -0.05 -0.18 0.02 0.04 0.14 -0.35 -0.41 0.24 0.23 -0.40 -0.36
CASP7 -1.40 -0.05 -0.78 -1.33 -0.43 0.63 -1.39 -0.95 0.81 1.45 0.09 0.11
CCL22 -0.96 1.47 0.37 -1.48 1.34 2.72 -11.12 -1.05 -0.63 1.42 0.30 0.12
CCL5 -5.59 -3.84 -4.64 -5.84 -5.19 -5.24 -5.45 -5.45 -2.86 -4.53 -4.80 -6.46
CCR7 -11.26 -9.50 -2.96 -11.50 -2.35 -2.31 -11.12 -3.66 -3.18 -1.31 -2.48 -2.84
CD14 2.85 4.14 3.87 4.33 1.16 3.28 3.68 3.74 1.20 2.80 3.23 2.79
CD200R1 -11.67 -9.92 -5.37 -11.92 -4.61 -9.38 -11.54 -11.54 -3.59 -2.96 -4.54 -4.89
CD274 -5.59 -9.92 -4.64 -5.84 -1.78 -3.30 -5.45 -5.45 -4.17 -10.61 -4.80 -4.48
CD80 -6.57 -9.50 -4.96 -6.82 -6.17 -4.28 -6.43 -6.43 -3.18 -5.51 -5.12 -4.16
CD86 0.14 0.94 0.87 1.12 -0.23 0.58 1.09 0.66 -0.15 0.42 0.74 0.49
CXCL10 -6.57 -2.85 -4.96 -6.82 -4.20 -2.31 -4.47 -4.47 -2.38 -2.74 -5.12 -4.67
CXCL11 -5.28 -9.50 -5.63 -11.50 -10.85 -8.97 -11.12 -11.12 -9.83 -10.20 -5.79 -6.14
IDO1 -5.02 -9.92 -4.37 -5.26 -4.61 -2.72 -4.88 -4.88 -2.60 -3.96 -4.54 -5.88
IFNA1 -11.67 -9.92 -5.37 -5.26 -11.27 -9.38 -11.54 -4.88 -3.59 -10.61 -6.52 -5.88
IFNB1 -11.67 -9.92 -6.35 -11.92 -11.27 -9.38 -11.54 -11.54 -10.25 -10.61 -12.19 -12.54
IFNG -2.09 -1.21 -1.66 -2.24 -2.75 -2.50 -2.83 -3.22 -2.48 -1.60 -2.13 -2.48
IFR3 -0.39 0.05 -0.21 0.15 -0.27 0.07 -0.01 -0.11 -0.28 0.28 0.04 -0.09
IL10 -1.53 -0.21 -0.51 0.45 -3.40 -1.00 -0.51 -0.04 -2.38 -1.55 -0.25 -0.72
IL12A -11.67 -9.92 -4.79 -11.92 -3.30 -3.71 -11.54 -11.54 -10.25 -3.38 -4.22 -4.09
IL15 -1.91 -2.53 -3.50 -3.85 -2.75 -9.38 -4.15 -4.15 -2.19 -2.09 -2.81 -3.16
IL1A -4.28 -2.53 -2.26 -3.39 -2.12 -0.51 -11.54 -2.67 -1.73 -1.75 -2.13 -1.84
IL1B -1.61 -2.53 -0.31 -0.16 0.77 -3.30 -1.95 -0.21 -1.73 -2.55 -0.65 -0.64
IL1RN 3.14 -0.40 -1.54 -3.53 3.95 0.76 0.15 -3.15 3.34 0.95 -1.23 -1.02
IL6 -4.60 -0.21 -1.82 -3.53 -1.25 0.76 -11.12 -2.47 -0.94 -0.60 -1.61 -1.74
IL8 5.43 5.04 4.57 4.22 5.67 5.06 4.30 4.53 4.84 4.53 4.25 3.79
IRF7 0.14 0.97 -0.13 -0.72 0.83 1.85 -0.19 -0.19 1.01 0.62 0.07 -0.03
ITGAM -1.68 0.91 0.28 -0.12 0.67 1.73 -0.30 -0.07 1.21 1.28 0.71 1.21
NFKB1 0.80 0.31 0.29 0.43 1.21 -0.74 0.39 0.02 0.15 -0.02 0.01 -0.09
NOS2 -11.26 -3.52 -4.50 -5.52 -4.87 -2.98 -5.14 -5.14 -3.85 -4.22 -5.79 -6.14
PPARG 0.68 0.23 0.02 -1.16 0.56 1.38 0.80 -0.95 1.17 1.04 1.09 0.94
TGFB1 3.99 3.21 2.41 2.62 4.05 3.48 2.87 2.15 3.68 2.97 2.46 2.31
TLR3 -3.61 -1.85 -1.72 -11.92 -2.40 -1.32 -11.54 -11.54 -0.57 0.09 -1.32 -1.60
TLR7 -3.80 -2.05 -1.64 -0.35 -6.17 -4.28 -2.47 -1.75 -3.18 -3.54 -1.86 -2.84
TNF 1.09 0.53 0.71 1.17 1.91 0.58 1.04 1.41 1.20 1.18 1.13 0.66
VEGFA -2.36 -2.85 -3.64 -3.53 -3.40 -4.28 -4.47 -4.47 -5.15 -5.51 -4.32 -4.67
df=pd.read_csv('../Data/virus_miniset0.txt', sep='\t')
len(df['Sample'])
df
Set the index, in order to properly transpose:
in tabular data, the top row should indicate the name of each column
in this data, the first header was named sample, with all the M prefixed names being the samples.
sample was renamed to protein to properly identify the column.
Current Data:
import pandas as pd
import matplotlib.pylot as plt
import seaborn as sns
df.set_index('protein', inplace=True)
Transpose:
df_sample = df.T
df_sample.reset_index(inplace=True)
df_sample.rename(columns={'index': 'sample'}, inplace=True)
df_sample.set_index('sample', inplace=True)
How many samples:
len(df_sample.index)
>>> 12
How many proteins / genes:
len(df_sample.columns)
>>> 36
Lowest average expression:
find the mean and then find the min
df_sample.mean().min() works, but doesn't include the protein name, just the value.
protein_avg = df_sample.mean()
protein_avg[protein_avg == df_sample.mean().min()]
>>> protein
IFNB1 -10.765
dtype: float64
The following boxplot of all genes, confirms IFNB1 as the protein with the lowest average expression across samples, and shows IL8 as the protein with highest average expression.
Boxplot:
seaborn to make your plots look nicer
plt.figure(figsize=(12, 8))
g = sns.boxplot(data=df_sample)
for item in g.get_xticklabels():
item.set_rotation(90)
plt.show()
Alternate Boxplot:
plt.figure(figsize=(8, 8))
sns.boxplot('IL6', data=df_sample, orient='v')
plt.show()
IL6 Histogram:
sns.distplot(df_sample.IL6)
plt.show()
Bonus Plot - Heatmap:
I thought you might like this
plt.figure(figsize=(20, 8))
sns.heatmap(df_sample, annot=True, annot_kws={"size": 7}, cmap='PiYG')
plt.show()
M-12 and M+SNV-48 are only half size in the plot. This will be resolved in the forthcoming matplotlib v3.1.2

Creating a heatmap using python and csv file

I'm trying to create a heatmap, with the x axis being time, the y axis being detectors (it's for freeway speed detection), and the colour scheme and numbers on the graph being for occupancy or basically what values the csv has at that time and detector.
My first thought is to use matplotlib in conjunction with pandas and numpy.
I've been trying lots of different approaches and feel like i've hit a brickwall in terms of getting it working.
Does anyone have a good idea about using these tools?
Cheers!
Row Labels 14142OB_L1 14142OB_L2 14140OB_E1P0 14140OB_E1P1 14140OB_E2P0 14140OB_E2P1 14140OB_L1 14140OB_L2 14140OB_M1P0 14140OB_M1P1 14140OB_M2P0 14140OB_M2P1 14140OB_M3P0 14140OB_M3P1 14140OB_S1P0 14140OB_S1P1 14140OB_S2P0 14140OB_S2P1 14140OB_S3P0 14140OB_S3P1 14138OB_L1 14138OB_L2 14138OB_L3 14136OB_L1 14136OB_L2 14136OB_L3 14134OB_L1 14134OB_L2 14134OB_L3 14132OB_L1 14132OB_L2 14132OB_L3
00 - 01 hr 0.22 1.42 0.29 0.29 0.59 0.59 0.17 1.47 0.38 0.38 0.56 0.6 0.08 0.1 0.67 0.7 0.88 0.9 0.15 0.17 0.17 1.66 0.47 0.16 1.6 0.49 0.14 0.94 1.21 0.21 1.22 0.44
01 - 02 hr 0.08 0.77 0.08 0.07 0.24 0.24 0.1 0.73 0.08 0.09 0.21 0.23 0.05 0.06 0.21 0.23 0.29 0.29 0.1 0.1 0.08 0.83 0.17 0.1 0.77 0.18 0.08 0.4 0.57 0.07 0.64 0.18
02 - 03 hr 0.08 0.73 0.06 0.06 0.23 0.23 0.06 0.73 0.07 0.07 0.23 0.24 0.02 0.02 0.16 0.17 0.32 0.34 0.06 0.07 0.06 0.77 0.16 0.06 0.78 0.17 0.07 0.3 0.66 0.06 0.68 0.19
03 - 04 hr 0.05 0.85 0.06 0.06 0.22 0.23 0.04 0.86 0.05 0.05 0.2 0.21 0.1 0.11 0.11 0.12 0.32 0.33 0.15 0.16 0.03 0.93 0.14 0.03 0.89 0.15 0.03 0.41 0.61 0.02 0.73 0.21
04 - 05 hr 0.13 1.25 0.09 0.09 0.24 0.24 0.12 1.25 0.11 0.11 0.2 0.21 0.08 0.09 0.19 0.2 0.32 0.34 0.15 0.15 0.1 1.33 0.18 0.11 1.35 0.19 0.11 0.52 1 0.07 1.08 0.29
05 - 06 hr 0.91 2.87 0.08 0.08 0.66 0.69 0.8 2.96 0.15 0.17 0.43 0.45 0.32 0.33 0.39 0.41 0.76 0.82 0.47 0.49 0.59 3.27 0.51 0.58 3.19 0.56 0.45 1.85 2.19 0.43 2.52 0.79
06 - 07 hr 3.92 5.44 1.29 1.14 4.03 4.12 3.19 6.03 1.66 1.69 3.26 3.44 1.84 1.93 13.03 14.97 13.81 19.23 4.69 5.59 3.03 6.72 3.01 2.78 6.81 3.02 1.52 4.22 7.13 2.54 5.94 2.88
07 - 08 hr 4.68 6.35 1.67 1.8 5.69 5.95 4.01 6.81 2.69 2.78 3.84 4.03 3.27 4.05 24.25 24.39 28.07 36.5 15.39 15.38 3.79 7.91 4.28 3.58 7.91 4.33 1.67 6.16 8.3 3.17 6.59 3.74
08 - 09 hr 5.21 6.31 2.51 2.82 7.46 7.72 4.53 6.65 9.03 8.98 13.94 12.77 6.73 8.55 47 48.38 50.08 48.32 22.83 21.91 4.29 8.27 5.04 4.15 8.27 5.16 2.44 6.24 9.17 3.26 6.81 4.16
09 - 10 hr 4.05 6.17 1.01 0.99 4.47 4.55 3.45 6.53 1.68 1.74 3.12 3.24 1.82 1.98 16.49 16.22 15.58 20.36 4.31 5.2 3.36 7.24 3.55 3.03 7.36 3.73 1.89 5.64 6.75 2.24 5.94 3.26
10 - 11 hr 3.62 6.64 1.14 1.15 4.11 4.18 3.23 6.87 1.79 1.87 3.03 3.13 1.72 1.89 15.02 18.75 17.25 22.61 3.06 3.24 3.06 7.69 3.23 2.87 7.49 3.56 2.06 4.99 7.05 2.26 6.2 3.07
11 - 12 hr 4.31 6.74 1.29 1.3 4.91 4.97 3.79 6.88 2.25 2.35 3.97 4.29 1.84 1.98 19.58 22.5 24.92 23.14 3.27 3.46 3.65 7.67 3.96 3.43 7.74 4 2.39 5.4 7.67 2.57 6.42 3.22
12 - 13 hr 4.53 6.9 1.4 1.39 5.81 5.9 3.96 7.18 2.69 2.86 4.94 5.28 2.15 2.29 24.46 28.34 36.59 31.06 5.4 5.39 3.95 7.98 4.54 3.7 8.03 4.69 2.36 5.99 8.29 3.01 6.61 3.37
13 - 14 hr 6.13 7.29 1.57 1.55 6.02 6.11 5.34 7.74 2.67 2.76 5.2 5.56 2.04 2.16 23.74 28.31 31.01 36.89 4.15 4.6 5.22 8.83 4.77 4.96 8.84 4.92 2.65 6.56 9.77 3.96 7.23 3.88
14 - 15 hr 8.72 8.22 2.93 3.06 8.58 8.9 8.94 9.57 17.69 17.2 18.99 23.58 2.37 3.69 38.81 53.33 49.93 45.42 5.69 4.3 8.13 10.04 5.45 7.03 9.94 5.51 3.59 7.41 12.4 5.92 8.04 4.4
15 - 16 hr 13.26 9.75 15.68 18.3 22.21 23.25 10.8 9.06 35.31 37.1 36.27 35.89 3.14 2.91 47.93 54.86 51.96 50.74 6.27 5.77 11.82 12.78 7.62 12.03 12.5 6.55 4.71 9.21 17.87 9.06 9.33 4.5
16 - 17 hr 18.25 14.92 4.95 4.63 9.68 10.2 20.14 16.68 21.38 21.39 23.92 28.11 1.75 1.86 48.15 47.31 46.65 50.4 3.46 3.31 21.52 16.97 7.37 18.47 14.84 7.51 6.88 15.52 27.8 11.17 9.35 5.34
17 - 18 hr 13.82 9.76 31.23 31.46 34.89 36.06 13.72 11.14 41.24 44.5 42 47.07 1.6 1.62 57.4 58.92 57.23 62.92 3.41 8.01 20.26 20.35 15.25 21.49 20.5 9.31 12.27 17.3 34.46 22.89 20.56 12.04
18 - 19 hr 7.51 5.81 50.48 49.94 45.97 46.43 8.65 5.95 49.26 48.28 51.04 46.46 2 3.04 56.08 56.39 54.95 59.06 3.18 6.47 13.44 13.73 25.79 17.67 21.52 19.26 6.35 11.52 22.13 11.31 10.4 5.42
19 - 20 hr 3.96 5.01 2.77 2.71 6.62 6.87 3.65 5.19 7.72 7.86 9.5 10.44 1.17 1.44 23.6 30.16 28.82 30.87 1.73 1.76 3.6 6.52 4.04 3.38 6.51 4.03 1.88 5.05 7.15 2.99 5.44 3.1
20 - 21 hr 2.16 3.72 1.75 1.74 3.96 4.02 2.03 3.72 2.62 2.73 4.32 4.54 0.76 0.79 18.41 23.69 30.91 31.05 1.31 1.26 2.1 4.76 2.97 1.93 4.75 2.97 1.43 3.43 4.9 1.73 3.9 2.27
21 - 22 hr 2.03 3.81 1.49 1.47 2.97 2.99 2 3.79 2.11 2.15 3.07 3.27 0.37 0.4 12.96 14.05 15.49 17.93 0.64 0.67 1.86 4.87 2.35 1.75 4.88 2.29 1.14 3.4 4.44 1.57 3.89 1.92
22 - 23 hr 1.33 3.2 1.21 1.22 2.46 2.5 1.21 3.23 1.75 1.79 2.36 2.48 0.35 0.38 6.19 9.26 10.48 12.16 0.57 0.58 1.28 3.85 2 1.23 3.84 1.96 0.82 2.74 3.55 1.12 3.29 1.73
23 - 24 hr 0.65 2.43 0.49 0.49 1.41 1.44 0.69 2.35 0.69 0.7 1.3 1.38 0.19 0.21 1.51 1.66 2.46 2.45 0.41 0.42 0.71 2.63 1.06 0.59 2.73 1.04 0.4 1.8 2.25 0.58 2.28 0.94
Grand Total 4.57 5.26 5.23 5.32 7.64 7.85 4.36 5.56 8.54 8.73 9.83 10.29 1.49 1.74 20.68 23.05 23.71 25.17 3.78 4.1 4.84 6.98 4.5 4.79 7.21 3.98 2.39 5.29 8.59 3.84 5.63 2.97
Here is the current script I'm using.
read_occupancy = pd.read_csv (r'C:\Users\holborm\Desktop\Visualisation\dataaxisplotstuff.csv') #read the csv file (put 'r' before the path string to address any special characters, such as '\'). Don't forget to put the file name at the end of the path + ".csv"
df = DataFrame(read_occupancy) # assign column names
#create time and detector name axis
time_axis = df.index
detector_axis = df.columns
plt.plot(df)
Using Seaborn
read_occupancy = pd.read_csv (r'C:\Users\holborm\Desktop\Visualisation\dataaxisplotstuff.csv') #read the csv file (put 'r' before the path string to address any special characters, such as '\'). Don't forget to put the file name at the end of the path + ".csv"
df = DataFrame(read_occupancy) # assign column names
#create time and detector name axis
sns.heatmap(df)
Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-33a3388e21cc> in <module>()
6 #create time and detector name axis
7
----> 8 sns.heatmap(df)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py in heatmap(data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, linewidths, linecolor, cbar, cbar_kws, cbar_ax, square, xticklabels, yticklabels, mask, ax, **kwargs)
515 plotter = _HeatMapper(data, vmin, vmax, cmap, center, robust, annot, fmt,
516 annot_kws, cbar, cbar_kws, xticklabels,
--> 517 yticklabels, mask)
518
519 # Add the pcolormesh kwargs here
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py in __init__(self, data, vmin, vmax, cmap, center, robust, annot, fmt, annot_kws, cbar, cbar_kws, xticklabels, yticklabels, mask)
166 # Determine good default values for the colormapping
167 self._determine_cmap_params(plot_data, vmin, vmax,
--> 168 cmap, center, robust)
169
170 # Sort out the annotations
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py in _determine_cmap_params(self, plot_data, vmin, vmax, cmap, center, robust)
203 cmap, center, robust):
204 """Use some heuristics to set good defaults for colorbar and range."""
--> 205 calc_data = plot_data.data[~np.isnan(plot_data.data)]
206 if vmin is None:
207 vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
You can use .set_index('Row Labels) to ensure your Row Labels column is interpreted as an axis for the heatmap and transpose your DataFrame with .T so that you get the time along the x-axis and the detectors for the y-axis.
sns.heatmap(df.set_index('Row Labels').T)

Pandas: importing Date and 12 hour Time together

I have the following txt file:
Temp Hi Low Out Dew Wind Wind Wind Hi Hi Wind Heat THW THSW Rain Solar Solar Hi Solar UV UV Hi Heat Cool In In In In In In Air Wind Wind ISS Arc.
Date Time Out Temp Temp Hum Pt. Speed Dir Run Speed Dir Chill Index Index Index Bar Rain Rate Rad. Energy Rad. Index Dose UV D-D D-D Temp Hum Dew Heat EMC Density ET Samp Tx Recept Int.
01/01/16 12:30 a 13.8 13.8 13.6 88 11.9 0.0 --- 0.00 0.0 --- 13.8 13.8 13.8 12.4 1012.3 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.094 0.000 21.5 50 10.6 20.7 9.25 1.1823 0.00 702 1 100.0 30
01/01/16 1:00 a 13.6 13.8 13.2 88 11.7 0.0 --- 0.00 0.0 --- 13.6 13.6 13.6 12.2 1012.2 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.098 0.000 21.5 50 10.6 20.7 9.25 1.1823 0.00 702 1 100.0 30
01/01/16 1:30 a 14.5 14.5 13.6 81 11.3 0.0 --- 0.00 0.0 --- 14.5 14.4 14.4 12.9 1012.2 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.080 0.000 21.5 50 10.6 20.7 9.25 1.1822 0.00 703 1 100.0 30
01/01/16 2:00 a 15.2 15.2 14.5 75 10.8 0.0 --- 0.00 0.0 --- 15.2 14.9 14.9 13.4 1012.0 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.066 0.000 21.4 49 10.2 20.5 9.05 1.1829 0.00 702 1 100.0 30
01/01/16 2:30 a 14.4 15.2 14.0 79 10.8 0.0 --- 0.00 0.0 --- 14.4 14.2 14.2 12.8 1012.2 0.20 0.0 0 0.00 0 0.0 0.00 0.0 0.082 0.000 21.4 48 9.9 20.4 8.86 1.1834 0.00 703 1 100.0 30
01/01/16 3:00 a 15.1 15.1 14.1 76 10.9 0.0 --- 0.00 0.0 --- 15.1 14.8 14.8 13.4 1011.9 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.068 0.000 21.4 48 9.9 20.4 8.86 1.1830 0.00 700 1 100.0 30
01/01/16 3:30 a 14.9 15.2 14.9 73 10.1 0.0 --- 0.00 0.0 --- 14.9 14.6 14.6 13.2 1011.9 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.071 0.000 21.4 47 9.6 20.3 8.75 1.1833 0.00 702 1 100.0 30
01/01/16 4:00 a 15.2 15.3 14.9 68 9.4 0.0 --- 0.00 0.0 --- 15.2 14.8 14.8 13.3 1011.9 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.065 0.000 21.4 47 9.6 20.3 8.75 1.1833 0.00 700 1 100.0 30
01/01/16 4:30 a 14.9 15.2 14.6 72 9.9 0.0 --- 0.00 0.0 --- 14.9 14.6 14.6 13.1 1011.8 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.072 0.000 21.3 46 9.2 20.2 8.64 1.1838 0.00 703 1 100.0 30
01/01/16 5:00 a 14.1 15.1 14.0 76 9.9 0.0 --- 0.00 0.0 --- 14.1 13.8 13.8 12.3 1012.1 0.00 0.0 0 0.00 0 0.0 0.00 0.0 0.088 0.000 21.3 46 9.2 20.2 8.64 1.1842 0.00 702 1 100.0 30
and I want to import it into a Data Frame but with one column contating the date and the time in 24 hour display together:
Time
01/01/16 12:30
.....
01/01/16 13:30
Is there an easy way to do this ?
Thank you !!
try this:
For dd/mm/yy format:
def parse_dt(dt, tm, ap):
return pd.to_datetime(dt + ' ' + tm + ap, dayfirst=True)
For mm/dd/yy format:
def parse_dt(dt, tm, ap):
return pd.to_datetime(dt + ' ' + tm + ap)
Parse CSV:
df = pd.read_csv(filename, sep='\s+', skiprows=2, header=None,
parse_dates={'ts': [0,1,2] }, date_parser=parse_dt)

Categories

Resources