Issue combining columns in Dataframe? - python

I have the following dataframe:
Obj BIT BIT BIT GAS GAS GAS OIL OIL OIL
Date
2007-01-03 18 7 0 184 35 2 52 14 0
2007-01-09 43 3 0 249 35 2 68 11 1
2007-01-16 60 6 0 254 35 5 72 13 1
2007-01-23 69 11 1 255 43 2 81 6 0
2007-01-30 74 8 0 263 29 4 69 9 0
2007-02-06 78 6 1 259 34 2 79 6 0
2007-02-14 76 9 1 263 24 2 70 10 1
2007-02-20 85 7 0 241 20 6 72 4 0
2007-02-27 79 6 0 242 35 3 68 7 0
2007-03-06 68 14 0 225 26 2 57 10 1
How can I sum each of the 9 columns into 3 columns. "BIT","GAS" and "OIL"
This is the code for the dataframe which basically just gets me a cross section from a larger df I want:
ABrigsA = ndfAB.xs(['BIT','GAS','OIL'],axis=1)
Any suggestions?

Assuming that you want to sum similarly-named columns, you can use groupby [tutorial docs]:
>>> df.groupby(level=0, axis='columns').sum()
Obj BIT GAS OIL
Date
2007-01-03 25 221 66
2007-01-09 46 286 80
2007-01-16 66 294 86
2007-01-23 81 300 87
2007-01-30 82 296 78
2007-02-06 85 295 85
2007-02-14 86 289 81
2007-02-20 92 267 76
2007-02-27 85 280 75
2007-03-06 82 253 68

Related

How to plot multiple chart on one figure and combine with another?

# Create an axes object
axes = plt.gca()
# pass the axes object to plot function
df.plot(kind='line', x='鄉鎮別', y='男', ax=axes,figsize=(10,8));
df.plot(kind='line', x='鄉鎮別', y='女', ax=axes,figsize=(10,8));
df.plot(kind='line', x='鄉鎮別', y='合計(男+女)', ax=axes,figsize=(10,8),title='hihii',
xlabel='鄉鎮別',ylabel='人數')
It's my data.
鄉鎮別 鄰數 戶數 男 女 合計(男+女) 遷入 遷出 出生 死亡 結婚 離婚
0 苗栗市 715 32517 42956 43362 86318 212 458 33 65 28 13
1 苑裡鎮 362 15204 22979 21040 44019 118 154 17 24 9 7
2 通霄鎮 394 11557 17034 15178 32212 73 113 5 33 3 3
3 竹南鎮 518 32061 44069 43275 87344 410 392 31 59 35 11
4 頭份市 567 38231 52858 52089 104947 363 404 39 69 31 19
5 後龍鎮 367 12147 18244 16274 34518 93 144 12 41 2 7
6 卓蘭鎮 176 5861 8206 7504 15710 29 51 1 11 2 0
7 大湖鄉 180 5206 7142 6238 13380 31 59 5 21 3 2
8 公館鄉 281 10842 16486 15159 31645 89 169 12 32 5 3
9 銅鑼鄉 218 6106 8887 7890 16777 57 62 7 13 4 1
10 南庄鄉 184 3846 5066 4136 9202 22 48 1 10 0 2
11 頭屋鄉 120 3596 5289 4672 9961 59 53 2 11 4 4
12 三義鄉 161 5625 8097 7205 15302 47 63 3 12 3 5
13 西湖鄉 108 2617 3653 2866 6519 38 20 1 17 3 0
14 造橋鄉 115 4144 6276 5545 11821 44 64 3 11 3 2
15 三灣鄉 93 2331 3395 2832 6227 27 18 2 9 0 2
16 獅潭鄉 98 1723 2300 1851 4151 28 10 1 4 0 0
17 泰安鄉 64 1994 3085 2642 5727 36 26 2 8 4 1
18 總計 4721 195608 276022 259758 535780 1776 2308 177 450 139 82
This my output df.plot
First question is how to display Chinese?
Second is can I use without df.plot to plot line chart?
last question is : There are four graphs(use subplot): the line graphs of male and female population and total population(男、女、合計(男+女)) in each township; the line graphs of in-migration and out-migration(遷入和遷出); the long bar graphs of household number(戶數); and the line graphs of births and deaths(出生和死亡).

How do I make each group within a dataframe the same size?

I have the following dataframe:
Patient
HR
02
PaO2
Hgb
1
62
94
73
31
1
64
93
73
34
1
62
92
73
31
2
64
90
84
42
3
62
95
75
30
3
70
97
77
29
Each row for a patient indicates an hourly observation. So, patient 1 has three observations, patient 2 has one observation and patient 3 has two observations. I'm trying to find a way to pad each patient group so that they are the same size (the same number of observations) as I'm trying to use this data for an LSTM. I'm not sure what the best way to do this would be though. I was wondering if anyone had any ideas?
The output would hopefully look like this:
Patient
HR
02
PaO2
Hgb
1
62
94
73
31
1
64
93
73
34
1
62
92
73
31
2
64
90
84
42
2
0
0
0
0
2
0
0
0
0
3
62
95
75
30
3
70
97
77
29
3
0
0
0
0
Reindex your original data to a pandas.MultiIndex on the Patient and Cumulative Count:
df = df.set_index(["Patient", df.groupby("Patient").cumcount()])
index = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
output = df.reindex(index, fill_value=0).reset_index(level=1, drop=True).reset_index()
>>> output
Patient HR 02 PaO2 Hgb
0 1 62 94 73 31
1 1 64 93 73 34
2 1 62 92 73 31
3 2 64 90 84 42
4 2 0 0 0 0
5 2 0 0 0 0
6 3 62 95 75 30
7 3 70 97 77 29
8 3 0 0 0 0

From Matlab to Python Code [z,index]=sort(abs(z));

i am trying to convert code from matlab to python.
Can you please help me to convert this code from matlab to python?
in matlab code
z is list and z length is 121
z= 7.0502 5.8030 4.4657 3.0404 1.5416 0 -1.5416 -3.0404 -4.4657
-5.8030 -7.0502 7.5944 6.3059 4.8990 3.3662 1.7189 0 -1.7189 -3.3662 -4.8990 -6.3059 -7.5944 8.2427 6.9282 5.4611 3.8122 1.9735 0 -1.9735 -3.8122 -5.4611 -6.9282 -8.2427 9.0135 7.7027 6.2075 4.4590 2.3803 0 -2.3803 -4.4590 -6.2075 -7.7027 -9.0135 9.9185 8.6576 7.2038 5.4466 3.1530 0 -3.1530 -5.4466 -7.2038 -8.6576 -9.9185 10.9545 9.7980 8.4853 6.9282 4.8990 0 -4.8990 -6.9282 -8.4853 -9.7980 -10.9545 12.0986 11.0885 9.9947 8.8128 7.6119 -6.9282 -7.6119 -8.8128 -9.9947 -11.0885 -12.0986 13.3133 12.4632 11.5988 10.7649 10.0829 -9.7980 -10.0829 -10.7649 -11.5988 -12.4632 -13.3133 14.5583 13.8564 13.1842 12.5910 12.1612 -12.0000 -12.1612 -12.5910 -13.1842 -13.8564 -14.5583 15.8011 15.2238 14.6969 14.2594 13.9626 -13.8564 -13.9626 -14.2594 -14.6969 -15.2238 -15.8011 17.0207 16.5431 16.1227 15.7875 15.5684 -15.4919 -15.5684 -15.7875 -16.1227 -16.5431 -17.0207
Matlab code : [z,index]=sort(abs(z));
after the code
z = 0 0 0 0 0 0 1.5416 1.5416 1.7189 1.7189 1.9735 1.9735 2.3803 2.3803 3.0404 3.0404 3.1530 3.1530 3.3662 3.3662 3.8122 3.8122 4.4590 4.4590 4.4657 4.4657 4.8990 4.8990 4.8990 4.8990 5.4466 5.4466 5.4611 5.4611 5.8030 5.8030 6.2075 6.2075 6.3059 6.3059 6.9282 6.9282 6.9282 6.9282 6.9282 7.0502 7.0502 7.2038 7.2038 7.5944 7.5944 7.6119 7.6119 7.7027 7.7027 8.2427 8.2427 8.4853 8.4853 8.6576 8.6576 8.8128 8.8128 9.0135 9.0135 9.7980 9.7980 9.7980 9.9185 9.9185 9.9947 9.9947 10.0829 10.0829 10.7649 10.7649 10.9545 10.9545 11.0885 11.0885 11.5988 11.5988 12.0000 12.0986 12.0986 12.1612 12.1612 12.4632 12.4632 12.5910
12.5910 13.1842 13.1842 13.3133 13.3133 13.8564 13.8564 13.8564 13.9626 13.9626 14.2594 14.2594 14.5583 14.5583 14.6969 14.6969 15.2238 15.2238 15.4919 15.5684 15.5684 15.7875 15.7875 15.8011 15.8011 16.1227 16.1227 16.5431 16.5431 17.0207 17.0207
and index is
index = 6 17 28 39 50 61 5 7 16 18 27 29 38 40 4 8 49 51 15 19 26 30 37 41 3 9 14 20 60 62 48 52 25 31 2 10 36 42 13 21 24 32 59 63 72 1 11 47 53 12 22 71 73 35 43 23 33 58 64 46 54 70 74 34 44 57 65 83 45 55 69 75 82 84 81 85 56 66 68 76 80 86 94 67 77 93 95 79 87 92 96 91 97 78 88 90 98 105 104 106 103 107 89 99 102 108 101 109 116 115 117 114 118 100 110 113 119 112 120 111 121
so what is the [z,index] in python ?
Do you need to return the index? If you don't, you could use:
z = abs(z)
new_list = sorted(map(abs, z))
index = sorted(range(len(z)), key=lambda k: z[k])
where x is the output and z is the list.
EDIT:
Try that now

Pandas pivot table with mean

I have a pandas data frame, df, that looks like this;
index New Old MAP Limit count
1 93 35 54 > 18 1
2 163 93 116 > 18 1
3 134 78 96 > 18 1
4 117 81 93 > 18 1
5 194 108 136 > 18 1
6 125 57 79 <= 18 1
7 66 39 48 > 18 1
8 120 83 95 > 18 1
9 150 98 115 > 18 1
10 149 99 115 > 18 1
11 148 85 106 > 18 1
12 92 55 67 <= 18 1
13 64 24 37 > 18 1
14 84 53 63 > 18 1
15 99 70 79 > 18 1
I need to create a pivot table that looks like this
Limit <=18 >18
New xx1 xx2
Old xx3 xx4
MAP xx5 xx6
where values xx1, xx2, xx3, xx4, xx5, and xx6 are the mean of New, Old and Map for respective Limit.
How can I achieve this?
I tried the following without success.
table = df.pivot_table('count', index=['New', 'Old', 'MAP'], columns=['Limit'], aggfunc='mean')
Solution
df.groupby('Limit')['New', 'Old', 'MAP'].mean().T
Limit <= 18 > 18
New 108.5 121.615385
Old 56.0 72.769231
MAP 73.0 88.692308

Transpose .csv File: changing Header Time Stamps to Line TimeStamp

My Data Looks Like this:
statnr datum ele h01 h02 h03 h04 h05 h06 h07 h08 h09 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23 h24
----------- ----------- --- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
20101 20020401 D6K 103 126 115 114 105 101 118 118 130 129 126 128 132 133 131 130 130 131 130 130 125 117 122 124
20101 20020402 D6K 126 118 119 120 114 111 107 119 124 126 122 130 130 130 128 128 126 119 129 134 132 127 112 118
........
20101 20150909 D6K 72 82 75 76 82 93 91 96 99 101 108 108 103 100 94 90 82 92 88 79 77 89 94 92
20101 20020401 FLP 54 61 58 61 66 67 65 56 47 46 40 40 39 32 34 34 37 43 45 45 50 54 59 63
20101 20020402 FLP 64 61 67 66 68 69 67 56 50 46 42 39 33 32 33 34 39 48 55 58 61 62 65 68
........
20101 20150909 FLP 93 95 92 94 94 96 95 92 90 84 87 75 81 75 75 74 83 87 89 96 94 92 91 94
20101 20070906 GSE 32700 0 0 0 0 0 3 10 17 30 28 27 37 44 37 25 16 5 1 0 0 0 0 0
20101 20070907 GSE 0 0 0 0 0 0 11 48 72 107 257 264 290 216 255 178 122 57 6 0 0 0 0 0
........
20101 20150909 GSE 0 0 0 0 0 1 17 51 71 118 82 200 116 130 142 156 48 15 1 0 0 0 0 0
20101 20020101 SUV 0 0 0 0 0 0 0 0 9 10 10 10 10 10 10 10 2 0 0 0 0 0 0 0
........
20101 20150909 SUV 0 0 0 0 0 0 0 0 0 1 0 5 1 4 4 9 2 0 0 0 0 0 0 0
20101 20020401 TEX 30 18 21 18 9 10 18 42 69 91 114 117 126 135 133 127 114 87 58 47 39 33 27 24
........
20101 20150909 TEX 50 46 48 50 50 49 57 67 77 85 80 111 95 100 101 92 74 67 59 53 49 49 49 47
20101 20020401 QVX 6 10 9 8 13 25 19 15 16 19 24 24 19 23 24 22 24 23 19 13 12 16 16 18
........
20101 20150909 QVX 40 42 37 34 30 34 22 22 27 31 26 28 37 38 42 43 52 54 59 81 80 69 78 60
as you can see it is a huge sheet with a statnr Row, DateRow, ele stands for the parameter and than h01 - h24 are as you can imagine the hours.
I need to adjust the format from that Sheet to the Format of the other Files I'm working with (Plotting and processing reasons)
I'm currently trying to bring this FileSheet into this Format:
Date Time D6K FLP GSE SUV TEX QVX
01.04.2002 01:00 103 54 0 30 6
.....
09.09.2015 23:59 92 94 0 0 47 60
So what I'm trying to do is:
1) Get rid of row[0] (statnr)
2) Switch the Header with Row[2] so that all parameters are in the header and link them to the new Time Date fmt in the lines
3) Convert the time fmt from %H%M%D to %D%M&Y %H:%M
Since I'm new to python and coding I thought I'd ask if there's maybe a package out there that deals with that kind of Problem, and if there's a term for that Problem in general (switching header with lines) --> thanks (Peter Wood) I switched the Title to Transpose
Thanks for suggestions
For Clarification:
the ........ indicates that I left some rows out
the ----------- are in the file
Because you may have missing data, this isn't a simple case of transposing blocks. I think what you need to do is read the input file into a data structure from which you can then look up the values as required to generate your output. In Python you can use a dictionary whose key is a tuple of your element type, date, and hour:
mydict = {}
with open('F:\myfile.txt') as f:
z = f.readline() # discard headings
z = f.readline() # discard row of dashes
for line in f:
fields = line.split()
date = fields[1]
ele = fields[2]
for hour, value in enumerate(fields[3:27]):
mydict[(ele, date, hour)] = value
Now you have all the data in a big dictionary that's addressable by ele, date and hour. I'm going to guess that the ele values are fixed and you can hardcode them, but you'll want to build a list of the unique dates you actually found in the input file, and put them in ascending order:
dateset=set()
for k in mydict.keys():
dateset.add(k[1])
dates=list(dateset)
dates.sort()
Now you're ready to build your output file.
for date in dates:
for hour in range(24):
output = date + '\t' + hour
for ele in ['D6K', 'FLP', 'GSE', 'SUV', 'TEX', 'QVX']:
output = output + '\t' + mydict.get((ele, date, hour), '')
print(output)
Using the get method on the dictionary allows you to specify a default value to be returned if the key you supplied isn't in the dictionary.
I haven't dealt with the date formatting (note that 'hour' ranges from 0 to 23), or writing the output to a file, but the above should get you going.

Categories

Resources