Last cell in a column dataframe from excel using pandas - python

I just had a quick question. How would one go about getting the last cell value of an excel spreadsheet when working with it as a dataframe using pandas, for every single different column. I'm having quite some difficulty with this, I know the index can be found with len(), but I can't quite wrap my finger around it. Thank you any help would be greatly appreciated.

If you want the last cell of a dataframe meaning the most bottom right cell, then you can use .iloc:
df = pd.DataFrame(np.arange(1,101).reshape((10,-1)))
df
Output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10
1 11 12 13 14 15 16 17 18 19 20
2 21 22 23 24 25 26 27 28 29 30
3 31 32 33 34 35 36 37 38 39 40
4 41 42 43 44 45 46 47 48 49 50
5 51 52 53 54 55 56 57 58 59 60
6 61 62 63 64 65 66 67 68 69 70
7 71 72 73 74 75 76 77 78 79 80
8 81 82 83 84 85 86 87 88 89 90
9 91 92 93 94 95 96 97 98 99 100
Use .iloc with -1 index selection on both rows and columns.
df.iloc[-1,-1]
Output:
100

DataFrame.head(n) gets the top n results from the dataframe. DataFrame.tail(n) gets the bottom n results from the dataframe.
If your dataframe is named df, you could use df.tail(1) to get the last row of the dataframe. The returned value is also a dataframe.

Related

Place data from a Pandas DF into a Grid or Template

I have process where the end product is a Pandas DF where the output, which is variable in terms of data and length, is structured like this example of the output.
9 80340796
10 80340797
11 80340798
12 80340799
13 80340800
14 80340801
15 80340802
16 80340803
17 80340804
18 80340805
19 80340806
20 80340807
21 80340808
22 80340809
23 80340810
24 80340811
25 80340812
26 80340813
27 80340814
28 80340815
29 80340816
30 80340817
31 80340818
32 80340819
33 80340820
34 80340821
35 80340822
36 80340823
37 80340824
38 80340825
39 80340826
40 80340827
41 80340828
42 80340829
43 80340830
44 80340831
45 80340832
46 80340833
I need to get the numbers in the second column above, into the following grid format based on the numbers in the first column above.
1 2 3 4 5 6 7 8 9 10 11 12
A 1 9 17 25 33 41 49 57 65 73 81 89
B 2 10 18 26 34 42 50 58 66 74 82 90
C 3 11 19 27 35 43 51 59 67 75 83 91
D 4 12 20 28 36 44 52 60 68 76 84 92
E 5 13 21 29 37 45 53 61 69 77 85 93
F 6 14 22 30 38 46 54 62 70 78 86 94
G 7 15 23 31 39 47 55 63 71 79 87 95
H 8 16 24 32 40 48 56 64 72 80 88 96
So the end result in this example would be
Any advice on how to go about this would be much appreciated. I've been asked for this by a colleague, so the data is easy to read for their team (as it matches the layout of a physical test) but I have no idea how to produce it.
pandas pivot table, can do what you want in your question, but first you have to create 2 auxillary columns, 1 determing which column the value has to go in, another which row it is. You can get that as shown in the following example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'num': list(range(9, 28)), 'val': list(range(80001, 80020))})
max_rows = 8
df['row'] = (df['num']-1)%8
df['col'] = np.ceil(df['num']/8).astype(int)
df.pivot_table(values=['val'], columns=['col'], index=['row'])
val
col 2 3 4
row
0 80001.0 80009.0 80017.0
1 80002.0 80010.0 80018.0
2 80003.0 80011.0 80019.0
3 80004.0 80012.0 NaN
4 80005.0 80013.0 NaN
5 80006.0 80014.0 NaN
6 80007.0 80015.0 NaN
7 80008.0 80016.0 NaN

How to create 1-100 in 10 rows? [duplicate]

This question already has answers here:
How to right-align numeric data?
(5 answers)
Closed 5 years ago.
I'm trying to get the following exercise:
"Write a program containing a pair of neste while loops that displays the integer values 1-100, ten numbers per row, with the columns alignes as below.
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
So far I've come up with this:
lijst = list(range(1, 101))
i = 0
while i < 100:
print(lijst[i],"\t", end=" ".format(">"))
i = i+1
if i % 10 == 0:
print("")
Although it produces the things I need, the tabs aren't working. whenever I try to add spaces instead of a tab, things move way too much on the second and further rows.
Furthermore I can't seem to find out why the .format(">") doesn't work. I've tried to apply .format(">3") but that didn't do anything at all.
You can use the {:>5d} format style to right align integers 5 spaces
lijst = list(range(1, 101))
i = 0
while i < 100:
print("{:>5d}".format(lijst[i]), end=" ")
i = i+1
if i % 10 == 0:
print("")
Output:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100

How to select all rows which contain values greater than a threshold?

The request is simple: I want to select all rows which contain a value greater than a threshold.
If I do it like this:
df[(df > threshold)]
I get these rows, but values below that threshold are simply NaN. How do I avoid selecting these rows?
There is absolutely no need for the double transposition - you can simply call any along the column index (supplying 1 or 'columns') on your Boolean matrix.
df[(df > threshold).any(1)]
Example
>>> df = pd.DataFrame(np.random.randint(0, 100, 50).reshape(5, 10))
>>> df
0 1 2 3 4 5 6 7 8 9
0 45 53 89 63 62 96 29 56 42 6
1 0 74 41 97 45 46 38 39 0 49
2 37 2 55 68 16 14 93 14 71 84
3 67 45 79 75 27 94 46 43 7 40
4 61 65 73 60 67 83 32 77 33 96
>>> df[(df > 95).any(1)]
0 1 2 3 4 5 6 7 8 9
0 45 53 89 63 62 96 29 56 42 6
1 0 74 41 97 45 46 38 39 0 49
4 61 65 73 60 67 83 32 77 33 96
Transposing as your self-answer does is just an unnecessary performance hit.
df = pd.DataFrame(np.random.randint(0, 100, 10**8).reshape(10**4, 10**4))
# standard way
%timeit df[(df > 95).any(1)]
1 loop, best of 3: 8.48 s per loop
# transposing
%timeit df[df.T[(df.T > 95)].any()]
1 loop, best of 3: 13 s per loop
This is actually very simple:
df[df.T[(df.T > 0.33)].any()]

How to print directly above another print after a series of subsecuent prints but without leaving a blank space in python 2?

Thanks beforehand.
I'm currently trying to get some values from an array in the format that another program requieres as input. I'm iterating over i rows and j columns since I need the value of i directly followed by the value of array[i,j] (if non-zero and i different than j) printed directly on the same line for each value on the first dimension. I also need to jump to the next line only for a new value of i. I've achieved it with a normal jump line "\n", but it leaves a blank line and I need the next line to be directly under the previous with no blank line. I know I could easily fix this in bash but I'd like to know a method to do it in python.
This is what I'm trying and the result:
import numpy as np
z=np.arange(100).reshape(10,10)
z[5,4]=0
print z
for i in xrange(1,10,1):
for j in xrange(1,10,1):
if not (i==j):
if not z[i,j]==0:
print j, z[i,j],
print "\n"
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 0 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]
2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19
1 21 3 23 4 24 5 25 6 26 7 27 8 28 9 29
1 31 2 32 4 34 5 35 6 36 7 37 8 38 9 39
1 41 2 42 3 43 5 45 6 46 7 47 8 48 9 49
1 51 2 52 3 53 6 56 7 57 8 58 9 59
1 61 2 62 3 63 4 64 5 65 7 67 8 68 9 69
1 71 2 72 3 73 4 74 5 75 6 76 8 78 9 79
1 81 2 82 3 83 4 84 5 85 6 86 7 87 9 89
1 91 2 92 3 93 4 94 5 95 6 96 7 97 8 98
A call to print automatically adds a newline at the end of what it prints. You can suppress the newline by adding a comma at the end as you have done. However in your call to print '\n' at the end of the loop, you are adding two newlines because print adds a newline to the end of your '\n'. Either end this print statement with a comma or print the empty string, either will work:
import numpy as np
z=np.arange(100).reshape(10,10)
z[5,4]=0
print z
for i in xrange(1,10,1):
for j in xrange(1,10,1):
if not (i==j):
if not z[i,j]==0:
print j, z[i,j],
print "" # automatically adds newline to end of empty string.
# print "\n", # <---- could use this alternatively. Note the comma at the end

Find column with the highest value (pandas)

I have a Pandas dataframe with several columns that range from 0 to 100. I would like to add a column on to the dataframe that contains the name of the column from among these that has the greatest value for each row. So:
one two three four COLUMN_I_WANT_TO_CREATE
5 40 12 19 two
90 15 58 23 one
74 95 34 12 two
44 81 22 97 four
10 59 59 44 [either two or three, selected randomly]
etc.
Bonus points if the solution can resolve ties randomly.
You can use idxmax with parameter axis=1:
print df
one two three four
0 5 40 12 19
1 90 15 58 23
2 74 95 34 12
3 44 81 22 97
df['COLUMN_I_WANT_TO_CREATE'] = df.idxmax(axis=1)
print df
one two three four COLUMN_I_WANT_TO_CREATE
0 5 40 12 19 two
1 90 15 58 23 one
2 74 95 34 12 two
3 44 81 22 97 four
With random duplicity max values is it more complicated.
You can first find all max values by x[(x == x.max())]. Then you need index values, where apply sample. But it works only with Series, so index is converted to
Series by to_series. Last you can select only first value of Serie by iloc:
print df
one two three four
0 5 40 12 19
1 90 15 58 23
2 74 95 34 12
3 44 81 22 97
4 10 59 59 44
5 59 59 59 59
6 10 59 59 59
7 59 59 59 59
#first run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
one two three four COL
0 5 40 12 19 two
1 90 15 58 23 one
2 74 95 34 12 two
3 44 81 22 97 four
4 10 59 59 44 three
5 59 59 59 59 one
6 10 59 59 59 two
7 59 59 59 59 three
#one of next run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
one two three four COL
0 5 40 12 19 two
1 90 15 58 23 one
2 74 95 34 12 two
3 44 81 22 97 four
4 10 59 59 44 two
5 59 59 59 59 one
6 10 59 59 59 three
7 59 59 59 59 four

Categories

Resources