Pandas apply function on each row with a condition

Pandas apply function on each row with a condition - python

I have a pandas dataframe column, code that is of type int. I would like to extract the last 5 digits for rows where length of this field is > 5. Sample data and my attempt below:
df['Code']
144 602000
145 602000
146 602000
147 602000
148 602000
...
571 84410
572 84410
573 84410
574 84410
575 684410
df['Code5'] = df['Code'].apply(lambda row: row['Code'].astype(str).str[-5:] if len(row['Code']) > 5 else row['Code'])
Error:
TypeError: 'int' object is not subscriptable

Try:
df['Code5'] = df['Code'].astype(str).str[-5:]
>>> df
Code Code5
144 602000 02000
145 602000 02000
146 602000 02000
147 602000 02000
148 602000 02000
571 84410 84410
572 84410 84410
573 84410 84410
574 84410 84410
575 684410 84410
999 1234 1234 # <- I added this sample
Another option is to fill value with less than 5 digits:
>>> df['Code'].astype(str).str[-5:].str.zfill(5)
144 02000
145 02000
146 02000
147 02000
148 02000
571 84410
572 84410
573 84410
574 84410
575 84410
999 01234 # <- padded with fillchar '0'
Name: Code, dtype: object

Related

sort pivot/dataframe without All row pandas/python

I created a dataframe with the help of a pivot, and I have:
name x y z All
A 155 202 218 575
C 206 149 45 400
B 368 215 275 858
Total 729 566 538 1833
I would like sort by column "All" not taking into account row "Total". i am using:
df.sort_values(by = ["All"], ascending = False)
Thank you in advance!

If the Total row is the last one, you can sort other rows and then concat the last row:
df = pd.concat([df.iloc[:-1, :].sort_values(by="All"), df.iloc[-1:, :]])
print(df)
Prints:
name x y z All
C 206 149 45 400
A 155 202 218 575
B 368 215 275 858
Total 729 566 538 1833

You can try with the following, although it has a FutureWarning you should be careful of:
df = df.iloc[:-1,:].sort_values('All',ascending=False).append(df.iloc[-1,:])
This outputs:
name x y z All
2 B 368 215 275 858
0 A 155 202 218 575
1 C 206 149 45 400
3 Total 729 566 538 1833

You can get the sorted order without Total (assuming here the last row), then index by position:
import numpy as np
idx = np.argsort(df['All'].iloc[:-1])
df2 = df.iloc[np.r_[idx[::-1], len(df)-1]]
NB. as we are sorting only an indexer here this should be very fast
output:
name x y z All
2 B 368 215 275 858
0 A 155 202 218 575
1 C 206 149 45 400
3 Total 729 566 538 1833

you can just ignore the last column
df.iloc[:-1].sort_values(by = ["All"], ascending = False)

math operation in list of list

I have created 1000 lists containing some values. I wanted to have a math operation on elements of each list and saving in another list that I want to plot. Each list has the shape like below and I want to subtract the ith element from the i-1th element(distance between two consecutive elements)
[[ 4 29 73 111 130 140 167 231 248 267 284 298 320 333
379 404 421 433 475 510 523 534 544 558 575 602 617 630
661 672 685 698 711 731 742 764 780 828 842 854 874 885
903 916 944 961 985 996 1013 1032 1054 1064 1077 1109 1122 1138
1205 1233 1249 1282 1299 1311 1326 1337 1372 1409 1426 1437 1511 1549
1578 1591 1604 1646]]
I have written the code below but it does not work and I got the error of index out of range.
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import find_peaks
Cases = [f for f in sorted(os.listdir('.')) if f.startswith('config')]
plt.rcParams.update({'font.size': 14})
maxnum = np.max([int(os.path.splitext(f)[0].split('_')[1]) for f in CASES])
CASES = ['configuration_%d.out' % i for i in range(maxnum)]
gg = []
my_l_h = []
for i, d in enumerate(CASES):
a = np.loadtxt(d).T
x = a[3]
peaks, _ = find_peaks(x, distance=10)
gg = [peaks]
L_h = np.array(gg)
for numbers in gg:
jp = L_h[:,i]-L_h[:,i-1]
my_l_h.append(jp)
print(my_l_h)
t = np.arange(0,len(my_l_h)
plt.plot(t,my_l_h)
plt.show()

ValueError: Axes instance argument was not found in a figure, Question with same name has no answer

I am trying to create a seaborn Facetgrid to plot the normality distribution of all columns in my dataFrame decathlon. The data looks as such:
P100m Plj Psp Phj P400m P110h Ppv Pdt Pjt P1500
0 938 1061 773 859 896 911 880 732 757 752
1 839 975 870 749 887 878 880 823 863 741
2 814 866 841 887 921 939 819 778 884 691
3 872 898 789 878 848 879 790 790 861 804
4 892 913 742 803 816 869 1004 789 854 699
... ... ... ... ... ... ... ... ... ...
7963 755 760 604 714 812 794 482 571 539 780
7964 830 845 524 767 786 783 601 573 562 535
7965 819 804 653 840 791 699 659 461 448 632
7966 804 720 539 758 830 782 731 487 425 729
7967 687 809 692 714 565 741 804 527 738 523
I am relatively new to python and I can't understand my error. My attempt to format the data and create the grid is as such:
import seaborn as sns
df_stacked = decathlon.stack().reset_index(1).rename({'level_1': 'column', 0: 'values'}, axis=1)
g = sns.FacetGrid(df_stacked, row = 'column')
g = g.map(plt.hist, "values")
However I recieve the following error:
ValueError: Axes instance argument was not found in a figure
Can anyone explain what exactly this error means and how I would go about fixing it?
EDIT
df_stacked looks as such:
column values
0 P100m 938
0 Plj 1061
0 Psp 773
0 Phj 859
0 P400m 896
... ...
7967 P110h 741
7967 Ppv 804
7967 Pdt 527
7967 Pjt 738
7967 P1500 523

I encountered this similar issue when running a Jupyter Notebook.
My solution involved:
Restart the notebook
Re-run the imports %matplotlib inline; import matplotlib.pyplot as plt

As you did not post a full working example its a bit of guessing.
What might go wrong is in the line where you have g = g.map(plt.hist, "values") because the error comes from deep within matplotlib. You can see this here in this SO question where its another function pylab.sca(axes[i]) outside matplotlib due to not being in that module available, is being triggered by matplotlib.
Likely you installed/updated something in your (conda?) environment (changes in environment paths?) and after the next reboot it was found.
I also wonder how you come up with plt.hist ... fully typed it should resemble matplotlib.pyplot.hist ... but guessing... (waiting for your updated example code).

Python print string alignment

I am printing some values in a loop in Python. My current output is as follows:
0 Data Count: 249 7348 249 4469 2768 261 20 126
1 Data Count: 288 11 288 48 2284 598 137 408
2 Data Count: 808 999 808 2896 32739 138 202 678
3 Data Count: 140 26 140 2688 8054 884 433 987
What I'd like is for all values in each column to align, despite differing character/number counts in some, to make it easier to read.
The pseudo code behind this is as follows:
for i in range(0,3):
print i, " Data Count: ", Count_A, " ", Count_B, " ", Count_C, " ", Count_D, " ", Count_E, " ", Count_F, " ", Count_G, " ", Count_H
Thanks in advance everyone!

You could use format string justification:
from random import randint
for i in range(5):
data = [randint(0, 1000) for j in range(5)]
print("{:5} {:5} {:5} {:5}".format(*data))
output:
92 460 72 630
837 214 118 677
906 328 102 320
895 998 177 922
651 742 215 938
According to the format specification from Python docs

With the % string formatting operator, the minimum width of output is specified in a placeholder as a number before the data type (the full format of a placeholder is %[key][flags][width][.precision][length type]conversion type). If the result is shorter, it will be left-padded to the specified length:
from random import randint
for i in range(5):
data = [randint(0, 1000) for j in range(5)]
print("%5d %5d %5d %5d %5d" % tuple(data))
gives:
946 937 544 636 871
232 860 704 877 716
868 849 851 488 739
419 381 695 909 518
570 756 467 351 537
(code adapted from #andreihondrari's answer)

Removing a recurrant regular expression in a string - Python

I have the following collection of items. I would like to add a comma followed by a space at the end of each item so I can create a list out of them. I am assuming the best way to do this is to form a string out of the items and then replace 3 spaces between each item with a comma, using regular expressions?
I would like to do this with python, which I am new to.
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463

Instead of a regular expression, how about this (assuming you have it in a file somewhere):
items = open('your_file.txt').read().split()
If it's just in a string variable:
items = your_input.split()
To combine them again with a comma in between:
print ', '.join(items)

data = """179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281 """
To get the list out of it:
lst = re.findall("(\d+)", data)
print lst
To add comma after each item, replace multiple spaces with , and space.
data = re.sub("[ ]+", ", ", data)
print data

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas apply function on each row with a condition - python

Related

sort pivot/dataframe without All row pandas/python

math operation in list of list

ValueError: Axes instance argument was not found in a figure, Question with same name has no answer

Python print string alignment

Removing a recurrant regular expression in a string - Python

Categories

Resources