Delete duplicate values in a row within a Pandas DataFrame (Python) - python
what is the expression to remove duplicate-values in any row within a pandas dataframe as follows....(note: first column is the index (date), followed by four columns of data).
1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-18 500 500 500 500 <-- duplicate values,
1983-02-21 505 505 496 496
Delete row of duplicate values, end up with this...
1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-21 505 505 496 496
Could only find how to do this by columns, not rows....Many thanks in advance,
Peter
A slightly more elegant/dynamic (but perhaps less performant version):
In [11]: msk = df1.apply(lambda col: df[1] != col).any(axis=1)
Out[11]:
0 True
1 True
2 False
3 True
dtype: bool
In [12]: msk.index = df1.index # iloc doesn't support masking
In [13]: df1.loc[msk]
Out[13]:
1 2 3 4
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-21 505 505 496 496
import pandas as pd
import io
content = '''\
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-18 500 500 500 500
1983-02-21 505 505 496 496'''
df = pd.read_table(io.BytesIO(content), parse_dates=[0], header=None, sep='\s+',
index_col=0)
index = (df[1] == df[2]) & (df[1] == df[3]) & (df[1] == df[4])
df = df.ix[~index]
print(df)
yields
1 2 3 4
0
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-21 505 505 496 496
df.ix can be used to select rows. df = df.ix[~index] selects all rows where index is False.
Related
sort pivot/dataframe without All row pandas/python
I created a dataframe with the help of a pivot, and I have: name x y z All A 155 202 218 575 C 206 149 45 400 B 368 215 275 858 Total 729 566 538 1833 I would like sort by column "All" not taking into account row "Total". i am using: df.sort_values(by = ["All"], ascending = False) Thank you in advance!
If the Total row is the last one, you can sort other rows and then concat the last row: df = pd.concat([df.iloc[:-1, :].sort_values(by="All"), df.iloc[-1:, :]]) print(df) Prints: name x y z All C 206 149 45 400 A 155 202 218 575 B 368 215 275 858 Total 729 566 538 1833
You can try with the following, although it has a FutureWarning you should be careful of: df = df.iloc[:-1,:].sort_values('All',ascending=False).append(df.iloc[-1,:]) This outputs: name x y z All 2 B 368 215 275 858 0 A 155 202 218 575 1 C 206 149 45 400 3 Total 729 566 538 1833
You can get the sorted order without Total (assuming here the last row), then index by position: import numpy as np idx = np.argsort(df['All'].iloc[:-1]) df2 = df.iloc[np.r_[idx[::-1], len(df)-1]] NB. as we are sorting only an indexer here this should be very fast output: name x y z All 2 B 368 215 275 858 0 A 155 202 218 575 1 C 206 149 45 400 3 Total 729 566 538 1833
you can just ignore the last column df.iloc[:-1].sort_values(by = ["All"], ascending = False)
math operation in list of list
I have created 1000 lists containing some values. I wanted to have a math operation on elements of each list and saving in another list that I want to plot. Each list has the shape like below and I want to subtract the ith element from the i-1th element(distance between two consecutive elements) [[ 4 29 73 111 130 140 167 231 248 267 284 298 320 333 379 404 421 433 475 510 523 534 544 558 575 602 617 630 661 672 685 698 711 731 742 764 780 828 842 854 874 885 903 916 944 961 985 996 1013 1032 1054 1064 1077 1109 1122 1138 1205 1233 1249 1282 1299 1311 1326 1337 1372 1409 1426 1437 1511 1549 1578 1591 1604 1646]] I have written the code below but it does not work and I got the error of index out of range. import numpy as np import os import matplotlib.pyplot as plt import pandas as pd from scipy.signal import find_peaks Cases = [f for f in sorted(os.listdir('.')) if f.startswith('config')] plt.rcParams.update({'font.size': 14}) maxnum = np.max([int(os.path.splitext(f)[0].split('_')[1]) for f in CASES]) CASES = ['configuration_%d.out' % i for i in range(maxnum)] gg = [] my_l_h = [] for i, d in enumerate(CASES): a = np.loadtxt(d).T x = a[3] peaks, _ = find_peaks(x, distance=10) gg = [peaks] L_h = np.array(gg) for numbers in gg: jp = L_h[:,i]-L_h[:,i-1] my_l_h.append(jp) print(my_l_h) t = np.arange(0,len(my_l_h) plt.plot(t,my_l_h) plt.show()
ValueError: Axes instance argument was not found in a figure, Question with same name has no answer
I am trying to create a seaborn Facetgrid to plot the normality distribution of all columns in my dataFrame decathlon. The data looks as such: P100m Plj Psp Phj P400m P110h Ppv Pdt Pjt P1500 0 938 1061 773 859 896 911 880 732 757 752 1 839 975 870 749 887 878 880 823 863 741 2 814 866 841 887 921 939 819 778 884 691 3 872 898 789 878 848 879 790 790 861 804 4 892 913 742 803 816 869 1004 789 854 699 ... ... ... ... ... ... ... ... ... ... 7963 755 760 604 714 812 794 482 571 539 780 7964 830 845 524 767 786 783 601 573 562 535 7965 819 804 653 840 791 699 659 461 448 632 7966 804 720 539 758 830 782 731 487 425 729 7967 687 809 692 714 565 741 804 527 738 523 I am relatively new to python and I can't understand my error. My attempt to format the data and create the grid is as such: import seaborn as sns df_stacked = decathlon.stack().reset_index(1).rename({'level_1': 'column', 0: 'values'}, axis=1) g = sns.FacetGrid(df_stacked, row = 'column') g = g.map(plt.hist, "values") However I recieve the following error: ValueError: Axes instance argument was not found in a figure Can anyone explain what exactly this error means and how I would go about fixing it? EDIT df_stacked looks as such: column values 0 P100m 938 0 Plj 1061 0 Psp 773 0 Phj 859 0 P400m 896 ... ... 7967 P110h 741 7967 Ppv 804 7967 Pdt 527 7967 Pjt 738 7967 P1500 523
I encountered this similar issue when running a Jupyter Notebook. My solution involved: Restart the notebook Re-run the imports %matplotlib inline; import matplotlib.pyplot as plt
As you did not post a full working example its a bit of guessing. What might go wrong is in the line where you have g = g.map(plt.hist, "values") because the error comes from deep within matplotlib. You can see this here in this SO question where its another function pylab.sca(axes[i]) outside matplotlib due to not being in that module available, is being triggered by matplotlib. Likely you installed/updated something in your (conda?) environment (changes in environment paths?) and after the next reboot it was found. I also wonder how you come up with plt.hist ... fully typed it should resemble matplotlib.pyplot.hist ... but guessing... (waiting for your updated example code).
How can I get product of last 12 months from the current row using pandas
I wanted the product of last 12 months data from the current row. Date Open 21/06/11 839.9 22/06/11 853.35 23/06/11 846.55 24/06/11 874.15 27/06/11 866.7 28/06/11 878.9 29/06/11 875.7 30/06/11 888.7 01/07/11 907 04/07/11 874.4 05/07/11 869.3 06/07/11 848.85 07/07/11 858 08/07/11 873 11/07/11 854 12/07/11 847.5 13/07/11 853.05 14/07/11 863.3 15/07/11 867.7 18/07/11 871.9 19/07/11 867.5 20/07/11 886 21/07/11 875.95 22/07/11 866 25/07/11 892 26/07/11 888.25 27/07/11 875 28/07/11 855 29/07/11 840 01/08/11 838 02/08/11 827.55 03/08/11 826.75 04/08/11 828 05/08/11 799.5 08/08/11 776.7 09/08/11 753 10/08/11 785.35 11/08/11 768.35 12/08/11 783 16/08/11 760 17/08/11 760.5 18/08/11 757.7 19/08/11 731.05 22/08/11 731 23/08/11 760.35 24/08/11 764 25/08/11 761.6 26/08/11 751 29/08/11 731.1 30/08/11 765 02/09/11 796.7 05/09/11 794.5 06/09/11 783.2 07/09/11 824 08/09/11 833.5 09/09/11 852.15 12/09/11 810.35 13/09/11 813.2 14/09/11 813.9 15/09/11 833 16/09/11 850 19/09/11 825 20/09/11 823 21/09/11 850.9 22/09/11 823.95 23/09/11 773.9 26/09/11 769.2 27/09/11 774 28/09/11 799.75 29/09/11 790.5 30/09/11 803.5 03/10/11 791.2 04/10/11 784 05/10/11 772.55 07/10/11 786.7 10/10/11 804.25 11/10/11 835 12/10/11 829.4 13/10/11 850 14/10/11 842 17/10/11 867 18/10/11 825 19/10/11 825.5 20/10/11 834.85 21/10/11 840 24/10/11 848 25/10/11 855 26/10/11 879 28/10/11 899.7 31/10/11 898 01/11/11 870.5 02/11/11 855 03/11/11 867.75 04/11/11 905 08/11/11 879 09/11/11 890.05 11/11/11 859 14/11/11 891.4 15/11/11 871 16/11/11 859.1 17/11/11 845.05 18/11/11 800.3 21/11/11 800 22/11/11 788.1 23/11/11 789.9 24/11/11 775 25/11/11 769.7 28/11/11 765 29/11/11 782 30/11/11 756.7 01/12/11 799 02/12/11 797 05/12/11 808.35 07/12/11 807 08/12/11 802 09/12/11 769.9 12/12/11 760.55 13/12/11 723.9 14/12/11 738 15/12/11 731.9 16/12/11 749 19/12/11 719.2 20/12/11 741.7 21/12/11 727 22/12/11 741.35 23/12/11 760 26/12/11 747.05 27/12/11 766 28/12/11 757.7 29/12/11 733.65 30/12/11 713 02/01/12 696.8 03/01/12 712.25 04/01/12 727.4 05/01/12 715 06/01/12 697.05 07/01/12 716.7 09/01/12 714.45 10/01/12 712 11/01/12 737.9 12/01/12 747.5 13/01/12 742 16/01/12 729.95 17/01/12 716 18/01/12 762 19/01/12 789 20/01/12 790 23/01/12 755.3 24/01/12 774.6 25/01/12 788.7 27/01/12 800 30/01/12 813.9 31/01/12 804.5 01/02/12 818.9 02/02/12 835 03/02/12 830 06/02/12 845.9 07/02/12 842 08/02/12 847 09/02/12 856.75 10/02/12 850.35 13/02/12 841.1 14/02/12 846.9 15/02/12 854.2 16/02/12 831 17/02/12 822.05 21/02/12 817.5 22/02/12 848 23/02/12 832 24/02/12 833.5 27/02/12 821.8 28/02/12 789.05 29/02/12 805.05 01/03/12 811.8 02/03/12 816.25 03/03/12 811 05/03/12 812.05 06/03/12 797 07/03/12 776.55 09/03/12 775.3 12/03/12 790 13/03/12 803.45 14/03/12 828 15/03/12 818 16/03/12 780 19/03/12 781 20/03/12 756.1 21/03/12 760 22/03/12 765.9 23/03/12 743.8 26/03/12 743.9 27/03/12 738 28/03/12 730 29/03/12 718 30/03/12 729.5 02/04/12 749.35 03/04/12 744.25 04/04/12 745 09/04/12 740.05 10/04/12 746 11/04/12 739 12/04/12 733.3 13/04/12 746.05 16/04/12 747.1 17/04/12 754.8 18/04/12 750 19/04/12 753.9 20/04/12 740.05 23/04/12 725.85 24/04/12 739 25/04/12 734.1 26/04/12 737.1 27/04/12 741.3 28/04/12 739.8 30/04/12 737.5 02/05/12 747.9 03/05/12 738.5 04/05/12 733.4 07/05/12 715 08/05/12 718 09/05/12 702 10/05/12 697.25 11/05/12 693 14/05/12 698 15/05/12 679 16/05/12 675 17/05/12 680.25 18/05/12 676.9 21/05/12 686.5 22/05/12 704.6 23/05/12 685.2 24/05/12 694 25/05/12 695 28/05/12 692 29/05/12 702.2 30/05/12 699.65 31/05/12 697 01/06/12 707.35 04/06/12 677 05/06/12 696 06/06/12 704.45 07/06/12 721.05 08/06/12 718 11/06/12 732.7 12/06/12 715 13/06/12 722.25 14/06/12 716 15/06/12 718.5 18/06/12 730.35 19/06/12 717 20/06/12 738 21/06/12 734 22/06/12 713.55 25/06/12 714.2 26/06/12 717.5 27/06/12 726.4 28/06/12 724.4 29/06/12 725.1 02/07/12 735.5 03/07/12 739.95 04/07/12 740 05/07/12 734.95 06/07/12 738 09/07/12 729 10/07/12 731.45 11/07/12 733.45 12/07/12 721.9 13/07/12 720 16/07/12 720 17/07/12 724.8 18/07/12 718 19/07/12 720.2 20/07/12 722.3 23/07/12 715 24/07/12 721 25/07/12 720.4 26/07/12 720.9 27/07/12 719 30/07/12 723 31/07/12 731.6 01/08/12 740.25 02/08/12 742.1 03/08/12 735 06/08/12 748.05 07/08/12 786.05 08/08/12 785.05 09/08/12 788.9 10/08/12 777.65 13/08/12 779.5 14/08/12 787.9 16/08/12 802.05 17/08/12 817.9 21/08/12 816 22/08/12 809.2 23/08/12 810.55 24/08/12 791.75 27/08/12 786 28/08/12 786.85 29/08/12 791 30/08/12 779.75 31/08/12 780 03/09/12 768 04/09/12 763.95 05/09/12 775.25 06/09/12 766.3 07/09/12 778.7 08/09/12 793.5 10/09/12 800 11/09/12 789.5 12/09/12 793.5 13/09/12 798.1 14/09/12 813 17/09/12 848.1 18/09/12 870.2 I tried using something on these lines but did not find a solution: df['val']= df['Open'].last('12M').transform('prod') How can I get the result?
If you just need product of last 12 months' value for df['Open'] then you could do something like this: import pandas as pd df['Date'] = pd.to_datetime(df['Date']) df.set_index(['Date'], inplace=True) df.sort_index(inplace=True) df.tail(12).prod() which gives you Open 2.843636e+34 dtype: float64
I think you can adapt the following example to get what you need: # example with 7 days import pandas as pd dates = pd.date_range('1/1/2018', periods=7, freq='d') values = [4,3,7,5,3,2,3] df = pd.DataFrame({'col1':values}, index=dates) # get product of last 2 days df['col1'].last('2d').prod()
How to display a sequence of numbers in column-major order?
Program description: Find all the prime numbers between 1 and 4,027 and print them in a table which "reads down", using as few rows as possible, and using as few sheets of paper as possible. (This is because I have to print them out on paper to turn it in.) All numbers should be right-justified in their column. The height of the columns should all be the same, except for perhaps the last column, which might have a few blank entries towards its bottom row. The plan for my first function is to find all prime numbers between the range above and put them in a list. Then I want my second function to display the list in a table that reads up to down. 2 23 59 3 29 61 5 31 67 7 37 71 11 41 73 13 43 79 17 47 83 19 53 89 ect... This all I've been able to come up with myself: def findPrimes(n): """ Adds calculated prime numbers to a list. """ prime_list = list() for number in range(1, n + 1): prime = True for i in range(2, number): if(number % i == 0): prime = False if prime: prime_list.append(number) return prime_list def displayPrimes(): pass print(findPrimes(4027)) I'm not sure how to make a row/column display in Python. I remember using Java in my previous class and we had to use a for loop inside a for loop I believe. Do I have to do something similar to that?
Although I frequently don't answer questions where the original poster hasn't even made an attempt to solve the problem themselves, I decided to make an exception of yours—mostly because I found it an interesting (and surprisingly challenging) problem that required solving a number of somewhat tricky sub-problems. I also optimized your find_primes() function slightly by taking advantage of some reatively well-know computational shortcuts for calculating them. For testing and demo purposes, I made the tables only 15 rows high to force more than one page to be generated as shown in the output at the end. from itertools import zip_longest import locale import math locale.setlocale(locale.LC_ALL, '') # enable locale-specific formatting def zip_discard(*iterables, _NULL=object()): """ Like zip_longest() but doesn't fill out all rows to equal length. https://stackoverflow.com/questions/38054593/zip-longest-without-fillvalue """ return [[entry for entry in iterable if entry is not _NULL] for iterable in zip_longest(*iterables, fillvalue=_NULL)] def grouper(n, seq): """ Group elements in sequence into groups of "n" items. """ for i in range(0, len(seq), n): yield seq[i:i+n] def tabularize(width, height, numbers): """ Print list of numbers in column-major tabular form given the dimensions of the table in characters (rows and columns). Will create multiple tables of required to display all numbers. """ # Determine number of chars needed to hold longest formatted numeric value gap = 2 # including space between numbers col_width = len('{:n}'.format(max(numbers))) + gap # Determine number of columns that will fit within the table's width. num_cols = width // col_width chunk_size = num_cols * height # maximum numbers in each table for i, chunk in enumerate(grouper(chunk_size, numbers), start=1): print('---- Page {} ----'.format(i)) num_rows = int(math.ceil(len(chunk) / num_cols)) # rounded up table = zip_discard(*grouper(num_rows, chunk)) for row in table: print(''.join(('{:{width}n}'.format(num, width=col_width) for num in row))) def find_primes(n): """ Create list of prime numbers from 1 to n. """ prime_list = [] for number in range(1, n+1): for i in range(2, int(math.sqrt(number)) + 1): if not number % i: # Evenly divisible? break # Not prime. else: prime_list.append(number) return prime_list primes = find_primes(4027) tabularize(80, 15, primes) Output: ---- Page 1 ---- 1 47 113 197 281 379 463 571 659 761 863 2 53 127 199 283 383 467 577 661 769 877 3 59 131 211 293 389 479 587 673 773 881 5 61 137 223 307 397 487 593 677 787 883 7 67 139 227 311 401 491 599 683 797 887 11 71 149 229 313 409 499 601 691 809 907 13 73 151 233 317 419 503 607 701 811 911 17 79 157 239 331 421 509 613 709 821 919 19 83 163 241 337 431 521 617 719 823 929 23 89 167 251 347 433 523 619 727 827 937 29 97 173 257 349 439 541 631 733 829 941 31 101 179 263 353 443 547 641 739 839 947 37 103 181 269 359 449 557 643 743 853 953 41 107 191 271 367 457 563 647 751 857 967 43 109 193 277 373 461 569 653 757 859 971 ---- Page 2 ---- 977 1,069 1,187 1,291 1,427 1,511 1,613 1,733 1,867 1,987 2,087 983 1,087 1,193 1,297 1,429 1,523 1,619 1,741 1,871 1,993 2,089 991 1,091 1,201 1,301 1,433 1,531 1,621 1,747 1,873 1,997 2,099 997 1,093 1,213 1,303 1,439 1,543 1,627 1,753 1,877 1,999 2,111 1,009 1,097 1,217 1,307 1,447 1,549 1,637 1,759 1,879 2,003 2,113 1,013 1,103 1,223 1,319 1,451 1,553 1,657 1,777 1,889 2,011 2,129 1,019 1,109 1,229 1,321 1,453 1,559 1,663 1,783 1,901 2,017 2,131 1,021 1,117 1,231 1,327 1,459 1,567 1,667 1,787 1,907 2,027 2,137 1,031 1,123 1,237 1,361 1,471 1,571 1,669 1,789 1,913 2,029 2,141 1,033 1,129 1,249 1,367 1,481 1,579 1,693 1,801 1,931 2,039 2,143 1,039 1,151 1,259 1,373 1,483 1,583 1,697 1,811 1,933 2,053 2,153 1,049 1,153 1,277 1,381 1,487 1,597 1,699 1,823 1,949 2,063 2,161 1,051 1,163 1,279 1,399 1,489 1,601 1,709 1,831 1,951 2,069 2,179 1,061 1,171 1,283 1,409 1,493 1,607 1,721 1,847 1,973 2,081 2,203 1,063 1,181 1,289 1,423 1,499 1,609 1,723 1,861 1,979 2,083 2,207 ---- Page 3 ---- 2,213 2,333 2,423 2,557 2,687 2,789 2,903 3,037 3,181 3,307 3,413 2,221 2,339 2,437 2,579 2,689 2,791 2,909 3,041 3,187 3,313 3,433 2,237 2,341 2,441 2,591 2,693 2,797 2,917 3,049 3,191 3,319 3,449 2,239 2,347 2,447 2,593 2,699 2,801 2,927 3,061 3,203 3,323 3,457 2,243 2,351 2,459 2,609 2,707 2,803 2,939 3,067 3,209 3,329 3,461 2,251 2,357 2,467 2,617 2,711 2,819 2,953 3,079 3,217 3,331 3,463 2,267 2,371 2,473 2,621 2,713 2,833 2,957 3,083 3,221 3,343 3,467 2,269 2,377 2,477 2,633 2,719 2,837 2,963 3,089 3,229 3,347 3,469 2,273 2,381 2,503 2,647 2,729 2,843 2,969 3,109 3,251 3,359 3,491 2,281 2,383 2,521 2,657 2,731 2,851 2,971 3,119 3,253 3,361 3,499 2,287 2,389 2,531 2,659 2,741 2,857 2,999 3,121 3,257 3,371 3,511 2,293 2,393 2,539 2,663 2,749 2,861 3,001 3,137 3,259 3,373 3,517 2,297 2,399 2,543 2,671 2,753 2,879 3,011 3,163 3,271 3,389 3,527 2,309 2,411 2,549 2,677 2,767 2,887 3,019 3,167 3,299 3,391 3,529 2,311 2,417 2,551 2,683 2,777 2,897 3,023 3,169 3,301 3,407 3,533 ---- Page 4 ---- 3,539 3,581 3,623 3,673 3,719 3,769 3,823 3,877 3,919 3,967 4,019 3,541 3,583 3,631 3,677 3,727 3,779 3,833 3,881 3,923 3,989 4,021 3,547 3,593 3,637 3,691 3,733 3,793 3,847 3,889 3,929 4,001 4,027 3,557 3,607 3,643 3,697 3,739 3,797 3,851 3,907 3,931 4,003 3,559 3,613 3,659 3,701 3,761 3,803 3,853 3,911 3,943 4,007 3,571 3,617 3,671 3,709 3,767 3,821 3,863 3,917 3,947 4,013