Count specific values in a pandas series - python

I have a pandas series in python.
Is there a function/easy way to construct a series which contains the number of appearances of given values?
For demonstration, suppose I have the following Series: 1, 3, 1, 5, 10.
I want to count how many appearances each value has, from the following list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
The series that should return is 2, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1

We do value_counts + reindex
l=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
l1=[1, 3, 1, 5, 10]
pd.Series(l1).value_counts().reindex(l,fill_value=0).tolist()
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]

Use numpy.bincount
import numpy as np
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
s = pd.Series([1, 3, 1, 5, 10])
out = list(np.bincount(s)[[l]])
out
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]

With map:
s = pd.Series([1, 3, 1, 5, 10])
inp_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pd.Series(inp_list).map(s.value_counts()).fillna(0).astype(int).tolist()
Or list comp with get
c = s.value_counts()
[c.get(i,0) for i in inp_list]
#or [*map(lambda x: c.get(x,0),inp_list)]
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]

Related

If value is greater than x select all values from list until value is greater than y. Make all other values 0

I would like to loop through a list of values. If there is a value greater than 3, then select all the following values while the value is greater or equal than 1 (or stop before it drops lower than 1). The rest of the values in the list should be zero until another value down the list is greater than 3 and the process repeats itself.
Example:
If I have the following list:
l = [1, 3, 2, 3, 2, 4, 1, 3, 5, 6, 7, 6, 7, 8, 1, 0, 1, 2, 1, 3, 4, 7, 8, 9, 7, 5, 2, 1, 2, 4, 7, 8, 1, 3]
I would like to get get the following:
o = [0, 0, 0, 0, 0, 4, 1, 0, 5, 6, 7, 6, 7, 8, 1, 0, 0, 0, 0, 0, 4, 7, 8, 9, 7, 5, 2, 1, 0, 4, 7, 8, 1, 0]
So far I managed to get all values greater that 3 and the rest 0, but I don't know how to integrate the other conditon:
l = [1, 3, 2, 3, 2, 4, 1, 3, 5, 6, 7, 6, 7, 8, 1, 0, 1, 2, 1, 3, 4, 7, 8, 9, 7, 5, 2, 1, 2, 4, 7, 8, 1, 3]
o = [0] * len(l)
for index in range(len(l)):
if l[index] > 3:
o[index] = l[index]
else:
o[index] = 0
output:
[0, 0, 0, 0, 0, 4, 0, 0, 5, 6, 7, 6, 7, 8, 0, 0, 0, 0, 0, 0, 4, 7, 8, 9, 7, 5, 0, 0, 0, 4, 7, 8, 0, 0]
I would use a flag to control which values to let through.
Also, I would use a generator:
def a_filter(items, on=3, off=1):
through = False
for item in items:
if item > on:
through = True
elif item < off:
through = False
yield item if through else 0
if item <= off:
through = False
l = [1, 3, 2, 3, 2, 4, 1, 3, 5, 6, 7, 6, 7, 8, 1, 0, 1, 2, 1, 3, 4, 7, 8, 9, 7, 5, 2, 1, 2, 4, 7, 8, 1, 3]
o = [0, 0, 0, 0, 0, 4, 1, 0, 5, 6, 7, 6, 7, 8, 1, 0, 0, 0, 0, 0, 4, 7, 8, 9, 7, 5, 2, 1, 0, 4, 7, 8, 1, 0]
print(l)
# [1, 3, 2, 3, 2, 4, 1, 3, 5, 6, 7, 6, 7, 8, 1, 0, 1, 2, 1, 3, 4, 7, 8, 9, 7, 5, 2, 1, 2, 4, 7, 8, 1, 3]
print(o)
# [0, 0, 0, 0, 0, 4, 1, 0, 5, 6, 7, 6, 7, 8, 1, 0, 0, 0, 0, 0, 4, 7, 8, 9, 7, 5, 2, 1, 0, 4, 7, 8, 1, 0]
print(list(a_filter(l)))
# [0, 0, 0, 0, 0, 4, 1, 0, 5, 6, 7, 6, 7, 8, 1, 0, 0, 0, 0, 0, 4, 7, 8, 9, 7, 5, 2, 1, 0, 4, 7, 8, 1, 0]
print(o == list(a_filter(l)))
# True

Replacing specific values of a 2d numpy array, but only at the edges

To illustrate my point, lets take this 2d numpy array:
array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
I want to replace the value 1 with some other value, let's say 0, but only at the edges. This is the desired result:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])
Note that the 1's surrounded by other values are not changed.
I could implement this by iterating over every row and element, but I feel like that would be very inefficient. Normally I would use the np.where function to replace a specific value, but I don't think you can add positional conditions?
m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
These three lines will give you the index for the trailling ones. The idea has been taken from trailing zeroes.
Try:
arr = np.array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
for row in arr:
m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
# print(w1, w2)
row[0:w1+1] = 0
row[w2:] = 0
# print(row)
arr:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])

Strange output error following example of matirx vector operation in python

I want to do this in python, here is a small example:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
NDD_month = [8, 7, 11]
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates)
This gives me
[[8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8], [7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 4], [11, 10, 7, 6, 6, 6, 5, 4, 3, 2, 2, 1]]
Now I try to do the same thing but with the entire set of data but this is what I get (I will paste my whole code below):
# Import modules
import numpy as np
import pandas as pd
import datetime
# Import data file
df = pd.read_csv("Paystring Data.csv")
df.head()
# Get column data into a list
x = list(df)
# Append column data into cpi, NDD, and as of dates
NDD = df['NDD 8/31']
cpi = df['Contractual PI']
as_of_date = pd.Series(pd.to_datetime(df.columns.str[:8], errors='coerce'))
as_of_date = as_of_date[1:13]
NDD_month = pd.to_datetime(NDD, errors = 'coerce').dt.month.tolist()
# print(as_of_date.dt.month)
# Get cash flows
cf = df.iloc[:,1:13].replace('[^0-9.]', '', regex=True).astype(float)
cf = cf.values
# Calculate number of payments
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
np.vstack(number_of_payments).tolist()
# Calculate the new NDD dates
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates[0])
This just gives me [8]
When it should be [8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8].
Anyone know how to fix this?
In your "small example", number_of_payments is a list of list of ints:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
In your real code, number_of_payments is a list of ints:
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
It seems like you need to figure out how to make your real number_of_payments look like your sample one through nesting.

Why aren't my data being masked?

data = [[0, 1, 1, 5, 5, 5, 0, 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, 6, 6, 6],
[1, 1, 1, 0, 5, 5, 5, 0, 2, 2, 0, 0, 2, 0, 0, 6, 6, 6, 0, 0, 6, 6],
[1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 2, 0, 0, 2, 6, 0, 0, 6, 6]]
The data object i have is a <class 'numpy.ndarray'>
Knowing data is a numpy object i did the following:
data = np.array(data)
i want to set the numbers inside a list i give as input to 0, what i tried:
data[~np.isin(data,[2,4])] = 0
i expect all the 2 and 4 occurrences in the previous matrix to be 0 and the rest to keep their values, what i got:
TypeError: only integer scalar arrays can be converted to a scalar index
also tried to give data as a numpy array using np.array gave error as well.
You should not negate the mask from np.isin check if you intend to set those matching values to 0. The below code works just fine:
Also, you should make the data a numpy array instead of list of lists.
In [10]: data = np.array([[0, 1, 1, 5, 5, 5, 0, 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, 6, 6, 6],
...: [1, 1, 1, 0, 5, 5, 5, 0, 2, 2, 0, 0, 2, 0, 0, 6, 6, 6, 0, 0, 6, 6],
...: [1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 2, 0, 0, 2, 6, 0, 0, 6, 6]])
...:
In [11]: data[np.isin(data, [2, 4])] = 0
In [12]: data
Out[12]:
array([[0, 1, 1, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6],
[1, 1, 1, 0, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 0, 0, 6, 6],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 6, 6]])
Just to reproduce your error:
In [13]: data = [[0, 1, 1, 5, 5, 5, 0, 2, 2, 2, 2, 2, 2, 2, 6, 6, 6, 6, 6, 6, 6, 6],
...: [1, 1, 1, 0, 5, 5, 5, 0, 2, 2, 0, 0, 2, 0, 0, 6, 6, 6, 0, 0, 6, 6],
...: [1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 2, 0, 0, 2, 6, 0, 0, 6, 6]]
...:
In [14]: data[np.isin(data, [2, 4])] = 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-06ee1662f1f2> in <module>()
----> 1 data[np.isin(data, [2, 4])] = 0
TypeError: only integer scalar arrays can be converted to a scalar index

How would I display the values of a selected column in Python array?

How would I find the values from a certain column in an array? For example I have:
[1, 1, 2, 4, 1, 7, 1, 7, 6, 9]
[1, 2, 5, 3, 9, 1, 1, 1, 9, 1]
[7, 4, 5, 1, 8, 1, 2, 0, 0, 4]
[1, 4, 1, 1, 1, 1, 1, 1, 8, 5]
[9, 0, 0, 0, 0, 0, 1, 1, 9, 8]
[7, 4, 2, 1, 8, 2, 2, 2, 9, 7]
[7, 4, 2, 1, 7, 1, 1, 1, 0, 5]
[3, 4, 5, 3, 4, 5, 9, 1, 0, 9]
[0, 0, 5, 1, 1, 1, 9, 7, 7, 7]
If I wanted to list all of the values of column 5, how would I do this? I have figured out how to do this for the rows, but for the columns it is tricky, since they are all part of a separate list. I have not been able to find anything about this and I am very new to Python so I don't really know what I don't know.
It's simple. Just use l[i][4] to print 5th column value.
l = [
[1, 1, 2, 4, 1, 7, 1, 7, 6, 9],
[1, 2, 5, 3, 9, 1, 1, 1, 9, 1],
[7, 4, 5, 1, 8, 1, 2, 0, 0, 4],
[1, 4, 1, 1, 1, 1, 1, 1, 8, 5],
[9, 0, 0, 0, 0, 0, 1, 1, 9, 8],
[7, 4, 2, 1, 8, 2, 2, 2, 9, 7],
[7, 4, 2, 1, 7, 1, 1, 1, 0, 5],
[3, 4, 5, 3, 4, 5, 9, 1, 0, 9],
[0, 0, 5, 1, 1, 1, 9, 7, 7, 7]
]
for i in l:
print(i[4])
# or simply use
[i[4] for i in l] #as pointed out by #COLDSPEED
# the above code will create a list with values from 5th column
See it in action here
For a two dimensional array, you can use array[row][column].

Categories

Resources