Create a new series with same index - python

I have a series with the following
2 [2, 2, 1, 2, 0, 0, 5, 8, 7, 1, 2, 1, 0, 8, 4, ...
5 [3, 1, 5, 0]
8 [9, 0, 0, 0, 9, 0, 6, 1, 7, 0, 1, 4, 6, 1, 3, ...
9 [1, 1, 0, 8, 0, 0, 2, 9, 8, 6, 0, 3, 0]
11 [1, 0, 0, 2, 0, 0, 0, 0, 1, 1, 8, 7, 5, 7, 5, ...
I want to create a new series that keeps the index (2, 5, 8, 9, 11), with values equal to the length of the list in each row
The result would be
2 25
5 4
8 20
9 13
11 18

list(map(lambda x: (x, len(object[x])), indices))
Its somewhat pseudo code because you haven't specified your data type or variable names, but the general approach is that you have an object of data indexed by some index x. So loop over all the xs and obtain the length property of the resultant data structure.
Edit: since you stated it was pandas Series of integer lists try this:
import pandas as pd
S = pd.Series([[1,2,3], [2,3]], index=[2,4])
print(S)
# 2 [1, 2, 3]
# 4 [2, 3]
lengths = list(map(lambda x: len(S[x]), S.index))
S2 = pd.Series(lengths, index=S.index)
print(S2)
# 2 3
# 4 2

Related

Copy an Array and delete doubles

I got this code
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = [0 for b in range(16)]
skipped = 0
for i in range(16):
if A[i] == A[i-1]:
skipped += 1
else:
B[i-skipped] = A[i]
print(B)
The output:
[1, 2, 3, 4, 5, 2, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0]
it eliminates the doubles. But if i got an array where doubles are at more random index it fails, like:
The Array#2:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 2, 2, 2, 7, 8, 8, 9]
The output#2
[1, 2, 3, 4, 5, 2, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0]
In the output#2 there is the value 2 at index 1 and index 5, but i just want to eliminate all the doubles.
Sum:
So basically my algorithm should copy the values from Array A to Array B and eliminate all doubles independent from their index.
EDIT: i have to put it in pseudocode so i cant use convert methods or functions like SET
You can use set to do it:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = set(A)
print(B)
This code returns a set. To convert set to list you can write some_list = list(B).
Another way to do what you need:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = []
for x in A:
if x not in B:
B.append(x)
print(B)

Find the counts of unique values row-wise using numpy

i am finding counts of unique values row-wise
import numpy as np
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr
output::
array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
[ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3],
[ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2],
[ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9],
[ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2],
[ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])
my code to finding counts of unique values row wise
row,col = np.shape(arr)
for i in range(row):
a,c = np.unique(arr[i], return_counts = True)
print(c)
output::
[1 2 1 1 1 2 2]
[2 1 3 1 1 1 1]
[3 2 3 1 1]
[1 2 1 1 2 1 2]
[2 2 2 1 1 1 1]
[1 1 1 1 1 2 2 1]
But i want my output to look like this
[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]
I think what you want is np.bincount:
row,col = np.shape(arr)
for i in range(row):
c = np.bincount(arr[i], minlength=11)[1:] #The purpose of [1:] is to ignore the first element
print(c)
this gives you a bunch of 1D arrays, not the 2D array you wanted. However, have a look at, for example, this question Can numpy bincount work with 2D arrays?.

In Python, why did I get -9223372036854775808 when I set one row of a 2D array as np.nan?

In Python, if I define a 2D array, and set the second row as np.nan, the second row will become all -9223372036854775808 rather than missing values. An example is here:
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
[[ 0 0 0
0 0 0
0 0 0
5]
[-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808]
[ 0 0 0
3 6 6
6 6 6
6]
[ 0 0 3
4 6 6
6 6 6
6]
[ 0 1 2
4 4 4
4 4 4
4]]
Does anyone have any idea? And how should I correctly assign one row to np.nan?
For your reference, I am running these codes on python 3.7.10 environment created by mamba on Ubuntu 16.04.7 LTS (GNU/Linux 4.15.0-132-generic x86_64).
np.nan is a special floating point value that cannot be used in integer arrays. Since b is an array of integers, the code b[1, :] = np.nan attempts to convert np.nan to an integer, which is an undefined behavior. See this for a discussion of a similar issue.
You initialised your array with integers. Integers do not have a possible "nan" value and will resort to the minimal value. A quick fix is to initialize your array as np.floats, they are allowed to be "nan":
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=np.float)
b[1, :] = np.nan
print(b)
First of all nan is a special value for float arrays only.
I tried running your code on my python 3.8(64 bit environment) on Windows x-64 based.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
This is what I got
[[ 0 0 0 0 0 0
0 0 0 5]
[-2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
-2147483648 -2147483648 -2147483648 -2147483648]
[ 0 0 0 3 6 6
6 6 6 6]
[ 0 0 3 4 6 6
6 6 6 6]
[ 0 1 2 4 4 4
4 4 4 4]]
In case of int array I got lower bound of int in place of NaN and you are also getting the same depending on your environment.
So instead of int array you can use float array.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=float)

Count specific values in a pandas series

I have a pandas series in python.
Is there a function/easy way to construct a series which contains the number of appearances of given values?
For demonstration, suppose I have the following Series: 1, 3, 1, 5, 10.
I want to count how many appearances each value has, from the following list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
The series that should return is 2, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1
We do value_counts + reindex
l=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
l1=[1, 3, 1, 5, 10]
pd.Series(l1).value_counts().reindex(l,fill_value=0).tolist()
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
Use numpy.bincount
import numpy as np
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
s = pd.Series([1, 3, 1, 5, 10])
out = list(np.bincount(s)[[l]])
out
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
With map:
s = pd.Series([1, 3, 1, 5, 10])
inp_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pd.Series(inp_list).map(s.value_counts()).fillna(0).astype(int).tolist()
Or list comp with get
c = s.value_counts()
[c.get(i,0) for i in inp_list]
#or [*map(lambda x: c.get(x,0),inp_list)]
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]

Strange output error following example of matirx vector operation in python

I want to do this in python, here is a small example:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
NDD_month = [8, 7, 11]
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates)
This gives me
[[8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8], [7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 4], [11, 10, 7, 6, 6, 6, 5, 4, 3, 2, 2, 1]]
Now I try to do the same thing but with the entire set of data but this is what I get (I will paste my whole code below):
# Import modules
import numpy as np
import pandas as pd
import datetime
# Import data file
df = pd.read_csv("Paystring Data.csv")
df.head()
# Get column data into a list
x = list(df)
# Append column data into cpi, NDD, and as of dates
NDD = df['NDD 8/31']
cpi = df['Contractual PI']
as_of_date = pd.Series(pd.to_datetime(df.columns.str[:8], errors='coerce'))
as_of_date = as_of_date[1:13]
NDD_month = pd.to_datetime(NDD, errors = 'coerce').dt.month.tolist()
# print(as_of_date.dt.month)
# Get cash flows
cf = df.iloc[:,1:13].replace('[^0-9.]', '', regex=True).astype(float)
cf = cf.values
# Calculate number of payments
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
np.vstack(number_of_payments).tolist()
# Calculate the new NDD dates
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates[0])
This just gives me [8]
When it should be [8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8].
Anyone know how to fix this?
In your "small example", number_of_payments is a list of list of ints:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
In your real code, number_of_payments is a list of ints:
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
It seems like you need to figure out how to make your real number_of_payments look like your sample one through nesting.

Categories

Resources