Related
I got this code
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = [0 for b in range(16)]
skipped = 0
for i in range(16):
if A[i] == A[i-1]:
skipped += 1
else:
B[i-skipped] = A[i]
print(B)
The output:
[1, 2, 3, 4, 5, 2, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0]
it eliminates the doubles. But if i got an array where doubles are at more random index it fails, like:
The Array#2:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 2, 2, 2, 7, 8, 8, 9]
The output#2
[1, 2, 3, 4, 5, 2, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0]
In the output#2 there is the value 2 at index 1 and index 5, but i just want to eliminate all the doubles.
Sum:
So basically my algorithm should copy the values from Array A to Array B and eliminate all doubles independent from their index.
EDIT: i have to put it in pseudocode so i cant use convert methods or functions like SET
You can use set to do it:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = set(A)
print(B)
This code returns a set. To convert set to list you can write some_list = list(B).
Another way to do what you need:
A = [1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 8, 9]
B = []
for x in A:
if x not in B:
B.append(x)
print(B)
i am finding counts of unique values row-wise
import numpy as np
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr
output::
array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
[ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3],
[ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2],
[ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9],
[ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2],
[ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])
my code to finding counts of unique values row wise
row,col = np.shape(arr)
for i in range(row):
a,c = np.unique(arr[i], return_counts = True)
print(c)
output::
[1 2 1 1 1 2 2]
[2 1 3 1 1 1 1]
[3 2 3 1 1]
[1 2 1 1 2 1 2]
[2 2 2 1 1 1 1]
[1 1 1 1 1 2 2 1]
But i want my output to look like this
[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]
I think what you want is np.bincount:
row,col = np.shape(arr)
for i in range(row):
c = np.bincount(arr[i], minlength=11)[1:] #The purpose of [1:] is to ignore the first element
print(c)
this gives you a bunch of 1D arrays, not the 2D array you wanted. However, have a look at, for example, this question Can numpy bincount work with 2D arrays?.
In Python, if I define a 2D array, and set the second row as np.nan, the second row will become all -9223372036854775808 rather than missing values. An example is here:
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
[[ 0 0 0
0 0 0
0 0 0
5]
[-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808]
[ 0 0 0
3 6 6
6 6 6
6]
[ 0 0 3
4 6 6
6 6 6
6]
[ 0 1 2
4 4 4
4 4 4
4]]
Does anyone have any idea? And how should I correctly assign one row to np.nan?
For your reference, I am running these codes on python 3.7.10 environment created by mamba on Ubuntu 16.04.7 LTS (GNU/Linux 4.15.0-132-generic x86_64).
np.nan is a special floating point value that cannot be used in integer arrays. Since b is an array of integers, the code b[1, :] = np.nan attempts to convert np.nan to an integer, which is an undefined behavior. See this for a discussion of a similar issue.
You initialised your array with integers. Integers do not have a possible "nan" value and will resort to the minimal value. A quick fix is to initialize your array as np.floats, they are allowed to be "nan":
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=np.float)
b[1, :] = np.nan
print(b)
First of all nan is a special value for float arrays only.
I tried running your code on my python 3.8(64 bit environment) on Windows x-64 based.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
This is what I got
[[ 0 0 0 0 0 0
0 0 0 5]
[-2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
-2147483648 -2147483648 -2147483648 -2147483648]
[ 0 0 0 3 6 6
6 6 6 6]
[ 0 0 3 4 6 6
6 6 6 6]
[ 0 1 2 4 4 4
4 4 4 4]]
In case of int array I got lower bound of int in place of NaN and you are also getting the same depending on your environment.
So instead of int array you can use float array.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=float)
I have a pandas series in python.
Is there a function/easy way to construct a series which contains the number of appearances of given values?
For demonstration, suppose I have the following Series: 1, 3, 1, 5, 10.
I want to count how many appearances each value has, from the following list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
The series that should return is 2, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1
We do value_counts + reindex
l=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
l1=[1, 3, 1, 5, 10]
pd.Series(l1).value_counts().reindex(l,fill_value=0).tolist()
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
Use numpy.bincount
import numpy as np
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
s = pd.Series([1, 3, 1, 5, 10])
out = list(np.bincount(s)[[l]])
out
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
With map:
s = pd.Series([1, 3, 1, 5, 10])
inp_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pd.Series(inp_list).map(s.value_counts()).fillna(0).astype(int).tolist()
Or list comp with get
c = s.value_counts()
[c.get(i,0) for i in inp_list]
#or [*map(lambda x: c.get(x,0),inp_list)]
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
I want to do this in python, here is a small example:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
NDD_month = [8, 7, 11]
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates)
This gives me
[[8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8], [7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 4], [11, 10, 7, 6, 6, 6, 5, 4, 3, 2, 2, 1]]
Now I try to do the same thing but with the entire set of data but this is what I get (I will paste my whole code below):
# Import modules
import numpy as np
import pandas as pd
import datetime
# Import data file
df = pd.read_csv("Paystring Data.csv")
df.head()
# Get column data into a list
x = list(df)
# Append column data into cpi, NDD, and as of dates
NDD = df['NDD 8/31']
cpi = df['Contractual PI']
as_of_date = pd.Series(pd.to_datetime(df.columns.str[:8], errors='coerce'))
as_of_date = as_of_date[1:13]
NDD_month = pd.to_datetime(NDD, errors = 'coerce').dt.month.tolist()
# print(as_of_date.dt.month)
# Get cash flows
cf = df.iloc[:,1:13].replace('[^0-9.]', '', regex=True).astype(float)
cf = cf.values
# Calculate number of payments
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
np.vstack(number_of_payments).tolist()
# Calculate the new NDD dates
dates = []
for i in range(len(number_of_payments)):
dates.append([NDD_month[i]])
for j in range(1, len(number_of_payments[i])):
dates[i].append((dates[i][j-1] + 12 - number_of_payments[i][j-1]) % 12)
print(dates[0])
This just gives me [8]
When it should be [8, 8, 7, 7, 6, 5, 4, 4, 11, 10, 10, 8].
Anyone know how to fix this?
In your "small example", number_of_payments is a list of list of ints:
number_of_payments = [
[0, 1, 0, 1, 1, 1, 0, 5, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0],
[1, 3, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0]
]
In your real code, number_of_payments is a list of ints:
number_of_payments = []
for i in range(len(cpi)):
number_of_payments.append((cf[:i + 1] / cpi[i]).astype(int))
It seems like you need to figure out how to make your real number_of_payments look like your sample one through nesting.