Inspect all elements in Numpy array (and Pandas DataFrame) and change selectively

Inspect all elements in Numpy array (and Pandas DataFrame) and change selectively - python

Suppose that we have a numpy 2d array (or a Pandas DataFrame) with variable lengths in both rows and columns.
Is there a quick way to inspect all elements and clip to the pre-specified max value (if any element is larger than the pre-specified max value) in either numpy ndarray or pandas DataFrame, whichever is simpler?

pandas - use DataFrame.clip_upper:
np.random.seed(2018)
df = pd.DataFrame(np.random.randint(10, size=(5,5)))
print (df)
0 1 2 3 4
0 6 2 9 5 4
1 6 9 9 7 9
2 6 6 1 0 6
3 5 6 7 0 7
4 8 7 9 4 8
print (df.clip_upper(5))
0 1 2 3 4
0 5 2 5 5 4
1 5 5 5 5 5
2 5 5 1 0 5
3 5 5 5 0 5
4 5 5 5 4 5
Numpy - use numpy.clip:
np.random.seed(2018)
arr = np.random.randint(10, size=(5,5))
print (arr)
[[6 2 9 5 4]
[6 9 9 7 9]
[6 6 1 0 6]
[5 6 7 0 7]
[8 7 9 4 8]]
print (np.clip(arr, arr.min(), 5))
[[5 2 5 5 4]
[5 5 5 5 5]
[5 5 1 0 5]
[5 5 5 0 5]
[5 5 5 4 5]]

Related

Multiply each value in a column by a row python

I have a small subset of data here:
import pandas as pd
days = [1, 2, 3]
time = [2, 4, 2, 4, 2, 4, 2, 4, 2]
df1 = pd.DataFrame(days)
df2 = pd.Series(time)
df2 = df2.transpose()
df3 = df1*df2
Df1 is a column of data and df2 is a row of data. I need a dataframe that is going to be 3x9 where the row is multiplied by each value in the column to make one large dataframe.
The end result should look like:
df3 = [2 4 2 4 2 4 2 4 2
4 8 4 8 4 8 4 8 4
6 12 6 12 6 12 6 12 6 ]
They way I currently have it for my larger dataset, only a few datapoints are correctly multiplied and most are nans.

Dot(product) is one of the solutions to this problem
import pandas as pd
days = [1, 2, 3]
time = [2, 4, 2, 4, 2, 4, 2, 4, 2]
df1 = pd.DataFrame(days)
df2 = pd.DataFrame(time)
# use dot
df3 = df1.dot(df2.T)
df3
Output
0 1 2 3 4 5 6 7 8
0 2 4 2 4 2 4 2 4 2
1 4 8 4 8 4 8 4 8 4
2 6 12 6 12 6 12 6 12 6

Try this:
df1.dot(df2.to_frame().T)
Output:
0 1 2 3 4 5 6 7 8
0 2 4 2 4 2 4 2 4 2
1 4 8 4 8 4 8 4 8 4
2 6 12 6 12 6 12 6 12 6

Python (numpy or panda): How to enlarge the 2D array size by appending the values from the close boundary values?

For example. I have a 2D array:
1 2 3
4 5 6
7 8 9
And I want it becomes:
1 1 2 3 3
1 1 2 3 3
4 4 5 6 6
7 7 8 9 9
7 7 8 9 9
And then loop this process until the size becomes 9x9 2D array.
Thanks!

You're looking for numpy's repeat function :
initial_array = np.arange(1, 10).reshape((3,3))
desired_shape = (9, 9)
number_of_repeat_axis0 = desired_shape[0] // initial_array.shape[0]
number_of_repeat_axis1 = desired_shape[1] // initial_array.shape[1]
tmp = np.repeat(initial_array , number_of_repeat_axis0, axis = 0)
output = np.repeat(tmp, number_of_repeat_axis1, axis = 1)
'''
returns :
[[1 1 1 2 2 2 3 3 3]
[1 1 1 2 2 2 3 3 3]
[1 1 1 2 2 2 3 3 3]
[4 4 4 5 5 5 6 6 6]
[4 4 4 5 5 5 6 6 6]
[4 4 4 5 5 5 6 6 6]
[7 7 7 8 8 8 9 9 9]
[7 7 7 8 8 8 9 9 9]
[7 7 7 8 8 8 9 9 9]]
'''
But this will repeat all your data, including that in the middle of your array. If you only want the extremal values to be repeated, simply change it to :
tmp = np.repeat(initial_array , [4,1,4], axis = 0)
output = np.repeat(tmp, [4,1,4], axis = 1)
'''
returns :
[[1 1 1 1 2 3 3 3 3]
[1 1 1 1 2 3 3 3 3]
[1 1 1 1 2 3 3 3 3]
[1 1 1 1 2 3 3 3 3]
[4 4 4 4 5 6 6 6 6]
[7 7 7 7 8 9 9 9 9]
[7 7 7 7 8 9 9 9 9]
[7 7 7 7 8 9 9 9 9]
[7 7 7 7 8 9 9 9 9]]
'''

How to replace or swap all values (largest with smallest) in python?

I want to swap all the values of my data frame.Largest value must be replaced with smallest value (i.e. 7 with 1, 6 with 2, 5 with 3, 4 with 4, 3 with 5, and so on..
import numpy as np
import pandas as pd
import io
data = '''
Values
6
1
3
7
5
2
4
1
4
7
2
5
'''
df = pd.read_csv(io.StringIO(data))
Trial
First I want to get all the unique values from my data.
df1=df.Values.unique()
print(df1)
[6 1 3 7 5 2 4]
I have sorted it in ascending order:
sorted1 = list(np.sort(df1))
print(sorted1)
[1, 2, 3, 4, 5, 6, 7]
Than I have reverse sorted the list:
rev_sorted = list(reversed(sorted1))
print(rev_sorted)
[7, 6, 5, 4, 3, 2, 1]
Now I need to replace the max. value with min. value and so on in my main data set (df). The old values can be replaced or a new column might be added.
Expected Output:
Values,New_Values
6,2
1,7
3,5
7,1
5,3
2,6
4,4
1,7
4,4
7,1
2,6
5,3

Here's a vectorized one -
In [51]: m,n = np.unique(df['Values'], return_inverse=True)
In [52]: df['New_Values'] = m[n.max()-n]
In [53]: df
Out[53]:
Values New_Values
0 6 2
1 1 7
2 3 5
3 7 1
4 5 3
5 2 6
6 4 4
7 1 7
8 4 4
9 7 1
10 2 6
11 5 3
Translating to pandas with pandas.factorize -
m,n = pd.factorize(df.Values, sort=True)
df['New_Values'] = n[m.max()-m]

Use Series.map by dictionary created by sorted and reverse sorting lists:
df['New'] = df['Values'].map(dict(zip(sorted1,rev_sorted)))
print (df)
Values New
0 6 2
1 1 7
2 3 5
3 7 1
4 5 3
5 2 6
6 4 4
7 1 7
8 4 4
9 7 1
10 2 6
11 5 3

add incremental value for duplicates [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 3 years ago.
Suppose I have a dataframe looking something like
df = pd.DataFrame(np.array([[1, 2, 3, 2], [4, 5, 6, 3], [7, 8, 9, 5]]), columns=['a', 'b', 'c', 'repeater'])
a b c repeater
0 1 2 3 2
1 4 5 6 3
2 7 8 9 5
And I repeat every row based on the df['repeat'] like df = df.loc[df.index.repeat(df['repeater'])]
So I end up with a data frame
a b c repeater
0 1 2 3 2
0 1 2 3 2
1 4 5 6 3
1 4 5 6 3
1 4 5 6 3
2 7 8 9 5
2 7 8 9 5
2 7 8 9 5
2 7 8 9 5
2 7 8 9 5
How can I add an incremental value based on the index row? So a new column df['incremental'] with the output:
a b c repeater incremental
0 1 2 3 2 1
0 1 2 3 2 2
1 4 5 6 3 1
1 4 5 6 3 2
1 4 5 6 3 3
2 7 8 9 5 1
2 7 8 9 5 2
2 7 8 9 5 3
2 7 8 9 5 4
2 7 8 9 5 5

Try your code with an extra groupby and cumcount:
df = df.loc[df.index.repeat(df['repeater'])]
df['incremental'] = df.groupby(df.index).cumcount() + 1
print(df)
Output:
a b c repeater incremental
0 1 2 3 2 1
0 1 2 3 2 2
1 4 5 6 3 1
1 4 5 6 3 2
1 4 5 6 3 3
2 7 8 9 5 1
2 7 8 9 5 2
2 7 8 9 5 3
2 7 8 9 5 4
2 7 8 9 5 5

Pandas Split DataFrame using row index

I want to split dataframe by uneven number of rows using row index.
The below code:
groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))
works only for uniform number of rows.
df
a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
l = [2, 5, 7]
df1
1 1 1
2 2 2
df2
3,3,3
4,4,4
5,5,5
df3
6,6,6
7,7,7
df4
8,8,8

You could use list comprehension with a little modications your list, l, first.
print(df)
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
6 7 7 7
7 8 8 8
l = [2,5,7]
l_mod = [0] + l + [max(l)+1]
list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]
Output:
list_of_dfs[0]
a b c
0 1 1 1
1 2 2 2
list_of_dfs[1]
a b c
2 3 3 3
3 4 4 4
4 5 5 5
list_of_dfs[2]
a b c
5 6 6 6
6 7 7 7
list_of_dfs[3]
a b c
7 8 8 8

I think this is what you need:
df = pd.DataFrame({'a': np.arange(1, 8),
'b': np.arange(1, 8),
'c': np.arange(1, 8)})
df.head()
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
6 7 7 7
last_check = 0
dfs = []
for ind in [2, 5, 7]:
dfs.append(df.loc[last_check:ind-1])
last_check = ind
Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.
dfs[0]
a b c
0 1 1 1
1 2 2 2
dfs[2]
a b c
5 6 6 6
6 7 7 7

I think this is you are looking for.,
l = [2, 5, 7]
dfs=[]
i=0
for val in l:
if i==0:
temp=df.iloc[:val]
dfs.append(temp)
elif i==len(l):
temp=df.iloc[val]
dfs.append(temp)
else:
temp=df.iloc[l[i-1]:val]
dfs.append(temp)
i+=1
Output:
a b c
0 1 1 1
1 2 2 2
a b c
2 3 3 3
3 4 4 4
4 5 5 5
a b c
5 6 6 6
6 7 7 7
Another Solution:
l = [2, 5, 7]
t= np.arange(l[-1])
l.reverse()
for val in l:
t[:val]=val
temp=pd.DataFrame(t)
temp=pd.concat([df,temp],axis=1)
for u,v in temp.groupby(0):
print v
Output:
a b c 0
0 1 1 1 2
1 2 2 2 2
a b c 0
2 3 3 3 5
3 4 4 4 5
4 5 5 5 5
a b c 0
5 6 6 6 7
6 7 7 7 7

You can create an array to use for indexing via NumPy:
import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))
L = [2, 5, 7]
idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))
for _, chunk in df.groupby(idx):
print(chunk, '\n')
a b c
0 0 1 2
1 3 4 5
a b c
2 6 7 8
3 9 10 11
4 12 13 14
a b c
5 15 16 17
6 18 19 20
a b c
7 21 22 23
Instead of defining a new variable for each dataframe, you can use a dictionary:
d = dict(tuple(df.groupby(idx)))
print(d[1]) # print second groupby value
a b c
2 6 7 8
3 9 10 11
4 12 13 14

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Inspect all elements in Numpy array (and Pandas DataFrame) and change selectively - python

Related

Multiply each value in a column by a row python

Python (numpy or panda): How to enlarge the 2D array size by appending the values from the close boundary values?

How to replace or swap all values (largest with smallest) in python?

add incremental value for duplicates [duplicate]

Pandas Split DataFrame using row index

Categories

Resources