Why apply function did not work on pandas dataframe

Why apply function did not work on pandas dataframe - python

ct_data['IM NO'] = ct_data['IM NO'].apply(lambda x: pyffx.Integer(b'dkrya#Jppl1994', length=20).encrypt(int(x)))
I am trying to encyrpt here is below head of ct_data
Unnamed: 0 IM NO CT ID
0 0 214281340 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 1 214281244 -vf6738ee3bedf47e8acf4613034069ab0|aa0d2dac654
2 2 175326863 __g3d877adf9d154637be26d9a0111e1cd6|6FfHZRoiWs
3 3 299631931 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
4 4 214282320 773840905c424a10a4a31aba9d6458bb|__g1114a30c6e
But I get as below
Unnamed: 0 ... CT ID
0 0 ... x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 1 ... aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
2 2 ... 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
3 3 ... 54e2c39cd35044ffbd9c0918d07923dc|__gbe204670ca
4 4 ... __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c
5 5 ... 9e6eb976075b4b189ae7dde42b67ca3d|WgpKucd28IcdE
IM NO columns header name and its value should be 20 digit encrpted ,
Normally encryption is done as below
import pyffx
strEncrypt = pyffx.Integer(b'dkrya#Jppl1994', length=20)
strEncrptVal = strEncrypt.encrypt(int('9digit IM No'))
ct_data.iloc[:, 1]) displays below thing
0 214281340
1 214281244
2 175326863
3 299631931
4 214282320
5 214279026

This should be a comment but it contains formatted data.
It is probably a mere display problem. With the initial sample of you dataframe, I have executed your command and printed its returned values:
print(ct_data['IM NO'].apply(lambda x: pyffx.Integer(b'dkrya#Jppl1994', length=20).encrypt(int(x))))
0 88741194526272080902
1 2665012251053580165
2 18983388112345132770
3 85666027666173191357
4 78253063863998100367
Name: IM NO, dtype: object
So it is correctly executed. Let us go one step further:
ct_data['IM NO'] = ct_data['IM NO'].apply(lambda x: pyffx.Integer(b'dkrya#Jppl1994', length=20).encrypt(int(x)))
print(ct_data['IM NO'])
0 88741194526272080902
1 2665012251053580165
2 18983388112345132770
3 85666027666173191357
4 78253063863998100367
Name: IM NO, dtype: object
Again...
That means that your command was successfull, but as the IM NO column is now larger, you system can no more display all the columns and it displays the first and las ones, with ellipses (...) in the middle.

Related

Python Groupby and Count

I'm working on create a sankey plot and have the raw data mapped so that I know source and target node. I'm having an issue with grouping the source & target and then counting the number of times each occurs. E.g. using the table below finding out how many time 0 -> 4 occurs and recording that in the dataframe.
index event_action_num next_action_num
227926 0 6
227928 1 5
227934 1 6
227945 1 7
227947 1 6
227951 0 7
227956 0 6
227958 2 6
227963 0 6
227965 1 6
227968 1 5
227972 3 6
Where I want to send up is:
event_action_num next_action_num count_of
0 4 1728
0 5 2382
0 6 3739
etc
Have tried:
df_new_2 = df_new.groupby(['event_action_num', 'next_action_num']).count()
but doesn't give me the result I'm looking for.
Thanks in advance

Try to use agg('size') instead of count():
df_new_2.groupby(['event_action_num', 'next_action_num']).agg('size')
For your sample data output will be:

How can I reference particular cells in a dataframe?

I am a beginner and this is my first project.. I searched for the answer but it still isn't clear.
I have imported a worksheet from excel using Pandas..
**Rabbit Class:
Num Behavior Speaking Listening
0 1 3 1 1
1 2 1 1 1
2 3 3 1 1
3 4 1 1 1
4 5 3 2 2
5 6 3 2 3
6 7 3 3 1
7 8 3 3 3
8 9 2 3 2
What I want to do is create if functions.. ex. if a student's behavior is a "1" I want it to print one string, else print a different string. How can I reference a particular cell of the worksheet to set up such a function? I tried: val = df.at(1, "Behavior") but that clearly isn't working..
Here is the code I have so far..
import os
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
path = r"C:\Users\USER\Desktop\Python\rabbit_class.xls"
print("Rabbit Class:")
print(df)

Also you can do
dff = df.loc[df['Behavior']==1]
if(not(dff.empty)):
# do Something

What you want is to find rows where df.Behavior is equal to 1. Use any of the following three methods.
# Method-1
df[df["Behavior"]==1]
# Method-2
df.loc[df["Behavior"]==1]
# Method-3
df.query("Behavior==1")
Output:
Num Behavior Speaking Listening LastColumn
0 0 1 3 1 1
Note: Dummy Data
Your sample data does not have a column header (the last one). So I named it LastColumn and read-in the data as a dataframe.
# Dummy Data
s = """
Num Behavior Speaking Listening LastColumn
0 1 3 1 1
1 2 1 1 1
2 3 3 1 1
3 4 1 1 1
4 5 3 2 2
5 6 3 2 3
6 7 3 3 1
7 8 3 3 3
8 9 2 3 2
"""
# Make Dataframe
ss = re.sub('\s+',',',s)
ss = ss[1:-1]
sa = np.array(ss.split(',')).reshape(-1,5)
df = pd.DataFrame(dict((k,v) for k,v in zip(sa[0,:], sa[1:,].T)))
df = df.astype(int)
df

Hope below example will help you
import pandas as pd
df = pd.read_excel(r"D:\test_stackoverflow.xlsx")
print(df.columns)
def _filter(col, filter_):
return df[df[col]==filter_]
print(_filter('Behavior', 1))

Thank you all for your answers. I finally figured out what I was trying to do using the following code:
i = 0
for i in df.index:
student_number = df["Student Number"][i]
print(student_number)
student_name = student_list[int(student_number) - 1]
behavior = df["Behavior"][i]
if behavior == 1:
print("%s's behavior is good" % student_name)
elif behavior == 2:
print ("%s's behavior is average." % student_name)
else:
print ("%s's behavior is poor" % student_name)
speaking = df["Speaking"][i]

Create single row for each entry in df rows

Hello I read in an excel file as a DataFrame whose rows contains multiple values. The shape of the df is like:
Welding
0 65051020 ...
1 66053510 66053550 ...
2 66553540 66553560 ...
3 67053540 67053505 ...
now I want to split each row and write each entry into an own row like
Welding
0 65051020
1 66053510
2 66053550
....
n 67053505
I tried have tried:
[new.append(df.loc[i,"Welding"].split()) for i in range(len(df))]
df2=pd.DataFrame({"Welding":new})
print(df2)
Welding
0 66053510
1 66053550
2 66053540
3 66053505
4 66053551
5 [65051020, 65051010, 65051030, 65051035, 65051...
6 [66053510, 66053550, 66053540, 66053505, 66053...
7 [66553540, 66553560, 66553505, 66553520, 66553...
8 [67053540, 67053505, 67057505]
9 [65051020, 65051010, 65051030, 65051035, 65051...
10 [66053510, 66053550, 66053540, 66053505, 66053...
11 [66553540, 66553560, 66553505, 66553520, 66553...
12 [67053540, 67053505, 67057505]
13 [65051020, 65051010, 65051030, 65051035, 65051...
14 [66053510, 66053550, 66053540, 66053505, 66053...
15 [66553540, 66553560, 66553505, 66553520, 66553...
16 [67053540, 67053505, 67057505]
But this did not return the expected results.
Appreciate each help!

Use split with stack and last to_frame:
df = df['Welding'].str.split(expand=True).stack().reset_index(drop=True).to_frame('Welding')
print (df)
Welding
0 65051020
1 66053510
2 66053550
3 66553540
4 66553560
5 67053540
6 67053505

Finding contiguous, non-unique slices in Pandas series without iterating

I'm trying to parse a logfile of our manufacturing process. Most of the time the process is run automatically but occasionally, the engineer needs to switch into manual mode to make some changes and then switches back to automatic control by the reactor software. When set to manual mode the logfile records the step as being "MAN.OP." instead of a number. Below is a representative example.
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
ser_orig = pd.Series(steps)
which results in
0 1
1 2
2 2
3 MAN.OP.
4 MAN.OP.
5 2
6 2
7 3
8 3
9 MAN.OP.
10 MAN.OP.
11 4
12 4
dtype: object
I need to detect the 'MAN.OP.' and make them distinct from each other. In this example, the two regions with values == 2 should be one region after detecting the manual mode section like this:
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
I have code that iterates over this series and produces the correct result when the series is passed to my object. The setter is:
#step_series.setter
def step_series(self, ss):
"""
On assignment, give the manual mode steps a unique name. Leave
the steps done on recipe the same.
"""
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
counter = 0
continuous = False
for i in ss.index:
if continuous and ss.at[i] != manual_mode:
continuous = False
counter += 1
elif not continuous and ss.at[i] == manual_mode:
continuous = True
ss.at[i] = new_manual_mode_text.format(str(counter))
elif continuous and ss.at[i] == manual_mode:
ss.at[i] = new_manual_mode_text.format(str(counter))
self._step_series = ss
but this iterates over the entire dataframe and is the slowest part of my code other than reading the logfile over the network.
How can I detect these non-unique sections and rename them uniquely without iterating over the entire series? The series is a column selection from a larger dataframe so adding extra columns is fine if needed.
For the completed answer I ended up with:
#step_series.setter
def step_series(self, ss):
pd.options.mode.chained_assignment = None
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
newManOp = (ss=='MAN.OP.') & (ss != ss.shift())
ss[ss == 'MAN.OP.'] = 'Manual_Mode_' + (newManOp.cumsum()-1).astype(str)
self._step_series = ss

Here's one way:
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
steps = pd.Series(steps)
newManOp = (steps=='MAN.OP.') & (steps != steps.shift())
steps[steps=='MAN.OP.'] += seq.cumsum().astype(str)
>>> steps
0 1
1 2
2 2
3 MAN.OP.1
4 MAN.OP.1
5 2
6 2
7 3
8 3
9 MAN.OP.2
10 MAN.OP.2
11 4
12 4
dtype: object
To get the exact format you listed (starting from zero instead of one, and changing from "MAN.OP." to "Manual_mode_"), just tweak the last line:
steps[steps=='MAN.OP.'] = 'Manual_Mode_' + (seq.cumsum()-1).astype(str)
>>> steps
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
There a pandas enhancement request for contiguous groupby, which would make this type of task simpler.

There is s function in matplotlib that takes a boolean array and returns a list of (start, end) pairs. Each pair represents a contiguous region where the input is True.
import matplotlib.mlab as mlab
regions = mlab.contiguous_regions(ser_orig == manual_mode)
for i, (start, end) in enumerate(regions):
ser_orig[start:end] = new_manual_mode_text.format(i)
ser_orig
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object

Guessing Indentation of text file with python

I am working with a program that generates a specific file format, that I have to read and modify with python scripts. This file is is supposed to be tab delimited, but I haven't been able to recognize the tab character. Any good way to read this kind of file, and generate a new one in the same formatting?
1. Base Year Data for Calibration
1.1 Observed Data per Internal Zone
Sector Zone ExogProd InducedPro ExogDemand Price ValueAdded Attractor
1 1 5000 0 0 14409.8204 0 1
1 2 800 0 0 12628.4625 0 1
1 3 1100 0 0 12676.3341 0 1
2 1 0 3393.2241 0 13944.0613 0 1
2 2 0 732.1119 0 12340.4575 0 1
2 3 0 974.6630 0 12132.7666 0 1
3 1 0 4491.8722 0 2701.8266 0 1
3 2 0 12755.9657 0 2445.0556 0 1
3 3 0 4752.1604 0 2671.2305 0 1
4 1 0 1790.7874 0 3858.0189 0 1
4 2 0 3076.6366 0 3337.8784 0 1
4 3 0 11132.5806 0 3728.1412 0 1
5 1 0 69.5126 0 250000 250000 1
5 2 0 109.5081 0 120000 120000 1
5 3 0 124.2133 0 180000 180000 1
The problem is that when I read this with python with line.split('\t'), I end with just the whole line.

As others have pointed out in the comments, this appears to be just a space separated file with a variable number of spaces between cells. If that is the case, you can extract the cells from a particular row like this:
cells = line.split()
As for regenerating it, you'll need to pad the various columns to different widths. One way would be with code like this:
widths = [12,9,11,11,11,11,11,11]
paddedCells = [string.rjust(cell,widths[i]) for i,cell in enumerate(cells)]
line = ''.join(paddedCells)

actually I am using
%12d %8d %10.2f %10.2f %10.2f %10.2f %10.2f %10.1f\n
The problem seems to be how the file are generated. I am pretty sure is not tab-delimited files.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why apply function did not work on pandas dataframe - python

Related

Python Groupby and Count

How can I reference particular cells in a dataframe?

Create single row for each entry in df rows

Finding contiguous, non-unique slices in Pandas series without iterating

Guessing Indentation of text file with python

Categories

Resources