I am trying to add values in cells of one column in Pandas Dataframe. The dataframe was created:
data = [['ID_123456', 'example=1(abc)'], ['ID_123457', 'example=1(def)'], ['ID_123458', 'example=1(try)'], ['ID_123459', 'example=1(try)'], ['ID_123460', 'example=1(try),2(test)'], ['ID_123461', 'example=1(try),2(test),9(yum)'], ['ID_123462', 'example=1(try)'], ['ID_123463', 'example=1(try),7(test)']]
df = pd.DataFrame(data, columns = ['ID', 'occ'])
display(df)
The table looks like this:
ID occ
ID_123456 example=1(abc)
ID_123457 example=1(def)
ID_123458 example=1(try)
ID_123459 example=1(test)
ID_123460 example=1(try),2(test)
ID_123461 example=1(try),2(test),9(yum)
ID_123462 example=1(test)
ID_123463 example=1(try),7(test)
The following link is related to it but I was unable to run the command on my dataframe.
Sum all integers in a PANDAS DataFrame "cell"
The command gives an error of "string index out of range".
The output should look like this:
ID occ count
ID_123456 example=1(abc) 1
ID_123457 example=1(def) 1
ID_123458 example=1(try) 1
ID_123459 example=1(test) 1
ID_123460 example=1(try),2(test) 3
ID_123461 example=1(try),2(test),9(yum) 12
ID_123462 example=1(test) 1
ID_123463 example=1(try),7(test) 8
If want sum all numbers on column occ use Series.str.extractall, convert to integers with sum:
df['count'] = df['occ'].str.extractall('(\d+)')[0].astype(int).sum(level=0)
print (df)
ID occ count
0 ID_123456 example=1(abc) 1
1 ID_123457 example=1(def) 1
2 ID_123458 example=1(try) 1
3 ID_123459 example=1(try) 1
4 ID_123460 example=1(try),2(test) 3
5 ID_123461 example=1(try),2(test),9(yum) 12
6 ID_123462 example=1(try) 1
7 ID_123463 example=1(try),7(test) 8
Related
I want to create a dataframe with index of dates. But in one date there would be one record or more.
so I wanna create a dataframe like :
A B
2021-11-12 1 0 0
2 1 1
2021-11-13 1 0 0
2 1 0
3 0 1
so could I append any row with the same date into this dataframe, and the subindex would be auto-increased?
Or is there any other way to save records with the same date index in one dataframe?
Use:
#remove counter level
df = df.reset_index(level=1, drop=True)
#add new row
#your code
#correct add new row after last datetime
df = df.sort_index()
#add subindex
df = df.set_index(df.groupby(level=0).cumcount().add(1), append=True)
I have a large dataset (df) with lots of columns and I am trying to get the total number of each day.
|datetime|id|col3|col4|col...
1 |11-11-2020|7|col3|col4|col...
2 |10-11-2020|5|col3|col4|col...
3 |09-11-2020|5|col3|col4|col...
4 |10-11-2020|4|col3|col4|col...
5 |10-11-2020|4|col3|col4|col...
6 |07-11-2020|4|col3|col4|col...
I want my result to be something like this
|datetime|id|col3|col4|col...|Count
6 |07-11-2020|4|col3|col4|col...| 1
3 |5|col3|col4|col...| 1
2 |10-11-2020|5|col3|col4|col...| 1
4 |4|col3|col4|col...| 2
1 |11-11-2020|7|col3|col4|col...| 1
I tried to use resample like this df = df.groupby(['id','col3', pd.Grouper(key='datetime', freq='D')]).sum().reset_index() and this is my result. I am still new to programming and Pandas but I have read up on pandas docs and am still unable to do it.
|datetime|id|col3|col4|col...
6 |07-11-2020|4|col3|1|0.0
3 |07-11-2020|5|col3|1|0.0
2 |10-11-2020|5|col3|1|0.0
4 |10-11-2020|4|col3|2|0.0
1 |11-11-2020|7|col3|1|0.0
try this:
df = df.groupby(['datetime','id','col3']).count()
If you want the count values for all columns based only on the date, then:
df.groupby('datetime').count()
And you'll get a DataFrame who has the date time as the index and the column cells representing the number of entries for that given index.
I am trying to add an underscore and incremental numbers to any repeating values ordered by index and within a group that is defined by another column.
For example, I would like the repeating values in the Chemistry column to have underscores and incremental numbers ordered by index and grouped by the Cycle column.
df = pd.DataFrame([[1,1,1,1,1,1,2,2,2,2,2,2], ['NaOH', 'H20', 'MWS', 'H20', 'MWS', 'NaOh', 'NaOH', 'H20', 'MWS', 'H20', 'MWS', 'NaOh']]).transpose()
df.columns = ['Cycle', 'Chemistry']
df
Original Table
So the output will look like the table in the link below:
Desired output table
IIUC:
pandas.Series.str.cat and cumcount
df['Chemistry'] = df.Chemistry.str.cat(
df.groupby(['Cycle', 'Chemistry']).cumcount().add(1).astype(str),
sep='_'
)
df
Cycle Chemistry
0 1 NaOH_1
1 1 H20_1
2 1 MWS_1
3 1 H20_2
4 1 MWS_2
5 1 NaOh_1
6 2 NaOH_1
7 2 H20_1
8 2 MWS_1
9 2 H20_2
10 2 MWS_2
11 2 NaOH_2
I have a pandas DataFrame which contains information in columns which I would like to extract into a new column.
It is best explained visually:
df = pd.DataFrame({'Number Type 1':[1,2,np.nan],
'Number Type 2':[np.nan,3,4],
'Info':list('abc')})
The Table shows the initial DataFrame with Number Type 1 and NumberType 2 columns.
I would like to extract the types and create a new Type column, refactoring the DataFrame accordingly.
basically, Numbers are collapsed into the Number columns, and the types extracted into the Type column. The information in the Info column is bound to the numbers (f.e. 2 and 3 have the same information b)
What is the best way to do this in Pandas?
Use melt with dropna:
df = df.melt('Info', value_name='Number', var_name='Type').dropna(subset=['Number'])
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
4 b 2 3
5 c 2 4
Another solution with set_index and stack:
df = df.set_index('Info').stack().rename_axis(('Info','Type')).reset_index(name='Number')
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
2 b 2 3
3 c 2 4
I am trying to lookup string values in two dataframes and I am using Pandas library.
The first dataframe - df_transactions has a list of error codes in the column 'ErrList'
The second dataframe - df_action has a list of errors in one column 'CODE' and the corresponding error in the column 'ACTION'.
I am trying to compare the two strings from these dataframes as below:
ActionLookup_COL = []
ActionLookup = []
for index, transactions in df_transactions.iterrows():
errorList = transactions['ErrList']
for index, errorCode in df_action.iterrows():
eCode = errorCode['Code']
eAction = errorCode['Action']
if eCode ==errorList:
ActionLookup.append(eAction)
ActionLookup_COL.append(ActionLookup)
df_results['ActionLookup'] = pd.Series(shipmentActionLookup_COL, index=df_results.index)
When I print the dataframe df_results['ActionLookup'], I do not get the action code corresponding to the error code. Please let me know how can I compare the strings in these dataframes
Thanks for your time!
IIUC you need merge:
pd.merge(df_transactions, df_action, left_on='ErrList', right_on='Code')
Sample:
df_transactions = pd.DataFrame({'ErrList':['a','af','e','d'],
'col':[4,5,6,8]})
print (df_transactions)
ErrList col
0 a 4
1 af 5
2 e 6
3 d 8
df_action = pd.DataFrame({'Code':['a','af','u','m'],
'Action':[1,2,3,4]})
print (df_action)
Action Code
0 1 a
1 2 af
2 3 u
3 4 m
df_results = pd.merge(df_transactions, df_action, left_on='ErrList', right_on='Code')
print (df_results)
ErrList col Action Code
0 a 4 1 a
1 af 5 2 af
print (df_results['Action'])
ErrList col Action Code
0 a 4 1 a
1 af 5 2 af