Get column value of row with condition based on another column

Get column value of row with condition based on another column - python

I keep writing code like this, when I want a specific column in a row, where I select the row first based on another column.
my_col = "Value I am looking for"
df.loc[df["Primary Key"] == "blablablublablablalblalblallaaaabblablalblabla"].iloc[0][
my_col
]
I don't know why, but it seems weird. Is there a more beautiful solution to this?

It would be helpful with a complete minimally working example, since it is not clear what your data structure looks like. You could use the example given here:
import pandas as pd
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
index=['cobra', 'viper', 'sidewinder'],
columns=['max_speed', 'shield'])
If you are then trying to e.g. select the viper-row based on its max_speed, and then obtain its shield-value like so:
my_col = "shield"
df.loc[df["max_speed"] == 4].iloc[0][my_col]
then I guess that is the way to do that - not a lot of fat in that command.

Related

How to combine multiple rows into a single row with many columns in pandas using an id (clustering multiple records with same id into one record)

Situation:
1. all_task_usage_10_19
all_task_usage_10_19 is the file which consists of 29229472 rows × 20 columns.
There are multiple rows with the same ID inside the column machine_ID with different values in other columns.
Columns:
'start_time_of_the_measurement_period','end_time_of_the_measurement_period', 'job_ID', 'task_index','machine_ID', 'mean_CPU_usage_rate','canonical_memory_usage', 'assigned_memory_usage','unmapped_page_cache_memory_usage', 'total_page_cache_memory_usage', 'maximum_memory_usage','mean_disk_I/O_time', 'mean_local_disk_space_used', 'maximum_CPU_usage','maximum_disk_IO_time', 'cycles_per_instruction_(CPI)', 'memory_accesses_per_instruction_(MAI)', 'sample_portion',
'aggregation_type', 'sampled_CPU_usage'
2. clustering code
I am trying to cluster multiple machine_ID records using the following code, referencing: How to combine multiple rows into a single row with pandas
3. Output
Output displayed using: with option_context as it allows to better visualise the content
My Aim:
I am trying to cluster multiple rows with the same machine_ID into a single record, so I can apply algorithms like Moving averages, LSTM and HW for predicting cloud workloads.
Something like this.

Maybe a Multi-Index is what you're looking for?
df.set_index(['machine_ID', df.index])
Note that by default set_index returns a new dataframe, and does not change the original.
To change the original (and return None) you can pass an argument inplace=True.
Example:
df = pd.DataFrame({'machine_ID': [1, 1, 2, 2, 3],
'a': [1, 2, 3, 4, 5],
'b': [10, 20, 30, 40, 50]})
new_df = df.set_index(['machine_ID', df.index]) # not in-place
df.set_index(['machine_ID', df.index], inplace=True) # in-place
For me, it does create a multi-index: first level is 'machine_ID', second one is the previous range index:

The below code worked for me:
all_task_usage_10_19.groupby('machine_ID')[['start_time_of_the_measurement_period','end_time_of_the_measurement_period','job_ID', 'task_index','mean_CPU_usage_rate', 'canonical_memory_usage',
'assigned_memory_usage', 'unmapped_page_cache_memory_usage', 'total_page_cache_memory_usage', 'maximum_memory_usage',
'mean_disk_I/O_time', 'mean_local_disk_space_used','maximum_CPU_usage',
'maximum_disk_IO_time', 'cycles_per_instruction_(CPI)',
'memory_accesses_per_instruction_(MAI)', 'sample_portion',
'aggregation_type', 'sampled_CPU_usage']].agg(list).reset_index()

Split one pandas dataframe column into several so that they are binary

I have a problem and would like to split a column of a pandas dataframe into several columns, where, however, only binary numbers are written (0 or 1), so that I can continue working with them. The problem is as follows ( very simplified):
df = pd.DataFrame({'c1': [1, 2, 3, 4, 5, 6],
'c2': [0, "A", "B", "A, B", "B", "C"]},
columns=['c1', 'c2'])
However, I cannot work with the second column in this way and would therefore like to divide c2 into several columns so that there are the following columns: c2_A, c2_B and c2_C. If, for example, the second column contains "A", then column c2_A should contain a 1 and otherwise a 0. If, for example, as in the fourth column, "A,B" is written, then there should be a 1 in both c2_A and c2_B and a 0 in c2_C.
I have already tried a lot and tried it with if/ else, for example, but then failed at the latest when there is more than one letter in a row (e.g. "A,B").
It would be great if someone could help me, because I'm really running out of ideas.
PS: I am also rather a Python newbie.
Thanks in advance!

Finding only one world from pandas column [duplicate]

I want to filter a pandas data frame based on exact match of a string.
I have a data frame as below
df1 = pd.DataFrame({'vals': [1, 2, 3, 4,5], 'ids': [u'aball', u'bball', u'cnut', u'fball','aballl']})
I want to filter all the rows except the row that has 'aball'.As you can see I have one more entry with ids == 'aballl'. I want that filterd out. Hence the below code does not work:
df1[df1['ids'].str.contains("aball")]
even str.match does not work
df1[df1['ids'].str.match("aball")]
Any help would be greatly appreciated.

Keeping it simple, this should work:
df1[df1['ids'] == "aball"]

You can try this:
df1[~(df1['ids'] == "aball")]
Essentially it will find all entries matching "aball" and then negate it.

Check each values on Column a column with another column values

Is there any way in Excel or in DAX i can check if all the values of a single column exist or don't on another column.
Example - I have a column called Column 1 where i have some values, like 4,5,2,1. now i want to check how many of those values exists on Column 2 !
As an Output, i expected it can Go Green if the value exists else Red.
I have looked in a lot of place but the only useful result i have found where i can find for a sngle value, not for all the values in a single column.
Do anyone knows any way of doing this work !

Since you mention Python, this is possible programmatically with the Pandas library:
import pandas as pd
# define dataframe, or read in via df = pd.read_excel('file.xlsx')
df = pd.DataFrame({'col1': [4, 5, 2, 1] + [np.nan]*4,
'col2': [6, 8, 3, 4, 1, 6, 3, 4]})
# define highlighting logic
def highlight_cols(x):
res = []
for i in x:
if np.isnan(i):
res.append('')
elif i in set(df['col2']):
res.append('background: green')
else:
res.append('background: red')
return res
# apply highlighting logic to first column only
res = df.style.apply(highlight_cols, subset=pd.IndexSlice[:, ['col1']])
Result:

Create a (optionally hidden) column that will be adjacent to your search column (in my example that will be column C to column B)
=IF(ISERROR(VLOOKUP(B1,$A$1:$A$4, 1, 0)), FALSE, TRUE)
This will determine, if the value is contained within the first data-list (returns true if it is)
And then just use simple conditional formatting
Provides the result as expected:

You can do this easily without adding hidden columns as below. This will updated anytime if you change numbers in column A.
Select column B
Conditional Formatting -> New Rule -> Use a formula to determine which cells to format
insert formula as =OR(B2=$A$2,B2=$A$3,B2=$A$4,B2=$A$5) = TRUE and format cell as your wish (here in Green)
Repeat steps 1 to 2
insert formula as =OR(B2=$A$2,B2=$A$3,B2=$A$4,B2=$A$5) = FASLE and format cells as your wish (here in Red)
Select the column name cell (To remove column heading formatting)
Conditional Formatting -> Clear Rule -> Clear Rules from selected cells

Filter Pandas Data Frame based on exact string match

I want to filter a pandas data frame based on exact match of a string.
I have a data frame as below
df1 = pd.DataFrame({'vals': [1, 2, 3, 4,5], 'ids': [u'aball', u'bball', u'cnut', u'fball','aballl']})
I want to filter all the rows except the row that has 'aball'.As you can see I have one more entry with ids == 'aballl'. I want that filterd out. Hence the below code does not work:
df1[df1['ids'].str.contains("aball")]
even str.match does not work
df1[df1['ids'].str.match("aball")]
Any help would be greatly appreciated.

Keeping it simple, this should work:
df1[df1['ids'] == "aball"]

You can try this:
df1[~(df1['ids'] == "aball")]
Essentially it will find all entries matching "aball" and then negate it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get column value of row with condition based on another column - python

Related

How to combine multiple rows into a single row with many columns in pandas using an id (clustering multiple records with same id into one record)

Split one pandas dataframe column into several so that they are binary

Finding only one world from pandas column [duplicate]

Check each values on Column a column with another column values

Filter Pandas Data Frame based on exact string match

Categories

Resources