Split column into multiple column based on string condition - python

This is how the data set looks like: image
I have a column for 4000's values which contain different values in that column such as shown in the image.
I want to split the dataset based on the string compare. My ultimate goal is get all the values of W_LD(1) to W_LD(57) to put in one column, similarly others such as R_LD(1) to R_LD(32) into different column and so on.
I am creating a dataframe and try to match the string if the string matches particular value then all the values should go into different column.
df=pd.DataFrame(data)
str_x = df.Device_names[56]
def my_split(df):
return pd.Series
({'W_LD': [i for i in df.Device_names if str_x == "^W_LD(57)"] })
df.apply(my_split, axis=1)

Related

Combing Rows in a Single Dataframe

I have a dataframe that looks like this, where there is a new row per ID if one of the following columns has a value. I'm trying to combine on the ID, and just consolidate all of the remaining columns. I've tried every groupby/agg combination and can't get the right output. There are no conflicting column values. So for instance if ID "1" has an email value in row 0, the remaining rows will be empty in the column. So I just need it to sum/consolidate, not concatenate or anything.
my current dataframe:
the output i'm looking to achieve:
# fill Nones in string columns with empty string
df[['email', 'status']] = df[['email', 'status']].fillna('')
df = df.groupby('id').agg('max')
If you still want the index as you shown in desired output,
df = df.reset_index(drop=False)

Creating a new dataframe based on rows of an existing dataframe which contain only specific characters

In Python I am trying to create a new dataframe by appending all rows which do not contain certain
charachters in a certain column of another dataframe. Afterwards I want the generated list containing the results into a dataframe.
However, this result only contains a one column dataframe and does not include all the columns of the first dataframe (which do not contain those characters, which is what I need).
Does anybody have a suggestion on how to add all the columns to a new dataframe?
%%time
newlist = []
for row in old_dataframe['column']:
if row != (r'^[^\s]') :
newlist.append(row)

Python Compare the last two non-null values in dataframe column

I have a dataframe with columns that are a string of blanks (null/nan set to 0) with sporadic number values.
I am tying to compare the last two non-zero values in a data frame column.
Something like :
df['Column_c'] = df[column_a'].last_non_zero_value > df[column_a'].second_to_last_non_zero_value
This is what the columns look like in excel
You could drop all the rows with missing data using pd.df.dropna() and then access the last row in the dataframe index and have it return the values as an array which should be easy to find the last two elements in.

Pandas - Find a column with a specific value in the entire dataframe

I have a DataFrame which has a few columns. There is a column with a value that only appears once in the entire dataframe. I want to write a function that returns the column name of the column with that specific value. I can manually find which column it is with the usual data exploration, but since I have multiple dataframes with the same properties, I need to be able to find that column for multiple dataframes. So a somewhat generalized function would be of better use.
The problem is that I don't know beforehand which column is the one I am looking for since in every dataframe the position of that particular column with that particular value is different. Also the desired columns in different dataframes have different names, so I cannot use something like df['my_column'] to extract the column.
Thanks
You'll need to iterate columns and look for the value:
def find_col_with_value(df, value):
for col in df:
if (df[col] == value).any():
return col
This will return the name of the first column that contains value. If value does not exist, it will return None.
Check the entire DataFrame for the specific value, checking any to see if it ever appears in a column, then slice the columns (or the DataFrame if you want the Series)
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.normal(0, 5, (100, 200)),
columns=[chr(i+40) for i in range(200)])
df.loc[5, 'Y'] = 'secret_value' # Secret value in column 'Y'
df.eq('secret_value').any().loc[lambda x: x].index
# or
df.columns[df.eq('secret_value').any()]
Index(['Y'], dtype='object')
I have another solution:
names = ds.columns
for i in names:
for j in ds[i]:
if j == 'your_value':
print(i)
break
Here you are collecting all the names of columns and then iterating all dataset while it will be found. Then print the name of column.

Extract text enclosed between a delimiter and store it as a list in a separate column

I have a Panda dataframe with a text column in the format below. There are some values/text meshed in between ##. I want to find such text which are present between ## and extract them in a separate column as a list.
##fare_curr.currency####based_fare_90d.price##
htt://www.abcd.lol/abcd-Search?from:##based_best_flight_fare_90d.air##,to:##mbased_90d.water##,departure:##mbased_90d.date_1##TANYT&pas=ch:0Y&mode=search
Consider the above two strings to be two rows of the same column. I want to get a new column with a list [fare_curr.currency, based_fare_90d.price] in the first row and [based_best_flight_fare_90d.air, mbased_90d.water, based_90d.date_1] in the second row.
Given this df
df = pd.DataFrame({'data':
['##fare_curr.currency####based_fare_90d.price##',
'htt://www.abcd.lol/abcd-Search?\ from:##based_best_flight_fare_90d.air##,to:##mbased_90d.water##,departure:#
#mbased_90d.date_1##TANYT&pas=ch:0Y&mode=search']})
You can get desired result in a new column using
df['new'] = pd.Series(df.data.str.extractall('##(.*?)##').unstack().values.tolist())
You get
data new
0 ##fare_curr.currency####based_fare_90d.price## [fare_curr.currency, based_fare_90d.price, None]
1 htt://www.abcd.lol/abcd-Search?from:##based_be... [based_best_flight_fare_90d.air, mbased_90d.wa...

Categories

Resources