I have a Pandas dataframe and I need to strip out components.schema.Person.properties and just call it id.
column
data_type
data_description
components.schemas.Person.properties.id
string
Unique Mongo Id generated for the person.
Like this?
df['column'] = df['column'].apply(lambda x: x.split('.')[-1])
or more compact solution by #Chris Adams:
df['column'].str.split('.').str[-1]
Related
I carefully read the post Select by partial string from a pandas DataFrame but I think it doesn't address my problem. I need to filter rows of a dataframe if a row value can be found within the string.
Example, my table is:
Part_Number
A1127
A1347
I want to filter records if column value is within the string ZA1127B.48. The filtered dataframe should contain row 1. (All the posts show how to check if row value contains a string.)
You can use .apply + in operator:
s = "ZA1127B.48"
print(df[df.apply(lambda x: x.Part_Number in s, axis=1)])
Prints:
Part_Number
0 A1127
I think using the apply function will help you to do what you want.
Try this line:
df[df["Part_Number"].apply(lambda x: x in "ZA1127B.48")]
I am dealing with DNA sequencing data, and each column looks something like "ACCGTGC". I would like to transform this into several columns, where each column contains only one char. How to do this in Python pandas?
For performance convert values to lists and pass to DataFrame constructor:
df1 = pd.DataFrame([list(x) for x in df['col']], index=df.index)
If need add to original:
df = df.join(df1)
I have three types of columns in my dataframe: numeric, string and datetime.
I need to add the element | to the end of every value as a separator
I have tried:
df['column'] = (df['column']+ '|')
but it does not work for the datetime columns and I have to add .astype(str) to the numeric columns which may result in formatting issues later.
Any other suggestions?
you can use DataFrame.to_csv() with sep="|", if you want to create a csv.
further documentation :
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Not too sure why you would want to do this but if you want to make a CSV file with | as the delimiter, you can set that in the df.to_csv('out.csv', sep='|') method. I think a cleaner way of doing this would be to use a lambda function:
df['column'] = df['column'].apply(lambda x: f"{x}|")
You will always have to add .astype(str) though...
This may help you in this case:
df['column'] = df['column'].astype(str) + "|"
As picture shown, how would you slice or extract 'id' from the 'user' column?
df['id'] = df['user'].apply(lambda x: x['id'])
This should work
The column id will contain the ids
The user column looks like JSON. Try json.loads(df['user'])['id'].
I have a data frame where all the columns are supposed to be numbers. While reading it, some of them were read with commas. I know a single column can be fixed by
df['x']=df['x'].str.replace(',','')
However, this works only for series objects and not for entire data frame. Is there an elegant way to apply it to entire data frame since every single entry in the data frame should be a number.
P.S: To ensure I can str.replace, I have first converted the data frame to str by using
df.astype('str')
So I understand, I will have to convert them all to numeric once the comma is removed.
Numeric columns have no ,, so converting to strings is not necessary, only use DataFrame.replace with regex=True for substrings replacement:
df = df.replace(',','', regex=True)
Or:
df.replace(',','', regex=True, inplace=True)
And last convert strings columns to numeric, thank you #anki_91:
c = df.select_dtypes(object).columns
df[c] = df[c].apply(pd.to_numeric,errors='coerce')
Well, you can simplely do:
df = df.apply(lambda x: x.str.replace(',', ''))
Hope it helps!
In case you want to manipulate just one column:
df.column_name = df.column_name.apply(lambda x : x.replace(',',''))