I have a pandas dataframe, in which in one column I have a list of hashtags. Now, I would like to delete all elements in that list expect the first one of each row.
Is there a way of doing this?
A simple way to do so:
df.hashtags = df.hashtags.map(lambda l: l[:1])
I have a list having Pandas Series objects, which I've created by doing something like this:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
where input_df is a Pandas Dataframe
I want to convert this list of Series objects back to Pandas Dataframe object, and was wondering if there is some easy way to do it
Based on the post you can do this by doing:
pd.DataFrame(li)
To everyone suggesting pd.concat, this is not a Series anymore. They are adding values to a list and the data type for li is a list. So to convert the list to dataframe then they should use pd.Dataframe(<list name>).
Since the right answer has got hidden in the comments, I thought it would be better to mention it as an answer:
pd.concat(li, axis=1).T
will convert the list li of Series to DataFrame
It seems that you wish to perform a customized melting of your dataframe.
Using the pandas library, you can do it with one line of code. I am creating below the example to replicate your problem:
import pandas as pd
input_df = pd.DataFrame(data={'1': [1,2,3,4,5]
,'2': [1,2,3,4,5]
,'3': [1,2,3,4,5]
,'4': [1,2,3,4,5]
,'5': [1,2,3,4,5]})
Using pd.DataFrame, you will be able to create your new dataframe that melts your two selected lists:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
new_df = pd.DataFrame(li)
if what you want is that those two lists present themselves under one column, I would not pass them as list to pass those list back to dataframe.
Instead, you can just append those two columns disregarding the column names of each of those columns.
new_df = input_df.iloc[0].append(input_df.iloc[4])
Let me know if this answers your question.
The answer already mentioned, but i would like to share my version:
li_df = pd.DataFrame(li).T
If you want each Series to be a row of the dataframe, you should not use concat() followed by T(), unless all your values are of the same datatype.
If your data has both numerical and string values, then the transpose() function will mangle the dtypes, likely turning them all to objects.
The right way to do this in general is:
Convert each series to a dict()
Pass the list of dicts either into the pd.Dataframe() constructor directly or use pd.Dataframe.from_dicts and set the orient keyword to "index."
In your case the following should work:
my_list_of_dicts = [s.to_dict() for s in li]
my_df = pd.Dataframe(my_list_of_dicts)
I search Pandas DataFrame by loc -for example like this
x = df.loc[df.index.isin(['one','two'])]
But I need only the first row of the result. If I use
x = df.loc[df.index.isin(['one','two'])].iloc[0]
I get error in the case that no row is found. Of course, I can select all the rows (the first example) and then check if result is empty or not. But I seek some more efficient way (the dataframe can be long). Is there any?
pandas.Index.duplicated
The pandas.Index object has a duplicated method that identifies all repeated values after the first occurance.
x[~x.index.duplicated()]
If you wanted to ...
df[df.index.isin(['one', 'two']) & ~df.index.duplicated()]
I have a large string array which i store as an nparray named np_base: np.shape(np_base)
Out[32]: (65000000, 1)
what i intend to do is to vertically slice the array in order to decompose it into multiple columns that i'll store later in an independant way, so i tried to loop over the row indexes and to append:
for i in range(65000000):
INCDN.append(np.base[i, 0][0:5])
but this trhows out a memory error.
Could anybody please help me out with this issue, i've been searching for days for an alternative way to slice the string array.
Thanks,
There are many ways to apply a function to a numpy array one of which is the following:
np_truncated = np.vectorize(lambda x: x[:5])(np_base)
Your approach with iterativly appending a list is usally the least perfomed solution in most contexts.
Alternatively, if you intent to work with many columns, you might want to use pandas.
import pandas as pd
df = pd.DataFrame(np_base, columns=["Raw"])
truncated = df.Raw.str.slice(0,5)
So, I have a list with tuples, and a multi-index dataframe. I want to find the rows of the dataframe whose indices are NOT included in the list of tuples, and create a new dataframe with these elements. Any help? Thanx!
You can use isin with a negation to explicitly filter your DataFrame:
new_df = df[~df.index.isin(list_of_tuples)]
Alternatively, use drop to remove the tuples you don't want to be included in the new DataFrame.
new_df = df.drop(list_of_tuples)
From a couple simple tests, using isin appears to be faster, although drop is a bit more readable.