I have a pandas dataframe and I have to fill a new column based on the values of an existing column, associating the values of a dictionary.
mydict={'key1':'val1', 'key2':'val2'}
df['new_col']=df['keys'].map(mydict)
Now I have a similar problem, but the dictionary is now a defaultdict(list)
my_defdict=defaultdict(list)
my_defdict={'key1':['val1','item1'], 'key2':['val2','item2']}
and I need a new column with the second element of the list, something like
df['new_col2']=df['keys'].map(my_defdict()[1])
which is of course wrong. How can I perform this operation without creating another normal dictionary?
Assuming all your values have at least two items per list, add an str[1] at the end:
df['new_col2'] = df['keys'].map(my_defdict).str[1]
Or,
df['new_col2'] = df['keys'].map(my_defdict).str.get(1)
Related
I have a pandas dataframe, in which in one column I have a list of hashtags. Now, I would like to delete all elements in that list expect the first one of each row.
Is there a way of doing this?
A simple way to do so:
df.hashtags = df.hashtags.map(lambda l: l[:1])
I have following table:
I want to create new data frame or column in same data frame where unique values are listed. e.g.
I used following code:
data.groupby('EMAIL')['Classification'].transform('nunique')
But it is giving me number of unique values (for CLASSIFICATION, it is 2),
However I want to note down value in list format. So that at the end i will remove duplicate rows and put single row for each unique email id. Please advise on this.
Thanks!
For performance use set for unique values and pass to lambda function in GroupBy.agg, order should be different like original:
df = data.groupby('EMAIL').agg(lambda x: ','.join(set(x))).reset_index()
For same order like original use dictionary trick:
f = ','.join(dict.fromkeys(x).keys())
df = data.groupby('EMAIL').agg(f).reset_index()
Use df.groupby(as_index=False) with df.groupby.agg:
data.groupby('EMAIL',as_index=False).agg(lambda x: ','.join(x.unique()))
I am trying to insert or add from one dataframe to another dataframe. I am going through the original dataframe looking for certain words in one column. When I find one of these terms I want to add that row to a new dataframe.
I get the row by using.
entry = df.loc[df['A'] == item]
But when trying to add this row to another dataframe using .add, .insert, .update or other methods i just get an empty dataframe.
I have also tried adding the column to a dictionary and turning that into a dataframe but it writes data for the entire row rather than just the column value. So is there a way to add one specific row to a new dataframe from my existing variable ?
So the entry is a dataframe containing the rows you want to add?
you can simply concatenate two dataframe using concat function if both have the same columns' name
import pandas as pd
entry = df.loc[df['A'] == item]
concat_df = pd.concat([new_df,entry])
pandas.concat reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
The append function expect a list of rows in this formation:
[row_1, row_2, ..., row_N]
While each row is a list, representing the value for each columns
So, assuming your trying to add one row, you shuld use:
entry = df.loc[df['A'] == item]
df2=df2.append( [entry] )
Notice that unlike python's list, the DataFrame.append function returning a new object and not changing the object called it.
See also enter link description here
Not sure how large your operations will be, but from an efficiency standpoint, you're better off adding all of the found rows to a list, and then concatenating them together at once using pandas.concat, and then using concat again to combine the found entries dataframe with the "insert into" dataframe. This will be much faster than using concat each time. If you're searching from a list of items search_keys, then something like:
entries = []
for i in search_keys:
entry = df.loc[df['A'] == item]
entries.append(entry)
found_df = pd.concat(entries)
result_df = pd.concat([old_df, found_df])
Let's say I have a list of objects (in this instance, dataframes)
myList = [dataframe1, dataframe2, dataframe3 ...]
I want to loop over my list and create new objects based on the names of the list items. What I want is a pivoted version of each dataframe, called "dataframe[X]_pivot" where [X] is the identifier for that dataframe.
My pseudocode looks something like:
for d in myList:
d+'_pivot' = d.pivot_table(index='columnA', values=['columnB'], aggfunc=np.sum)
And my desired output looks like this:
myList = [dataframe1, dataframe2 ...]
dataframe1_pivoted # contains a pivoted version of dataframe1
dataframe2_pivoted # contains a pivoted version of dataframe2
dataframe3_pivoted # contains a pivoted version of dataframe3
Help would be much appreciated.
Thanks
John
You do not want to do that. Creating a variables dynamically is almost always a very bad idea. The correct thing to do would be to simply use an appropriate data structure to hold your data, e.g. either a list (as your elements are all just numbered, you can just as well access them via an index) or a dictionary (if you really really want to give a name to each individual thing):
pivoted_list = []
for df in mylist:
pivoted_df = #whatever you need to to to turn a dataframe into a pivoted one
pivoted_list.append(pivoted_df)
#now access your results by index
do_something(pivoted_list[0])
do_something(pivoted_list[1])
The same thing can be expressed as a list comprehension. Assume pivot is a function that takes a dataframe and turns it into a pivoted frame, then this is equivalent to the loop above:
pivoted_list = [pivot(df) for df in mylist]
If you are certain that you want to have names for the elements, you can create a dictionary, by using enumerate like this:
pivoted_dict = {}
for index, df in enumerate(mylist):
pivoted_df = #whatever you need to to to turn a dataframe into a pivoted one
dfname = "dataframe{}_pivoted".format(index + 1)
pivoted_dict[dfname] = pivoted_df
#access results by name
do_something(pivoted_dict["dataframe1_pivoted"])
do_something(pivoted_dict["dataframe2_pivoted"])
The way to achieve that is:
globals()[d+'_pivot'] = d.pivot_table(...)
[edit] after looking at your edit, I see that you may want to do something like this:
for i, d in enumerate(myList):
globals()['dataframe%d_pivoted' % i] = d.pivot_table(...)
However, as others have suggested, it is unadvisable to do so if that is going to create lots of global variables.
There are better ways (read: data structures) to do so.
I'm new to pandas and python, and could definitely use some help.
I have the code below, which almost does what I want. It creates dummy variables for the unique values in a field and indexes them by the unique combinations of the unique values in two other fields.
What I would like is only one row for each unique combination of the fields used for the index. Right now I get multiple rows for say 'asset subs end dt' = 10/30/2008 and 'reseller csn' = 55008 if the dummy variable comes up 3 times. I would rather have one row for the combination of index field values with a 3 in the dummy variable column.
Code:
df = data
df = df.set_index(['ASSET_SUBS_END_DT','RESELLER_CSN'])
Dummies=pd.get_dummies(df['EXPERTISE'])
something like:
df.groupby(level=[0, 1]).EXPERTISE.count()
when you do this groupby, everything with the same index is grouped together. assuming your data in EXPERTISE is notnull, you will get a new DataFrame returned with unique index values and the count per each index. try it out for yourself, play around with the results, and see how it can be combined with your existing DataFrame to get the final result you want.