How can I create dynamic column names in Pandas? - python

I have a list of numerical columns [NumColumns] and I want to Bin all of them but I also want to keep the original value in the data frame as well so I don't want the values simply replaced. How can I do that?
I tried this(which works to replace) -
data[NumColumns] = pd.cut(data[NumColumns], bins=[0,30,70,100], labels=["Low", "Mid", "High"])
Instead of replacing, I'm hoping it would add '_bin' to the end so I would end up with something like value_bin, revenue_bin, age_bin, etc..
I'm not sure how to do this as I have to declare this on the left-hand side. I'm wondering if there is a common way?

I think the simpliest is use f-strings:
for c in NumColumns:
data[f'{c}_bin'] = pd.cut(data[c], bins=[0,30,70,100], labels=["Low", "Mid", "High"])

Related

How can I change the number of int value to another number in a column?

The dataframe dataset has two columns 'Review' and 'Label' and dtypes of 'Label' is int.
I would like to change the number in the 'Label' column. So I tried to use replace() but it doesn't change well as you can see in the below picture.
A simple and quick solution(besides replace) would be to use a Series.map() method. You could define a dictionary with keys corresponding to the values you want to replace and values set to the new values you wish to have. Then, use an anonymous function(or normal one) to replace your values
d={1:0,2:0,4:1,5:1}
dataset['label']=dataset['label'].map(lambda x: d[x])
This will replace 1 and 2 with 0, and 4 and 5 with 1.
I am not sure what your criteria for "well" is, as the replace method will work for you and essentially achieve the same result(and is more optimized than map for replacement purposes).
What might be causing the issues is that replace has a default arg inplace=False. Thus, your results will not affect each other and you will have to combine them into dataset['label']=dataset['label'].replace([1,2,4,5],[0,0,1,1]) or dataset['label'].replace([1,2,4,5],[0,0,1,1],inplace=True)

Replace - to : in the dict.values to use it for delete slices in dataframe

i need to replace - to : in a dict. The reason is that i need this to delete automatically slices from a dataframe. I got this from a csv so i cant change the inputs.
My dict is looking like this
alpha = {'a':'12-15,20-25','b':'10-15,100-250'}
At the end i want a dict where i can get the name a and pick the slice for example 12:15 to delete this rows in a dataframe called a
for key in alpha:
alpha[key] = alpha[key].replace('-', ':')

Test Anova on multiple groups

I have the following dataframe:
I would like to use this code to compare the means between my entire dataframe:
F_statistic, pVal = stats.f_oneway(percentage_age_ss.iloc[:,0:1],
percentage_age_ss.iloc[:,1:2],
percentage_age_ss.iloc[:,2:3],
percentage_age_ss.iloc[:,3:4]) etc...
However, I don't want to use each time .iloc because it takes too much time. Do you I have another way to do it?
Thanks
get a list of columns using list comprehension, then use star syntax to expand it into the arglist:
stats.f_oneway(*(percentage_age_ss[col] for col in percentage_age_ss.columns))
or, just
stats.f_oneway(*(percentage_age_ss.T.values))

Replace Value in Dataframe Column based upon value in another column within the same dataframe

I have a pandas dataframe in which some rows didn't pull in correctly so that the values were pushed over into the next column over. Therefore I have a column that is mostly null, but has a few instances where there is a value that should go in the previous column. Below is an example of what it looks like.
enter image description here
I need to replace the 12345 and 45678 in the Approver column with JJones in the NeedtoDelete column.
I am not sure if a for loop, or a regular expression is the right way to go. I also came across the replace function, but I'm not sure how I would set that up in this scenario. Below is the code I have tried thus far (Q1Q2 is the df name):
for Q1Q2['Approver'] in Q1Q2:
Replacement = Q1Q2.loc[Q1Q2['Need to Delete'].notnull()]
Q1Q2.loc[Replacement] = Q1Q2['Approver']
Q1Q2.loc[Q1Q2['Need to Delete'].notnull(), ['Approver'] == Q1Q2['Need to Delete']]
If you could help me fix either attempts above, or point me in the right direction, it would be greatly appreciated. Thanks in advance!
You can use boolean indexing:
r=Q1Q2['Need to Delete'].notnull()
Q1Q2.loc[r,'Approver']=Q1Q2.loc[r,'Need to Delete']

How to convert Multilevel Dictionary with Irregular Data to Desired Format

Dict = {'Things' : {'Car':'Lambo', 'Home':'NatureVilla', 'Gadgets':{'Laptop':{'Programs':{'Data':'Excel', 'Officework': 'Word', 'Coding':{'Python':'PyCharm', 'Java':'Eclipse', 'Others': 'SublimeText'}, 'Wearables': 'SamsungGear', 'Smartphone': 'Nexus'}, 'clothes': 'ArmaaniSuit', 'Bags':'TravelBags'}}}}
d = {(i,j,k,l,m,n): Dict[i][j][k][l][m][n]
for i in Dict.keys()
for j in Dict[i].keys()
for k in Dict[j].keys()
for l in Dict[k].keys()
for m in Dict[l].keys()
for n in Dict[n].keys()
}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
print (df)
What I have already done:
I tried to Multiindex this Irregular Data using pandas but I am getting KeyError at 'Car'. Then I tried to handle exceptions and tried to PASS it but then it results in a Syntax Error. So May be I lost the direction. If there is any other module or way I can index this irregular data and put it in a table somehow. I have a chunk of raw data like this.
What I am trying to do:
I wanted to use this data for printing in QTableView which is from PyQt5 (Making a program with GUI).
Conditions:
This Data keeps on updating every hour from an API.
What I have thought till now:
May be I can append all this data to MySQL. But then when this data updates from API, only Values will change, rest of the KEYS will be the same. But then It will require more space.
References:
How to convert a 3-level dictionary to a desired format?
How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists
Any Help will be appreciated. Thanks for reading the question.
You data is not actually a 6-level dictionary like a dictionary in a 3-level example you referenced to. The difference is: your dictionary has a data on multiple different levels, e.g. 'Lambo' value is on second level of hierarchy with key ('Things','Car') but 'Eclipse' value is on sixth level of hierarchy with key ('Things','Gadgets','Laptop','Programs','Coding','Java')
If you want to 'flatten' your structure you will need to decide what to do with 'missed' key values for deeper levels for values like 'Lambo'.
Btw, maybe it is not actually a solution for your problem, maybe you need to use more appropriate UI widgets like TreeView to work with such kind of hierarchical data, but I will try to directly address your exact question.
Unfortunately it seems to be no easy way to reference all different level values uniformly in one simple dict or list comprehension statement.
Just look at your 'value extractor' (Dict[i][j][k][l][m][n]) there are no such values for i,j,k,l,m,n exists which allows you to get a 'Lambo'. Because to get a Lambo you will need to just use Dict['Things']['Car'] (ironically, in a real life it is also could be difficult to get a Lambo :-) )
One straightforward way to solve your task is:
extract a second level data, extract a third level data, and so on, and combine them together.
E.g. to extract second level values you can write something like this:
val_level2 = {(k1,k2):Dict[k1][k2]
for k1 in Dict
for k2 in Dict[k1]
if isinstance(Dict[k1],dict) and
not isinstance(Dict[k1][k2],dict)}
but if you want to combine it later with six level values, it will need to add some padding to your key tuples:
val_level2 = {(k1,k2,'','','',''):Dict[k1][k2]
for k1 in Dict
for k2 in Dict[k1]
if isinstance(Dict[k1],dict) and
not isinstance(Dict[k1][k2],dict)}
later you can combine all together by something like:
d = {}
d.update(val_level2)
d.update(val_level3)
But usually the most organic way to work with hierarchical data is to use some recursion, like this:
def flatten_dict(d,key_prefix,max_deep):
return [(tuple(key_prefix+[k]+['']*(max_deep-len(key_prefix))),v)
for k,v in d.items() if not isinstance(v,dict)] +\
sum([flatten_dict(v,key_prefix+[k],max_deep)
for k,v in d.items() if isinstance(v,dict)],[])
And later with code like this:
d={k:v for k,v in flatten_dict(Dict,[],5)}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
df.reset_index()
I actually get this result with your data:
P.S. According to https://www.python.org/dev/peps/pep-0008/#prescriptive-naming-conventions we prefer a lowercase_with_underscores for variable names, CapWords is for classes. So src_dict would be much better, than Dict in your case.
You information looks a lot like json and that's what the API is returning. If that's the case, and you are turning it into a dictionary, then you might me better off using python's json library or even panda's built it read_json format.
Pandas read json
Python's json

Categories

Resources