How to name subsets of a dataframe inside a loop

How to name subsets of a dataframe inside a loop - python

I'm having trouble naming the subsets I create inside a loop. I want to give each one the five first letters of the condition (or even just the iteration number) as a name but I haven't figured out how to.
Here's my code
list_mun=list(ensud21.NOM_MUN.unique())
for mun in list_mun:
name=ensud21[ensud21['NOM_MUN']== mun]
list_mun is a list with the unique values that a column of my dataframe can take. Inside the for loop I wrote name where I want what I explained before. I am unable to give each dataframe a different name. Thankyou!

You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here:
list_mun=list(ensud21.NOM_MUN.unique())
out_dic = {}
for mun in list_mun:
# here we set "mun" as key
out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun]
Then subsets with:
out_dic[the_mun_you_want]

Related

I need my values to not repeat their names in their categories

I am not sure how to fix this. This is the code I want, but I do not want it to continuously repeat the names of the rows in the output.

I'd suggest a few changes to your code.
Firstly, to answer your question, you can remove the multiple occurences of the words by using:
select_merch = d.loc[df['Category] == 'Merchandise'].sum()['Cost]
This will make sure to select only the sum of the Cost column for a particular dataframe. Also this code is very redundant and confusing. What you can do is also create a list and iterate over it for each category.
list(df['Category'].unique()) will give you a list of all the unqiue categories. Store it in a list and then iterate over it. Plus, you don't need to do a d=pd.Dataframe(df) everytime, you can use df itself as well.

Splitting a DataFrame to filtered "sub - datasets"

So I have a DataFrame with several columns, some contain objects (string) and some are numerical.
I'd like to create new dataframes which are "filtered" to the combination of the objects available.
To be clear, those are my object type columns:
Index(['OS', 'Device', 'Design',
'Language'],
dtype='object')
["Design"] and ["Language"] have 3 options each.
I filtered ["OS"] and ["Device"] manually as I needed to match them.
However, now I want to create multiple variables each contains a "filtered" dataframe.
For example:
I have
"android_fltr1_d1" to represent the next filter:
["OS"]=android, ["Device"]=1,["Design"]=1
and "android_fltr3_d2" to represent:
["OS"]=android, ["Device"]=3,["Design"]=2
I tried the next code (which works perfectly fine).
android_fltr1_d1 = android_fltr1[android_fltr1["Design"]==1].drop(["Design"],axis=1)
android_fltr1_d2 = android_fltr1[android_fltr1["Design"]==2].drop(["Design"],axis=1)
android_fltr1_d3 = android_fltr1[android_fltr1["Design"]==3].drop(["Design"],axis=1)
android_fltr3_d1 = android_fltr3[android_fltr3["Design"]==1].drop(["Design"],axis=1)
android_fltr3_d2 = android_fltr3[android_fltr3["Design"]==2].drop(["Design"],axis=1)
android_fltr3_d3 = android_fltr3[android_fltr3["Design"]==3].drop(["Design"],axis=1)
android_fltr5_d1 = android_fltr5[android_fltr5["Design"]==1].drop(["Design"],axis=1)
android_fltr5_d2 = android_fltr5[android_fltr5["Design"]==2].drop(["Design"],axis=1)
android_fltr5_d3 = android_fltr5[android_fltr5["Design"]==3].drop(["Design"],axis=1)
As you can guess, I don't find it efficient and would like to use a for loop to generate those variables (as I'd need to match each ["Language"] option to each filter I created. Total of 60~ variables).
Thought about using something similar to .format() in the loop in order to be some kind of a "place-holder", couldn't find a way to do it.
It would be probably the best to use a nested loop to create all the variables, though I'd be content even with a single loop for each column.
I find it difficult to build the for loop to execute it and would be grateful for any help or directions.
Thanks!
As suggested I tried to find my answer in:How do I create variable variables?
Yet I failed to understand how I use the globals() function in my case. I also found that using '%' is not working anymore.

How to 'zip' and 'unzip' dataframes

I am working in a project where I have different dataframes.
Basically, I have a function that returns 10 dataframes.
I would like to know if would be possible to my function to return all the 10 frames but just in one variable (here my concept of zip).
And then I would take this variable (with the 10 dataframes) and I would pass it to another function, and inside that function I would need to extract all those dataframes to use them.
I can put everything in a list and return it as only one variable, and pass it to second function, but then I would need to access the dataframes by the indices of the list.
What I want is to extract all of them inside the second fuction, without the need to do a loop on each element of the list.

name = ["Mary","John","Alex","Maria","Xavi"]
age = [30,24,29,40,39]
result = list(zip(name,age))
print(result)

issue with Storing dictionary with key and multi values for key in python

I have a loop on utterances(list of statements)
predicted_utt_label=defaultdict(list)
for utt in test_utterances:
#here some code to detect label for each statement
Now I want to add utterance and its label in a dictionary and write the code here:
predicted_utt_label[utt]=DA
But there is a problem with the statement like 'I don't know' and 'I don't know' with different labels like 'sd' and 'qy'. As both statements have different labels then how I can store them as statement key of dictionary and labels as multi value of that key? Because dictionary always have unique keys.

Using a list as a key to each value will allow you to store multiple values to each key. Use a for loop to find the key needed and use another loop or even just a list comprehension to add the new value to this list of values. Let us know if this is of any help

How can I specify a CellRange of variable length in DataNitro?

I am creating a script and part of it requires a list of names from a cell range to be stored as a list. I need the list to store as many names as are added to the cellrange however it must not store the values of empty cells.
If I simply use a longer range than is necessary like so:
names = CellRange("C10:C99999").value
then my final script will iterate through all the empty values which is extremely inefficient.

After quite some searching through the DataNitro documentation I found .vertical property which "returns the values of the cells starting with the cell it’s called from, and ending in the last non-empty cell in the same column."
So in my example this would mean:
names = Cell("C10").vertical

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to name subsets of a dataframe inside a loop - python

You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here: list_mun=list(ensud21.NOM_MUN.unique()) out_dic = {} for mun in list_mun: # here we set "mun" as key out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun] Then subsets with: out_dic[the_mun_you_want]

Related

I need my values to not repeat their names in their categories

Splitting a DataFrame to filtered "sub - datasets"

How to 'zip' and 'unzip' dataframes

issue with Storing dictionary with key and multi values for key in python

How can I specify a CellRange of variable length in DataNitro?

Categories

Resources