How to 'zip' and 'unzip' dataframes - python

I am working in a project where I have different dataframes.
Basically, I have a function that returns 10 dataframes.
I would like to know if would be possible to my function to return all the 10 frames but just in one variable (here my concept of zip).
And then I would take this variable (with the 10 dataframes) and I would pass it to another function, and inside that function I would need to extract all those dataframes to use them.
I can put everything in a list and return it as only one variable, and pass it to second function, but then I would need to access the dataframes by the indices of the list.
What I want is to extract all of them inside the second fuction, without the need to do a loop on each element of the list.

name = ["Mary","John","Alex","Maria","Xavi"]
age = [30,24,29,40,39]
result = list(zip(name,age))
print(result)

Related

Use a list with function names to iteratively apply over a dataframe column

Context: I'm allowing a user to add specific methods for a cleaning process pipeline (appended to a main list with all the methods chosen). Each element from this list is the name of a function.
My quesiton is:
Why does this work:
dataframe[cleanedCol] =dataframe[colToClean].apply(replace_contractions).apply(remove_links).apply(remove_emails)
But something like this doesn't?
pipeline = ['replace_contractions','remove_links','remove_emails']
for method in pipeline:
dataframe[cleanedColumn] = dataframe[columnToClean].apply(method)
How could I iteratively apply each one of the methods from the list (by the order they are in the list) to the dataframe column?
Thank you in advance!
You would either have to convert those strings to actual function objects or even better just store the function objects instead of the names as strings
pipeline = [replace_contractions, remove_links, remove_emails]
for method in pipeline:
dataframe[cleanedColumn] = dataframe[columnToClean].apply(method)

How to name subsets of a dataframe inside a loop

I'm having trouble naming the subsets I create inside a loop. I want to give each one the five first letters of the condition (or even just the iteration number) as a name but I haven't figured out how to.
Here's my code
list_mun=list(ensud21.NOM_MUN.unique())
for mun in list_mun:
name=ensud21[ensud21['NOM_MUN']== mun]
list_mun is a list with the unique values that a column of my dataframe can take. Inside the for loop I wrote name where I want what I explained before. I am unable to give each dataframe a different name. Thankyou!
You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here:
list_mun=list(ensud21.NOM_MUN.unique())
out_dic = {}
for mun in list_mun:
# here we set "mun" as key
out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun]
Then subsets with:
out_dic[the_mun_you_want]

How do I return a dataframe from a function within a function?

I'm trying to write a function within a function; the outer function asks the user what dataset they wish to consolidate and the inner function consolidates datasets enclosed in a particular list. The list chosen is determined by the user's input.
def ConsolidateData():
def AppendData(dataList):
main_df = pd.read_excel(dataList[0])
for f in dataList[1:]:
df = pd.read_excel(f)
main_df.append(df)
return main_df
consolInput = input('What dataset do you want to consolidate?: ')
if 'Purchase Invoice' in consolInput:
return AppendData(PI_data)
else:
print("Nada")
AppendData takes the first filename in a list, creates a dataframe and then appends the remaining files in the list to the created dataframe.
PI_data is one of the lists containing the names of 3 or 4 files. I plan to have other lists of filenames as well and have those be part of the 'if' statements that come after the consolInput code.
When I run call the outer function and try to print the main_df after the function is run, I get an 'Undefined name main_df' error.
I've tried placing the AppendData function outside the ConsolidateData function, defining it before and calling it within the ConsolidateData function, but that hasn't worked either.
I'm aware this is an issue of scope, but I can't figure out what I'm doing wrong/how to solve the issue. Thanks in advance for any help.

Splitting a DataFrame to filtered "sub - datasets"

So I have a DataFrame with several columns, some contain objects (string) and some are numerical.
I'd like to create new dataframes which are "filtered" to the combination of the objects available.
To be clear, those are my object type columns:
Index(['OS', 'Device', 'Design',
'Language'],
dtype='object')
["Design"] and ["Language"] have 3 options each.
I filtered ["OS"] and ["Device"] manually as I needed to match them.
However, now I want to create multiple variables each contains a "filtered" dataframe.
For example:
I have
"android_fltr1_d1" to represent the next filter:
["OS"]=android, ["Device"]=1,["Design"]=1
and "android_fltr3_d2" to represent:
["OS"]=android, ["Device"]=3,["Design"]=2
I tried the next code (which works perfectly fine).
android_fltr1_d1 = android_fltr1[android_fltr1["Design"]==1].drop(["Design"],axis=1)
android_fltr1_d2 = android_fltr1[android_fltr1["Design"]==2].drop(["Design"],axis=1)
android_fltr1_d3 = android_fltr1[android_fltr1["Design"]==3].drop(["Design"],axis=1)
android_fltr3_d1 = android_fltr3[android_fltr3["Design"]==1].drop(["Design"],axis=1)
android_fltr3_d2 = android_fltr3[android_fltr3["Design"]==2].drop(["Design"],axis=1)
android_fltr3_d3 = android_fltr3[android_fltr3["Design"]==3].drop(["Design"],axis=1)
android_fltr5_d1 = android_fltr5[android_fltr5["Design"]==1].drop(["Design"],axis=1)
android_fltr5_d2 = android_fltr5[android_fltr5["Design"]==2].drop(["Design"],axis=1)
android_fltr5_d3 = android_fltr5[android_fltr5["Design"]==3].drop(["Design"],axis=1)
As you can guess, I don't find it efficient and would like to use a for loop to generate those variables (as I'd need to match each ["Language"] option to each filter I created. Total of 60~ variables).
Thought about using something similar to .format() in the loop in order to be some kind of a "place-holder", couldn't find a way to do it.
It would be probably the best to use a nested loop to create all the variables, though I'd be content even with a single loop for each column.
I find it difficult to build the for loop to execute it and would be grateful for any help or directions.
Thanks!
As suggested I tried to find my answer in:How do I create variable variables?
Yet I failed to understand how I use the globals() function in my case. I also found that using '%' is not working anymore.

How to write a python function that receives a list of function name-strings to evaluate, and stacks each function's output into one DataFrame

I have 3 Python functions (full-on def functions) that each output a 2d numpy array (row vector) of the same size as one another. These functions are alternative approaches/answers to some same core problem.
def fun1():
return row_array
An identifier name (a string) corresponds to each of these functions, so that a full list of these function names would be
names = ['fun1','fun2','fun3']
How can I attach these names to the corresponding functions quickly as a pandas.DataFrame?
all = pd.DataFrame([['fun1', fun1], ['fun2', fun2]])
Finally, I would like to write a master function that lets the user send to it a list of names of desired functions to call, which could be all or a subset of them, and this master function evaluates each called function only, and stacks their respective row_arrays on top of each other into a squarish pandas DataFrame. In other words, aggregate all desired "alternative answers" into one data structure for viewing, so that at any later time I can repeat with a different subset of names.
What's the most efficient way to do all of the above without a series of if name is in list conditions?

Categories

Resources