Before I start, my disclaimer is that I'm very new to Python and I've been building a flask app in an effort to learn more, so my question might be silly but please oblige me.
I have a Pandas Dataframe created from reading in a csv or excel doc on a flask app. The user uploads the document, so the dataframe and column names change with every upload.
The user also selects the columns they want to merge from a html multiselect object, which returns the selected columns from the user to the app script in the form of a python list.
What I currently have is:
df=pd.read_csv(file)
columns=df.columns.values
and
selected_col=request.form.getlist('columns')
All of this works fine, but I'm now stuck. How can I merge the values from the rows of this list of column names (selected_col) into a new column on the dataframe such that df["Merged"] = list of selected column values.
I've seen people use the merge function which seems to work well for 2 columns, but in this case it could be any number of columns that are merged, hence I'm looking for a function that either takes in a list of columns and merges it or iterates through the list of columns appending the values in a new column.
Sounds like what you want to do is more like an element-wise concatenation, not a merge.
If I understand you correctly, you can get your desired result with a list comprehension creating a nested list that is turned into a pandas Series by assigning it as a new DataFrame column:
df['Merged'] = [list(row) for row in df[selected_col].values]
Related
def check_duplication(excelfile, col_Date, col_Name):
list_rows[]
Uphere is the a bit of the code.
How do I make lists in Python from the excel file? I want to compile every rows that contains the value of Date and Name in the sheet from excel and make it as a list. The reason I want to make a list because later I want to compare between the rows within the list to check if there is a duplicate within the list of rows.
Dataframe Method
To compare excel content, you do not need to make a list. But if you want to make a list, one starting point may be making a dataframe, which you can inspect in python. To make a dataframe, use:
import pandas as pd
doc_path = r"the_path_of_excel_file"
sheets= pd.read_excel(doc_path, sheet_name= None, engine= "openpyxl", header= None)
This code lines read the excel document's all sheets without headers. You may change the parameters.
(For more information: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)
Assume Sheet1 is the sheet we have our data in:
d_frame = sheets[0]
list_rows = [df.iloc[i,:] for i in range(len(d_frame.shape[0]))]
I assume you want to use all columns. You may find the list with the code.
Below is the code where 5 dataframes are being generated and I want to combine all the dataframes into one, but since they have different headers of the columns, i think appending it to the list are not retaining the header names instead it is providing numbers.
Is there any other solution to combine the dataframes keeping the header names as it is?
Thanks in advance!!
list=[]
i=0
while i<5:
df = pytrend.interest_over_time()
list.append(df)
i=i+1
df_concat=pd.concat(list,axis=1)
Do you have a common column in the dataframes that you can merge on? In that case - use the data frame merge function.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
I've had to do this recently with two dataframes I had, and I merged on the date column.
Are you trying to add additional columns, or append each dataframe on top of each other?
https://www.datacamp.com/community/tutorials/joining-dataframes-pandas
This link will give you an overview of the different functions you might need to use.
You can also rename the columns, if they do contain the same sort of data. Without an example of the dataframe it's tricky to know.
I imported a .csv file with a single column of data into a dataframe that I am trying to clean up by splitting the column based on various string occurrences within the cells. I've tried numerous means to split the column, but can't seem to get it to work. My latest attempt was using the following:
df.loc[:,'DataCol'] = df.DataCol.str.split(pat=':\n',expand=True)
df
The result is a dataframe that is still one column and completely unchanged. What am I doing wrong? This is my first time doing anything like this so please forgive the simple question.
Df.loc creates a copy of the column you've selected - try replacing the code below with df['DataCol'], which references the actual column in the original dataframe.
df.loc[:,'DataCol']
I don't know whether this is a very simple qustion, but I would like to do a condition statement based on two other columns.
I have two columns like: the age and the SES and the another empty column which should be based on these two columns. For example when one person is 65 years old and its corresponding socio-economic status is high, then in the third column(empty column=vitality class) a value of 1 is for example given. I have got an idea about what I want to achieve, however I have no idea how to implement that in python itself. I know I should use a for loop and I know how to write conditons, however due to the fact that I want to take two columns into consideration for determining what will be written in the empty column, I have no idea how to write that in a function
and furthermore how to write back into the same csv (in the respective empty column)
[]
Use the pandas module to import the csv as a DataFrame object. Then you can do logical statements to fill empty columns:
import pandas as pd
df = pd.read_csv('path_to_file.csv')
df.loc[(df['age']==65) & (df['SES']=='high'), 'vitality_class'] = 1
df.to_csv('path_to_new_file.csv', index=False)
I am trying to combine two tables row wise (stack on top of each other, like using rbind in R). I've followed steps mentioned in:
Pandas version of rbind
how to combine two data frames in python pandas
But none of the "append" or "concat" are working for me.
About my data
I have two panda dataframe objects (type class 'pandas.core.frame.DataFrame'), both have 19 columns. when i print each dataframe they look fine.
The problem
So I created another panda dataframe using:
query_results = pd.DataFrame(columns=header_cols)
and then in a loop (because sometimes i may be combining more than just 2 tables) I am trying to combine all the tables:
for CCC in CCCList:
query_results.append(cost_center_query(cccode=CCC))
where cost_center_query is a customized function and returns pandas dataframe objects with same column names as the query_results.
however, with this, whenever i print "query_results" i get empty dataframe.
any idea why this is happening? no error message as well, so i am just confused.
Thank you so much for any advice!
Consider the concat method on a list of dataframes which avoids object expansion inside a loop with multiple append calls. Even consider a list comprehension:
query_results = pd.concat([cost_center_query(cccode=CCC) for CCC in CCCList], ignore_index=True)