I have two Dataframes in which I am trying to merge using pandas. One table is 4 columns and the other one is 3. I am attempting an inner join on an int64 type column.
On the link you can see both columns named UPC are int64 types.
Just to make sure the Dataframes weren't empty, I have added a picture of the first 20 rows for each table.
when I try to merge I put the following command.
result = merge(MPA_COMMODITY, MDM_LINK_VIEW, on='UPC')
When I try to check the return value, it returns the column names but it says that the Dataframe is empty.
This is using Python 3.6.4 and Pandas version 0.22.0.
If there is any other information needed please let me know. More than glad to update post if have to.
I think you want
MPA_COMMODITY.merge(MDM_LINK_VIEW, on='UPC')
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
Related
I'm trying to get a scope of certain rows in a specific timeframe. The dataframe contains 2 indexes and one of them is made of datetimes (created with pd.to_datetime). When I try to select certain rows using df_pivot.loc[slice(None), '2021'] I get a KeyError: '2021'. Looking for rows using the year should be possible with datetimes right??? What do i do wrong? picture of the dataframe/indexes
problem is solved, I used reset_index() and then set_index('Datetime') to make it easier to navigate
Before I start, my disclaimer is that I'm very new to Python and I've been building a flask app in an effort to learn more, so my question might be silly but please oblige me.
I have a Pandas Dataframe created from reading in a csv or excel doc on a flask app. The user uploads the document, so the dataframe and column names change with every upload.
The user also selects the columns they want to merge from a html multiselect object, which returns the selected columns from the user to the app script in the form of a python list.
What I currently have is:
df=pd.read_csv(file)
columns=df.columns.values
and
selected_col=request.form.getlist('columns')
All of this works fine, but I'm now stuck. How can I merge the values from the rows of this list of column names (selected_col) into a new column on the dataframe such that df["Merged"] = list of selected column values.
I've seen people use the merge function which seems to work well for 2 columns, but in this case it could be any number of columns that are merged, hence I'm looking for a function that either takes in a list of columns and merges it or iterates through the list of columns appending the values in a new column.
Sounds like what you want to do is more like an element-wise concatenation, not a merge.
If I understand you correctly, you can get your desired result with a list comprehension creating a nested list that is turned into a pandas Series by assigning it as a new DataFrame column:
df['Merged'] = [list(row) for row in df[selected_col].values]
I am trying to combine two tables row wise (stack on top of each other, like using rbind in R). I've followed steps mentioned in:
Pandas version of rbind
how to combine two data frames in python pandas
But none of the "append" or "concat" are working for me.
About my data
I have two panda dataframe objects (type class 'pandas.core.frame.DataFrame'), both have 19 columns. when i print each dataframe they look fine.
The problem
So I created another panda dataframe using:
query_results = pd.DataFrame(columns=header_cols)
and then in a loop (because sometimes i may be combining more than just 2 tables) I am trying to combine all the tables:
for CCC in CCCList:
query_results.append(cost_center_query(cccode=CCC))
where cost_center_query is a customized function and returns pandas dataframe objects with same column names as the query_results.
however, with this, whenever i print "query_results" i get empty dataframe.
any idea why this is happening? no error message as well, so i am just confused.
Thank you so much for any advice!
Consider the concat method on a list of dataframes which avoids object expansion inside a loop with multiple append calls. Even consider a list comprehension:
query_results = pd.concat([cost_center_query(cccode=CCC) for CCC in CCCList], ignore_index=True)
I have a df :
How can I remove duplicates based on of only one column? Because I have rows that all of their columns are the same but only one is not. I want to ignore that column and get the unique values based on the other column?
That is how I tried but I get an error on it:
data.drop_duplicates('asn','first_seen','incident_type','ip','uri')
Any idea?
What version of pandas are you running? I believe that since >0.14 you should provide a list of columns to drop_duplicates() using the subset keyword, so try
data.drop_duplicates(subset=['asn','first_seen','incident_type','ip','uri'])
Also note that if you are not using inplace=True you will need to assign the returned value to a new dataframe.
Depending on your needs, you may also want to call reset_index() after dropping the duplicate rows.
I have a DataFrame ('main') that has about 300 columns. I created a smaller DataFrame ('public') and have been working on this.
I now want to delete the columns contained within 'public' from the larger DataFrame ('main').
I've tried the following instructions:
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.drop.html
Python Pandas - Deleting multiple series from a data frame in one command
without any success, along with various other statements that have been unsuccessful.
The columns that make up 'public' are not consecutive - i.e. they are taken from various points in the larger DataFrame 'main'. All of the columns have the same Index. [Not sure if this is important, but 'public' was created using the 'join' function].
Yes, I'm being lazy - I don't want to have to type out the names of every column! I'm hoping there's a way to use the DataFrame 'public' in a statement that will allow deletion of these columns en masse. If anyone has any suggestions and/or guidance I'd be most grateful.
(Have Python 2.7 and am using Pandas, numpy, math, pylab etc.)
Thanks in advance.
Ignore my question - Murphy's Law prevails and I've just solved it.
I was using the statement from the stackoverflow question mentioned below:
df.drop(df.columns[1:], axis=1)
and this was not working. I have instead used
df = df.drop(df2, axis=1)
and this worked (df = main, df2 = public). Simple really once you don't overthink it.