Join Two Data frames- Python pandas - python

I have two Excel files which I loaded into dataframes:
In first Frame I have States State1,2,3... as column names:
In second Frame I have State1,2,3... as column values
I need to merge these two dataframes, based on value of state.
Kindly suggest.

Related

How do you make duplicate rows from same column into two separate column?

After importing CSV in pandas. i noticed one of the column "subject" has two different categories. in the csv file for subject column it has 'Tot' and 'voluntary' in one column but different rows.
How would make tot and voluntary into two separate column ? so that the rows doesn't duplicated
Below is the image of the csv on pandas

Merge multiple df in python and keep the same rows only one time

i am trying to merge multiple dataframes and create a new dataframe containing all the rows from each dataframe but containing only one time the rows that are the same. For example:
The dataframes that i have as input:
input dataframes
The dataframe that i want to have as output:
output dataframe
Do you know if there is a way to do that? If you could help me, i would be more than thankfull!!
Thanks,
Eleni

How to merge two subsets of a pandas dataframe and sort by original index?

Let's say I have a pandas dataframe:Original Dataframe
I then performed two operations on this dataframe to filter out some cells. This gives me the following two dataframes which are derived from the original dataframe:first filtered dataframe, second filtered dataframe
How can I merge these two dataframes together and sort it by original index? Here is the example result of what I would like to do: merged dataframe example
Since they share no intersecting indexes between the two data frames, you can simply concatenate them and sort them by index like this:
merged = pandas.concat(first_filtered_df1, second_filtered_df2).sort_index()
If Index is the label of the header of the column and not a real index then:
merged = pandas.concat(first_filtered_df1, second_filtered_df2).sort_values(by=['Index'])

Combining two pandas dataframes based on column AND row VALUES

First, I haven't found this asked before - probably because I'm not using the right words to ask it. So if it has been asked, please send me in that direction.
How can I combine two pandas data frames based on column AND row. My main dataframe has a column 'years' and a column 'county' among others. Ideally, I want to add another column 'percent' from the second data frame below.
For example, I have this image of my first df:
and I have another data frame with the same 'year' column and every other column name is a string value in the original "main" dataframe's 'county' column:
How can I combine these two data frames in a way that adds another column to the 'main df'? It would be helpful to first put the second data frame in the format where there are three columns: 'year', 'county', and 'percent'. If anyone can help me with this part, I can merge it.
I think what you will want to do is transform the second dataframe to have a row for each year/county combination and then you can use a left join to combine the two. I believe the ```melt`` method will do this transformation. Try this:
melted_second_df = second_df.melt(id_vars=["year"], var_name="county", value_name="percent")
combined_df = first_df.merge(
right=melted_second_df,
on=["year", "county"],
how="left"
)

Merging data from python list into one dataframe

I have the following files in AAMC_K.txt, AAU.txt, ACU.txt, ACY.txt in a folder called AMEX. I am trying to merge these text files into one dataframe. I have tried to do so with pd.merge() but I get an error that the merge function needs a right and left parameter and my data is in a python list. How can I merge the data in the data_list into one pandas dataframe.
import pandas as pd
import os
textfile_names = os.listdir("AMEX")
textfile_names.sort()
data_list = []
for i in range(len(textfile_names)):
data = pd.read_csv("AMEX/"+textfile_names[i], index_col=None, header=0)
data_list.append(data)
frame = pd.merge(data_list, on='<DTYYYYMMDD>', how='outer')
"AE.txt"
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0
AAU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
AAU,D,20020513,000000,0.4220,0.4220,0.4220,0.4220,0,0
AAU,D,20020514,000000,0.4177,0.4177,0.4177,0.4177,0,0
ACU.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0
ACY.txt
<TICKER>,<PER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACY,D,19980116,000000,9.7500,9.7500,8.8125,8.8125,289,0
ACY,D,19980120,000000,8.7500,8.7500,8.1250,8.1250,151,0
I want the output to be filtered with the DTYYYYMMDD and put into one dataframe frame.
OUTPUT
<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>,<TICKER>,<PER>,<DTYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>,<OPENINT>
ACU,D,19970102,000000,5.2500,5.3750,5.1250,5.1250,52,0,AE,D,19970102,000000,12.6250,12.6250,11.7500,11.7500,144,0
ACU,D,19970103,000000,5.1250,5.2500,5.0625,5.2500,12,0,AE,D,19970103,000000,11.8750,12.1250,11.8750,12.1250,25,0
As #busybear says, pd.concat is the right tool for this job: frame = pd.concat(data_list).
merge is for when you're joining two dataframes which usually have some of the same columns and some different ones. You choose a column (or index or multiple) which identifies which rows in the two dataframes correspond to each other, and pandas handles making a dataframe whose rows are combinations of the corresponding rows in the two original dataframes. This function only works on 2 dataframes at a time; you'd have to do a loop to merge more in (it's uncommon to need to merge many dataframes this way).
concat is for when you have multiple dataframes and want to just append all of their rows or columns into one large dataframe. (Let's assume you're concatenating rows, as you want here.) It doesn't use an identifier to determine which rows correspond. All it does is create a new dataframe which has each row from each of the concated dataframes (all the rows from the first, then all from the second, etc.).
I think the above is a decent TLDR on merge vs concat but see here for a lengthy but much more comprehensive guide on using merge/join/concat with dataframes.

Categories

Resources