I'm in awe that this isn't working because I've done this a hundred times. I want to stack two dataframes vertically and pandas is creating duplicate columns and refusing to put the data in the right columns.
df1 looks like this:
df2 looks like this:
then I run this:
frames = [df1,df2]
final = pd.concat(frames, ignore_index = True, axis = 0)
final
and get 6 columns instead of 3 like this:
I have no idea why two dataframes with identical column names and data types would not simply stack on top of each other. Any help appreciated.
Thanks.
update:
Big Thanks to #Naveed there was trailing whitespace in one of the dataframe's columns.
Here is the final fix:
df2.columns = [x.strip() for x in df2.columns]
frames = [df1,df2]
final = pd.concat(frames,ignore_index = True, axis = 0)
final
Try
check the column names, there might be white spaces that results in mis-alignment of column after the concat.
display(df1.columns, df2.columns)
# make axis=0 and remove ignore_index
final = pd.concat(frames, axis = 0)
final
Related
I have a pandas dataframe with the following data: (in csv)
#list1
poke_id,symbol
0,BTC
1,ETB
2,USDC
#list2
5,SOL
6,XRP
I am able to concatenate them into one dataframe using the following code:
df = pd.concat([df1, df2], ignore_index = True)
df = df.reset_index(drop = True)
df['poke_id'] = df.index
df = df[['poke_id','symbol']]
which gives me the output: (in csv)
poke_id,symbol
0,BTC
1,ETB
2,USDC
3,SOL
4,XRP
Is there any other way to do the same. I think calling the whole data frame of ~4000 entries just to add ~100 more will be a little pointless and cumbersome. How can I make it in such a way that it picks list 1 (or dataframe 1) and picks the highest poke_id; and just does i + 1 to the later entries in list 2.
Your solution is good, is possible simplify:
df = pd.concat([df1, df2], ignore_index = True).rename_axis('poke_id').reset_index()
use indexes to get what data you want from the dataframe, although this is not effective if you want large amounts of data from the dataframe, this method allows you to take specific amounts of data from the dataframe
I want to concatenate two dataframes that have the shape
(261, 35) and (600,35). I expect in the end to get a df with the shape (861,35) but I get (861, 70). I used the following methods
dfs = [df1,df2]
conc_df = pd.concat(dfs)
and
df1.append(df2)
However I always get double the amount of columns. Can someone please help me out here?
Make sure you have the two data frame name same
df1.columns = df2.columns
conc_df = pd.concat([df1,df2])
I want to concatenate two data frames of the same length, by adding a column to the first one (df).
But because certain df rows are being filtered, it seems the index isn't matching.
import pandas as pd
pd.read_csv(io.StringIO(uploaded['customer.csv'].decode('utf-8')), sep=";")
df["Margin"] = df["Sales"]-df["Cost"]
df = df.loc[df["Margin"]>-100000]
df = df.loc[df["Sales"]> 1000]
df.reindex()
df
This returns:
So this operation:
customerCluster = pd.concat([df, clusters], axis = 1, ignore_index= True)
print(customerCluster)
Is returning:
So, I've tried reindex and the argument ignore_index = True as you can see in above code snippet.
Thanks for all the answers. If anyone encounters the same problem, the solution I found was this:
customerID = df["CustomerID"]
customerID = customerID.reset_index(drop=True)
df = df.reset_index(drop=True)
So, basically, the indexes of both data frames are now matching, thus:
customerCluster = pd.concat((customerID, clusters), axis = 1)
This will concatenate correctly the two data frames.
I am reading two dataframes looking at one column and then showing the difference in position between the two dataframe with a -1 or +1 etc.
I have try the following code but it only shows 0 in Position Change when there should be a difference between British Airways and Ryanair
first = pd.read_csv("C:\\Users\\airma\\PycharmProjects\\Vatsim_Stats\\Vatsim_stats\\Base.csv", encoding='unicode_escape')
df1 = pd.DataFrame(first, columns=['airlines', 'Position'])
second = pd.read_csv("C:\\Users\\airma\\PycharmProjects\\Vatsim_Stats\\Vatsim_stats\\Base2.csv", encoding='unicode_escape')
df2 = pd.DataFrame(second, columns=['airlines', 'Position'])
df1['Position Change'] = np.where(df1['airlines'] == df2['airlines'], 0, df1['Position'] - df2['Position'])
I have also try to do it with the following code, but just keep getting a ValueError: cannot reindex from a duplicate axis
df1.set_index('airlines', drop=False) # Set index to cross reference by (icao)
df2.set_index('airlines', drop=False)
df2['Position Change'] = df1[['Position']].sub(df2['Position'], axis=0)
df2 = df2.reset_index(drop=True)
pd.set_option('display.precision', 0)
Base csv looks like this -
and Base2 csv looks like this -
As you can see British Airways is in 3 position on Base csv and 4 in Base 2 csv, but when running the code it just shows 0 and does not do the math between the two dataframes.
Have been stuck on this for days now, would be so grateful for any help.
I would like to offer some easier way based on columns, value and if-statement.
It is probably a little bit useless while you have big dataframe, but it can gives you the information you expect.
first = pd.read_csv("C:\\Users\\airma\\PycharmProjects\\Vatsim_Stats\\Vatsim_stats\\Base.csv", encoding='unicode_escape')
df1 = pd.DataFrame(first, columns=['airlines', 'Position'])
second = pd.read_csv("C:\\Users\\airma\\PycharmProjects\\Vatsim_Stats\\Vatsim_stats\\Base2.csv", encoding='unicode_escape')
df2 = pd.DataFrame(second, columns=['airlines', 'Position'])
I agree, that my answer was not correct with your question.
Now, if I understand correctly - you want to create new column in DataFrame that gives you -1 if two same columns in 2 DataFrames are incorrect and 1 if correct.
It should help:
key = "Name_Of_Column"
new = []
for i in range(0, len(df1)):
if df1[key][i] != df2[key][i]:
new.append(-1)
else:
new.append(1)
df3 = pd.DataFrame({"Diff":new}) # I create new DataFrame as Dictionary.
df1 = df1.append(df3, ignore_index = True)
print(df1)
i am giving u an alternative, i am not sure whether it is appreciated or not. But just an idea.
After reading two csv's and getting the column u require, why don't you try to join two dataframes for the column'airlines'? it will merge two dataframes with key as 'airlines'
The goal is to essentially combine the two databases and keep the alphabetical headers from the Tk1P dataframe while integrating the data from the Tk1L dataframe. Unfortunately I am getting this unintended result when trying to merge. Please see the link below the code to giphy for the output screen which shows both databases and the concat result. If anyone has ideas it would be very helpful. Thanks in advance.
Tk1D = pd.read_excel('C:\\Users\\Sam\\Desktop\\DF2.xlsx',1)
Tk1D = Tk1D.dropna()
Tk1D.drop(Tk1D.columns[[0, 1, 10]], inplace=True, axis=1)
#print("Tk1D: ", len(Tk1D), 'X', len(Tk1D.columns))
print('----------------------------------------------------------')
Tk1P = Tk1D.drop(['NT', 'PT'], axis=1)
Tk1P = Tk1P.drop(Tk1P.index[2:10035])
print(Tk1P)
print("Tk1P: ", len(Tk1P), 'X', len(Tk1P.columns))
print('----------------------------------------------------------')
Tk1L = xw.Book('C:\\Users\\Sam\\Desktop\\DF2.xlsx').sheets[1]
Tk1L = Tk1L.range('A2:N2').value
Tk1L = pd.DataFrame([Tk1L])
Tk1L.drop(Tk1L.columns[[0, 1, 10, 11, 12]], inplace=True, axis=1)
print(Tk1L)
print("Tk1L: ", len(Tk1L), 'X', len(Tk1L.columns))
print('----------------------------------------------------------')
TKP = pd.DataFrame(Tk1P.iloc[0]).transpose()
TKP.columns = Tk1P.columns
TKP = pd.concat([Tk1L, TKP], ignore_index=True)
print(TKP)
Giphy Dataframe and Concat Output
From your ouput it seems that you concat the dfalong the columns axis.
Try append instead of concat.
Tk1L.append(TKP)
EDIT
Look at the following two lines
TKP.columns = Tk1P.columns
TKP = pd.concat([Tk1L, TKP], ignore_index=True)
You set the column names of TKP equal to the column names of Tk1P in the first line. But in the second line, you append TKP to Tk1L (!). So the follwing should solve your problem
TKP.columns = Tk1L.columns
TKP = pd.concat([Tk1L, TKP], ignore_index=True)
I therefore guess that you simply mixed this up.
EDIT TWO
There is another problem in your code.
TKP = pd.DataFrame(Tk1P.iloc[0]).transpose()
The transpose likely messes things up. Tk1P is a 2 x 9 data frame. But when you transpose this, you get a 9 x 2 data frame. So, if you delete the transpose, you should be fine.
EDIT THREE
If you want alphatic column names, do
TKP.columns = Tk1P.columns