I want to merge a DataFrame with a CSV

I want to merge a DataFrame with a CSV - python

I need to merge 1 df with 1 csv.
df1 contains only 1 columns (id list of the product I want to update)
df2 contains 2 columns (id of all the products, quantity)
df1=pd.read_csv(id_file, header=0, index_col=False)
df2 = pd.DataFrame(data=result_q)
df3=pd.merge(df1, df2)
What I want: a dataframe that contains only id from csv/df1 merge with the quantities of df2 for the same id

if you want only the products that ya have in first data_frame you can use this:
df_1
Out[11]:
id
0 1
1 2
2 4
3 5
df_2
Out[12]:
id prod
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
6 7 g
7 8 h
df_3 = df_1.merge(df_2,on='id')
df_3
Out[14]:
id prod
0 1 a
1 2 b
2 4 d
3 5 e
you neede use the parameter on='column' so the will generate a new df only with the correspondent rows that have the same id.

you can use new_df= pd.merge(df1,df2, on=['Product_id'])

I've found the solution. I needed to reset the index for my df2
df1=pd.read_csv(id_file)
df2 = pd.DataFrame(data=result_q).reset_index()
df1['id'] = pd.to_numeric(df1['id'], errors = 'coerce')
df2['id'] = pd.to_numeric(df2['id'], errors = 'coerce')
df3=df1.merge(df2, on='id')
Thank you everyone!

Related

lookup value in the pandas dataframe using the muliple values in the row of another dataframe

I have dataframes:
df1:
| |A|B|C|D|E|
|0|1|2|3|4|5|
|1|1|3|4|5|0|
|2|3|1|2|3|5|
|3|2|3|1|2|6|
|4|2|5|1|2|3|
df2:
| |K|L|M|N|
|0|1|3|4|2|
|1|1|2|5|3|
|2|3|2|3|1|
|3|1|4|5|0|
|4|2|2|3|6|
|5|2|1|2|7|
What I need to do is match column A of df1 with column k of df2; column C of df1 with L of df2; and column D of df1 with column M of df2. If the values are matched the corresponding value of N in df2 should be assigned to a new column F in df1. The output should be:
| |A|B|C|D|E|F|
|0|1|2|3|4|5|2|
|1|1|3|4|5|0|0|
|2|3|1|2|3|5|1|
|3|2|3|1|2|6|7|
|4|2|5|1|2|3|7|

Use DataFrame.merge with left join and rename columns for match:
df = df1.merge(df2.rename(columns={'K':'A','L':'C','M':'D', 'N':'F'}), how='left')
print (df)
A B C D E F
0 1 2 3 4 5 2
1 1 3 4 5 0 0
2 3 1 2 3 5 1
3 2 3 1 2 6 7
4 2 5 1 2 3 7

df3 = df1.join(df2)
F = []
for _, row in df3.iterrows():
if row['A'] == row['K'] and row['C'] == row['L'] and row['D'] == row['M']:
F.append(row['N'])
else:
F.append(0)
df1['F'] = F
df1

Copy column value from one dataframe to another based on id in Pandas

I am trying to copy Name from df2 into df1 where ID is common between both dataframes.
df1:
ID Name
1 A
2 B
4 C
16 D
7 E
df2:
ID Name
1 X
2 Y
7 Z
Expected Output:
ID Name
1 X
2 Y
4 C
16 D
7 Z
I have tried like this, but it didn't worked. I am not able to understand how to assign value here. I am assigning =df2['Name'] which is wrong.
for i in df2["ID"].tolist():
df1['Name'].loc[(df1['ID'] == i)] = df2['Name']

Try with update
df1 = df1.set_index('ID')
df1.update(df2.set_index('ID'))
df1 = df1.reset_index()
df1
Out[476]:
ID Name
0 1 X
1 2 Y
2 4 C
3 16 D
4 7 Z

If the order of rows does not matter, then concatenate two dfs and drop_duplicates will achieve the result,
df2.append(df1).drop_duplicates(subset='ID')

another solution would be,
s = df1["Name"]
df1.loc[:,"Name"]=df1["ID"].map(df2.set_index("ID")["Name"].to_dict()).fillna(s)
o/P:
ID Name
0 1 X
1 2 Y
2 4 C
3 16 D
4 7 Z

One more for consideration
df,dg = df1,df2
df = df.set_index('ID')
dg = dg.set_index('ID')
df.loc[dg.index,:] = dg # All columns
#df.loc[dg.index,'Name'] = dg.Name # Single column
df = df.reset_index()
>>> df
ID Name
0 1 X
1 2 Y
2 4 C
3 16 D
4 7 Z
Or for a single column (index for both is 'ID'

Pandas: Force position of column in DataFrame (without knowing all columns)

Let's say I have a DataFrame and don't know the names of all columns. However, I know there's a column called "N_DOC" and I want this to be the first column of the DataFrame - (while keeping all other columns, regardless its order).
How can I do this?

You can reorder the columns of a datframe with reindex:
cols = df.columns.tolist()
cols.remove('N_DOC')
df.reindex(['N_DOC'] + cols, axis=1)

Use DataFrame.insert with DataFrame.pop for extract column:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'N_DOC':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
c = 'N_DOC'
df.insert(0, c, df.pop(c))
Or:
df.insert(0, 'N_DOC', df.pop('N_DOC'))
print (df)
N_DOC A B C E F
0 1 a 4 7 5 a
1 3 b 5 8 3 a
2 5 c 4 9 6 a
3 7 d 5 4 9 b
4 1 e 5 2 2 b
5 0 f 4 3 4 b

Here's a simple, one line, solution using DataFrame masking:
import pandas as pd
# Building sample dataset.
cols = ['N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe', 'N_DOC']
df = pd.DataFrame(columns=cols)
# Re-order columns.
df = df[['N_DOC'] + df.columns.drop('N_DOC').tolist()]
Before:
Index(['N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe', 'N_DOC'], dtype='object')
After:
Index(['N_DOC', 'N_DOCa', 'N_DOCb', 'N_DOCc', 'N_DOCd', 'N_DOCe'], dtype='object')

How to join column values in pandas MultiIndex DataFrame?

How can I join values in columns with the same name in MultiIndex pandas DataFrame?
data = [['1','1','2','3','4'],['2','5','6','7','8']]
df = pd.DataFrame(data, columns=['id','A','B','A','B'])
df = df.set_index('id')
df.columns = pd.MultiIndex.from_tuples([('result','A'),('result','B'),('student','A'),('student','B')])
df
result student
A B A B
id
1 1 2 3 4
2 5 6 7 8
Desired results:
A B
id
1 "1 3" "2 4"
2 "5 7" "6 8"

I am not completely sure what you are asking. If you have two separate dataframes then you should be able to just use pd.concat.
pd.concat([df1, df2], axis=1)
If you have one dataframe then just drop the top level of the index.
df.columns = df.columns.droplevel(0)

New answer:
For join values by second level of MultiIndex in columns use groupby with agg:
#select columns define in list
df = df[['result','student']]
df1 = df.astype(str).groupby(level=1, axis=1).agg(' '.join)
print (df1)
A B
id
1 1 3 2 4
2 5 7 6 8
Old answer:
You can use sort_index for sorting columns and then droplevel for remove first level of MultiIndex.
But get duplicate columns names.
print (df)
result student col
A B A B A B
id
1 1 2 3 4 6 7
2 5 6 7 8 2 1
#select columns define in list
df = df[['result','student']]
print (df)
result student
A B A B
id
1 1 2 3 4
2 5 6 7 8
df = df.sort_index(axis=1, level=1)
df.columns = df.columns.droplevel(0)
print (df)
A A B B
id
1 1 3 2 4
2 5 7 6 8
So better, unique columns names can be created by map with join:
df = df.sort_index(axis=1, level=1)
df.columns = df.columns.map('_'.join)
print (df)
result_A student_A result_B student_B
id
1 1 3 2 4
2 5 7 6 8
df = pd.concat([df['result'],df['student']], axis=1).sort_index(axis=1)
print (df)
A A B B
id
1 1 3 2 4
2 5 7 6 8

pandas - multiple columns to "column name - value" columns

I have a pandas dataframe with multiple columns and I want to "flatten" it to just two columns - one with column name and the other with values. E.g.
df1 = pd.DataFrame({'A':[1,2],'B':[2,3], 'C':[3,4]})
How can I convert it to look like:
df2 = pd.DataFrame({'column name': ['A','A','B','B','C','C'], 'value': [1,2,2,3,3,4]})

You can stack to stack all column values into a single, column, then drop the first level index calling reset_index, overwrite the column names with the ones you desire and then finally sort using sort_values:
In [37]:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index()
df2.columns = ['column name', 'value']
df2.sort_values(['column name', 'value'], inplace=True)
df2
Out[37]:
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4

You can reshape by stack to MultiIndex Series and then reset_index with sort_values:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values('index')
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
One row solution with rename column index to column name:
df2 = df1.stack()
.reset_index(level=0, drop=True)
.reset_index(name='value')
.sort_values(['index'])
.rename(columns={'index':'column name'})
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4
If need sort by both columns:
df2 = df1.stack().reset_index(level=0, drop=True).reset_index().sort_values(['index',0])
df2.columns = ['column name','value']
print (df2)
column name value
0 A 1
3 A 2
1 B 2
4 B 3
2 C 3
5 C 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

I want to merge a DataFrame with a CSV - python

you can use new_df= pd.merge(df1,df2, on=['Product_id'])

I've found the solution. I needed to reset the index for my df2 df1=pd.read_csv(id_file) df2 = pd.DataFrame(data=result_q).reset_index() df1['id'] = pd.to_numeric(df1['id'], errors = 'coerce') df2['id'] = pd.to_numeric(df2['id'], errors = 'coerce') df3=df1.merge(df2, on='id') Thank you everyone!

Related

lookup value in the pandas dataframe using the muliple values in the row of another dataframe

Copy column value from one dataframe to another based on id in Pandas

Pandas: Force position of column in DataFrame (without knowing all columns)

How to join column values in pandas MultiIndex DataFrame?

pandas - multiple columns to "column name - value" columns

Categories

Resources