Having a dataframe as below:
df1 = pd.DataFrame({'Name1':['A','Q','A','B','B','C','C','C','E','E','E'],
'Name2':['B','C','D','C','D','D','A','B','A','B','C'],'Marks2':[10,20,6,50, 88,23,140,9,60,65,70]})
df1
#created a new frame
new=df1.loc[(df1['Marks2'] <= 50)]
new
#created a pivot table
temp=new.pivot_table(index="Name1", columns="Name2", values="Marks2")
temp
I tried to re-index the pivot table.
new_value=['E']
order = new_value+list(temp.index.difference(new_value))
matrix=temp.reindex(index=order, columns=order)
matrix
But the values related to 'E' is not present in pivot table. dataframe df1 contains values related with E. I need to add the value related to E in the pivot_table
Expected output:
Based on the comments my understanding of the intended result:
E A B C D
E NaN 60.0 65.0 70.0 NaN
A NaN NaN 10.0 NaN 6.0
C NaN NaN 9.0 NaN 23.0
Q NaN NaN NaN 20.0 NaN
Code:
Activate the inlcuded #print() statements to see what the steps do.
Especially at the header 'formatting' in the end you may adapt acc. your needs.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Name1':['A','Q','A','B','B','C','C','C','E','E','E'],
'Name2':['B','C','D','C','D','D','A','B','A','B','C'],
'Marks2':[10,20,6,50, 88,23,140,9,60,65,70]})
df1['Marks2'] = np.where( (df1['Marks2'] >= 50) & (df1['Name1'] != 'E'),
np.nan, df1['Marks2'])
#print(df1)
temp=df1.pivot_table(index="Name1", columns="Name2", values="Marks2")
#print(temp)
name1_to_move = 'E'
# build new index with name1_to_move at the start (top in df idx)
idx=temp.index.tolist()
idx.pop(idx.index(name1_to_move))
idx.insert(0, name1_to_move)
# moving the row to top by reindex
temp=temp.reindex(idx)
#print(temp)
temp.insert(loc=0, column=name1_to_move, value=np.nan)
#print(temp)
temp.index.name = None
#print(temp)
temp = temp.rename_axis(None, axis=1)
print(temp)
I have two dataframes like below,
import numpy as np
import pandas as pd
df1 = pd.DataFrame({1: np.zeros(5), 2: np.zeros(5)}, index=['a','b','c','d','e'])
and
df2 = pd.DataFrame({'category': [1,1,2,2], 'value':[85,46, 39, 22]}, index=[0, 1, 3, 4])
The value from second dataframe is required to be assigned in first dataframe such that the index and column relationship is maintained. The second dataframe index is iloc based and has column category which is actually containing column names of first dataframe. The value is value to be assigned.
Following is the my solution with expected output,
for _category in df2['category'].unique():
df1.loc[df1.iloc[df2[df2['category'] == _category].index.tolist()].index, _category] = df2[df2['category'] == _category]['value'].values
Is there a pythonic way of doing so without the for loop?
One option is to pivot and update:
df3 = df1.reset_index()
df3.update(df2.pivot(columns='category', values='value'))
df3 = df3.set_index('index').rename_axis(None)
Alternative, reindex df2 (in two steps, numerical and by label), and combine_first with df1:
df3 = (df2
.pivot(columns='category', values='value')
.reindex(range(max(df2.index)+1))
.set_axis(df1.index)
.combine_first(df1)
)
output:
1 2
a 85.0 0.0
b 46.0 0.0
c 0.0 0.0
d 0.0 39.0
e 0.0 22.0
Here's one way by replacing the 0s in df1 with NaN; pivotting df2 and filling in the NaNs in df1 with df2:
out = (df1.replace(0, pd.NA).reset_index()
.fillna(df2.pivot(columns='category', values='value'))
.set_index('index').rename_axis(None).fillna(0))
Output:
1 2
a 85.0 0.0
b 46.0 0.0
c 0.0 0.0
d 0.0 39.0
e 0.0 22.0
I tried two merge two data frames by adding the first line of the second df to the first line of the first df. I also tried to concatenate them but eiter failed.
The format of the Data is
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,
The expected format of the output should be
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,0.000,,,,,,,
2,3,N0129,Position,62.2,0.376,62.238,0.136,***---**,76.1,-36.000,0.300,-36.057,,,,
3,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,0.000,,,,,,,
I already splitted the dataframe from above into two frames. The first one contains only the odd indexes and the second one the even one's.
My problem is now, to merge/concatenate the two frames, by adding the first row of the second df to the first row of the first df. I already tried some methods of merging/concatenating but all of them failed. All the print functions are not neccessary, I only use them to have a quick overview in the console.
The code which I felt most comfortable with is:
os.chdir(output)
csv_files = os.listdir('.')
for csv_file in (csv_files):
if csv_file.endswith(".asc.csv"):
df = pd.read_csv(csv_file)
keep_col = ['Messpunkt', 'Zeichnungspunkt', 'Eigenschaft', 'Position', 'Sollmass','Toleranz','Abweichung','Lage']
new_df = df[keep_col]
new_df = new_df[~new_df['Messpunkt'].isin(['**Teil'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Oben'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Unten'])]
new_df = new_df[~new_df['Messpunkt'].isin(['**N'])]
print(new_df)
new_df.to_csv(output+csv_file)
df1 = new_df[new_df.index % 2 ==1]
df2 = new_df[new_df.index % 2 ==0]
df1.reset_index()
df2.reset_index()
print (df1)
print (df2)
merge_df = pd.concat([df1,df2], axis=1)
print (merge_df)
merge_df.to_csv(output+csv_file)
I highly appreciate some help.
With this code, the output is:
1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,,,,,,,,
2,,,,,,,,,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---,,,,,,,,
4,,,,,,,,,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,,,,,,,,
6,,,,,,,,,0.000,,,,,,,
I get expected result when I use reset_index() to have the same index in both DataFrames.
It may need also drop=True to skip index as new column
pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
Minimal working example.
I use io only to simulate file in memory.
text = '''1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,'''
import pandas as pd
import io
pd.options.display.max_columns = 20 # to display all columns
df = pd.read_csv(io.StringIO(text), header=None, index_col=0)
#print(df)
df1 = df[df.index % 2 == 1] # .reset_index(drop=True)
df2 = df[df.index % 2 == 0] # .reset_index(drop=True)
#print(df1)
#print(df2)
merge_df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print(merge_df)
Result:
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
0 3.0 N0128 Durchm. 5.0 0.100 5.076 0.076 -----****-- 0.0 NaN NaN NaN NaN NaN NaN NaN
1 3.0 N0129 Position 62.2 0.376 62.238 0.136 ***--- 76.1 -36.000 0.300 -36.057 NaN NaN NaN NaN
2 2.0 N0130 Durchm. 5.0 0.100 5.067 0.067 -----***--- 0.0 NaN NaN NaN NaN NaN NaN NaN
EDIT:
It may need
merge_df.index = merge_df.index + 1
to correct index.
I'am trying to merge 3 dataframes by index however so far unsuccessfully.
Here is the code:
import pandas as pd
from functools import reduce
#identifying csvs
x='/home/'
csvpaths = ("Data1.csv", "Data2.csv", "Data3.csv")
dfs = list() # an empty list
#creating dataframes based on number of csvs
for i in range (len(csvpaths)):
dfs.append(pd.read_csv(str(x)+ csvpaths[i],index_col=0))
print(dfs[1])
#creating suffix for each dataframe's columns
S=[]
for y in csvpaths:
s=str(y).split('.csv')[0]
S.append(s)
print(S)
#merging attempt
dfx = lambda a,b: pd.merge(a,b,on='SHIP_ID',suffixes=(S)), dfs
print(dfx)
print(dfx.columns)
if i try to export it as csv i get an error as follows(similar error when i try to print dfx.columns):
'tuple' object has no attribute 'to_csv'
the output i want is merger of the 3 dataframes as follows(with respective suffixes), please help.
[Note:table below is very simplified,original table consists of dozens of columns and thousands of rows, hence require practical merging method]
Try:
for s,el in zip(suffixes, dfs):
el.columns=[str(col)+s for col in el.columns]
dfx=pd.concat(dfs, ignore_index=True, sort=False, axis=1)
For the test case I used:
import pandas as pd
dfs=[pd.DataFrame({"x": [1,2,7], "y": list("ghi")}), pd.DataFrame({"x": [5,6], "z": [4,4]}), pd.DataFrame({"x": list("acgjksd")})]
suffixes=["_1", "_2", "_3"]
for s,el in zip(suffixes, dfs):
el.columns=[str(col)+s for col in el.columns]
>>> pd.concat(dfs, ignore_index=True, sort=False, axis=1)
x_1 y_1 x_2 z_2 x_3
0 1.0 g 5.0 4.0 a
1 2.0 h 6.0 4.0 c
2 7.0 i NaN NaN g
3 NaN NaN NaN NaN j
4 NaN NaN NaN NaN k
5 NaN NaN NaN NaN s
6 NaN NaN NaN NaN d
Edit:
for s,el in zip(suffixes, dfs):
el.columns=[str(col)+s for col in el.columns]
el.set_index('ID', inplace=True)
dfx=pd.concat(dfs, ignore_index=False, sort=False, axis=1).reset_index()
I have a 2-D numpy array each row of which consists of three elements - ['dataframe_column_name', 'dataframe_index', 'value'].
Now, I tried populating the pandas dataframe using iloc double for loop but it is quite slow. Is there any faster way of doing this. I am a bit new to pandas, so apologies in case this is something very basic.
Here is the code snippet :
my_nparray = [['a', 1, 123], ['b', 1, 230], ['a', 2, 321]]
for r in range(my_nparray.shape[0]):
[col, ind, value] = my_nparray[r]
df.iloc[col][ind] = value
This takes a lot of time when my_nparray is large, is there any other way of doing this?
Initially assume that I can create this data frame :
'a' 'b'
1 NaN NaN
2 NaN NaN
I want the output as :
'a' 'b'
1 123 230
2 321 NaN
You can use from_records and then pivot:
df = pd.DataFrame.from_records(my_nparray, index=1).pivot(columns=0)
2
0 a b
1
1 123.0 230.0
2 321.0 NaN
This specifies that the index uses field 1 from your array and pivot uses Series 0 for the columns.
Then we can reset the MultiIndex on the columns and the index:
df.columns = df.columns.droplevel(None)
df.columns.name = None
df.index.name = None
a b
1 123.0 230.0
2 321.0 NaN
Use DataFrame constructor with DataFrame.pivot and DataFrame.rename_axis:
df = pd.DataFrame(my_nparray).pivot(1,0,2).rename_axis(index=None, columns=None)
print (df)
a b
1 123.0 230.0
2 321.0 NaN