How can I convert columns to rows in a pd.dataframe, currently my code is as below in, instead of having my values returned in columns I want them to be displayed in rows, I have tried using iterrows:
df = pd.DataFrame (columns = cleaned_array)
output = df.to_csv ( index=False, mode='a', encoding = "utf-8")
print(output)
Try this:
df = pd.DataFrame (columns = cleaned_array)
df.T
This will interchange your rows and columns
You want to use the tranpose function.
df.T or df.transpose()
Related
I have several data frames (with equal # columns but different names). I'm trying to create one data frame with rows stacked below each other. I don't care now about the column names (I can always rename them later). I saw different SO links but they don't address this problem completely.
Note I've 21 data frames and scalability is important. I was looking at
this
How I get df:
df = []
for f in files:
data = pd.read_csv(f, usecols = [0,1,2,3,4])
df.append(data)
Assuming your DataFrames are stored in some list df_l:
Rename the columns and concat:
df_l = [df1, df2, df3]
for df in df_l:
df.columns = df_l[0].columns # Just chose any DataFrame
pd.concat(df_l) # Columns named with above DataFrame
# Index is preserved
Or construct a new DataFrame:
pd.DataFrame(np.vstack([df.to_numpy() for df in df_l])) # Columns are RangeIndex
# Index is RangeIndex
I will do it at the beginning adding skiprows=1
names=[0,1,2,3,4]# what every you want to call them ..
pd.concat([pd.read_csv(f, usecols = [0,1,2,3,4],skiprows=1,names=[0,1,2,3,4]) for f in files])
Once you put all the data frames into a list, try this code.
import pandas as pd
df = [df1, df2, df3]
result = pd.DataFrame(columns=df1.columns)
for df in df:
result = pd.concat([result,df.rename(columns=df1.columns)], ignore_index=True)
Well I have a simple csv, that has 2 columns and about 50 rows.
The first column is ip and other is cik, and I want to get how many ip's are there with the different cik. So this is my code that does that, and it work great:
code:
import pandas as pd
csv = pd.read_csv('test.csv')
df = pd.DataFrame(csv)
df = df.groupby('cik').count()
df = pd.DataFrame(df).to_csv('output.csv', index=False)
But the csv output is like:
ip
49
And I want it to be like when I print the df value after groupby and count, something like this:
So I have in the first column the cik and in other the number of ip's that have that cik.
Your option index=False makes the method omit row names which in your case is the 1515671, save it with simple:
df.to_csv('output.csv')
Try adding reset_index before you output to_csv.
import pandas as pd
csv = pd.read_csv('test.csv')
df = pd.DataFrame(csv)
df = df.groupby('cik').count().reset_index() #reset_index creates 0...n index and avoids cik as index
df.to_csv('output.csv', index=False)
OR
set the index=True while outputting to_csv
df.to_csv('output.csv', index=True)
I have an Excel file with 100 sheets. I need to extract data from each sheets column P beginning from row 7 & create a new file with all extracted data in same column. In my Output file, the data is located in different column, ie(Sheet 2's data in column R, Sheet 3's in column B)
How can I make the data in the same column in the new Output excel? Thank you.
ps. Combining all sheets' column P data into a single column in single sheet is enough for me
import pandas as pd
import os
Flat_Price = "Flat Pricing.xlsx"
dfs = pd.read_excel(Flat_Price, sheet_name=None, usecols = "P", skiprows=6, indexcol=1, sort=False)
df = pd.concat(dfs)
print(df)
writer = pd.ExcelWriter("Output.xlsx")
df.to_excel(writer, "Sheet1")
writer.save()
print (os.path.abspath("Output.xlsx"))
You need parameter header=None for default 0 column name:
dfs = pd.read_excel(Flat_Price,
sheet_name=None,
usecols = "P",
skiprows=6,
indexcol=1,
header=None)
Then is possible extract number from first level of MultiIndex, convert to integer and sorting by sort_index:
df =df.set_index([df.index.get_level_values(0).str.extract('(\d+)',expand=False).astype(int),
df.index.get_level_values(1)]).sort_index()
This should work:
raw_data.drop('some_great_column', axis=1).compute()
But the column is not dropped. In pandas I use:
raw_data.drop(['some_great_column'], axis=1, inplace=True)
But inplace does not exist in Dask. Any ideas?
You can separate into two operations:
# dask operation
raw_data = raw_data.drop('some_great_column', axis=1)
# conversion to pandas
df = raw_data.compute()
Then export the Pandas dataframe to a CSV file:
df.to_csv(r'out.csv', index=False)
I assume you want to keep "raw data" in a Dask DF. In that case the following will do the trick:
new_raw_df = raw_data.drop('some_great_column', axis=1).copy()
where type(new_raw_df) is dask.dataframe.core.DataFrame and you can delete the original DF.
I wonder how to save a new pandas Series into a csv file in a different column. Suppose I have two csv files which both contains a column as a 'A'. I have done some mathematical function on them and then create a new variable as a 'B'.
For example:
data = pd.read_csv('filepath')
data['B'] = data['A']*10
# and add the value of data.B into a list as a B_list.append(data.B)
This will continue until all of the rows of the first and second csv file has been reading.
I would like to save a column B in a new spread sheet from both csv files.
For example I need this result:
colum1(from csv1) colum2(from csv2)
data.B.value data.b.value
By using this code:
pd.DataFrame(np.array(B_list)).T.to_csv('file.csv', index=False, header=None)
I won't get my preferred result.
Since each column in a pandas DataFrame is a pandas Series. Your B_list is actually a list of pandas Series which you can cast to DataFrame() constructor, then transpose (or as #jezrael shows a horizontal merge with pd.concat(..., axis=1))
finaldf = pd.DataFrame(B_list).T
finaldf.to_csv('output.csv', index=False, header=None)
And should csv have different rows, unequal series are filled with NANs at corresponding rows.
I think you need concat column from data1 with column from data2 first:
df = pd.concat(B_list, axis=1)
df.to_csv('file.csv', index=False, header=None)