I imported a table with 30 columns of data and pandas automatically generated an index for the rows from 0-232. I went to make a new dataframe with only 5 of the columns, using the below code:
df = pd.DataFrame(data=[data['Age'], data['FG'], data['FGA'], data['3P'], data['3PA']])
When I viewed the df the rows and columns had been transposed, so that the index made 232 columns and there were 5 rows. How can I set the index vertically, or transpose the dataframe?
The correct approach is actually much simpler. You just need to pull out the columns simultaneously with a list of column names:
df = data[['Age', 'FG', 'FGA', '3P', '3PA']]
Paul's response is the most preferred way to perform this operation. But as you suggest, you could alternatively transpose the DataFrame after reading it in:
df = df.T
Related
I have a Pandas DF and I need to:
Transpose my columns to rows,
Transform these rows to indexes,
Set the actual columns as titles for each columns (and not as part of rows)
How can I do that?
Here is my DF before the transpostion:
Here is my Df after my failed transposition:
After transposing, use:
df.columns = df.iloc[0]
to set column headers to the first row.
Then use the 'set_axis()' function to set indices for your rows. An explanation for this function is linked
here
I have a big dataframe with 100 rows and the structure is [qtr_dates<datetime.date>, sales<float>] and a small dataframe with same structure with less than 100 rows. I want to merge these two dfs such that merged df will have all the rows from small df and remaining rows will be taken from big df.
Right now I am doing this
df = big_df.merge(small_df, on=big_df.columns.tolist(), how='outer')
But this is creating a df with duplicate qtr_dates.
Use concat with remove duplicates by DataFrame.drop_duplicates:
pd.concat([small_df, big_df], ignore_index=True).drop_duplicates(subset=['qtr_dates'])
If I understand correctly, you want everything from the bigger dataframe, but if that date is present in the smaller data frame you would want it replaced by the relevant value from the smaller one?
Hence I think you want to do this:
df = big_df.merge(small_df, on=big_df.columns.tolist(),how='left',indicator=True)
df = df[df._merge!= "both"]
df_out = pd.concat([df,small_df],ignore_index=True)
This would remove any rows from the big_df which exist in the small_df in the 2nd step, before then adding the small_df rows by concatenating rather than merging.
If you had more column names that weren't involved with the join you'd have to do some column renaming/dropping though I think.
Hope that's right.
Try maybe join instead of merge.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html
Wondering if someone could please help me on this:
Have a pandas df with a rather large amount of columns (over 50). I'd like to remove duplicates based on a subset (column 2 to 50).
Been trying to use df.drop_duplicates(subset=["col1","col2",....]), but wondering if there is a way to pass the column index instead so I don't have to actually write out all the column headers to consider for the drop but instead can do something along the lines of df.drop_duplicates(subset = [2:])
Thanks upfront
You can slice df.columns like:
df.drop_duplicates(subset = df.columns[2:])
Or:
df.drop_duplicates(subset = df.columns[2:].tolist())
I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.
Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.
You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task
You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.
Im performing some operations in a df of 4000 columns and 17520 rows. I have to repeat these operations 100 times with 5 different randomly selected columns from the df. I am using the following function:
for i in range(0,100):
rand_cols = np.random.permutation(df.columns)[0:5]
df2 = df[rand_cols]
df2[:,:] *= 2
My question is the following:
Does the operation in the df2 which is the 5 random columns of df affect the columns in the original df?
Thanks
No it doesn't. Just like Valentino in the comments suggested, if you try it with some dummy DataFrame, you can see it doesn't change:
df=pd.DataFrame({'c':range(50)})
df2=df.loc[df['c']%2==0,:]
df2*=10
if you look at df you can see it didn't change.
The reason is df2 saves the view of df but not the data itself