I'm using Python and I have a DataFrama as the following:
I need to delete all the index (columns/rows) in order to get only the name of the columns. How can I do this? please. By the way, I'm using google colab to run the code.
Thanks!
I think you are wanting the first row as headers?
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
Related
I am working on a python data analytics.
first. this is the raw data
I want to get a result like
and My code is like
df_sellout.groupby("Brand")[:,0:4].sum()
But this doesn't work.
I want to use [:,0:4] because I have another massive data which I can't write all the columns name.
Can anyone help me please?
Try this;
df_sellout.groupby("Brand")[df_sellout.columns[2:].sum()
Put indexing by iloc in front of groupby:
df_sellout.iloc[:,0:4].groupby("Brand").sum()
I want to create a new column(not from an existing columns) having MapType(StringType(),StringType()) how do I achieve this.
So far, I was able to write the below code but it's not working.
df = df.withColumn("normalizedvariation",lit(None).cast(MapType(StringType(),StringType())))
Also I would like to know the different methods to achieve the same, Thank you.
This is one way you can try, newMapColumn here woukld be the name of the Map column.
You can see the output I got below. If that is not what you are looking for, please let me know. Thanks!
Also you will have to import the functions using the below line:
from pyspark.sql.functions import col,lit,create_map
I am importing several csv files into python using Jupyter notebook and pandas and some are created without a proper index column. Instead, the first column, which is data that I need to manipulate is used. How can I create a regular index column as first column? This seems like a trivial matter, but I can't find any useful help anywhere.
What my dataframe looks like
What my dataframe should look like
Could you please try this:
df.reset_index(inplace = True, drop = True)
Let me know if this works.
When you are reading in the csv, use pandas.read_csv(index_col= #, * args). If they don't have a proper index column, set index_col=False.
To change indices of an existing DataFrame df, try the methods df = df.reset_index() or df=df.set_index(#).
When you imported your csv, did you use the index_col argument? It should default to None, according to the documentation. If you don't use the argument, you should be fine.
Either way, you can force it not to use a column by using index_col=False. From the docs:
Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.
Python 3.8.5
pandas==1.2.4
pd.read_csv('file.csv', header=None)
I found the solution in the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
So I have an excel sheet with the following format:
Now what I'm looking to do is to loop trough each index cell in column A and assign all cells the same value until the next 0 is reached. so for example:
Now I have tried importing the excel file into a pandas dataframe and then using for loops to do this, but I can't seem to make it work. Any suggestions or directions to the appropriate method would be much appreciated!
Thank you for your time
Edit:
Using #wen-ben's method: s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
just enters the first element (bananas) for all cells in Column A
Assuming you have dataframe s using cumsum
s.index=pd.Series((s.index==0).cumsum()).map({1:'bananas',2:'cherries',3:'pineapples'})
I have a tab separated file which I extracted in pandas dataframe as below:
import pandas as pd
data1 = pd.DataFrame.from_csv(r"C:\Users\Ashish\Documents\indeed_ml_dataset\train.tsv", sep="\t")
data1
Here is how the data1 looks like:
Now, I want to view the column name tags. I don't know whether I should call it a column or not, but I have tried accessing it using the norm:
data2=data1[['tags']]
but it errors out. I have tried several other things as well using index and loc, but all of them fails. Any suggestions?
To fix this you'll need to remove description from the index by resetting. Try the below:
data2 = data1.reset_index()
data2['tags']
You'll then be able to select by "tags".
Try reading your data using pd.read_csv instead of pd.DataFrame.from_csv as it takes first column as index by default.
For more info refer to this documentation on pandas website: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html