Issue with stacking in Pandas - python

I'm trying to stack my dataset using PANDAS and set the countries as the index.
import pandas as pd
url = 'https://raw.githubusercontent.com/cleibowitz/Module-6/main/Module%206%20Dataset%20-%20GDP%20TRANSPOSED.csv'
data = pd.read_csv(url, index_col = 'Year')
data.columns.name = 'Country'
data = pd.DataFrame(data.stack().rename('value'))
data.reset_index()
data = data.query('Year == 2020')
data.set_index('Country')
data
For some reason, I keep getting this error that it can't find "Country", yet I know it is in the dataset. I'm looking for this output:
Would someone mind helping me with this? Thanks!

You must reset your index before (because index is the group of year and country) :
data.reset_index().set_index('Country')

Related

How to drop duplicated rows in a dataframe and then intergrate rows with same column given

Please show me how to convert this dataframe to the following format in python code, thank you!!!
since some raws are repeatedly appear with same column names as image1 shown, the result should contain less raws after intergration and dropping of redundancies
as mentioned it is kind of hard to answer without a usable data sample, but this solution based on random sample might help you:
import random
import pandas as pd
brands = ['jaguar', 'volvo', 'toyota']
years = [2019, 2020, 2021, 2022]
actions = ['showroom', 'testdrive', 'call']
df = pd.DataFrame()
df['brand'] = random.choices(brands, k=22)
df['year'] = random.choices(years, k=22)
df['action'] = random.choices(actions, k=22)
pd.DataFrame(df.groupby(['brand', 'year', 'action']).size()).unstack('action').fillna(0)

Python Pandas Group by

I've the below code
import pandas as pd
Orders = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\Sample.xls", sheet_name='Orders')
Returns = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\Sample.xls", sheet_name='Returns')
Sum_value = pd.DataFrame(Orders['Sales']).sum
Orders_Year = pd.DatetimeIndex(Orders['Order Date']).year
Orders.merge(Returns, how="inner", on="Order ID")
which gives the output as below
My Requirement is i would like to use groupby and would like to see the output as below
Can some one please help me how to use groupby in my above code, it means i would like to see everything in the single line by using groupby
Regards,
Bharath
You can do by selecting column then define to a new dataframe
grouped = pd.DataFrame()
groupby = ['Year','Segment','Sales']
for i in groupby:
grouped[i] = Orders[i]

Error when .loc() rows with a list of dates in pandas

I have the following code:
import pandas as pd
from pandas_datareader import data as web
df = web.DataReader('^GSPC', 'yahoo')
df['pct'] = df['Close'].pct_change()
dates_list = df.index[df['pct'].gt(0.002)]
df2 = web.DataReader('^GDAXI', 'yahoo')
df2['pct2'] = df2['Close'].pct_change()
i was trying to run this:
df2.loc[dates_list, 'pct2']
But i keep getting this error:
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported,
I am guessing this is because there are missing data for dates in dates_list. To resolve this:
idx1 = df.index
idx2 = df2.index
missing = idx2.difference(idx1)
df.drop(missing, inplace = True)
df2.drop(missing, inplace = True)
However i am still getting the same error. I dont understand why that is.
Note that dates_list has been created from df, so it includes
some dates present in index there (in df).
Then you read df2 and attempt to retrieve pct2 from rows on
just these dates.
But there is a chance that the index in df2 does not contain
all dates given in dates_list.
And just this is the cause of your exception.
To avoid it, retrieve only rows on dates present in the index.
To look for only such "allowed" (narrow down the rows specifidation),
you should pass:
dates_list[dates_list.isin(df2.index)]
Run this alone and you will see the "allowed" dates (some dates will
be eliminated).
So change the offending instruction to:
df2.loc[dates_list[dates_list.isin(df2.index)], 'pct']

Is there anyway to remove index number in python when using pandas?

This is just a simple code that can take out some dataframes by using input dates.
It works right, but my issues has suddenly appeared once more.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import datetime
plt.rc('font', family = 'Malgun Gothic')
df = pd.read_csv('seoul.csv', encoding = 'cp949', index_col=False)
df.style.hide_index()
del df['지점']
a = input("날짜 입력 yyyy-mm-dd: ")
b = input("날짜 입력 yyyy-mm-dd: ")
df['날짜'] = pd.to_datetime(df['날짜'])
mask = (df['날짜']>=a) & (df['날짜']<=b)
df.loc[mask]
And this is the result.
How can I remove these numbers?(the row that I point out with a red box)
oh edit: change index_col=0 is not work since some of rows are in a different level.
The index is the way the rows are identified. You can't remove it.
You can only reset it, if you make some selection and want to reindex your dataframe.
df = df.reset_index(drop=True)
If the argument drop is set to False, the indexes would come in an additional column named index.
Try df.to_csv(filename, index=False) –
tbhaxor
Jan 14, 2020 at 9:27

Extracting specific columns from pandas.dataframe

I'm trying to use python to read my csv file extract specific columns to a pandas.dataframe and show that dataframe. However, I don't see the data frame, I receive Series([], dtype: object) as an output. Below is the code that I'm working with:
My document consists of:
product sub_product issue sub_issue consumer_complaint_narrative
company_public_response company state zipcode tags
consumer_consent_provided submitted_via date_sent_to_company
company_response_to_consumer timely_response consumer_disputed?
complaint_id
I want to extract :
sub_product issue sub_issue consumer_complaint_narrative
import pandas as pd
df=pd.read_csv("C:\\....\\consumer_complaints.csv")
df=df.stack(level=0)
df2 = df.filter(regex='[B-F]')
df[df2]
import pandas as pd
input_file = "C:\\....\\consumer_complaints.csv"
dataset = pd.read_csv(input_file)
df = pd.DataFrame(dataset)
cols = [1,2,3,4]
df = df[df.columns[cols]]
Here specify your column numbers which you want to select. In dataframe, column start from index = 0
cols = []
You can select column by name wise also. Just use following line
df = df[["Column Name","Column Name2"]]
A simple way to achieve this would be as follows:
df = pd.read_csv("C:\\....\\consumer_complaints.csv")
df2 = df.loc[:,'B':'F']
Hope that helps.
This worked for me, using slicing:
df=pd.read_csv
df1=df[n1:n2]
Where $n1<n2# are both columns in the range, e.g:
if you want columns 3-5, use
df1=df[3:5]
For the first column, use
df1=df[0]
Though not sure how to select a discontinuous range of columns.
We can also use i.loc. Given data in dataset2:
dataset2.iloc[:3,[1,2]]
Will spit out the top 3 rows of columns 2-3 (Remember numbering starts at 0)
Then dataset2.iloc[:3,[1,2]] spits out

Categories

Resources