I'm trying to get a scope of certain rows in a specific timeframe. The dataframe contains 2 indexes and one of them is made of datetimes (created with pd.to_datetime). When I try to select certain rows using df_pivot.loc[slice(None), '2021'] I get a KeyError: '2021'. Looking for rows using the year should be possible with datetimes right??? What do i do wrong? picture of the dataframe/indexes
problem is solved, I used reset_index() and then set_index('Datetime') to make it easier to navigate
Related
I'm importing data from nasdaqdatalink api
Two questions from this:
(1) How is this already a Pandas DataFrame without me needing to type df = pd.DataFrame ?
(2) The 'Date' column, doesn't appear to be a DataFrame column? if I try df.columns it doesn't show up in the index and obviously has no header. So I am confused on what's happening here.
Essentially, I wanted to select data from this DataFrame between two dates, but the only way I really know how to do that is by selecting the column name first. However, I'm missing something here. I tried to rename the column in position [0] but that just created a new column named 'Date' with NaN values.
What am I not understanding? (I've only just begun learning Python, Pandas etc. ~1 month ago ! so this is about as far as I could go on my own without more help)
screenshot
There's actually a better way, by keeping Date as the index, see the output of:
df.loc['2008-01-01':'2009-01-01']
df.reset_index() makes whatever the current index is into a column.
I have a excel in below format
Note:- Values in Column Name will be dynamic. In current example 10 records are shown. In another set of data it can be different number of column name.
I want to convert the rows into columns as below
Is there any easy option in python pandas to handle this scenario?
Thanks #juhat for the suggestion on pivot table. I was able to achieve the intended result with this code:
fsdData = pd.read_csv("py_fsd.csv")
fsdData.pivot(index="msg Srl", columns="Column Name", values="Value")
I am new to Python and I want to access some rows for an already grouped dataframe (used groupby).
However, I am unable to select the row I want and would like your help.
The code I used for groupby shown below:
language_conversion = house_ads.groupby(['date_served','language_preferred']).agg({'user_id':'nunique',
'converted':'sum'})
language_conversion
Result shows:
For example, I want to access the number of Spanish-speaking users who received house ads using:
language_conversion[('user_id','Spanish')]
gives me KeyError('user_id','Spanish')
This is the same when I try to create a new column, which gives me the same error.
Thanks for your help
Use this,
language_conversion.loc[(slice(None), 'Arabic'), 'user_id']
You can see the indices(in this case tuples of length 2) using language_conversion.index
you should use this
language_conversion.loc[(slice(None),'Spanish'), 'user_id']
slice(None) here includes all rows in date index.
if you have one particular date in mind just replace slice(None) with that specific date.
the error you are getting is because u accessed columns before indexes which is not correct way of doing it follow the link to learn more indexing
I have two Dataframes in which I am trying to merge using pandas. One table is 4 columns and the other one is 3. I am attempting an inner join on an int64 type column.
On the link you can see both columns named UPC are int64 types.
Just to make sure the Dataframes weren't empty, I have added a picture of the first 20 rows for each table.
when I try to merge I put the following command.
result = merge(MPA_COMMODITY, MDM_LINK_VIEW, on='UPC')
When I try to check the return value, it returns the column names but it says that the Dataframe is empty.
This is using Python 3.6.4 and Pandas version 0.22.0.
If there is any other information needed please let me know. More than glad to update post if have to.
I think you want
MPA_COMMODITY.merge(MDM_LINK_VIEW, on='UPC')
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
I am fairly new to Python and Pandas and I have not found an answer to this question while searching.
I have multiple csv data files that all contain a date-time column and corresponding data. I wanted to create a series/dataframe that contains a specific span of dates (all data is 1 min interval, so if I wanted to look at July for example I would set the index to start at July and go until the end).
Can I create a series or dataframe that contains only the date-time intervals as an index and does not contain column info? Or would I create an index (the row numbers) and then fill my column with the dates.
I also am unsure of using 'pd.merge' vs 'newdataframe = pd.merge'. When using just pd.merge, nothing comes up in my variable explorer (I use Anaconda's Spyder IDE), only when I use newdataframe = pd.merge does it appear.
Thanks in advance,