I have two csv which have 5 columns id product name normal price discount price card price and discount both csv have different prices from different timelines so i am looking to plot a graph because i want to see the behavior of the prices for each product, both have products that are repeated, so I would like those that are repeated to appear in the graph.
I tried converting to a data frame but no success.
This is how it look the csv file
It would be helpful to your second .csv file too.
From this answer.
Assuming both your files have the same column names, use Pandas' pandas.concat() method to make one dataframe from a list of dataframes.
Then, simply plot a graph of your new dataframe as usual.
import pandas as pd
listOfDFs = []
df1 = pd.read_csv('csv1.csv')
df2 = pd.read_csv('csv2.csv')
listOfDFs.append(df1)
listOfDFs.append(df2)
dfBoth = pd.concat(listOfDFs, axis=0, ignore_index=True)
dfBoth.plot()
Related
I have a crime dataset where each row is a recorded crime, and the relevant columns are Date, Crime Type, District.
Here is an example with only 2 Districts and 2 Crime Types over a week:
I want to expand it to a dataframe that can be used to run regressions. In this simple example, I need the columns to be Date, District, Murder, Theft. Each District would have a different row for each date in the range, and the crime type categories would be the number of that crimes committed on that Date in that District
Here is the final dataframe:
I need - a time series where #Rows = #Districts * #Dates, and there is a column for each crime type
Are there any good ways to make this without looping through the dataframes?
I can create the date list like this:
datelist = pd.date_range(start='01-01-2011', end='12-31-2015', freq='1d')
But how do I merge that with my other dataframe and create the columns described above?
I will try to answer my own question here. I think I figured it out, but would appreciate any input on my method. I was able to do it without looping, but rather using pivot_table and merge.
Import packages:
import pandas as pd
from datetime import datetime
import numpy as np
Import crime dataset:
crime_df = pd.read_csv("/Users/howard/Crime_Data.csv")
Create a list of dates in the range:
datelist = pd.date_range(start='01-01-2011', end='12-31-2015', freq='1d')
Create variables for the length of this date list and length of unique districts list:
nd = len(datelist)
nu = len(df_crime['District'].unique())
Create dataframe combining dates and districts:
date_df = pd.DataFrame({'District':df_crime['District'].unique().tolist()*nd, 'Date':np.repeat(datelist,nu)})
Now we turn to our crime dataset.
I added a column of 1s to have something to sum in the next step:
crime_df["ones"] = 1
Next we take our crime data and put it in wide form using Pandas pivot_table:
crime_df = pd.pivot_table(crime_df,index=["District","Date"], columns="Crime Type", aggfunc="sum")
This gave me stacked-level columns and an unnecessary index, so I removed them with the following:
crime_df.columns.droplevel()
crime_df.reset_index(inplace=True)
The final step is to merge the two datasets. I want to put date_df first and merge on that because it includes all the dates in the range and all the districts included for each date. Thus, this uses a Left merge.
final_df = pd.merge(date_df, crime_df, on=["Date", "District"],how="left")
Now I can finish by filling in NaN with 0s
final_df.fillna(0, inplace=True)
Our final dataframe is in the correct form to do time series analyses - regressions, plotting, etc. Many of the plots in matplotlib.pyplot that I use are easier to make if the date column is the index. This can be done like this:
df_final = df_final.set_index(['Date'])
That's it! Hope this helps others and please comment on any way to improve.
One ID can have multiple dates and results and I want each date and result column stacked sideways to be stacked into 1 date and 1 result row. How can I transfer columns of a table to rows?
[Table which needs to be transposed]
enter image description here
[I want to change like this]
enter image description here
This seems to work, not sure if it's the best solution:
df2 = pd.concat([df.loc[:,['ID','Date','Result']],
df.loc[:,['ID','Date1','Result1']].rename(columns={'Date1':'Date','Result1':'Result'}),
df.loc[:,['ID','Date2','Result2']].rename(columns={'Date2':'Date','Result2':'Result'})
]).dropna().sort_values(by = 'ID')
It's just separating the dataframes, concatenating them together inline, removing the NAs and then sorting.
If you are looking to transpose data from pandas you could use pandas.DataFrame.pivot There are more examples there on the syntax.
I am trying to manipulate a data from excel file, however it has merged heading for columns, I managed to transform them in pandas. Please see example of original data below.
So I transformed to this format.
and my final goal is to get the format below and plot brand items and their sales quantity and prices over the period, however I don't know how to access info in multiindex dataframe. Could you please suggest something. Thanks.
My code:
import pandas as pd
df = pd.read_excel('path.xls', sheet_name = 'data', header = [0,1])
a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).fillna('')
df.columns = [b, df.columns.get_level_values(1)]
df.drop(0, inplace=True)
try pandas groupby or pivot_table. The pivot table include index, columns, values and aggfunc. It really nice for summarizing data.
hello stackoverflow community. I am having an issue while trying to do a simple merge between two dataframes which share the same date column. sorry I am new to python and perhaps the way I express myself is not very clear. I am working on the project related to stock prices calculation. the first data frame has date and closing prices columns, while the second one only has similar date column. my goal is to obtain a single date column which will have matching closing prices column next to it.
this is what I have done to merge two dataframes
inner_join = pd.merge(df.iloc[7:79],df1[['Ex-Date','FDX UN Equity']],on ='Ex-date',how ='inner')
inner_join
Ex-date refers to date column and FXD UN Equity refers to column with closing prices
I get this as a result:
) = self._get_merge_keys()
# validate the merge keys dtypes. We may need to coerce
# Check for duplicates
# work-around for merge_asof(right_index=True)
KeyError: 'Ex-date'```
Pandas read the format of date columns differently, so I made the same format for date columns in original excel file but it hasn't helped. I tried all sorts of various merges but it didn't work either.
anyone have any ideas what is going on?
The code would look like this
import pandas as pd
inner_join = pd.merge_asof(df, df1, on = 'Ex-date')
Change both column header name to the same lower case and merge again.. check Ex-Date.. the column name header should be the same before you merge and use how=‘left’
Basically, i use an excel file that contains thousands of data and I'm using pandas to read in the file.
import pandas as pd
agg = pd.read_csv('Station.csv', sep = ',')
Then what i did was i grouped the data accordingly to these categories,
month_station = agg.groupby(['month','StationName'])
the groupby will not be used for counting the mean, median or etc but just aggregating the data in terms of month and station name. it's what the question wants
Now, I would want to output the month_station into an excel file so first i would need to transfer the groupby into the dataframe.
I've seen examples:
pd.DataFrame(month_station.size().reset_index(name = "Group_Count"))
but the thing is, i don't require the size/count of my data but just grouping it in terms of month and station name which does not require count or sorts. I tried removing the size() and it gives me an error.
I just want the content of month_station to be ported into a dataframe so i could proceed and output as a csv file but it seemed complicated.
The nature of groupby is so that you can derive an aggregate calculation, such as mean or count or sum or etc. If you are merely trying to see on of each pair of month and station name, try this:
month_station = agg.groupby(['month','StationName'],as_index=False).count()
month_station = month_station['month','StationName']