Backstory: I'm fairly new to python, and have only ever done things in MATLAB prior.
I am looking to take a specific value from a table based off of data I have.
The data I have is
Temperatures = [0.8,0.1,-0.8,-1.4,-1.7,-1.5,-2,-1.7,-1.7,-1.3,-0.7,-0.2,0.3,1.4,1.4,1.5,1.2,1,0.9,1.3,1.7,1.7,1.6,1.6]
Hour of the Day =
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
This is all data for a Monday.
My Monday table looks like this:
Temp | Hr0 | Hr1 | Hr2 ...
-15 < t <= -10 | 0.01 | 0.02 | 0.06 ...
-10 < t <= -5 | 0.04 | 0.03 | 0.2 ...
with the Temperatures increment by +5 until 30, and the hours of the day until 23. The values in the table are constants that I would like to call based off of the temperature and hour.
For example, I'd like to be able to say:
print(monday(1,1)) = 0.01
I would also be doing this for everyday of the week for a mass data analysis, thus the need for it to be efficient.
What I've done so far:
So i have stored all of my tables in dictionaries that look kind of like this:
monday_hr0 = [0.01,0.04, ... ]
So first by column then calling them by the temperature value.
What I have now is a bunch of loops that looks like this:
for i in range (0,365):
for j in range (0,24):
if Day[i] = monday
if hr[i+24*j] = 0
if temp[i] = -15
constant.append(monday_hr1[0])
...
if hr[i+24*j] = 1
if temp[i] = -15
constant.append(monday_hr2[0])
...
...
elif Day[i] = tuesday
if hr[i+24*j] = 0
if temp[i] = -15
constant.append(tuesday_hr1[0])
...
if hr[i+24*j] = 1
if temp[i] = -15
constant.append(tuesday_hr2[0])
...
...
...
I'm basically saying here if it's a monday, use this table. Then if it's this hour use this column. Then if it's this temperature, use this cell. This is VERY VERY inefficient however.
I'm sure there's a quicker way but I can't wrap my head around it. Thank you very much for your help!
Okay, bear with me here, I'm on mobile. I'll try to write up a solution.
I am assuming the following:
you have a dictionary called day_data which contains the table of data for each day of the week.
you have a dictionary called days which maps 0-6 to a day of the week. 0 is monday, 6 is Sunday.
you have a list of temperatures you want something done with
you have a time of the day you want to use to pick out the appropriate data from your day_data. You want to do this for each day of the year.
We should only have to iterate once through all 365 days and once through each hour of the day.
heat-load-days={}
for day_index in range(1,365):
day=Days[day_index%7]
#day is now the Day of the week.
data = day_data[day]
Heat_load =[]
for hour in range(24):
#still unsure on how to select which temperature row from the data table.
Heat_load.append (day_data_selected)
heat-load-days [day] = Heat_load
I am a new Python convert (from Matlab). I am using the pandas groupby function, and I am getting tripped up by a seemingly easy problem. I have written a custom function that I apply to the grouped df that returns 4 different values. Three of the values are working great, but the other value is giving me an error. Here is the original df:
Index,SN,Date,City,State,ID,County,Age,A,B,C
0,32,9/1/16,X,AL,360,BB County,29.0,negative,positive,positive
1,32,9/1/16,X,AL,360,BB County,1.0,negative,negative,negative
2,32,9/1/16,X,AL,360,BB County,10.0,negative,negative,negative
3,32,9/1/16,X,AL,360,BB County,11.0,negative,negative,negative
4,35,9/1/16,X,AR,718,LL County,67.0,negative,negative,negative
5,38,9/1/16,X,AR,728-13,JJ County,3.0,negative,negative,negative
6,38,9/1/16,X,AR,728-13,JJ County,8.0,negative,negative,negative
7,30,9/1/16,X,AR,728-13,JJ County,8.0,negative,negative,negative
8,30,9/1/16,X,AR,728-13,JJ County,14.0,negative,negative,negative
9,30,9/1/16,X,AR,728-13,JJ County,5.0,negative,negative,negative
...
This is the function that transforms the data. Basically, it counts the number of 'positive' values and the total number of observations in the group. I also want it to return the ID value, and this is where the problem is:
def _ct_id_pos(grp):
return grp['ID'][0], grp[grp.A == 'positive'].shape[0], grp[grp.B == 'positive'].shape[0], grp.shape[0]
I apply the _ct_id_pos function to the data grouped by Date and SN:
FullMx_prime = FullMx.groupby(['Date', 'SN']).apply(_ct_id_pos).reset_index()
So, the method should return something like this:
Date SN ID 0
0 9/1/16 32 360 (360,2,1,4)
1 9/1/16 35 718 (718,0,0,1)
2 9/2/16 38 728 (728,1,0,2)
3 9/3/16 30 728 (728,2,0,3)
But, I keep getting the following error:
...
KeyError: 0
Obviously, it does not like this part of the function: grp['ID'][0] . I just want to take the first value of grp['ID'] because--if there are multiple values--they should all be the same (i.e., I could take the last, it does not matter). I have tried other ways to index, but to no avail.
Change grp['ID'][0] to grp.iloc[0]['ID']
The problem you are having is due to grp['ID'] which selects a column and returns a pandas.Series. Which is straight forward enough, and you could reasonably expect that [0] would select the first element. But the [0] actually selects based on the index for the Series, and in this case the index is from the dataframe that was grouped. So, 0 is not always going to be a valid index.
Code:
def _ct_id_pos(grp):
id = grp.iloc[0]['ID']
a = grp[grp.A == 'positive'].shape[0]
b = grp[grp.B == 'positive'].shape[0]
sz = grp.shape[0]
return id, a, b, sz
Test Code:
df = pd.read_csv(StringIO(u"""
Index,SN,Date,City,State,ID,County,Age,A,B,C
0,32,9/1/16,X,AL,360,BB County,29.0,negative,positive,positive
1,32,9/1/16,X,AL,360,BB County,1.0,negative,negative,negative
2,32,9/1/16,X,AL,360,BB County,10.0,negative,negative,negative
3,32,9/1/16,X,AL,360,BB County,11.0,negative,negative,negative
4,35,9/1/16,X,AR,718,LL County,67.0,negative,negative,negative
5,38,9/1/16,X,AR,728-13,JJ County,3.0,negative,negative,negative
6,38,9/1/16,X,AR,728-13,JJ County,8.0,negative,negative,negative
7,30,9/1/16,X,AR,728-13,JJ County,8.0,negative,negative,negative
8,30,9/1/16,X,AR,728-13,JJ County,14.0,negative,negative,negative
9,30,9/1/16,X,AR,728-13,JJ County,5.0,negative,negative,negative
"""), header=0, index_col=0)
print(df.groupby(['Date', 'SN']).apply(_ct_id_pos).reset_index())
Results:
Date SN 0
0 9/1/16 30 (728-13, 0, 0, 3)
1 9/1/16 32 (360, 0, 1, 4)
2 9/1/16 35 (718, 0, 0, 1)
3 9/1/16 38 (728-13, 0, 0, 2)
I have .csv data that I want to sort by it's date column. My date format is of the following:
Week,Quarter,Year: So WK01Q12001 for example.
When I .sort() my dataframe on this column, the resulting is sorted like:
WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, ... WK02Q12001, WK02Q12002...
for example. This makes sense because its sorting the string in ascending order.
But I need my data sorted chronologically such that the result is like the following:
WK01Q12001, WK02Q12001, WK03Q12001, WK04Q12001, ... , WK01Q22001, WK02Q22001, ... WK01Q12002, WK02Q22002 ...
How can I sort it this way using pandas? Perhaps sorting the string in reverse? (right to left) or creating some kind of datetime object?
I have also tried using Series(): pd.Series([pd.to_datetime(d) for d in weeklyData['Date']])
But the result is same as the above .sort() method.
UPDATE:
My DataFrame is similar in format to an excel sheet and currently looks like the following. I want to sort chronologically by 'Date'.
Date Price Volume
WK01Q12001 32 500
WK01Q12002 43 400
WK01Q12003 55 300
WK01Q12004 58 350
WK01Q22001 33 480
WK01Q22002 40 450
.
.
.
WK13Q42004 60 400
You can add a new column to your dataframe containing the date components as a list.
e.g.
a = ["2001", "Q2", "WK01"]
b = ["2002", "Q2", "WK01"]
c = ["2002", "Q2", "WK02"]
So, you can apply a function to your data frame to do this...
def tolist(x):
g = re.match(r"(WK\d{2})(Q\d)(\d{4})", str(x))
return [g.group(3), g.group(2), g.group(1)]
then...
df['datelist'] = df['Date'].apply(tolist)
which gives you your date as a list arranged in the order of importance...
Date Price Volume datelist
0 WK01Q12001 32 500 [2001, Q1, WK01]
1 WK01Q12002 22 400 [2002, Q1, WK01]
2 WK01Q12003 42 500 [2003, Q1, WK01]
When comparing lists of equal length in Python the comparison operators behave well. So, you can use the standard DataFrame sort to order your data.
So the default sorting in a Pandas series will work correctly when you do...
df.sort('datelist')
Use str.replace to change the order of the keys first:
s = "WK01Q12001, WK01Q12002, WK01Q12003, WK01Q22001, WK01Q22002, WK01Q22003, WK02Q12001, WK02Q12002"
date = map(str.strip, s.split(","))
df = pd.DataFrame({"date":date, "value":range(len(date))})
df["date2"] = df.date.str.replace(r"WK(\d\d)Q(\d)(\d{4})", r"\3Q\2WK\1")
df.sort("date2")
I was also able to accomplish this Date reformatting very easily using SQL. When I first query my data, I did SELECT *,
RIGHT([Date], 4) + SUBSTRING([Date], 5, 2) + LEFT([Date], 4) As 'SortedDate'
FROM [Table]
ORDER BY 'SortedDate' ASC.
Use the right tool for the job!