Related
Here is the code I am using:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high']),'green','red')
fig = go.Figure()
# add OHLC trace
fig.add_trace(go.Candlestick(x=df.index,
open=df['open'],
high=df['high'],
low=df['low'],
close=df['close'],
showlegend=False))
# add moving average traces
fig.add_trace(go.Scatter(x=df.index,
y=df['ma'],
opacity=0.7,
line=dict(color='blue', width=2),
name='MA 5'))
fig.add_trace(go.Scatter(
x = df.index,
y = df['close'],
mode = 'markers',
marker_color=df.C
))
fig.update_layout(xaxis_rangeslider_visible=False).show()`
the output
in the image, you can see that plot both True and false signal, maybe because the marker_color = "C" but if change that and use only color names it will plot noting even if i change the y = df['close'], i get the same problem
data {'timeStamp': {0: 1657220400000, 1: 1657222200000, 2: 1657224000000, 3: 1657225800000, 4: 1657227600000}, 'open': {0: 21357.7, 1: 21495.84, 2: 21812.46, 3: 21641.56, 4: 21624.03}, 'high': {0: 21499.87, 1: 21837.74, 2: 21838.1, 3: 21659.99, 4: 21727.87}, 'low': {0: 21325.0, 1: 21439.13, 2: 21526.4, 3: 21541.96, 4: 21567.56}, 'close': {0: 21495.83, 1: 21812.47, 2: 21641.56, 3: 21624.03, 4: 21619.57}, 'volume': {0: 3663.2089, 1: 7199.91652, 2: 4367.94336, 3: 1841.10043, 4: 1786.17022}, 'quoteVolume': {0: 78386481.2224664, 1: 155885063.7202956, 2: 94605455.6190078, 3: 39756576.8814698, 4: 38684342.7232105}, 'tradesCount': {0: 59053, 1: 111142, 2: 81136, 3: 56148, 4: 53122}, 'date': {0: Timestamp('2022-07-07 19:00:00'), 1: Timestamp('2022-07-07 19:30:00'), 2: Timestamp('2022-07-07 20:00:00'), 3: Timestamp('2022-07-07 20:30:00'), 4: Timestamp('2022-07-07 21:00:00')}, 'Avg_Volume': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_high': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_mid': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'spread': {0: 78.9901069365825, 1: 79.43353152203923, 2: 54.82836060314386, 3: 14.85215623146836, 4: 2.782109662528346}, 'Marker': {0: 21502.87, 1: 21840.74, 2: 21523.4, 3: 21538.96, 4: 21564.56}, 'Symbol': {0: 'triangle-up', 1: 'triangle-up', 2: 'triangle-down', 3: 'triangle-down', 4: 'triangle-down'}, 'ma': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'C': {0: 'red', 1: 'red', 2: 'red', 3: 'red', 4: 'red'}}
It seems to me that the issue is in your np.where() statement, likely with the nan values in Ma_multi_high producing the false statement in df['volume'] > df['Ma_mult_high'] that result in 'red'.
Try this:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high'].fillna(0)),'green','red')
I am trying to merge 2 dataframes and have a problem in figuring out how, as it is not straigh forward.
One data frame has match results for over 25000 games and looks like this.
The second one has team performance metrics but only for around 1500 games.
As I am not allowed to post pictures yet, here are the column names of interest:
df_match['date', 'home_team_api_id', 'away_team_api_id']
df_team_attributes['date', 'team_api_id']
Both data frames have additional columns with results or performance metrics.
To be able to merge correctly, I need to merge by date and by looking if the 'team_api_id' matches either 'home...' or 'away_team_api_id'
This is what I have tried until now:
df_team_performance = pd.merge(df_team_attributes, df_match,
how = 'left',
left_on = ['date', 'team_api_id', 'team_api_id'],
right_on = ['date', 'home_team_api_id', 'home_team_api_id'])
I have tried also with only 2 columns, but w/o succes.
What I would like to get is a new data frame with only the rows of the df_team_attributes and columns from both data frames.
Thank you in advance!
Added to request by Correlien:
output of print(df_match[['date', 'home_team_api_id', 'away_team_api_id', 'win_home', 'win_away', 'draw', 'win']].head(10).to_dict())
{'date': {0: '2008-08-17 00:00:00', 1: '2008-08-16 00:00:00', 2: '2008-08-16 00:00:00', 3: '2008-08-17 00:00:00', 4: '2008-08-16 00:00:00', 5: '2008-09-24 00:00:00', 6: '2008-08-16 00:00:00', 7: '2008-08-16 00:00:00', 8: '2008-08-16 00:00:00', 9: '2008-11-01 00:00:00'}, 'home_team_api_id': {0: 9987, 1: 10000, 2: 9984, 3: 9991, 4: 7947, 5: 8203, 6: 9999, 7: 4049, 8: 10001, 9: 8342}, 'away_team_api_id': {0: 9993, 1: 9994, 2: 8635, 3: 9998, 4: 9985, 5: 8342, 6: 8571, 7: 9996, 8: 9986, 9: 8571}, 'win_home': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 1, 9: 1}, 'win_away': {0: 0, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 1, 8: 0, 9: 0}, 'draw': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 0, 8: 0, 9: 0}, 'win': {0: 0, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1}}
output for print(df_team_attributes[['date', 'team_api_id', 'buildUpPlaySpeed', 'buildUpPlaySpeedClass']].head(10).to_dict())
{'date': {0: '2010-02-22 00:00:00', 1: '2014-09-19 00:00:00', 2: '2015-09-10 00:00:00', 3: '2010-02-22 00:00:00', 4: '2011-02-22 00:00:00', 5: '2012-02-22 00:00:00', 6: '2013-09-20 00:00:00', 7: '2014-09-19 00:00:00', 8: '2015-09-10 00:00:00', 9: '2010-02-22 00:00:00'}, 'team_api_id': {0: 9930, 1: 9930, 2: 9930, 3: 8485, 4: 8485, 5: 8485, 6: 8485, 7: 8485, 8: 8485, 9: 8576}, 'buildUpPlaySpeed': {0: 60, 1: 52, 2: 47, 3: 70, 4: 47, 5: 58, 6: 62, 7: 58, 8: 59, 9: 60}, 'buildUpPlaySpeedClass': {0: 'Balanced', 1: 'Balanced', 2: 'Balanced', 3: 'Fast', 4: 'Balanced', 5: 'Balanced', 6: 'Balanced', 7: 'Balanced', 8: 'Balanced', 9: 'Balanced'}}
Have you tried casting the your date columns into the correct format and then attempting the merge? The following worked for me based on the example that you provided -
# Casting to date
df_match["date"] = pd.to_datetime(df_match["date"])
df_team_attributes["date"] = pd.to_datetime(df_match["date"])
# Merging on the date field alone
df_team_performance = pd.merge(df_team_attributes, df_match,
how = 'left',
on = 'date')
# Filtering out the required rows
result = df_team_performance.query("(team_api_id == home_team_api_id) | (team_api_id == away_team_api_id)")
Please let me know if my understanding of your question is correct.
Essentially my program finds the person with the most yards per carry, but finds the person with only a couple of attempts.
I'm trying to filter out the rest of the players so that I only get people with above 200 yards so far in the season.
All of the data comes from a CSV file and so it has to be done thru pandas.
import pandas as pd
wide_receiver = pd.read_csv('nfl-flex.csv')
wide_receiver['ypc'] = wide_receiver.reyds / wide_receiver.rec
wr_ypc = wide_receiver[wide_receiver['pos'] == 'WR']['ypc'].max()
yards_leader = wide_receiver.loc[wide_receiver['ypc'] == wr_ypc]
print(yards_leader['name'])
I'm not quite sure how to filter out those players with less than 200 yards.
Output:
{'id': {0: 11706, 1: 11791, 2: 11792, 3: 11793, 4: 11810}, 'name': {0: 'Mark Ingram', 1: 'Rob Gronkowski', 2: 'Marcedes Lewis', 3: 'Jimmy Graham', 4: 'Jared Cook'}, 'fpts': {0: 100.5, 1: 90.8, 2: 26.1, 3: 21.8, 4: 90.1}, 'gp': {0: 11, 1: 6, 2: 12, 3: 9, 4: 11}, 'cmp': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'att': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'payds': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'patd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'int': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'ruatt': {0: 137, 1: 0, 2: 0, 3: 0, 4: 0}, 'ruyds': {0: 499, 1: 0, 2: 0, 3: 0, 4: 0}, 'rutd': {0: 2, 1: 0, 2: 0, 3: 0, 4: 0}, 'tar': {0: 31, 1: 39, 2: 17, 3: 12, 4: 55}, 'rec': {0: 24, 1: 29, 2: 14, 3: 6, 4: 33}, 'rzatt': {0: 22, 1: 0, 2: 0, 3: 0, 4: 0}, 'rztar': {0: 5, 1: 8, 2: 2, 3: 5, 4: 7}, 'reyds': {0: 156, 1: 378, 2: 121, 3: 98, 4: 371}, 'retd': {0: 0, 1: 4, 2: 0, 3: 1, 4: 3}, 'fuml': {0: 1, 1: 0, 2: 0, 3: 0, 4: 0}, 'putd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'krtd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'fumtd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptpa': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptru': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptre': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 'pct': {0: '0.00%', 1: '0.00%', 2: '0.00%', 3: '0.00%', 4: '0.00%'}, 'ruypc': {0: 3.64, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'reypc': {0: 6.5, 1: 13.03, 2: 8.64, 3: 16.33, 4: 11.24}, 'tchs': {0: 161, 1: 29, 2: 14, 3: 6, 4: 33}, 'tyds': {0: 655, 1: 378, 2: 121, 3: 98, 4: 371}, 'team': {0: 'NOS', 1: 'TBB', 2: 'GBP', 3: 'CHI', 4: 'LAC'}, 'pos': {0: 'RB', 1: 'TE', 2: 'TE', 3: 'TE', 4: 'TE'}, 'ypc': {0: 6.5, 1: 13.03448275862069, 2: 8.642857142857142, 3: 16.333333333333332, 4: 11.242424242424242}}
You filter when you did yards_leader = wide_receiver.loc[wide_receiver['ypc'] == wr_ypc], so now just use that same concept.
import pandas as pd
sample_dict = {'id': {0: 11706, 1: 11791, 2: 11792, 3: 11793, 4: 11810}, 'name': {0: 'Mark Ingram', 1: 'Rob Gronkowski', 2: 'Marcedes Lewis', 3: 'Jimmy Graham', 4: 'Jared Cook'}, 'fpts': {0: 100.5, 1: 90.8, 2: 26.1, 3: 21.8, 4: 90.1}, 'gp': {0: 11, 1: 6, 2: 12, 3: 9, 4: 11}, 'cmp': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'att': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'payds': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'patd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'int': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'ruatt': {0: 137, 1: 0, 2: 0, 3: 0, 4: 0}, 'ruyds': {0: 499, 1: 0, 2: 0, 3: 0, 4: 0}, 'rutd': {0: 2, 1: 0, 2: 0, 3: 0, 4: 0}, 'tar': {0: 31, 1: 39, 2: 17, 3: 12, 4: 55}, 'rec': {0: 24, 1: 29, 2: 14, 3: 6, 4: 33}, 'rzatt': {0: 22, 1: 0, 2: 0, 3: 0, 4: 0}, 'rztar': {0: 5, 1: 8, 2: 2, 3: 5, 4: 7}, 'reyds': {0: 156, 1: 378, 2: 121, 3: 98, 4: 371}, 'retd': {0: 0, 1: 4, 2: 0, 3: 1, 4: 3}, 'fuml': {0: 1, 1: 0, 2: 0, 3: 0, 4: 0}, 'putd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'krtd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, 'fumtd': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptpa': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptru': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0}, '2ptre': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1}, 'pct': {0: '0.00%', 1: '0.00%', 2: '0.00%', 3: '0.00%', 4: '0.00%'}, 'ruypc': {0: 3.64, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'reypc': {0: 6.5, 1: 13.03, 2: 8.64, 3: 16.33, 4: 11.24}, 'tchs': {0: 161, 1: 29, 2: 14, 3: 6, 4: 33}, 'tyds': {0: 655, 1: 378, 2: 121, 3: 98, 4: 371}, 'team': {0: 'NOS', 1: 'TBB', 2: 'GBP', 3: 'CHI', 4: 'LAC'}, 'pos': {0: 'RB', 1: 'TE', 2: 'TE', 3: 'TE', 4: 'TE'}, 'ypc': {0: 6.5, 1: 13.03448275862069, 2: 8.642857142857142, 3: 16.333333333333332, 4: 11.242424242424242}}
sample_df = pd.DataFrame(sample_dict)
filtered_sample_df = sample_df[sample_df['reyds'] > 200]
Output:
print(sample_df)
id name fpts gp cmp ... tchs tyds team pos ypc
0 11706 Mark Ingram 100.5 11 0 ... 161 655 NOS RB 6.500000
1 11791 Rob Gronkowski 90.8 6 0 ... 29 378 TBB TE 13.034483
2 11792 Marcedes Lewis 26.1 12 0 ... 14 121 GBP TE 8.642857
3 11793 Jimmy Graham 21.8 9 0 ... 6 98 CHI TE 16.333333
4 11810 Jared Cook 90.1 11 0 ... 33 371 LAC TE 11.242424
[5 rows x 33 columns]
print(filtered_sample_df)
id name fpts gp cmp ... tchs tyds team pos ypc
1 11791 Rob Gronkowski 90.8 6 0 ... 29 378 TBB TE 13.034483
4 11810 Jared Cook 90.1 11 0 ... 33 371 LAC TE 11.242424
[2 rows x 33 columns]
I would like to save excel to CSV but make up the cells
My code:
import pandas as pd
import openpyxl
in_xls = 'excel01.xlsx'
sheet = 'Arkusz1'
with pd.ExcelFile(in_xls, engine="openpyxl") as ex:
excel = pd.read_excel(ex, sheet, header=None)
excel.to_csv('excel_out.csv', index=False)
My excel file:
enter link description here
as a dictionary use pd.DatFrame(d)
d = {0: {0: 'TEST\nPpp666', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 1: {0: 39191012, 1: 39191012, 2: 39191012, 3: 39191012, 4: 39191012, 5: 39191012}, 2: {0: 5906194003016, 1: 5906194003023, 2: 5906194003030, 3: 5906194003054, 4: 5906194003085, 5: 5906194003115}, 3: {0: 'DN-113H181-0019018', 1: 'DN-113H182-0019018', 2: 'DN-113H183-0019018', 3: 'DN-113H185-0019018', 4: 'DN-113H188-0019018', 5: 'DN-113H18T-K019018'}, 4: {0: 'Pierwszy, drugi\nTrzeci\nCzwarty', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 5: {0: 'czarny', 1: 'czerwony', 2: 'niebieski', 3: 'biały ', 4: 'żółty', 5: 'Tęcza 5 kolorów'}, 6: {0: 100, 1: 100, 2: 100, 3: 100, 4: 100, 5: 20}, 7: {0: '19mmx18m', 1: '19mmx18m', 2: '19mmx18m', 3: '19mmx18m', 4: '19mmx18m', 5: '19mmx18m'}, 8: {0: '5,80', 1: '5,80', 2: '5,80', 3: '5,80', 4: '5,80', 5: '29,40'}}
My output file csv:
0,1,2,3,4,5,6,7,8
"TEST
Ppp666",39191012,5906194003016,DN-113H181-0019018,"Pierwszy, drugi
Trzeci
Czwarty",czarny,100,19mmx18m,5.8
,39191012,5906194003023,DN-113H182-0019018,,czerwony,100,19mmx18m,5.8
,39191012,5906194003030,DN-113H183-0019018,,niebieski,100,19mmx18m,5.8
,39191012,5906194003054,DN-113H185-0019018,,biały ,100,19mmx18m,5.8
,39191012,5906194003085,DN-113H188-0019018,,żółty,100,19mmx18m,5.8
,39191012,5906194003115,DN-113H18T-K019018,,Tęcza 5 kolorów,20,19mmx18m,29.4
I would like to get a CSV file
This is related to this SO question: read_excel in pandas giving error for no header and multiple index_col's
But instead of a workaround, I would like to know why this is happening. The data frame:
The data:
{0: {0: nan, 1: nan, 2: nan, 3: 'A', 4: 'A', 5: 'B', 6: 'B', 7: 'C', 8: 'C'},
1: {0: nan, 1: nan, 2: nan, 3: 1.0, 4: 2.0, 5: 1.0, 6: 2.0, 7: 1.0, 8: 2.0},
2: {0: 'AA1', 1: 'a', 2: 'ng/mL', 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
3: {0: 'AA2', 1: 'a', 2: nan, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
4: {0: 'BB1', 1: 'b', 2: nan, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
5: {0: 'BB2', 1: 'b', 2: 'mL', 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
6: {0: 'CC1', 1: 'c', 2: nan, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
7: {0: 'CC2', 1: 'c', 2: nan, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}}
Reading the data like:
pd.read_excel(file_path, skiprows=3, index_col=[0, 1], header=None)
Does not work:
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Why?
The explanation is given in the full traceback:
....
File "D:\Programme\Python36\lib\site-packages\pandas\io\excel\_base.py", line 473, in parse offset = 1 + header
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
header is set to None and in calculating the offset it tries to add 1 to None which results in a TypeError. I think it's simply a bug.
The following is absolutely without any warranty:
line 473 of ...\Lib\site-packages\pandas\io\excel_base.py should be changed from offset = 1 + header to offset = 1 + header if header is not None else -1 to make multiple index columns work with header=None