how to plot only True signal with plotly candlestick chart - python

Here is the code I am using:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high']),'green','red')
fig = go.Figure()
# add OHLC trace
fig.add_trace(go.Candlestick(x=df.index,
open=df['open'],
high=df['high'],
low=df['low'],
close=df['close'],
showlegend=False))
# add moving average traces
fig.add_trace(go.Scatter(x=df.index,
y=df['ma'],
opacity=0.7,
line=dict(color='blue', width=2),
name='MA 5'))
fig.add_trace(go.Scatter(
x = df.index,
y = df['close'],
mode = 'markers',
marker_color=df.C
))
fig.update_layout(xaxis_rangeslider_visible=False).show()`
the output
in the image, you can see that plot both True and false signal, maybe because the marker_color = "C" but if change that and use only color names it will plot noting even if i change the y = df['close'], i get the same problem
data {'timeStamp': {0: 1657220400000, 1: 1657222200000, 2: 1657224000000, 3: 1657225800000, 4: 1657227600000}, 'open': {0: 21357.7, 1: 21495.84, 2: 21812.46, 3: 21641.56, 4: 21624.03}, 'high': {0: 21499.87, 1: 21837.74, 2: 21838.1, 3: 21659.99, 4: 21727.87}, 'low': {0: 21325.0, 1: 21439.13, 2: 21526.4, 3: 21541.96, 4: 21567.56}, 'close': {0: 21495.83, 1: 21812.47, 2: 21641.56, 3: 21624.03, 4: 21619.57}, 'volume': {0: 3663.2089, 1: 7199.91652, 2: 4367.94336, 3: 1841.10043, 4: 1786.17022}, 'quoteVolume': {0: 78386481.2224664, 1: 155885063.7202956, 2: 94605455.6190078, 3: 39756576.8814698, 4: 38684342.7232105}, 'tradesCount': {0: 59053, 1: 111142, 2: 81136, 3: 56148, 4: 53122}, 'date': {0: Timestamp('2022-07-07 19:00:00'), 1: Timestamp('2022-07-07 19:30:00'), 2: Timestamp('2022-07-07 20:00:00'), 3: Timestamp('2022-07-07 20:30:00'), 4: Timestamp('2022-07-07 21:00:00')}, 'Avg_Volume': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_high': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_mid': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'spread': {0: 78.9901069365825, 1: 79.43353152203923, 2: 54.82836060314386, 3: 14.85215623146836, 4: 2.782109662528346}, 'Marker': {0: 21502.87, 1: 21840.74, 2: 21523.4, 3: 21538.96, 4: 21564.56}, 'Symbol': {0: 'triangle-up', 1: 'triangle-up', 2: 'triangle-down', 3: 'triangle-down', 4: 'triangle-down'}, 'ma': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'C': {0: 'red', 1: 'red', 2: 'red', 3: 'red', 4: 'red'}}

It seems to me that the issue is in your np.where() statement, likely with the nan values in Ma_multi_high producing the false statement in df['volume'] > df['Ma_mult_high'] that result in 'red'.
Try this:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high'].fillna(0)),'green','red')

Related

Data Manipulating one dataframe into another using for loops and dictionaries

I have a data set that I need to reformat so that I can plot and work with it further. It is sort of an transpose action but I am struggling to not overwrite the data in the new dataframe. I sorted out the headings using dictionaries and it maps the fields from the original df to the new output df correctly. It is just overwriting the first entry and not adding a new POLY/POLY_NAME
Input dataframe:
Output dataframe:
Below is my code so far:
import pandas as pd
fractions = {"A": 1.35, "B": 1.40, "C": 1.45}
quality = {"POLY_NAME":"POLY", "AS":"Ash", "CV":"CV","FC":"FC","MS":"Moist","TS":"Tots","VM":"Vols","YL":"Yield"}
frac = list(fractions.values())
headers = list(quality.values())
df = pd.DataFrame(columns=headers, index=frac)
wash_dic = {'POLY_NAME': {0: 'Asset 1', 1: 'Asset 2', 2: 'Asset 3'},
'RD': {0: 1.63, 1: 1.63, 2: 1.57},
'SEAMTH': {0: 3.02, 1: 3.02, 2: 3.37},
'AAS': {0: 7.76, 1: 7.34, 2: 7.24},
'ACV': {0: 28.98, 1: 29.18, 2: 29.27},
'AFC': {0: 54.95, 1: 53.55, 2: 52.38},
'AMS': {0: 4.22, 1: 4.26, 2: 4.63},
'ATS': {0: 0.97, 1: 1.09, 2: 1.23},
'AVM': {0: 33.07, 1: 34.85, 2: 35.75},
'AYL': {0: 0.4, 1: 0.95, 2: 0.75},
'BAS': {0: 9.28, 1: 9.27, 2: 9.58},
'BCV': {0: 28.17, 1: 28.33, 2: 28.09},
'BFC': {0: 56.21, 1: 54.39, 2: 52.11},
'BMS': {0: 4.25, 1: 4.25, 2: 4.61},
'BTS': {0: 0.84, 1: 1.01, 2: 1.22},
'BVM': {0: 30.25, 1: 32.08, 2: 33.7},
'BYL': {0: 3.11, 1: 5.44, 2: 4.36},
'CAS': {0: 11.01, 1: 10.96, 2: 11.25},
'CCV': {0: 27.31, 1: 27.53, 2: 27.39},
'CFC': {0: 58.09, 1: 56.0, 2: 53.43},
'CMS': {0: 4.41, 1: 4.38, 2: 4.62},
'CTS': {0: 0.63, 1: 0.83, 2: 0.98},
'CVM': {0: 26.5, 1: 28.66, 2: 30.71},
'CYL': {0: 13.45, 1: 16.11, 2: 12.94}}
wash = pd.DataFrame(wash_dic)
wash
for label, content in wash.items():
print('fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
for c in content:
try:
df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
except:
pass
I have tried to add another for loop but the logic is escaping me currently.
Required outcome as dictionary
outcome = {'Unnamed: 0': {0: 1.35,
1: 1.4,
2: 1.45,
3: 1.35,
4: 1.4,
5: 1.45,
6: 1.35,
7: 1.4,
8: 1.45},
'POLY': {0: 'Asset 1',
1: 'Asset 1',
2: 'Asset 1',
3: 'Asset 2',
4: 'Asset 2',
5: 'Asset 2',
6: 'Asset 3',
7: 'Asset 3',
8: 'Asset 3'},
'Ash': {0: 7.76,
1: 9.28,
2: 11.01,
3: 7.34,
4: 9.27,
5: 10.96,
6: 7.24,
7: 9.58,
8: 11.25},
'CV': {0: 28.98,
1: 28.17,
2: 27.31,
3: 29.18,
4: 28.33,
5: 27.53,
6: 29.27,
7: 28.09,
8: 27.39},
'FC': {0: 54.95,
1: 56.21,
2: 58.09,
3: 53.55,
4: 54.39,
5: 56.0,
6: 52.38,
7: 52.11,
8: 53.43},
'Moist': {0: 4.22,
1: 4.25,
2: 4.41,
3: 4.26,
4: 4.25,
5: 4.38,
6: 4.63,
7: 4.61,
8: 4.62},
'Tots': {0: 0.97,
1: 0.84,
2: 0.63,
3: 1.09,
4: 1.01,
5: 0.83,
6: 1.23,
7: 1.22,
8: 0.98},
'Vols': {0: 33.07,
1: 30.25,
2: 26.5,
3: 34.85,
4: 32.08,
5: 28.66,
6: 35.75,
7: 33.7,
8: 30.71},
'Yiels': {0: 0.4,
1: 3.11,
2: 13.45,
3: 0.95,
4: 5.44,
5: 16.11,
6: 0.75,
7: 4.36,
8: 12.94}}
Regards
I resolved to duplicate/overwriting of the values by first grouping the original wash DF and then in the for loop and the data of each loop into a blank DF and at the end of the loop append it to the Final DF. Just for neatness I made the index column a normal column and reordered the columns.
groups = wash.groupby("POLY_NAME")
df_final = pd.DataFrame(columns=headers)
for name, group in groups:
df = pd.DataFrame(columns=headers)
for label, content in group.items():
if quality.get(label[-2:]) in headers:
#print(label)
#print(name)
#print(label, content)
for c in content:
try:
df.loc[fractions.get(label[0]), "POLY"] = name
df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
#print('Poly:', name, ' fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
except:
pass
df_final = df_final.append(df)
df_final = df_final.reset_index().rename({'index':'FLOAT'}, axis = 'columns')
df_final = df_final.reindex(columns=["POLY","FLOAT","Ash","CV","FC","Moist","Tots","Vols","Yield"])
Might not be the neatest or fastest method but it gives the required results.

Pandas export Excel to CSV - merg cell

I would like to save excel to CSV but make up the cells
My code:
import pandas as pd
import openpyxl
in_xls = 'excel01.xlsx'
sheet = 'Arkusz1'
with pd.ExcelFile(in_xls, engine="openpyxl") as ex:
excel = pd.read_excel(ex, sheet, header=None)
excel.to_csv('excel_out.csv', index=False)
My excel file:
enter link description here
as a dictionary use pd.DatFrame(d)
d = {0: {0: 'TEST\nPpp666', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 1: {0: 39191012, 1: 39191012, 2: 39191012, 3: 39191012, 4: 39191012, 5: 39191012}, 2: {0: 5906194003016, 1: 5906194003023, 2: 5906194003030, 3: 5906194003054, 4: 5906194003085, 5: 5906194003115}, 3: {0: 'DN-113H181-0019018', 1: 'DN-113H182-0019018', 2: 'DN-113H183-0019018', 3: 'DN-113H185-0019018', 4: 'DN-113H188-0019018', 5: 'DN-113H18T-K019018'}, 4: {0: 'Pierwszy, drugi\nTrzeci\nCzwarty', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 5: {0: 'czarny', 1: 'czerwony', 2: 'niebieski', 3: 'biały ', 4: 'żółty', 5: 'Tęcza 5 kolorów'}, 6: {0: 100, 1: 100, 2: 100, 3: 100, 4: 100, 5: 20}, 7: {0: '19mmx18m', 1: '19mmx18m', 2: '19mmx18m', 3: '19mmx18m', 4: '19mmx18m', 5: '19mmx18m'}, 8: {0: '5,80', 1: '5,80', 2: '5,80', 3: '5,80', 4: '5,80', 5: '29,40'}}
My output file csv:
0,1,2,3,4,5,6,7,8
"TEST
Ppp666",39191012,5906194003016,DN-113H181-0019018,"Pierwszy, drugi
Trzeci
Czwarty",czarny,100,19mmx18m,5.8
,39191012,5906194003023,DN-113H182-0019018,,czerwony,100,19mmx18m,5.8
,39191012,5906194003030,DN-113H183-0019018,,niebieski,100,19mmx18m,5.8
,39191012,5906194003054,DN-113H185-0019018,,biały ,100,19mmx18m,5.8
,39191012,5906194003085,DN-113H188-0019018,,żółty,100,19mmx18m,5.8
,39191012,5906194003115,DN-113H18T-K019018,,Tęcza 5 kolorów,20,19mmx18m,29.4
I would like to get a CSV file

I am trying to map the countries on the world map. The problem occurring is only USA is being shown on the output

fig = px.scatter_geo(df, locations="country", color = "country",
projection="natural earth")
fig.show()
On the output side, I am able to get the world map and in the legends, all the countries do appear. The problem is the countries are not shown on the map.
Here is the snap of the sample data:
{'id': {0: '72b83200-4881-4806-b910-af86905256c4',
1: '5db5df19-c06b-489a-b2f4-c2ffc26643ba',
2: '6c9e4f0d-ef87-497f-97af-df207a25331d',
3: '004bf779-368d-47ae-b3cc-07b0ecad2464',
4: '8a2265d9-1f81-4c47-953f-0d4bfab326c0'},
'name': {0: 'BALCO BRANDS PTY LTD',
1: 'Bambury',
2: 'Bata Shoe Company of Australia',
3: 'Bean Body Care',
4: 'Caprice Australia '},
'canonical_name': {0: 'balcobrands',
1: 'bambury',
2: 'batashoecompanyofaustralia',
3: 'beanbodycare',
4: 'capriceaustralia'},
'url': {0: 'http://www.balcobrands.com',
1: 'http://www.bambury.com.au',
2: 'http://www.bataindustrials.com.au',
3: 'https://global.beanbodycare.com',
4: 'http://www.caprice.com.au'},
'type': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3},
'address': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'city': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'state': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'country': {0: 'Australia',
1: 'Australia',
2: 'Australia',
3: 'Australia',
4: 'Australia'},
'country_code': {0: 'AU', 1: 'AU', 2: 'AU', 3: 'AU', 4: 'AU'},
'created_at': {0: '2020-04-01 20:52:38.098099',
1: '2020-04-01 20:52:38.364935',
2: '2020-04-01 20:52:38.636768',
3: '2020-04-01 20:52:38.951573',
4: '2020-04-01 20:52:39.271376'},
'created_by': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'updated_at': {0: '2020-04-01 20:52:38.098099',
1: '2020-04-01 20:52:38.364935',
2: '2020-04-01 20:52:38.636768',
3: '2020-04-01 20:52:38.951573',
4: '2020-04-01 20:52:39.271376'},
'updated_by': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}
The data did not contain the three-digit country codes. When the data was merged with another dataset having three-digit country codes, the required output was obtained.

Python - making scatterplot from non-numeric values

I have a csv file with data that I have imported into a dataframe.
'RI_df = pd.read_csv("../Week15/police.csv")'
Using .head() my data looks like this:
state stop_date stop_time county_name driver_gender driver_race violation_raw violation search_conducted search_type stop_outcome is_arrested stop_duration drugs_related_stop district
0 RI 2005-01-04 12:55 NaN M White Equipment/Inspection Violation Equipment False NaN Citation False 0-15 Min False Zone X4
1 RI 2005-01-23 23:15 NaN M White Speeding Speeding False NaN Citation False 0-15 Min False Zone K3
2 RI 2005-02-17 04:15 NaN M White Speeding Speeding False NaN Citation False 0-15 Min False Zone X4
3 RI 2005-02-20 17:15 NaN M White Call for Service Other False NaN Arrest Driver
RI_df.head().to_dict()
Out[55]:
{'state': {0: 'RI', 1: 'RI', 2: 'RI', 3: 'RI', 4: 'RI'},
'stop_date': {0: '2005-01-04',
1: '2005-01-23',
2: '2005-02-17',
3: '2005-02-20',
4: '2005-02-24'},
'stop_time': {0: '12:55', 1: '23:15', 2: '04:15', 3: '17:15', 4: '01:20'},
'county_name': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'driver_gender': {0: 'M', 1: 'M', 2: 'M', 3: 'M', 4: 'F'},
'driver_race': {0: 'White', 1: 'White', 2: 'White', 3: 'White', 4: 'White'},
'violation_raw': {0: 'Equipment/Inspection Violation',
1: 'Speeding',
2: 'Speeding',
3: 'Call for Service',
4: 'Speeding'},
'violation': {0: 'Equipment',
1: 'Speeding',
2: 'Speeding',
3: 'Other',
4: 'Speeding'},
'search_conducted': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'search_type': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'stop_outcome': {0: 'Citation',
1: 'Citation',
2: 'Citation',
3: 'Arrest Driver',
4: 'Citation'},
'is_arrested': {0: False, 1: False, 2: False, 3: True, 4: False},
'stop_duration': {0: '0-15 Min',
1: '0-15 Min',
2: '0-15 Min',
3: '16-30 Min',
4: '0-15 Min'},
'drugs_related_stop': {0: False, 1: False, 2: False, 3: False, 4: False},
'district': {0: 'Zone X4',
1: 'Zone K3',
2: 'Zone X4',
3: 'Zone X1',
4: 'Zone X3'}}
RI_df['drugs_related_stop'].value_counts()
Out[27]:
False 90879
True 862
Name: drugs_related_stop, dtype: int64
I am trying to take the true value counts of "drug related stops" and put them on a line graph, in order to see if "drug related stops" have been increasing over time.
ax = RI_df['drugs_related_stop'].value_counts().plot(kind='line',
figsize=(10,8),
title="Drug stops")
ax.set_xlabel("drug stops")
ax.set_ylabel("number of stops")
You should just use groupby().count()
ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
figsize=(10,8), title="Drug stops", x='stop_date',
y='district')
Here is the complete code so you can double-check:
import pandas as pd
import numpy as np
df = pd.DataFrame({'state': {0: 'RI', 1: 'RI', 2: 'RI', 3: 'RI', 4: 'RI'},
'stop_date': {0: '2005-01-23',
1: '2005-01-23',
2: '2005-02-17',
3: '2005-02-17',
4: '2005-02-24'},
'stop_time': {0: '12:55', 1: '23:15', 2: '04:15', 3: '17:15', 4: '01:20'},
'county_name': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'driver_gender': {0: 'M', 1: 'M', 2: 'M', 3: 'M', 4: 'F'},
'driver_race': {0: 'White', 1: 'White', 2: 'White', 3: 'White', 4: 'White'},
'violation_raw': {0: 'Equipment/Inspection Violation',
1: 'Speeding',
2: 'Speeding',
3: 'Call for Service',
4: 'Speeding'},
'violation': {0: 'Equipment',
1: 'Speeding',
2: 'Speeding',
3: 'Other',
4: 'Speeding'},
'search_conducted': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'search_type': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'stop_outcome': {0: 'Citation',
1: 'Citation',
2: 'Citation',
3: 'Arrest Driver',
4: 'Citation'},
'is_arrested': {0: False, 1: False, 2: False, 3: True, 4: False},
'stop_duration': {0: '0-15 Min',
1: '0-15 Min',
2: '0-15 Min',
3: '16-30 Min',
4: '0-15 Min'},
'drugs_related_stop': {0: False, 1: False, 2: False, 3: False, 4: False},
'district': {0: 'Zone X4',
1: 'Zone K3',
2: 'Zone X4',
3: 'Zone X1',
4: 'Zone X3'}})
ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
figsize=(10,8), title="Drug stops", x='stop_date',
y='district')
This is what I'm getting with the code below...
ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
figsize=(10,8), title="Drug stops", x='stop_date',
y='district')

Applying a lambda function with three arguments within a Group By

Currently attempting to create a function where I divide columns in my DataFrame called DF_1 and group them by a dimension column in the same DataFrame.
The below code is attempting to achieve this by first grouping by the dimension column and applying a lambda function to each of the columns that I am trying to divide in order to get the average of each of the metrics i.e. cost per conversions, or cost per click.
Unfortunately, I am unsure how to accomplish this. The below code gives an error of TypeError: lambda() takes 2 positional arguments but 3 were given
calc_1 = DF_1[['Conversions_10D', 'Total_Revenue', 'Total_Revenue', 'Clicks', 'Spend']]
calc_2 = DF_1[['Impressions', 'Spend', 'Conversions_10D', 'Impressions', 'Clicks' ]]
def agg_avg(df, group_field, list_a, list_b):
grouped = df.groupby(group_field, as_index = False).apply(lambda x, y: x/y, list_a, list_b)
grouped = pd.DataFrame(grouped).reset_index(drop = True)
return grouped
{'Date': {0: '2018-02-28', 1: '2018-02-28', 2: '2018-02-28', 3: '2018-02-28', 4: '2018-02-28'}, 'Audience_Category': {0: 'Affinity', 1: 'Affinity', 2: 'Affinity', 3: 'Affinity', 4: 'Affinity'},
'Demo': {0: 'F25-34', 1: 'F25-34', 2: 'F25-34', 3: 'F25-34', 4: 'F25-34'}, 'Gender': {0: 'Female', 1: 'Female', 2: 'Female', 3: 'Female', 4: 'Female'},
'Device': {0: 'Android', 1: 'Android', 2: 'Android', 3: 'Android', 4: 'Android'},
'Creative': {0: 'Bubble:15', 1: 'Bubble:30', 2: 'Wide :15', 3: 'Oscar :15', 4: 'Oscar :30'},
'Impressions': {0: 3834, 1: 3588, 2: 3831, 3: 3876, 4: 3676},
'Clicks': {0: 2.0, 1: 0.0, 2: 4.0, 3: 2.0, 4: 1.0},
'Conversions_10D': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'Total_Revenue': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'Spend': {0: 28.600707059999991, 1: 25.95319236000001, 2: 28.29383795999998, 3: 29.287063200000013, 4: 26.514734159999968},
'Demo_Category': {0: 'Narrow', 1: 'Broad', 2: 'Narrow', 3: 'Broad', 4: 'Narrow'}
'CPM_Efficiency': {0: 'Low CPM', 1: 'Low CPM', 2: 'Low CPM', 3: 'Low CPM', 4: 'Low CPM'}}

Categories

Resources