Pandas export Excel to CSV - merg cell - python

I would like to save excel to CSV but make up the cells
My code:
import pandas as pd
import openpyxl
in_xls = 'excel01.xlsx'
sheet = 'Arkusz1'
with pd.ExcelFile(in_xls, engine="openpyxl") as ex:
excel = pd.read_excel(ex, sheet, header=None)
excel.to_csv('excel_out.csv', index=False)
My excel file:
enter link description here
as a dictionary use pd.DatFrame(d)
d = {0: {0: 'TEST\nPpp666', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 1: {0: 39191012, 1: 39191012, 2: 39191012, 3: 39191012, 4: 39191012, 5: 39191012}, 2: {0: 5906194003016, 1: 5906194003023, 2: 5906194003030, 3: 5906194003054, 4: 5906194003085, 5: 5906194003115}, 3: {0: 'DN-113H181-0019018', 1: 'DN-113H182-0019018', 2: 'DN-113H183-0019018', 3: 'DN-113H185-0019018', 4: 'DN-113H188-0019018', 5: 'DN-113H18T-K019018'}, 4: {0: 'Pierwszy, drugi\nTrzeci\nCzwarty', 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 5: {0: 'czarny', 1: 'czerwony', 2: 'niebieski', 3: 'biały ', 4: 'żółty', 5: 'Tęcza 5 kolorów'}, 6: {0: 100, 1: 100, 2: 100, 3: 100, 4: 100, 5: 20}, 7: {0: '19mmx18m', 1: '19mmx18m', 2: '19mmx18m', 3: '19mmx18m', 4: '19mmx18m', 5: '19mmx18m'}, 8: {0: '5,80', 1: '5,80', 2: '5,80', 3: '5,80', 4: '5,80', 5: '29,40'}}
My output file csv:
0,1,2,3,4,5,6,7,8
"TEST
Ppp666",39191012,5906194003016,DN-113H181-0019018,"Pierwszy, drugi
Trzeci
Czwarty",czarny,100,19mmx18m,5.8
,39191012,5906194003023,DN-113H182-0019018,,czerwony,100,19mmx18m,5.8
,39191012,5906194003030,DN-113H183-0019018,,niebieski,100,19mmx18m,5.8
,39191012,5906194003054,DN-113H185-0019018,,biały ,100,19mmx18m,5.8
,39191012,5906194003085,DN-113H188-0019018,,żółty,100,19mmx18m,5.8
,39191012,5906194003115,DN-113H18T-K019018,,Tęcza 5 kolorów,20,19mmx18m,29.4
I would like to get a CSV file

Related

how to plot only True signal with plotly candlestick chart

Here is the code I am using:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high']),'green','red')
fig = go.Figure()
# add OHLC trace
fig.add_trace(go.Candlestick(x=df.index,
open=df['open'],
high=df['high'],
low=df['low'],
close=df['close'],
showlegend=False))
# add moving average traces
fig.add_trace(go.Scatter(x=df.index,
y=df['ma'],
opacity=0.7,
line=dict(color='blue', width=2),
name='MA 5'))
fig.add_trace(go.Scatter(
x = df.index,
y = df['close'],
mode = 'markers',
marker_color=df.C
))
fig.update_layout(xaxis_rangeslider_visible=False).show()`
the output
in the image, you can see that plot both True and false signal, maybe because the marker_color = "C" but if change that and use only color names it will plot noting even if i change the y = df['close'], i get the same problem
data {'timeStamp': {0: 1657220400000, 1: 1657222200000, 2: 1657224000000, 3: 1657225800000, 4: 1657227600000}, 'open': {0: 21357.7, 1: 21495.84, 2: 21812.46, 3: 21641.56, 4: 21624.03}, 'high': {0: 21499.87, 1: 21837.74, 2: 21838.1, 3: 21659.99, 4: 21727.87}, 'low': {0: 21325.0, 1: 21439.13, 2: 21526.4, 3: 21541.96, 4: 21567.56}, 'close': {0: 21495.83, 1: 21812.47, 2: 21641.56, 3: 21624.03, 4: 21619.57}, 'volume': {0: 3663.2089, 1: 7199.91652, 2: 4367.94336, 3: 1841.10043, 4: 1786.17022}, 'quoteVolume': {0: 78386481.2224664, 1: 155885063.7202956, 2: 94605455.6190078, 3: 39756576.8814698, 4: 38684342.7232105}, 'tradesCount': {0: 59053, 1: 111142, 2: 81136, 3: 56148, 4: 53122}, 'date': {0: Timestamp('2022-07-07 19:00:00'), 1: Timestamp('2022-07-07 19:30:00'), 2: Timestamp('2022-07-07 20:00:00'), 3: Timestamp('2022-07-07 20:30:00'), 4: Timestamp('2022-07-07 21:00:00')}, 'Avg_Volume': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_high': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Ma_mult_mid': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'spread': {0: 78.9901069365825, 1: 79.43353152203923, 2: 54.82836060314386, 3: 14.85215623146836, 4: 2.782109662528346}, 'Marker': {0: 21502.87, 1: 21840.74, 2: 21523.4, 3: 21538.96, 4: 21564.56}, 'Symbol': {0: 'triangle-up', 1: 'triangle-up', 2: 'triangle-down', 3: 'triangle-down', 4: 'triangle-down'}, 'ma': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'C': {0: 'red', 1: 'red', 2: 'red', 3: 'red', 4: 'red'}}
It seems to me that the issue is in your np.where() statement, likely with the nan values in Ma_multi_high producing the false statement in df['volume'] > df['Ma_mult_high'] that result in 'red'.
Try this:
df['C'] = np.where((df['spread'] > 60) & (df['volume'] > df['Ma_mult_high'].fillna(0)),'green','red')

How to Fix Code to Avoid Stubnames Error (Python Pandas)?

dt = {'id': {0: 'x1', 1: 'x2', 2: 'x3', 3: 'x4', 4: 'x5', 5: 'x6', 6: 'x7', 7: 'x8', 8: 'x9', 9: 'x10'}, 'trt': {0: 'cnt', 1: 'cnt', 2: 'tr', 3: 'tr', 4: 'tr', 5: 'cnt', 6: 'tr', 7: 'tr', 8: 'cnt', 9: 'cnt'}, 'work.T1': {0: 0.6516556669957936, 1: 0.567737752571702, 2: 0.1135089821182191, 3: 0.5959253052715212, 4: 0.3580499750096351, 5: 0.4288094183430075, 6: 0.0519033221062272, 7: 0.2641776674427092, 8: 0.3987907308619469, 9: 0.8361341434065253}, 'play.T1': {0: 0.8647212258074433, 1: 0.6153524168767035, 2: 0.7751098964363337, 3: 0.3555686913896352, 4: 0.4058499720413238, 5: 0.7066469138953835, 6: 0.8382876652758569, 7: 0.2395891312044114, 8: 0.7707715332508087, 9: 0.3558977444190532}, 'talk.T1': {0: 0.5355970377568156, 1: 0.0930881295353174, 2: 0.169803041499108, 3: 0.8998324507847428, 4: 0.4226376069709658, 5: 0.7477464678231627, 6: 0.8226525799836963, 7: 0.9546536463312804, 8: 0.6854445093777031, 9: 0.5005032296758145}, 'work.T2': {0: 0.2754838624969125, 1: 0.2289039448369294, 2: 0.0144339059479534, 3: 0.7289645625278354, 4: 0.2498804717324674, 5: 0.1611832766793668, 6: 0.0170426501426845, 7: 0.4861003451514989, 8: 0.1029001718852669, 9: 0.8015470046084374}, 'play.T2': {0: 0.3543280649464577, 1: 0.9364325392525644, 2: 0.2458663922734558, 3: 0.4731414613779634, 4: 0.191560871200636, 5: 0.5832219698932022, 6: 0.4594731898978352, 7: 0.467434047954157, 8: 0.3998325555585325, 9: 0.5052855962421745}, 'talk.T2': {0: 0.0318881559651345, 1: 0.1144675880204886, 2: 0.468935475917533, 3: 0.3969867376144975, 4: 0.8336191941052675, 5: 0.7611217433586717, 6: 0.5733564489055425, 7: 0.447508045937866, 8: 0.0838020080700516, 9: 0.2191385473124683}}
mydt = pd.DataFrame(dt, columns = ['id', 'trt', 'work.T1', '', 'play.T1', 'talk.T1','work.T2', '', 'play.T2', 'talk.T2'])
So I have the above dataset and need to tidy it up. I have used the following code but it returns "ValueError: stubname can't be identical to a column name." How can I fix the code to avoid this problem?
names = ['play', 'talk', 'work']
activities = pd.wide_to_long(dt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
activities
Note: I am trying to get the dataframe to look like the following.
Changed :
activities = pd.wide_to_long(activities, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
To:
activities = pd.wide_to_long(mydt, stubnames=names, i='id', j='time', sep='.', suffix='T\d').sort_index().reset_index()
and then it works.

Pandas Loc string giving KeyError

I am trying to pass into a url a date in the format 2015-12-20, search the pandas dataframe and do a model.predict on it.
The problem is that I am trying to convert a working code from the jupyter lab into the .py file in order to run everything on the flask server and following I can not transfer.
The following code only works if the 'Date' column is converted to datetime. If it is in object format, the following code also doesn't work.
data.loc[2015-12-06]
The above works but the following gives an error:
data.loc['2015-12-06']
KeyError: '2015-12-06'
How do I pass in the 2015-12-06 not as string for the .loc to work?
print(data.head(5).to_dict())
{'Date': {0: '2015-12-27', 1: '2015-12-20', 2: '2015-12-13', 3: '2015-12-06', 4: '2015-11-29'}, 'Total Volume': {0: 64236.62, 1: 54876.98, 2: 118220.22, 3: 78992.15, 4: 51039.6}, '4046': {0: 1036.74, 1: 674.28, 2: 794.7, 3: 1132.0, 4: 941.48}, '4225': {0: 54454.85, 1: 44638.81, 2: 109149.67, 3: 71976.41, 4: 43838.39}, '4770': {0: 48.16, 1: 58.33, 2: 130.5, 3: 72.58, 4: 75.78}, 'Total Bags': {0: 8696.87, 1: 9505.56, 2: 8145.35, 3: 5811.16, 4: 6183.95}, 'Small Bags': {0: 8603.62, 1: 9408.07, 2: 8042.21, 3: 5677.4, 4: 5986.26}, 'Large Bags': {0: 93.25, 1: 97.49, 2: 103.14, 3: 133.76, 4: 197.69}, 'XLarge Bags': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}, 'type': {0: 'conventional', 1: 'conventional', 2: 'conventional', 3: 'conventional', 4: 'conventional'}, 'year': {0: 2015, 1: 2015, 2: 2015, 3: 2015, 4: 2015}, 'region': {0: 'Albany', 1: 'Albany', 2: 'Albany', 3: 'Albany', 4: 'Albany'}}

Data Manipulating one dataframe into another using for loops and dictionaries

I have a data set that I need to reformat so that I can plot and work with it further. It is sort of an transpose action but I am struggling to not overwrite the data in the new dataframe. I sorted out the headings using dictionaries and it maps the fields from the original df to the new output df correctly. It is just overwriting the first entry and not adding a new POLY/POLY_NAME
Input dataframe:
Output dataframe:
Below is my code so far:
import pandas as pd
fractions = {"A": 1.35, "B": 1.40, "C": 1.45}
quality = {"POLY_NAME":"POLY", "AS":"Ash", "CV":"CV","FC":"FC","MS":"Moist","TS":"Tots","VM":"Vols","YL":"Yield"}
frac = list(fractions.values())
headers = list(quality.values())
df = pd.DataFrame(columns=headers, index=frac)
wash_dic = {'POLY_NAME': {0: 'Asset 1', 1: 'Asset 2', 2: 'Asset 3'},
'RD': {0: 1.63, 1: 1.63, 2: 1.57},
'SEAMTH': {0: 3.02, 1: 3.02, 2: 3.37},
'AAS': {0: 7.76, 1: 7.34, 2: 7.24},
'ACV': {0: 28.98, 1: 29.18, 2: 29.27},
'AFC': {0: 54.95, 1: 53.55, 2: 52.38},
'AMS': {0: 4.22, 1: 4.26, 2: 4.63},
'ATS': {0: 0.97, 1: 1.09, 2: 1.23},
'AVM': {0: 33.07, 1: 34.85, 2: 35.75},
'AYL': {0: 0.4, 1: 0.95, 2: 0.75},
'BAS': {0: 9.28, 1: 9.27, 2: 9.58},
'BCV': {0: 28.17, 1: 28.33, 2: 28.09},
'BFC': {0: 56.21, 1: 54.39, 2: 52.11},
'BMS': {0: 4.25, 1: 4.25, 2: 4.61},
'BTS': {0: 0.84, 1: 1.01, 2: 1.22},
'BVM': {0: 30.25, 1: 32.08, 2: 33.7},
'BYL': {0: 3.11, 1: 5.44, 2: 4.36},
'CAS': {0: 11.01, 1: 10.96, 2: 11.25},
'CCV': {0: 27.31, 1: 27.53, 2: 27.39},
'CFC': {0: 58.09, 1: 56.0, 2: 53.43},
'CMS': {0: 4.41, 1: 4.38, 2: 4.62},
'CTS': {0: 0.63, 1: 0.83, 2: 0.98},
'CVM': {0: 26.5, 1: 28.66, 2: 30.71},
'CYL': {0: 13.45, 1: 16.11, 2: 12.94}}
wash = pd.DataFrame(wash_dic)
wash
for label, content in wash.items():
print('fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
for c in content:
try:
df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
except:
pass
I have tried to add another for loop but the logic is escaping me currently.
Required outcome as dictionary
outcome = {'Unnamed: 0': {0: 1.35,
1: 1.4,
2: 1.45,
3: 1.35,
4: 1.4,
5: 1.45,
6: 1.35,
7: 1.4,
8: 1.45},
'POLY': {0: 'Asset 1',
1: 'Asset 1',
2: 'Asset 1',
3: 'Asset 2',
4: 'Asset 2',
5: 'Asset 2',
6: 'Asset 3',
7: 'Asset 3',
8: 'Asset 3'},
'Ash': {0: 7.76,
1: 9.28,
2: 11.01,
3: 7.34,
4: 9.27,
5: 10.96,
6: 7.24,
7: 9.58,
8: 11.25},
'CV': {0: 28.98,
1: 28.17,
2: 27.31,
3: 29.18,
4: 28.33,
5: 27.53,
6: 29.27,
7: 28.09,
8: 27.39},
'FC': {0: 54.95,
1: 56.21,
2: 58.09,
3: 53.55,
4: 54.39,
5: 56.0,
6: 52.38,
7: 52.11,
8: 53.43},
'Moist': {0: 4.22,
1: 4.25,
2: 4.41,
3: 4.26,
4: 4.25,
5: 4.38,
6: 4.63,
7: 4.61,
8: 4.62},
'Tots': {0: 0.97,
1: 0.84,
2: 0.63,
3: 1.09,
4: 1.01,
5: 0.83,
6: 1.23,
7: 1.22,
8: 0.98},
'Vols': {0: 33.07,
1: 30.25,
2: 26.5,
3: 34.85,
4: 32.08,
5: 28.66,
6: 35.75,
7: 33.7,
8: 30.71},
'Yiels': {0: 0.4,
1: 3.11,
2: 13.45,
3: 0.95,
4: 5.44,
5: 16.11,
6: 0.75,
7: 4.36,
8: 12.94}}
Regards
I resolved to duplicate/overwriting of the values by first grouping the original wash DF and then in the for loop and the data of each loop into a blank DF and at the end of the loop append it to the Final DF. Just for neatness I made the index column a normal column and reordered the columns.
groups = wash.groupby("POLY_NAME")
df_final = pd.DataFrame(columns=headers)
for name, group in groups:
df = pd.DataFrame(columns=headers)
for label, content in group.items():
if quality.get(label[-2:]) in headers:
#print(label)
#print(name)
#print(label, content)
for c in content:
try:
df.loc[fractions.get(label[0]), "POLY"] = name
df.loc[fractions.get(label[0]), quality.get(label[-2:])] = c
#print('Poly:', name, ' fraction:', fractions.get(label[0]), ' quality:', quality.get(label[-2:]))
except:
pass
df_final = df_final.append(df)
df_final = df_final.reset_index().rename({'index':'FLOAT'}, axis = 'columns')
df_final = df_final.reindex(columns=["POLY","FLOAT","Ash","CV","FC","Moist","Tots","Vols","Yield"])
Might not be the neatest or fastest method but it gives the required results.

Renaming key and sub key in python nested dictionary with iterations

I am trying to rename the key and subkey in python nested dictionary. However, I haven't got the result that I expected yet. Below is the original nested key that I have.
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
I am trying to change the key and subkey to another value into this.
nested_dict = {
4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}
}
What I have in mind is renaming the key using a list. I have tried to replace the key and subkey with a list below:
new_key = []
for i in range(4,8):
new_key.append(i)
However, I still haven't got it. Another idea is using pandas DataFrame to rename both key and subkey. I am not sure whether using lists or pandas is suitable for the given problem.
Code for renaming a key from here:
mydict[new_key] = mydict.pop(old_key)
You could use a (nested) dict comprehension ([Python]: PEP 274 -- Dict Comprehensions). Note that it generates a new dictionary (but you can assign it to the old variable):
>>> from pprint import pprint as pp
>>>
>>> nested_dict = {
... 0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
... 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
... 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
... 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
... }
>>>
>>> pp(nested_dict)
{0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
>>>
>>> modified_nested_dict = {k0 + 4: {k1 + 4: v1 for k1, v1 in v0.items()} for k0, v0 in nested_dict.items()}
>>>
>>> pp(modified_nested_dict)
{4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56},
5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21},
6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93},
7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}
You can use Pandas Dataframe for the desired task, as follows:
import pandas as pd
nested_dict = {
0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56},
1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21},
2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93},
3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}
}
print("Dictionary before renaming: ", nested_dict)
# Convert nested dictionary to Pandas Dataframe
my_dataframe = pd.DataFrame.from_dict(nested_dict)
new_keys = list(range(4, 8)) # List of new keys
my_dataframe.columns = new_keys # Set columns to the new keys
my_dataframe.set_index([new_keys], inplace=True) # Set index to the new keys
nested_dict = my_dataframe.to_dict() # Convert back to nested dictionary
print("Dictionary after renaming: ", nested_dict)
This gives you the following expected output:
Dictionary before renaming: {0: {0: 33.97, 1: 55.32, 2: 57.31, 3: 71.56}, 1: {0: 27.31, 1: 23.32, 2: 32.25, 3: 60.21}, 2: {0: 65.38, 1: 36.88, 2: 70.88, 3: 21.93}, 3: {0: 35.44, 1: 21.21, 2: 40.72, 3: 51.35}}
Dictionary after renaming: {4: {4: 33.97, 5: 55.32, 6: 57.31, 7: 71.56}, 5: {4: 27.31, 5: 23.32, 6: 32.25, 7: 60.21}, 6: {4: 65.38, 5: 36.88, 6: 70.88, 7: 21.93}, 7: {4: 35.44, 5: 21.21, 6: 40.72, 7: 51.35}}

Categories

Resources