Dataframe's with strange structure with variables in even columns - python

I'm a beginner with python in combination with pandas, and I understand the basics.
But I received a couple days ago 3 strange datasets in excel.
As image below:
import pandas as pd
dfinput = pd.DataFrame([
["uuid", "79876081-099b-474f-9e8f-ff917fd7394c", "uuid", "a96bc7cb-02b1-4d13-823a-908531cda095", "uuid",
"38bc7d20-10be-4774-973c-b3b00234a645", "uuid", "e7b12da6-a47f-4c24-8545-faa24e249a03", "uuid", "6b2c9426-bd6f-4bda-9c53-a86200e051f8"],
["variable 1", "value", "variable 1", "value", "variable 1",
"value", "variable 1", "value", "variable 1", "value"],
["variable 2", "value", "variable 2", "value", "variable 2",
"value", "variable 2", "value", "variable 2", "value"],
["variable 3", "value", "variable 3", "value", "variable 3",
"value", "variable 3", "value", "variable 3", "value"],
["variable 4", "value", "variable 4", "value", "variable 4",
"value", "variable 4", "value", "variable 4", "value"],
["variable 5", "value", "variable 5", "value", "variable 5",
"value", "variable 5", "value", "variable 5", "value"],
["variable 6", "value", "variable 6", "value", "variable 6",
"value", "variable 6", "value", "variable 6", "value"],
["variable 7", "value", "variable 7", "value", "variable 7",
"value", "variable 7", "value", "variable 7", "value"],
["variable 8", "value", "variable 8", "value", "variable 8",
"value", "variable 8", "value", "variable 8", "value"],
["variable 9", "value", "variable 9", "value", "variable 9",
"value", "variable 9", "value", "variable 9", "value"],
["variable 10", "value", "variable 10", "value", "variable 10",
"value", "variable 10", "value", "variable 10", "value"],
["variable A", "value", "variable B", "value", "variable A",
"value", "variable A", "value", "variable A", "value"],
["variable B", "value", "variable C", "value", "variable C",
"value", "variable B", "value", "variable B", "value"],
["variable C", "value", "variable D", "value", "variable D",
"value", "variable D", "value", "variable C", "value"],
["variable D", "value", "Variable E", "value", "Variable E",
"value", "Variable F", "value", "Variable E", "value"],
["Variable E", "value", "Variable F", "value", "Variable H",
"value", "Variable G", "value", "Variable F", "value"],
["Variable F", "value", "Variable H", "value", "",
"", "Variable H", "value", "Variable G", "value"],
["Variable G", "value", "", "", "", "", "", "", "Variable H", "value"]
])
I want the following result:
dfoutput = pd.DataFrame([["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null"],
["value", "value", "value", "value", "value", "value", "value", "value", "value",
"value", "null", "value", "value", "value", "value", "value", "null", "value"],
["value", "value", "value", "value", "value", "value", "value", "value", "value",
"value", "value", "null", "value", "value", "value", "null", "null", "value"],
["value", "value", "value", "value", "value", "value", "value", "value", "value",
"value", "value", "value", "null", "value", "null", "value", "value", "value"],
["value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "value", "null", "value", "value", "value", "value"]],
index=['79876081-099b-474f-9e8f-ff917fd7394c', 'a96bc7cb-02b1-4d13-823a-908531cda095',
'38bc7d20-10be-4774-973c-b3b00234a645', 'e7b12da6-a47f-4c24-8545-faa24e249a03', '6b2c9426-bd6f-4bda-9c53-a86200e051f8'],
columns=['variable 1', 'variable 2', 'variable 3', 'variable 4', 'variable 5', 'variable 6', 'variable 7', 'variable 8', 'variable 9', 'variable 10', 'variable A', 'variable B', 'variable C', 'variable D', 'Variable E', 'Variable F', 'Variable G', 'Variable H'])
I did try to loop the columns and create a new dataframe, but got stuck and think I make it unnecessary complex.
I can't get my head around it. Someone dealt with this before? and have a useful direction for me to go?

You can re-structure your data to your desired outcome with a rather simple manipulation. Note that I am using the dataframe (dfinput) you posted:
# Change first row to headers and Transpose
headers = dfinput.iloc[0]
one = (pd.DataFrame(dfinput.values[1:], columns=headers)).T
# Change first row to headers again
one.columns = one.iloc[0]
# Keep only odd indexed rows
res = one.iloc[1::2, :]
res
uuid variable 1 variable 2 variable 3 variable 4 variable 5 variable 6 variable 7 variable 8 variable 9 variable 10 variable A variable B variable C variable D Variable E Variable F Variable G
79876081-099b-474f-9e8f-ff917fd7394c value value value value value value value value value value value value value value value value value
a96bc7cb-02b1-4d13-823a-908531cda095 value value value value value value value value value value value value value value value value
38bc7d20-10be-4774-973c-b3b00234a645 value value value value value value value value value value value value value value value
e7b12da6-a47f-4c24-8545-faa24e249a03 value value value value value value value value value value value value value value value value
6b2c9426-bd6f-4bda-9c53-a86200e051f8 value value value value value value value value value value value value value value value value value

Related

Waterfall chart with Plotly - Update Traces

I'm creating a waterfall plot for three categories, as shown in below code:
import plotly.graph_objects as go
fig=go.Figure()
fig.add_trace(go.Waterfall(
x = [["Category 1", "Category 1", "Category 1", "Category 1", "Category 1", "Category 1", "Category 1",
"Category 2", "Category 2", "Category 2", "Category 2", "Category 2", "Category 2", "Category 2",
"Category 3", "Category 3", "Category 3", "Category 3", "Category 3", "Category 3", "Category 3"
],
["Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
"Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
"Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
]
],
measure = ["absolute", "relative", "relative", "relative", "relative", "relative", "total",
"absolute", "relative", "relative", "relative", "relative", "relative", "total",
"absolute", "relative", "relative", "relative", "relative", "relative", "total"
],
y = [
1693,-296,1501,-897,-27,-45,532,
1439.05,-251.6,1275.85,-762.44,-22.95,-38.25,452.2,
1134.31,-198.32,1005.67,-600.99,-18.09,-30.150,356.44
]
))
The code returns this image: https://i.stack.imgur.com/bDf2g.png
What I want to do next is to edit color of 'Gross Income' items to green, so only EBTIDA would present a different layout.
I tried so with:
fig.update_traces(marker_color="LightSeaGreen",selector=dict(x='Gross Income'))
It doesn't work, though. Does anyone know how to do it?
Thanks
This is very difficult because for waterfall charts in plotly, the marker colors are assigned based on whether they are increasing, decreasing or total and cannot be assigned colors based on their category.
However, with a very ugly hack, we can make the plot appear to have the desired color in the "gross income" category. We can plot the gross income bars separately for all three categories, assigning them the same value, and classifying them as "relative" so that we can use the argument increasing = {"marker":{"color":"lightseagreen"}} to make them all lightseagreen. Note: this only works because they all happen to be positive values.
Then, because we have to add each overlapping gross income as a separate trace, we will need to offset each of these bars to ensure they overlap the bars from your original waterfall figure. I just used trial and error to figure out that offset=-0.4 looks approximately correct. Since these additional bars are purely visual, I also disabled their hover info and prevented them from appearing in the legend.
import plotly.graph_objects as go
fig=go.Figure()
fig.add_trace(go.Waterfall(
x = [["Category 1", "Category 1", "Category 1", "Category 1", "Category 1", "Category 1", "Category 1",
"Category 2", "Category 2", "Category 2", "Category 2", "Category 2", "Category 2", "Category 2",
"Category 3", "Category 3", "Category 3", "Category 3", "Category 3", "Category 3", "Category 3"
],
["Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
"Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
"Gross Income", "Taxes", "Net Revenue", "CPV", "Variable Expenses", "Recurrent Capex", "EBITDA",
]
],
measure = ["absolute", "relative", "relative", "relative", "relative", "relative", "total",
"absolute", "relative", "relative", "relative", "relative", "relative", "total",
"absolute", "relative", "relative", "relative", "relative", "relative", "total"
],
y = [
1693,-296,1501,-897,-27,-45,532,
1439.05,-251.6,1275.85,-762.44,-22.95,-38.25,452.2,
1134.31,-198.32,1005.67,-600.99,-18.09,-30.150,356.44
]
))
## add the gross income bars in each category
for category, value in zip(["Category 1", "Category 2", "Category 3"], [1693,-1439.05,1134.31]):
fig.add_trace(go.Waterfall(
x = [[category],["Gross Income"]],
measure = ["relative"],
y = [value],
increasing = {"marker":{"color":"lightseagreen"}},
offset=-0.4,
connector={"visible":False},
showlegend=False,
hoverinfo='skip',
))
fig.show()

How do I iterate through a list of dictionaries?

I need to take an inputted time, for example "12:20", and print a 5x3 ASCII clock representation of it. But I don't know how how iterate through a list of dictionaries, which I think is the simplest way to solve this problem.
time = input("enter a time HH:MM")
my_list = [
{"0": "000", "1": " 1 ","2":"222","3":"333","4":"44","5":"555","6":"666","7":"777","8":"888","9":"999"},
{"0": "000", "1": "11 ", "2": " 2", "3":" 3","4":"4 4","5":"5 ","6":"6 ","7":" 7","8":"8 8","9":"9 9"},
{"0": "000", "1": " 1 ", "2": "222", "3":"333","4":"444","5":"555","6":"666","7":" 7","8":"888","9":"999"},
{"0": "000", "1": " 1 ", "2": "2 ", "3":" 3","4":" 4","5":" 5","6":"6 6","7":" 7","8":"8 8","9":" 9"},
{"0": "000", "1": "111", "2": "222", "3":"333","4":" 4","5":"555","6":"666","7":" 7","8":"888","9":" 9"}
]
for i in my_list:
for l in my_list.keys():
if l == time[i]:
print(my_list[i][l])
I tried making a list of dictionaries with two for loops: one for iterating through the list and one for iterating through each dictionary. If the input is 12:20, I need to print a 5x3 12:00 like so:
1 222 222 000
11 2 : 2 0 0
1 222 222 0 0
1 2 : 2 0 0
111 222 222 000
You were almost there. You just overlooked a few fundamentals. Such as: you have to get the entire line before you print it, names matter, and you can't print a colon if you don't include one in your segments.
import re
time_valid = re.compile(r'^\d{2}:\d{2}$')
while not time_valid.match((time := input("enter a time HH:MM: "))):
#keep asking this question til the user get's it right
pass
segments = [
{"0":"000", "1":" 1 ", "2":"222", "3":"333", "4":"4 4", "5":"555", "6":"666", "7":"777", "8":"888", "9":"999", ":":" "},
{"0":"0 0", "1":"11 ", "2":" 2", "3":" 3", "4":"4 4", "5":"5 ", "6":"6 ", "7":" 7", "8":"8 8", "9":"9 9", ":":":"},
{"0":"0 0", "1":" 1 ", "2":"222", "3":"333", "4":"444", "5":"555", "6":"666", "7":" 7", "8":"888", "9":"999", ":":" "},
{"0":"0 0", "1":" 1 ", "2":"2 ", "3":" 3", "4":" 4", "5":" 5", "6":"6 6", "7":" 7", "8":"8 8", "9":" 9", ":":":"},
{"0":"000", "1":"111", "2":"222", "3":"333", "4":" 4", "5":"555", "6":"666", "7":" 7", "8":"888", "9":" 9", ":":" "}
]
for segment in segments:
line = ''
for c in time: #gather the entire line before printing
line = f'{line} {segment[c]}'
print(line)
With very little work this can be made into a console clock.
import threading
from datetime import datetime
from os import system, name
#repeating timer
class Poll(threading.Timer):
def run(self):
while not self.finished.wait(self.interval):
self.function(*self.args,**self.kwargs)
segments = [
{"0":"000", "1":" 1 ", "2":"222", "3":"333", "4":"4 4", "5":"555", "6":"666", "7":"777", "8":"888", "9":"999", ":":" "},
{"0":"0 0", "1":"11 ", "2":" 2", "3":" 3", "4":"4 4", "5":"5 ", "6":"6 ", "7":" 7", "8":"8 8", "9":"9 9", ":":":"},
{"0":"0 0", "1":" 1 ", "2":"222", "3":"333", "4":"444", "5":"555", "6":"666", "7":" 7", "8":"888", "9":"999", ":":" "},
{"0":"0 0", "1":" 1 ", "2":"2 ", "3":" 3", "4":" 4", "5":" 5", "6":"6 6", "7":" 7", "8":"8 8", "9":" 9", ":":":"},
{"0":"000", "1":"111", "2":"222", "3":"333", "4":" 4", "5":"555", "6":"666", "7":" 7", "8":"888", "9":" 9", ":":" "}
]
def display():
#get time
time = datetime.now().strftime("%H:%M:%S")
#clear console
system(('clear','cls')[name=='nt'])
#draw console
for segment in segments:
line = ''
for c in time:
#illustrates a simple method to replace graphics
line = f'{line} {segment[c].replace(c,chr(9608))}'
print(line)
#start clock
Poll(.1, display).start()

Comparing two lists of tuples and looking for different values [duplicate]

This question already has answers here:
Compare two lists of tuples
(5 answers)
Closed 10 months ago.
I have two lists of tuples. For example:
The first one I keep:
[
("base", "first_val", "value 1", "1"),
("base", "first_val", "value 2", "0"),
("base", "first_val", "value 3", "2"),
("base", "second_val", "value 1", "10"),
("base", "second_val", "value 2", "10"),
("base", "third_val", "value 1", "100"),
("base", "third_val", "value 2", "1"),
("base", "fourth_val", "value 1", "param 1", "22"),
("base", "fourth_val", "value 2", "param 1", "222"),
("base", "fourth_val", "value 3", "12")
]
10 tuples, 4 parameters with subparameters.
The second one I get. This list may have other content:
[
("base", "first_val", "value 1", "10000"), #changed
("base", "first_val", "value 2", "5555"), #changed
("base", "first_val", "value 3", "2"), #not changed
("base", "fourth_val", "value 1", "param 1", "22"), #not changed
("base", "fourth_val", "value 2", "param 1", "100000"), #changed
("base", "fourth_val", "value 3", "12") #not changed
]
6 tuples, 2 parameters with subparameters.
In fact, these are sheets with hundreds of entries.
The filling of the resulting list with tuples is constantly changing, but the general principle of constructing tuples is preserved. How to get only those tuples that have changed in the fastest possible way?
You can use set in Python to determine changed values:
first_list = [("base", "first_val", "value 1", "1"),
("base", "first_val", "value 2", "0"),
("base", "first_val", "value 3", "2"),
("base", "second_val", "value 1", "10"),
("base", "second_val", "value 2", "10"),
("base", "third_val", "value 1", "100"),
("base", "third_val", "value 2", "1"),
("base", "fourth_val", "value 1", "param 1", "22"),
("base", "fourth_val", "value 2", "param 1", "222"),
("base", "fourth_val", "value 3", "12")]
second_list = [
("base", "first_val", "value 1", "10000"), #changed
("base", "first_val", "value 2", "5555"), #changed
("base", "first_val", "value 3", "2"), #not changed
("base", "fourth_val", "value 1", "param 1", "22"), #not changed
("base", "fourth_val", "value 2", "param 1", "100000"), #changed
("base", "fourth_val", "value 3", "12") #not changed
]
changed = list(set(second_list) - set(first_list))
print(changed)
This outputs:
[('base', 'fourth_val', 'value 2', 'param 1', '100000'),
('base', 'first_val', 'value 1', '10000'),
('base', 'first_val', 'value 2', '5555')]
Repl Link

Multiple List in One Full List (Not list in a list) [duplicate]

This question already has answers here:
How to compile all lists in a column into one unique list
(4 answers)
Closed 2 years ago.
I have a list in each row in a dataframe and I want to merge it into one whole list.
data = {'row 1': ["text 1", "text 2", "text 3"],
'row 2': ["text 4", "text 5", "text 6"],'row 3':["text 7", "text 8", "text 9"]
}
dataframe = pd.DataFrame (data, columns = ['row 1','row 2','row 3'])
dataframe
expected output: ["text 1", "text 2", "text 3", "text 4", "text 5", "text 6", "text 7", "text 8", "text 9"]
I've tried the df.iterrows but it ended up as a list of multiple lists...
total =[]
for index, row in df.iterrows():
total.extend(row)
output: [["text 1", "text 2", "text 3"],["text 4", "text 5", "text 6"],["text 7", "text 8", "text 9"]]
Use .explode():
df.explode('col2').values.tolist()

Generating dictionary in for loop returning last iteration

I'm looping through the following JSON:
"item 1": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
},
"item 2": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
}
I'd like to make a dictionary with the values of the first two keys — property 1, property 2 — for each item (i.e. excluding property 3). The code that follows achieves the desired result, but only stores the most recent sequence:
for i in JSON:
value 1 = i["value 1"]
value 2 = i["value 2"]
...
JSON = json.dumps({'property 1':value 1,'property 2':value 2...})
return json.loads(JSON)
>> "item 2": {
"property 1": "value 1",
"property 2": "value 2" ...
# returns item 2, but I'd like item 1 also
How do I store each the output of each item without overwriting the other?
Using a simple Iteration.
Ex:
data = {"item 1": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
},
"item 2": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
}
}
d = {}
for i in data:
d[i] = {"property 1": data[i]["property 1"], "property 2": data[i]["property 2"]}
print( d )
Output:
{'item 2': {'property 1': 'value 1', 'property 2': 'value 2'}, 'item 1': {'property 1': 'value 1', 'property 2': 'value 2'}}
Here's another option using the items method and a dict comprehension:
data = {"item 1": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
},
"item 2": {
"property 1": "value 1",
"property 2": "value 2",
"property 3": "value 3"
}
}
new_data = {}
for index, value in data.items():
new_data.update({index: {k:v for k, v in value.items() if k != "property 3"}})
print(new_data)
Output:
{'item 1': {'property 1': 'value 1', 'property 2': 'value 2'},
'item 2': {'property 1': 'value 1', 'property 2': 'value 2'}}

Categories

Resources