Background
I have a set of financial data I am trying to analyze and display graphically. The data consists of high resolution price data for a number of commodity contracts. These contracts are specified by having both a product (the actual commodity in question) and a tenor (a specified delivery date for the contract). The combination of a product and tenor therefore gives a fully defined commodity contract, which has a current market price, as follows
T Product 1 Product 2
--------- ---------------------------- ----------------------------
Tenor 1 $ Commodity contract price $ Commodity contract price
for product 1 at tenor 1 for product 2 at tenor 1
Tenor 2 $ Commodity contract price $ Commodity contract price
for product 1 at tenor 2 for product 1 at tenor 2
Using some example data, looking at the September, October and November contracts for two grades of crude oil, Brent and WTI, and the current market prices would give us something like this.
T Brent WTI
----------- --------- ---------
Sep 2020 $37.25 $33.40
Oct 2020 $38.10 $33.75
Nov 2020 $38.85 $34.15
But of course these prices aren’t static as they are moving with market forces, so we have a snapshot of these prices every t seconds. Let’s say the above is our starting point, but at t+1 Brent prices have moved a dollar and WTI by 50c, so our prices now look like this
T + 1 Brent WTI
----------- --------- ---------
Sep 2020 $38.25 $33.90
Oct 2020 $39.10 $34.25
Nov 2020 $39.85 $34.65
This is broadly the structure of the dataset, though the scope is of course much larger.
What I am trying to do and how far I have gotten
I have a specific visualization of this data (or subsets thereof) that I have been working towards. I am trying to produce what I would call a 3D or extruded polar chart, which would have contract values for different products as the distance from the centre of the polar chart, would extend along the Z-axis to represent the different tenors, and that would animate to display the change in contract values over time. I have achieved 2/3rds of this using the polar chart function of ploty express, to produce a 2D animated polar chart that displays the contract values for 6 products for 1 tenor over time. The following two images show two frames from this chart
June14 contract price for Brent, WTI, barges, 180, 380, GC on date of 2014-02-14
June14 contract price for Brent, WTI, barges, 180, 380, GC on date of 2014-03-27
However, as far as I can tell this function cannot be extended to 3d, so I think I do have to start fresh. A diagram of what I’m trying to achieve is as follows:
Desired result
I am aiming for a result similar to a very basic Streamtube (see https://plotly.com/python/streamtube-plot/?_ga=2.134282552.895284899.1586457324-1038545846.1585071729 and https://medium.com/plotly/streamtubes-in-plotly-with-python-and-r-a30216ef20a3), but the cross sectional polygon shape of a streamtube seems to always have a fixed radius for each point from the centre, not varying as in my use case, so I don’t believe I can use this pre-built function. Additionally these work using vector fields to draw the plot, so there would be some complex and unhelpful reverse engineering of my data to produce the correct inputs, which indicates to me that this is the wrong route. I believe my best bet is using contour maps in either matplotlib or ploty, but I have hit a bit of brick wall on how to do this.
Any suggestions would be very appreciated.
Related
My problem looks like this: A movie theatre is showing a set of n films over a 5 day period. Each movie has a corresponding IMBD score. I want to watch one movie per day over the 5 day period, maximising the cumulative IMBD score whilst making sure that I watch the best movies first (i.e. Monday's movie will have a higher score than Tuesday's movie, Tuesday's higher than Wednesday's etc.). An extra constraint if that the theatre doesn't show every movie every day. For example.
Showings:
Monday showings: Sleepless in Seattle, Multiplicity, Jaws, The Hobbit
Tuesday showings: Sleepless in Seattle, Kramer vs Kramer, Jack Reacher
Wednesday showings: The Hobbit, A Star is Born, Joker
etc.
Scores:
Sleepless in Seattle: 7.0
Multiplicity: 10
Jaws: 9.2
The Hobbit: 8.9
A Star is Born: 6.2
Joker: 5.8
Kramer vs Kramer: 8.7
etc.
The way I've thought about this is that each day represents a variable: a,b,c,d,e and we are maximising (a+b+c+d+e) where each variable represents a day of the week and to make sure that I watch the movies in descending order in terms of IMBD rank I would add a constraint that a > b > c > d > e. However using the linear solver as far as I can tell you cannot specify discrete values, just a range as the variables will be chosen from a continuous range (in an ideal world I think the problem would look like "solve for a,b,c,d,e maximising their cumulative sum, while ensuring a>b>c>d>e where a can be this set of possible values, b can be this set of possible values etc.). I'm wondering if someone can point me in the right direction of which OR-Tools solver (or another library) would be best for this problem?
I tried to use the GLOP linear solver to solve this problem but failed. I was expecting it to solve for a,b,c,d, and e but I couldn't write the necessary constraints with this paradigm.
Let's say I have a dataset of scientific publications, their Country, and a score metric that evaluates the quality of the publications. Sort of like the following:
Paper
Country
Score
Pub 1
USA
5
Pub 2
China
7
Pub 3
Japan
9
Pub 4
China
4
I want to generate a map that is colored based on total score per country. For example, China would have the highest color score of 11, Japan next with 9, and USA last with 5. Additionally, I would like to generate another map that is colored based on total paper counts per country. In this case, China would have the highest color with a count of 2 papers, and Japan/USA would be tied with a count of 1 paper.
The code to use is as follows:
fig = px.choropleth(df, locations = df['Country'], color=???)
My problem is that it seems that the color argument requires me to pass a column from my source data without performing any aggregation functions (ie. sum/count).
Rather than base my color on a certain column cell value, I would like to base it on an aggregation of the column data. I know a workaround is to create a brand new dataframe that has the data aggregated, and then pass that, but I am wondering if this can be done natively in Dash without having to create a new dataframe per aggregation. I wasn't able to find any examples of this behavior in the Dash documentation. Any help is much appreciated!
How about something like
fig = px.choropleth(df[['Country', 'Score']].groupby('Country').sum().reset_index(),
locations = 'Country',
color='Score')
See example from Plotly here where you only need pass the column names.
I have a sample of companies with financial figures which I would like to compare. My data looks like this:
Cusip9 Issuer IPO Year Total Assets Long-Term Debt Sales SIC-Code
1 783755101 Ryerson Tull Inc 1996 9322000.0 2632000.0 633000.0 3661
2 826170102 Siebel Sys Inc 1996 995010.0 0.0 50250.0 2456
3 894363100 Travis Boats & Motors Inc 1996 313500.0 43340.0 23830.0 3661
4 159186105 Channell Commercial Corp 1996 426580.0 3380.0 111100.0 7483
5 742580103 Printware Inc 1996 145750.0 0.0 23830.0 8473
For every company I want to calculate a "similarity Score". This score should indicate the comparability with other companies. Therefore I want to compare them in different financial figures. The comparability should be expressed as the euclidean distance, the square root of the sum of the squared differences between the financial figures, to the "closest company". So I need to calculate the distance to every company, that fits these conditions, but only need the closest score. Assets of Company 1 minus Assets of Company 2 plus Debt Company 1 minus Debt Comapny 2....
√((x_1-y_1 )^2+(x_2-y_2 )^2)
This should only be computed for companies with the same SIC-Code and the IPO Year of the comparable companies should be smaller then for the company for which the "Similarity score" is computed. I only want to compare these companies with already listed companies.
Hopefully, my point get's clear. Has someone any idea where I can start? I am just starting with programming and completely lost with this.
Thanks in advance.
I would first create different dataframes according to the SIC-code, so every new dataframe only contains companies with the same SIC-code. Then for every of those dataframes, just double loop over the companies and compute the scores, and store them in a matrix. (So you'll end up with a symmetrical matrix of scores.)
try this , Here I have taken Compare the company with IPO Year Equal to or Smaller then since You didn't give any company record with smaller IPO year) You can change it to only Smaller than (<) in statement Group=df[...]
def closestCompany(companyRecord):
Group = df[(df['SIC-Code']==companyRecord['SIC-Code']) & (df['IPO Year'] <= companyRecord['IPO Year']) & (df['Issuer'] != companyRecord['Issuer'])]
return (((Group['Total Assets']-companyRecord['Total Assets'])**2 + (Group['Long-Term Debt'] - companyRecord['Long-Term Debt'])**2)**0.5).min()
df['Closest Company Similarity Score']=df.apply(closestCompany, axis=1)
df
I have been working with a dataset which contains information about houses that have been sold on a particular market. There are two columns, 'price', and 'date'.
I would like to make a line plot to show how the prices of this market have chaged over time.
The problem is, I see that some houses have been sold at the same date but with a diferent price.
So ideally i would need to get the mean/average price of the house sold on each date before plotting.
So for example, if I had something like this:
DATE / PRICE
02/05/2015 / $100
02/05/2015 / $200
I would need to get a new row with the following average:
DATE / PRICE
02/05/2015 / $150
I just havent been able to figure it out yet. I would appreciate anyone who could guide me in this matter. Thanks in advance.
Assuming you're using pandas:
pd.groupby('DATE')['PRICE'].mean()
I have lines of text containing multiple variables which correspond to a specific entry.
I have been trying to use regular expressions, such as the one below, with mixed success (lines are quite standardised but do contain typos and inconsistencies)
re.compile('matching factor').findall(input)
I was wondering what is the best way to approach this case, what data structures to use and how to loop it to go though multiple lines of text. Here is the sample of the text, with highlighted data I would like to scrape:
CHINA: National Grain Trade Centre: in auction of state reserves, govt. sold 70,418 t wheat (equivalent to 3.5% of total volume offered) at an average price of CNY2,507/t ($378.19) and 4,359 t maize (4.7%), at an average price of CNY1,290/t ($194.39). Separately, sold 2,100 t of 2013 wheat imports (1.5%) at CNY2,617/t ($394.25). 23 Oct
I am interested to create a data set containing variable such as:
VOLUME - COMMODITY - PERCENTAGE SOLD - PRICE - DATE