Find intersection point between two curves [duplicate] - python

This question already has an answer here:
How to find the exact intersection of a curve (as np.array) with y==0?
(1 answer)
Closed last year.
I have two curves (supply and demand) and I want to find their intersection point (both, x and y). I was not able to find a simple solution for the mentioned problem. I want my code in the end to print what is the value of X and what is the value of Y.
supply = final['0_y']
demand = final['0_x']
price = final[6]
plt.plot(supply, price)
plt.plot(demand, price)
The main problem and challenge (something wrong) are that I have tried every other method, and every single time I get an empty set/list. Even when I try to visualize the intersection, I also get empty visual.
GRAPH:

As the implementation of the duplicate is not straightforward, I will show you how you can adapt it to your case. First, you use pandas series instead of numpy arrays, so we have to convert them. Then, your x- and y-axes are switched, so we have to change their order for the function call:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
final = pd.DataFrame({'0_y': [0, 0, 0, 10, 10, 30],
'0_x': [20, 11, 10, 4, 1, 0,],
"6": [-200, 50, 100, 200, 600, 1000]})
supply = final['0_y'].to_numpy()
demand = final['0_x'].to_numpy()
price = final["6"].to_numpy()
plt.plot(supply, price)
plt.plot(demand, price)
def find_roots(x,y):
s = np.abs(np.diff(np.sign(y))).astype(bool)
return x[:-1][s] + np.diff(x)[s]/(np.abs(y[1:][s]/y[:-1][s])+1)
z = find_roots(price, supply-demand)
x4z = np.interp(z, price, supply)
plt.scatter(x4z, z, color="red", zorder=3)
plt.title(f"Price is {z[0]} at supply/demand of {x4z[0]}")
plt.show()
Sample output:

Related

Plotly: How to adjust size of markers in a scatter geo map so that differences become more visible?

I used library Plotly in Python: chart: Bubble-Map to display average_score of countries. However, average_score has values between 2-4 and therefore the size of the bubbles in the bubble-map chart does not differentiate much (the size of bubbles is very similar). How can I achieve a bigger difference among bubbles with values 2-4?
Here is my piece of code:
plot = px.scatter_geo(df, locations="country_code", color="country_code", size_max=20,hover_name="country_code", size="avg_score", animation_frame="year", projection="natural earth",title="Bubble Map",labels={"country_code": "Country"})
I would raise your raw data to the power of a number that suits the visualization you're aiming to build. This makes larger numbers look disproportionally larger than sall numbers. Compare the two plots below where the first is the px.scatter_geo() example from the docs, and the second where the same data for population df['pop'] has been replaced with df['pop']**1.6.
1. Raw data for df['pop']
1. Data for df['pop'] raised to the power of 1.6
Of course these numbers have no other business in the figure, so you will have to include the following in order to keep the correct hoverinfo:
fig.update_traces(hovertemplate = 'pop=%{text}<br>iso_alpha=%{location}<extra></extra>', text = df['pop'])
Complete code:
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
df['pop_display'] = df['pop']**1.6
fig = px.scatter_geo(df, locations="iso_alpha",
size="pop_display",
)
fig.update_traces(hovertemplate = 'pop=%{text}<br>iso_alpha=%{location}<extra></extra>', text = df['pop'])
fig.show()
What you're asking is not really a Plotly question but a general math question.
Given the inputs [2,3,4] step=1. return corresponding integers that have step > 1. There are multiple ways you can accomplish this:
One way is to multiply all items by an integer.
[2,3,4] * 2 = [4, 6, 8] # step=2
[2,3,4] * 3 = [6, 9, 12] # step=3
...
In this case the difference between new values will rise linearly. Meaning that the step between all values will remain constant.
If you want step to grow in a non linear way you can square the items:
[2,3,4]^2 = [4, 9, 16] # step=5, 7...
[2,3,4]^3= [8, 27, 64] # step=19,37...
...
Possibilities are really endless. It all depends on what kind of difference you want between the bubbles. In code, quick and dirty solution will look something like this:
plot = px.scatter_geo(df,
locations="country_code",
color="country_code",
size_max=20,
hover_name="country_code",
size=df["avg_score"]**2,
animation_frame="year",
projection="natural earth",
title="Bubble Map",
labels={"country_code": "Country"}
)

I want to detect ranges with the same numerical boundaries of a dataset using matplotlib or pandas in python 3.7

I have a ton of ranges. They all consist of numbers. The range has a maximum and a minimum which can not be exceeded, but given the example that you have two ranges and one max point of the range reaches above the min area of the other. That would mean that you have a small area that covers both of them. You can write one range that includes the others.
I want to see if some ranges overlap or if I can find some ranges that cover most of the other. The goal would be to see if I can simplify them by using one smaller range that fits inside the other. For example 7,8 - 9,6 and 7,9 - 9,6 can be covered with one range.
You can see my attempt to visualize them. But when I use my entire dataset consisting of hundreds of ranges my graph is not longer useful.
I know that I can detect recurrent ranges using python. But I don't want to know how often a range occurs. I want to know how many ranges lay in the same numerical boundaries.I want see if I can have a couple of ranges covering all of them. Finally my goal is to have the masterranges sorted in categories. Meaning that I have range 1 covering 50 other ranges. then range 2 covering 25 ranges and so on.
My current program shows the penetration of ranges but I also want that in a printed output with the exact digits.
It would be nice if you share some ideas to solve that program or if you have any suggestions on tools within python 3.7
import matplotlib.pyplot as plt
intervals = [[3.6,4.5],
[3.6,4.5],
[7.8,9.6],
[7.9,9.6],
[7.8,9.6],
[3.4,4.1],
[2.8,3.4],
[8.25,9.83],
[3.62,3.96],
[8.25,9.83],
[0.62,0.68],
[2.15,2.49],
[0.8,1.0],
[0.8,1.0],
[3.1,3.9],
[6.7,8.3],
[1,1.5],
[1,1.2],
[1.5,1.8],
[1.8,2.5],
[3,4.0],
[6.5,8.0],
[1.129,1.35],
[2.82,3.38],
[1.69,3.38],
[3.38,6.21],
[2.25,2.82],
[5.649,6.214],
[1.920,6.214]
]
for int in intervals:
plt.plot(int,[0,0], 'b', alpha = 0.2, linewidth = 100)
plt.show()
Here is an idea, You make a pandas data frame with the array. You substract the values in column2 - colum1 ( column 1 is x, and column 2 is y ). After that you create a histogram in which you take the range and the frecuency.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
intervals = [[3.6,4.5],
[3.6,4.5],
[7.8,9.6],
[7.9,9.6],
[7.8,9.6],
[3.4,4.1],
[2.8,3.4],
[8.25,9.83],
[3.62,3.96],
[8.25,9.83],
[0.62,0.68],
[2.15,2.49],
[0.8,1.0],
[0.8,1.0],
[3.1,3.9],
[6.7,8.3],
[1,1.5],
[1,1.2],
[1.5,1.8],
[1.8,2.5],
[3,4.0],
[6.5,8.0],
[1.129,1.35],
[2.82,3.38],
[1.69,3.38],
[3.38,6.21],
[2.25,2.82],
[5.649,6.214],
[1.920,6.214]]
intervals_ar = np.array(intervals)
df = pd.DataFrame({'Column1': intervals_ar[:, 0], 'Column2': intervals_ar[:, 1]})
df['Ranges'] = df['Column2'] - df ['Column1']
print(df)
frecuency_range = df['Ranges'].value_counts().sort_index()
print(frecuency_range)
df.Ranges.value_counts().sort_index().plot(kind = 'hist', bins = 5)
plt.title("Histogram Frecuency vs Range (column 2- column1)")
plt.show()

How to find the most frequent value in a column using np.histogram()

I have a DataFrame in which one column contains different numerical values. I would like to find the most frequently occurring value specifically using the np.histogram() function.
I know that this task can be achieved using functions such as column.value_counts().nlargest(1), however, I am interested in how the np.histogram() function can be used to achieve this goal.
With this task I am hoping to get a better understanding of the function and the resulting values, as the description from the documentation (https://numpy.org/doc/1.18/reference/generated/numpy.histogram.html) is not so clear to me.
Below I am sharing an example Series of values to be used for this task:
data = pd.Series(np.random.randint(1,10,size=100))
This is one way to do it:
import numpy as np
import pandas as pd
# Make data
np.random.seed(0)
data = pd.Series(np.random.randint(1, 10, size=100))
# Make bins
bins = np.arange(data.min(), data.max() + 2)
# Compute histogram
h, _ = np.histogram(data, bins)
# Find most frequent value
mode = bins[h.argmax()]
# Mode computed with Pandas
mode_pd = data.value_counts().nlargest(1).index[0]
# Check result
print(mode == mode_pd)
# True
You can also define bins as:
bins = np.unique(data)
bins = np.append(bins, bins[-1] + 1)
Or if your data contains only positive numbers you can directly use np.bincount:
mode = np.bincount(data).argmax()
Of course there is also scipy.stats.mode:
import scipy.stats
mode = scipy.stats.mode(data)[0][0]
It can be done with:
hist, bin_edges = np.histogram(data, bins=np.arange(0.5,10.5))
result = np.argmax(hist)
You just need to read documentation more carefully. It says that if bins is [1, 2, 3, 4] then first bin is [1, 2), second is [2, 3) and third is [3, 4).
We calculate which amount of numbers are in bins [0.5, 1.5), [1.5, 2.5), ..., [8.5, 9.5) specifically in your problem and choose index of the maximum one.
Just in case, it's worth to use
np.unique(data)[np.argmax(hist)]
if we are not sure that your sorted data set np.unique(data) includes all the consecutive integers 0, 1, 2, 3, ...

I want to calculate occurence of data over a range (period of 10) in Python

I have data in numeric form so I want to calculate occurrence of data in a range (period of 10). I have created a Python script. The
original script is very long because of large dataset so I am putting here a sample code.
In the actual code malware_opcd_frq list size is approx 19000 and bins list [0,11,21,31...........13991,14000]
opcode_frequency.py
import numpy as np
malware_opcd_frq = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,4,4,5,5,6,6,6,19,25,26,28,29,35,41,43,43,49,54,57,60,71,78,79,81,81,92,99,99,105,107,109,119,129,134,142,142,145,146,150,158,166,166,171,172,173,180,183,186,187,191,191,192,192,192,192,192,192,192,192,192,192,192,192,192,192,192,192,192,192,195,198,199,203,209,209,217,217,220,220,225,226,226,226,226,226,226,226,226,226,226,226,228,234,234,235,236,236,236,237,237,239,240,241,241,243,244,244,245,245,245,245,246,247,248,250,253,256,257,258,259,259,260,262,264,264,267,267,267,269,270,270,272,273,274,275,278,279,284,295,295,300])
frq = np.histogram(malware_opcd_frq, bins= [0,11,21,31,41,51,61,71,81,91,101,111,121,131,141,151,161,171,181,191,201,211,221,231,241,251,261,271,281,291,300])
print frq
so after execution of the actual code which I have gives output like this
(array([29, 1, 4, ..., 5, 9, 7]), array([ 0, 11, 21, ..., 13981, 13991, 14000]))
In the above output but I need full output — but I'm not getting it. Please explain what I need to do.
The "..." that you see are just a way to tell you that the variable is very big. Therefore you see just a small part of your variable.
frq is a tuple, where frq[0] are the amount of values in each bin and frq[1] are the bins that you use.
You can make the plot of all your data with:
import matplotlib.pyplot as plt
plt.plot(frq[1][1::],frq[0])

Seaborn tsplot - y-axis scale

import seaborn as sns
import matplotlib.pyplot as plt
sns.tsplot(data = df_month, time = 'month', value = 'pm_local');
plt.show()
Using this code I get this blank plot, I presume because of the scale of the y-axis. I don't know why this is, here are the first 5 rows of my dataframe (which consists of 12 rows - 1 row for each month):
How can I fix this?
I think the problem is related to the field unit. The function expects in the case of data passed as DataFrame a unit indicated which subject the data belongs to. This function behavior is not obvious for me, but see this example.
# Test Data
df = pd.DataFrame({'month': [1, 2, 3, 4, 5, 6],
'value': [11.5, 9.7, 12, 8, 4, 12.3]})
# Added a custom unit with a value = 1
sns.tsplot(data=df, value='value', unit=[1]*len(df), time='month')
plt.show()
You can also use extract a Series and plot it.
sns.tsplot(data=df.set_index('month')['value'])
plt.show()
I had this same issue. In my case it was due to incomplete data, such that every time point had at least one missing value, causing the default estimator to return NaN for every time point.
Since you only show the first 5 records of your data we can't tell if your data has the same issue. You can see if the following fix works:
from scipy import stats
sns.tsplot(data = df_month, time = 'month',
value = 'pm_local',
estimator = stats.nanmean);

Categories

Resources