Attempting to make a multi-column graph - python

I am trying to make a column graph where the y-axis is the mean grain size, the x-axis is the distance along the transect, and each series is a date and/or number value (it doesn't really matter).
I have been trying a few different methods in Excel 2010 but I cannot figure it out. My hope is that, lets say at the first location, 9, there will be three columns and then at 12 there will be two columns. If it matter at all, lets say the total distance is 50. The result of this data should have 7 sets of columns along the transect/x-axis.
I have tried to do this using python but my coding knowledge is close to nil. Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371, 1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
If someone happen to know of a code to use, it would be very helpful. A recommendation for how to do this in Excel would be awesome too.

There's a plotting library called seaborn, built on top of matplotlib, that does this in one line. Your example:
import numpy as np
import seaborn as sns
from matplotlib.pyplot import show
grainsize = [0.7912, 0.513, 0.4644, 1.0852, 1.8515, 1.812, 6.371,
1.602, 1.0251, 5.6884, 0.4166, 24.8669, 0.5223, 37.387, 0.5159, 0.6727]
series = [2, 3, 4, 1, 4, 2, 3, 4, 1, 4, 1, 4, 1, 4, 1, 4]
distance = [9, 9, 9, 12, 12, 15, 15, 15, 17, 17, 25, 25, 32.5, 32.5, 39.5, 39.5]
ax = sns.barplot(x=distance, y=grainsize, hue=series, palette='muted')
ax.set_xlabel('distance')
ax.set_ylabel('grainsize')
show()
You will be able to do a lot even as a total newbie by editing the many examples in the seaborn gallery. Use them as training wheels: edit only one thing at a time and think about what changes.

Related

Why do these two numpy.divide operations give such different results?

I would like to correct the values in hyperspectral readings from a cameara using the formula described over here;
the captured data is subtracted by dark reference and divided with
white reference subtracted dark reference.
In the original example, the task is rather simple, white and dark reference has the same shape as the main data so the formula is executed as:
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr),
np.subtract(white_nparr, dark_nparr))
However the main data is much larger in my experience. Shapes in my case are as following;
$ white_nparr.shape, dark_nparr.shape, data_nparr.shape
((100, 640, 224), (100, 640, 224), (4300, 640, 224))
that's why I repeat the reference arrays.
white_nparr_rep = white_nparr.repeat(43, axis=0)
dark_nparr_rep = dark_nparr.repeat(43, axis=0)
return np.divide(np.subtract(data_nparr, dark_nparr_rep), np.subtract(white_nparr_rep, dark_nparr_rep))
And it works almost perfectly, as can be seen in the image at the left. But this approach requires enormous amount of memory, so I decided to traverse the large array and replace the original values with corrected ones on-the-go instead:
ref_scale = dark_nparr.shape[0]
data_scale = data_nparr.shape[0]
for i in range(int(data_scale / ref_scale)):
data_nparr[i*ref_scale:(i+1)*ref_scale] =
np.divide
(
np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr),
np.subtract(white_nparr, dark_nparr)
)
But that traversal approach gives me the ugliest of results, as can be seen in the right. I'd appreciate any idea that would help me fix this.
Note: I apply 20-times co-adding (mean of 20 readings) to obtain the images below.
EDIT: dtype of each array is as following:
$ white_nparr.dtype, dark_nparr.dtype, data_nparr.dtype
(dtype('float32'), dtype('float32'), dtype('float32'))
Your two methods don't agree because in the first method you used
white_nparr_rep = white_nparr.repeat(43, axis=0)
but the second method corresponds to using
white_nparr_rep = np.tile(white_nparr, (43, 1, 1))
If the first method is correct, you'll have to adjust the second method to act accordingly. Perhaps
for i in range(int(data_scale / ref_scale)):
data_nparr[i*ref_scale:(i+1)*ref_scale] =
np.divide
(
np.subtract(data_nparr[i*ref_scale:(i+1)*ref_scale], dark_nparr[i]),
np.subtract(white_nparr[i], dark_nparr[i])
)
A simple example with 2-d arrays that shows the difference between repeat and tile:
In [146]: z
Out[146]:
array([[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
In [147]: np.repeat(z, 3, axis=0)
Out[147]:
array([[ 1, 2, 3, 4, 5],
[ 1, 2, 3, 4, 5],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[11, 12, 13, 14, 15],
[11, 12, 13, 14, 15]])
In [148]: np.tile(z, (3, 1))
Out[148]:
array([[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15],
[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
Off topic postscript: I don't know why the author of the page that you linked to writes NumPy expressions as (for example):
corrected_nparr = np.divide(
np.subtract(data_nparr, dark_nparr),
np.subtract(white_nparr, dark_nparr))
NumPy allows you to write that as
corrected_nparr = (data_nparr - dark_nparr) / (white_nparr - dark_nparr)
whick looks much nicer to me.

What is plotted when string data is passed to the matplotlib API?

# first, some imports:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Let's say I want to make a scatter plot, using this data:
np.random.seed(42)
x=np.arange(0,50)
y=np.random.normal(loc=3000,scale=1,size=50)
Plot via:
plt.scatter(x,y)
I get this answer:
Ok, let's create a dataframe first:
df=pd.DataFrame.from_dict({'x':x,'y':y.astype(str)})
(I am aware that I am storing y as str - this is a reproducible example, and I do this to reflect the real use case.)
Then, if I do:
plt.scatter(df.x,df.y)
I get:
What am I seeing in this second plot? I thought that the second plot must be showing the x column plotted against the y column, which are converted to float. This is clearly not the case.
Matplotlib doesn't automatically convert str values to numerical, so your y values are treated as categorical. As far as Matplotlib is concerned, the differences '1.0' to '0.9' and '1.0' to '100.0' are not different.
So, the y-axis on the plot will be the same as range(len(y)) (since the difference between all categorical values is the same) with labels assigned from the categorical values.
Since your x is a range equal to range(50), and now your y is a range too (also equal to range(50)), it plots x = y, with y-labels set to respective str value.
As per the excellent answer by dm2, when you pass y as a string, y is simply being treated as arbitrary string labels, and being plotted one after the other in the order in which they appear. To demonstrate, here's an even simpler example.
from matplotlib import pyplot as plt
x = [1, 2, 3, 4]
y = [5, 25, 10, 1] # these are ints
plt.scatter(x, y)
So far so good. Now, different string y values.
y = list("abcd")
plt.scatter(x, y)
You can see how it just takes the y labels and just drops them on the axis one after another.
Finally,
y = ["5", "25", "10", "1"]
plt.scatter(x, y)
Compare this with the previous results and now it should become obvious what's going on.
It's more obvious if the labels and locations are extracted, that the API plots the strings as labels, and the axis locations are 0 indexed numbers based on the how many (len) categories exist.
.get_xticks() and .get_yticks() extract a list of the numeric locations.
.get_xticklabels() and .get_yticklabels() extract a list of matplotlib.text.Text, Text(x, y, text).
There are fewer numbers in the list for the y axis because there were duplicate values as a result of rounding.
This applies to any APIs, like seaborn or pandas that use matplotlib as the backend.
sns.scatterplot(data=df, x='x_num', y='y', ax=ax1)
ax1.scatter(data=df, x='x_num', y='y')
ax1.plot('x_num', 'y', 'o', data=df)
Labels, Locs, and Text
print(x_nums_loc)
print(y_nums_loc)
print(x_lets_loc)
print(y_lets_loc)
print(x_lets_labels)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[Text(0, 0, 'A'), Text(1, 0, 'B'), Text(2, 0, 'C'), Text(3, 0, 'D'), Text(4, 0, 'E'),
Text(5, 0, 'F'), Text(6, 0, 'G'), Text(7, 0, 'H'), Text(8, 0, 'I'), Text(9, 0, 'J'),
Text(10, 0, 'K'), Text(11, 0, 'L'), Text(12, 0, 'M'), Text(13, 0, 'N'), Text(14, 0, 'O'),
Text(15, 0, 'P'), Text(16, 0, 'Q'), Text(17, 0, 'R'), Text(18, 0, 'S'), Text(19, 0, 'T'),
Text(20, 0, 'U'), Text(21, 0, 'V'), Text(22, 0, 'W'), Text(23, 0, 'X'), Text(24, 0, 'Y'),
Text(25, 0, 'Z')]
Imports, Data, and Plotting
import numpy as np
import string
import pandas as pd
import matplotlib.pyplot as plt
import string
# sample data
np.random.seed(45)
x_numbers = np.arange(100, 126)
x_letters = list(string.ascii_uppercase)
y= np.random.normal(loc=3000, scale=1, size=26).round(2)
df = pd.DataFrame.from_dict({'x_num': x_numbers, 'x_let': x_letters, 'y': y}).astype(str)
# plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3.5))
df.plot(kind='scatter', x='x_num', y='y', ax=ax1, title='X Numbers', rot=90)
df.plot(kind='scatter', x='x_let', y='y', ax=ax2, title='X Letters')
x_nums_loc = ax1.get_xticks()
y_nums_loc = ax1.get_yticks()
x_lets_loc = ax2.get_xticks()
y_lets_loc = ax2.get_yticks()
x_lets_labels = ax2.get_xticklabels()
fig.tight_layout()
plt.show()

How to create an astrology chart in Python from planetary data (derived from a custom algorithm) that dynamically updates with user input?

Using my algorithm, I can take space-time data as an input list & generate accurate planetary & astrological data as a 2D lists as follows:
Input_list: [Year, Month, Day, Hour, Minute, Second, Latitude, Longitude, Altitude]
Input_sample: [1990, 4, 9, 21, 12, 30, 22.51361111111111, 88.3411111111111, 9.14]
Output:2 2D lists "GRAHA POSITIONS" & "HOUSE CUSPS"
--------------------GRAHA POSITIONS-----------------------------
[Planet, Z.sign, degree, minute, second, nakshatra, pada, angle_in_float]
['Sun', 'Aries', 19, 30, 30, 'Revati', 4, 19.508579713622044]
['Moon', 'Libra', 14, 36, 20, 'Chitra', 1, 14.60574883925213]
['Mercury', 'Taurus', 8, 14, 2, 'Bharani', 2, 8.23414995759422]
['Venus', 'Pisces', 3, 21, 55, 'Shatavishak', 2, 3.3653055864194243]
['Mars', 'Aquarius', 21, 33, 43, 'Dhanistha', 3, 21.562001016537806]
['Jupiter', 'Cancer', 3, 45, 55, 'Ardra', 3, 3.765369522224674]
['Saturn', 'Capricorn', 24, 49, 1, 'U.Ashadha', 3, 24.81699905482151]
['Rahu', 'Aquarius', 14, 21, 35, 'Dhanistha', 1, 14.359770567077646]
['Ketu', 'Leo', 14, 21, 35, 'Ashlesha', 3, 14.359770567077646]
----------------------HOUSE CUSPS-------------------------------
[Cusp, Z.sign, degree, minute, second, nakshatra, pada, angle_in_float]
[1, 'Scorpio', 29, 16, 11, 'Anuradha', 2, 29.269861999821757]
[2, 'Sagittarius', 28, 55, 44, 'Moola', 3, 28.929165427870714]
[3, 'Aquarius', 0, 37, 54, 'Sravana', 1, 0.6317628121900043]
[4, 'Pisces', 3, 49, 57, 'Shatavishak', 3, 3.8325043513459605]
[5, 'Aries', 5, 53, 38, 'U.Bhadrapada', 4, 5.893938324894634]
[6, 'Taurus', 4, 20, 36, 'Bharani', 1, 4.343341833871818]
[7, 'Taurus', 29, 16, 11, 'Krittika', 4, 29.269861999821728]
[8, 'Gemini', 28, 55, 44, 'Ardra', 1, 28.929165427870714]
[9, 'Leo', 0, 37, 54, 'Pushya', 3, 0.6317628121900043]
[10, 'Virgo', 3, 49, 57, 'P.Phalguni', 1, 3.832504351345932]
[11, 'Libra', 5, 53, 38, 'Hasta', 2, 5.893938324894634]
[12, 'Scorpio', 4, 20, 36, 'Svati', 3, 4.34334183387179]
Now I want to use this data to create visualizations as as South Indian horoscope chart. I've added an example chart to show what I want to achieve.Sample chart
In the image, the chart background with zodiac symbols in the bottom right is static, I can simply use an image for that. The planets & asc symbol i.e. HOUSE CUSPS [0][0] (red slanted line in Scorpio) are vectors, which I'd like to import into my program & assign to planet names i.e. GRAHA POSITIONS[i][0] from the above 2D list.
The number above a planet is its angle in the rashi i.e. GRAHA POSITIONS[i][2,3,4] & the text below is its Nakshatra & pada i.e. GRAHA POSITIONS[i][5,6].
The little black numbers in the bottom left of a rashi are the cusps in that rashi i.e. HOUSE CUSPS[i][0].
Is it reasonable to try to build this idea in Python? If yes, then how? If no, then please guide me.
I'm an amateur programmer by need, with only few months of experience in Python. I'm very eager to build this software for research in statistical astrology aimed to predict natural calamities decades ahead & at a fraction of the current cost.
Thank you for any help in advance.
Yes you can most certainly do this in Python and I would suggest looking into Pillow/PIL. Basically you need to look at this as more of image manipulation rather than a chart/graph. You start with a basic background image and add all details on top of it.
Just out of curiosity, how are you generating the planetary & astrological data?

How to Reccurently Transpose A Series/List/Array

I have a array/list/pandas series :
np.arange(15)
Out[11]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
What I want is:
[[0,1,2,3,4,5],
[1,2,3,4,5,6],
[2,3,4,5,6,7],
...
[10,11,12,13,14]]
That is, recurently transpose this columns into a 5-column matrix.
The reason is that I am doing feature engineering for a column of temperature data. I want to use last 5 data as features and the next as target.
What's the most efficient way to do that? my data is large.
If the array is formatted like this :
arr = np.array([1,2,3,4,5,6,7,8,....])
You could try it like this :
recurr_transpose = np.matrix([[arr[i:i+5] for i in range(len(arr)-4)]])

3d numpy record array

Is is possible to have a 3-D record array in numpy? (Maybe this is not possible, or there is simply an easier way to do things too -- I am open to other options).
Assume I want an array that holds data for 3 variables (say temp, precip, humidity), and each variable's data is actually a 2-d array of 2 years (rows) and 6 months of data (columns), I could create that like this:
>>> import numpy as np
>>> d = np.array(np.arange(3*2*6).reshape(3,2,6))
>>> d
#
# comments added for explanation...
# jan feb mar apr may Jun
array([[[ 0, 1, 2, 3, 4, 5], # yr1 temp
[ 6, 7, 8, 9, 10, 11]], # yr2 temp
[[12, 13, 14, 15, 16, 17], # yr1 precip
[18, 19, 20, 21, 22, 23]], # yr2 precip
[[24, 25, 26, 27, 28, 29], # yr1 humidity
[30, 31, 32, 33, 34, 35]]]) # yr2 humidity
I'd like to be able to type:
>>> d['temp']
and get this (the first "page" of the data):
>>> array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
or:
>>> d['Jan'] # assume months are Jan-June
and get this
>>> array([[0,6],
[12,18],
[24,30]])
I have been through this: http://www.scipy.org/RecordArrays a number of times, but don't see how set up what I am after.
Actually, you can do something similar to this with structured arrays, but it's generally more trouble than it's worth.
What you want is basically labeled axes.
Pandas (which is built on top of numpy) provides what you want, and is a better choice if you want this type of indexing. There's also Larry (for labeled array), but it's largely been superseded by Pandas.
Also, you should be looking at the numpy documentation for structured arrays for info on this, rather than an FAQ. The numpy documentation has considerably more information. http://docs.scipy.org/doc/numpy/user/basics.rec.html
If you do want to take a pure-numpy route, note that structured arrays can contain multidimensional arrays. (Note the shape argument when specifying a dtype.) This will rapidly get more complex than it's worth, though.
In pandas terminology, what you want is a Panel. You should probably get familiar with DataFrames first, though.
Here's how you'd do it with Pandas:
import numpy as np
import pandas
d = np.array(np.arange(3*2*6).reshape(3,2,6))
dat = pandas.Panel(d, items=['temp', 'precip', 'humidity'],
major_axis=['yr1', 'yr2'],
minor_axis=['jan', 'feb', 'mar', 'apr', 'may', 'jun'])
print dat['temp']
print dat.major_xs('yr1')
print dat.minor_xs('may')

Categories

Resources