Python: Predicting series of numbers without INPUT to a NN - python

I have a random list of series (integers) along with dates in a csv like:
1/1/2019,34 44 57 62 70
12/28/2018,09 10 25 37 38
12/25/2018,02 08 42 43 50
12/21/2018,10 13 61 62 70
12/18/2018,13 22 32 60 69
12/14/2018,05 22 26 43 49
12/11/2018,04 38 39 54 59
12/7/2018,04 10 20 33 57
12/4/2018,28 31 41 42 50
The list goes all the way back to year 1997. What I am trying is to predict the next series (or as closest as possible) based on these data:
The size of the list (2336)
What have I tried?
The approach that I've used so far is (e.g. for 1/1/2019,34 44 57 62 70):
1) Get the occurrence of each number in the list, i.e. the number 34 has occurred 170 times out the total list (2336).
2) Find the percentage of each number that has occurred. i.e.
Perc/Chances(34) = Occurrence/TotalNo.
Chances(34) = 170/2336
Chances(34) = 0.072 ~ 07
One way to get the list would be to just find the 5 numbers from the list with the least Percentages. but that won't be much effective.
On the other hand, Now I have a data which has each number, its percentage and its occurrence. Is there any way I can somehow train a neural network that predicts the next series? or closest.
Hierarchy:
Where comp_data.csv contains data like:
1/1/2019,34 44 57 62 70
12/28/2018,09 10 25 37 38
12/25/2018,02 08 42 43 50
12/21/2018,10 13 61 62 70
12/18/2018,13 22 32 60 69
12/14/2018,05 22 26 43 49
12/11/2018,04 38 39 54 59
12/7/2018,04 10 20 33 57
12/4/2018,28 31 41 42 50
and occurrence.csv contains:
34,170
44,197
57,36
62,38
70,37
09,186
10,210
25,197
37,185
38,206
02,217
08,185
and report.csv contains the number, occurrence and its percentage:
34,3,11
44,1,03
57,5,19
62,5,19
70,5,19
09,1,03
10,5,19
25,2,07
37,3,11
38,2,07
02,1,03
08,2,07
So I have the list of series, its occurrences over a period of time, and the percentages. Is there anyway I can create a NN that expects some INPUTS trains over a data and predicts the OUT (a series in this case)
The Problem:
Which ones would be the Input? As it is a pure random problem. PS. I cannot provide any Input since I need a series without INPUT. Perhaps, a LSTM Network for Regression?

Related

How do I make a make a simple contour chart of a Pandas DataFrame with numeric cell values as Z and labeled rows/columns as X and Y coordinates?

I have a Pandas DataFrame, luminance_df, that looks like this:
barelyvisible
ultralight
light
abitlight
medium
abitdark
dark
evendarker
ultradark
almostblack
orange
96
92
83
72
61
53
48
40
34
28
gold
96
89
77
65
56
50
44
37
31
26
yellow
95
88
77
64
53
47
40
33
29
26
chartreuse
95
89
80
67
55
44
35
27
23
20
green
97
93
85
73
58
45
36
29
24
20
forest
96
90
80
67
52
39
30
24
20
16
aqua
97
89
78
64
50
40
32
26
22
19
teal
96
90
82
69
53
43
36
31
27
24
lightblue
97
94
86
74
60
48
39
32
27
24
blue
97
93
87
78
68
60
53
48
40
33
indigo
97
94
89
82
74
67
59
51
41
34
purple
98
95
92
85
76
66
58
50
42
35
royalpurple
98
95
92
85
75
65
56
47
39
32
magenta
98
95
91
83
73
61
49
40
33
28
pink
97
95
90
82
70
60
51
42
35
30
dustypink
97
95
90
82
71
60
50
41
35
30
red
97
94
89
82
71
60
51
42
35
31
So far, I'm building a single multi-chart HTML file like this:
with open(os.path.join(cwd, 'testout.html'), 'w') as outfile:
outfile.write("<p> </p><hr/><p> </p>".join(['<h1>Colors</h1>'+hex_styler.to_html(), '<h1>Hue</h1>'+hue_styler.to_html(), '<h1>Saturation</h1>'+saturation_styler.to_html(
), '<h1>Luminance</h1>'+luminance_styler.to_html(), '<h1>Perceived Brightness</h1>'+perceived_brightness_pivot_styler.to_html(), '<h1>Base Data</h1>'+basic_df.to_html()]))
I'd like to display an elevation/contour style map of the Luminance right after luminance_styler.to_html(), a lot like this one that I produced in Excel:
I'd like the colors to stay sorted "top to bottom" as values on a y-axis and the darknesses to stay sorted "left to right" as values on an x-axis, just like in the example above.
Question
I'm not a data scientist, nor do I use Python terribly regularly. I'm proud of myself for having made luminance_df in the first place, but I am not, for the life of me, figuring out how to make Python simply ... treat numeric cell values in a DataFrame whose labels in both directions are strings ... as a z-axis and make a contour-chart of it.
Everything I Google leads to really complicated data science nuanced questions.
Could someone get me on the right track by giving me the basic "hello world" code to get at least as far with luminance_df's data in Python as I got with the "insert chart" button in Excel?
If you can get me so I've got a img = BytesIO() that's image_base64 = base64.b64encode(img.read()).decode("utf-8")-able, I can f'<img src="data:image/png;base64, {image_base64}" />' it myself into the string concatenation that makes testout.html.
I'm on Windows and have myself set up to be able to pip install.
Notes
To be fair, I find these contour charts much more attractive and much easier to read than the one Excel made, but I'm fine with something sort of "brutish"-looking like the Excel version, as long as it makes "rising" & "falling" obvious and as long as it uses a ROYIGBV rainbow to indicate "less" vs. "more" (pet peeve of mine about the default Excel colors -- yes, I know, it's probably an accessibility thing):
While I'd like my chart's colors to follow a "rainbow" of sorts (because personally I find them easy to read), any "rainbow shading" on the chart should completely ignore the fact that the labels of the y-axis happen to describe colors. No correlation whatsoever. I'm simply plotting number facts between 16 and 98; colors of the chart should just indicate the change in "elevation" between those two extremes.
Effort so far
The only other "simple" question I've found so far that seems similar is Convert pandas DataFrame to a 3d graph using Index and Columns as X,Y and values as Z?, but this code didn't work for me at all, so I don't even know what it outputs, visually, so I have no idea if it's even relevant:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
lumX = luminance_df.columns
lumY = luminance_df.index
lumZ = luminance_df.values
fig = plt.figure()
ax = plt.axes(projection = '3d')
ax.contour3D(lumX,lumY,lumZ)
My script errored out with a message: ValueError: could not convert string to float: 'orange', and I don't know what I'm doing enough to accommodate for the fact that this answer seems to have been written around a presumption of numeric X- and Y-axis keys. (Also, it might not generate the type of chart I'm hoping for -- as I said, can't tell because it doesn't even execute and there's no visual sample in the answer.)
Dataset
Ready for pandas.DataFrame():
{"barelyvisible":{"orange":96,"gold":96,"yellow":95,"chartreuse":95,"green":97,"forest":96,"aqua":97,"teal":96,"lightblue":97,"blue":97,"indigo":97,"purple":98,"royalpurple":98,"magenta":98,"pink":97,"dustypink":97,"red":97},"ultralight":{"orange":92,"gold":89,"yellow":88,"chartreuse":89,"green":93,"forest":90,"aqua":89,"teal":90,"lightblue":94,"blue":93,"indigo":94,"purple":95,"royalpurple":95,"magenta":95,"pink":95,"dustypink":95,"red":94},"light":{"orange":83,"gold":77,"yellow":77,"chartreuse":80,"green":85,"forest":80,"aqua":78,"teal":82,"lightblue":86,"blue":87,"indigo":89,"purple":92,"royalpurple":92,"magenta":91,"pink":90,"dustypink":90,"red":89},"abitlight":{"orange":72,"gold":65,"yellow":64,"chartreuse":67,"green":73,"forest":67,"aqua":64,"teal":69,"lightblue":74,"blue":78,"indigo":82,"purple":85,"royalpurple":85,"magenta":83,"pink":82,"dustypink":82,"red":82},"medium":{"orange":61,"gold":56,"yellow":53,"chartreuse":55,"green":58,"forest":52,"aqua":50,"teal":53,"lightblue":60,"blue":68,"indigo":74,"purple":76,"royalpurple":75,"magenta":73,"pink":70,"dustypink":71,"red":71},"abitdark":{"orange":53,"gold":50,"yellow":47,"chartreuse":44,"green":45,"forest":39,"aqua":40,"teal":43,"lightblue":48,"blue":60,"indigo":67,"purple":66,"royalpurple":65,"magenta":61,"pink":60,"dustypink":60,"red":60},"dark":{"orange":48,"gold":44,"yellow":40,"chartreuse":35,"green":36,"forest":30,"aqua":32,"teal":36,"lightblue":39,"blue":53,"indigo":59,"purple":58,"royalpurple":56,"magenta":49,"pink":51,"dustypink":50,"red":51},"evendarker":{"orange":40,"gold":37,"yellow":33,"chartreuse":27,"green":29,"forest":24,"aqua":26,"teal":31,"lightblue":32,"blue":48,"indigo":51,"purple":50,"royalpurple":47,"magenta":40,"pink":42,"dustypink":41,"red":42},"ultradark":{"orange":34,"gold":31,"yellow":29,"chartreuse":23,"green":24,"forest":20,"aqua":22,"teal":27,"lightblue":27,"blue":40,"indigo":41,"purple":42,"royalpurple":39,"magenta":33,"pink":35,"dustypink":35,"red":35},"almostblack":{"orange":28,"gold":26,"yellow":26,"chartreuse":20,"green":20,"forest":16,"aqua":19,"teal":24,"lightblue":24,"blue":33,"indigo":34,"purple":35,"royalpurple":32,"magenta":28,"pink":30,"dustypink":30,"red":31}}
I believe you only need to do a countourf:
plt.contourf(df, cmap='RdYlBu')
plt.xticks(range(df.shape[1]), df.columns, rotation=90)
plt.yticks(range(df.shape[0]), df.index)
plt.show()
Output:
Or a heatmap:
import seaborn as sns
sns.heatmap(df, cmap='RdYlBu')
Output:

How to use use numpy random choice to get progressively longer sequences with the same numbers?

What I tried was this:
import numpy as np
def test_random(nr_selections, n, prob):
selected = np.random.choice(n, size=nr_selections, replace= False, p = prob)
print(str(nr_selections) + ': ' + str(selected))
n = 100
prob = np.random.choice(100, n)
prob = prob / np.sum(prob) #only for demonstration purpose
for i in np.arange(10, 100, 10):
np.random.seed(123)
test_random(i, n, prob)
The result was:
10: [68 32 25 54 72 45 96 67 49 40]
20: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 65 89 77]
30: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 62 86 60 35 37 8 48
52 47 31 92 95 56]
40: ...
Contrary to my expectation and hope, the 30 numbers selected do not contain all of the 20 numbers. I also tried using numpy.random.default_rng, but only strayed further away from my desired output. I also simplified the original problem somewhat in the above example. Any help would be greatly appreciated. Thank you!
Edit for clarification: I do not want to generate all the sequences in one loop (like in the example above) but rather use the related sequences in different runs of the same program. (Ideally, without storing them somewhere)

how to print unicode number series in python?

I am just trying to print the Unicode number ranging from 1 to 100 in python. I have searched a lot in StackOverflow but no question answers my queries.
So basically I want to print Bengali numbers from ১ to ১০০. The corresponding English number is 1 to 100.
What I have tried is to get the Unicode number of ১ which is '\u09E7'. Then I have tried to increase this number by 1 as depicted in the following code:
x = '\u09E7'
print(x+1)
But the above code says to me the following output.
TypeError: can only concatenate str (not "int") to str
So what I want is to get a number series as following:
১, ২, ৩, ৪, ৫, ৬, ৭, ৮, ৯, ১০, ১১, ১২, ১৩, ............, ১০০
TypeError: can only concatenate str (not "int") to str1
I wish if there is any solution to this. Thank you.
Make a translation table. The function str.maketrans() takes a string of characters and a string of replacements and builds a translation dictionary of Unicode ordinals to Unicode ordinals. Then, convert a counter variable to a string and use the translate() function on the result to convert the string:
#coding:utf8
xlat = str.maketrans('0123456789','০১২৩৪৫৬৭৮৯')
for i in range(1,101):
print(f'{i:3d} {str(i).translate(xlat)}',end=' ')
Output:
1 ১ 2 ২ 3 ৩ 4 ৪ 5 ৫ 6 ৬ 7 ৭ 8 ৮ 9 ৯ 10 ১০ 11 ১১ 12 ১২ 13 ১৩ 14 ১৪ 15 ১৫ 16 ১৬ 17 ১৭ 18 ১৮ 19 ১৯ 20 ২০ 21 ২১ 22 ২২ 23 ২৩ 24 ২৪ 25 ২৫ 26 ২৬ 27 ২৭ 28 ২৮ 29 ২৯ 30 ৩০ 31 ৩১ 32 ৩২ 33 ৩৩ 34 ৩৪ 35 ৩৫ 36 ৩৬ 37 ৩৭ 38 ৩৮ 39 ৩৯ 40 ৪০ 41 ৪১ 42 ৪২ 43 ৪৩ 44 ৪৪ 45 ৪৫ 46 ৪৬ 47 ৪৭ 48 ৪৮ 49 ৪৯ 50 ৫০ 51 ৫১ 52 ৫২ 53 ৫৩ 54 ৫৪ 55 ৫৫ 56 ৫৬ 57 ৫৭ 58 ৫৮ 59 ৫৯ 60 ৬০ 61 ৬১ 62 ৬২ 63 ৬৩ 64 ৬৪ 65 ৬৫ 66 ৬৬ 67 ৬৭ 68 ৬৮ 69 ৬৯ 70 ৭০ 71 ৭১ 72 ৭২ 73 ৭৩ 74 ৭৪ 75 ৭৫ 76 ৭৬ 77 ৭৭ 78 ৭৮ 79 ৭৯ 80 ৮০ 81 ৮১ 82 ৮২ 83 ৮৩ 84 ৮৪ 85 ৮৫ 86 ৮৬ 87 ৮৭ 88 ৮৮ 89 ৮৯ 90 ৯০ 91 ৯১ 92 ৯২ 93 ৯৩ 94 ৯৪ 95 ৯৫ 96 ৯৬ 97 ৯৭ 98 ৯৮ 99 ৯৯ 100 ১০০
You can try this. Convert the character to an integer. Do the addition and the convert it to character again. If the number is bigger than 10 you have to convert both digits to characters that's why we are using modulo %.
if num < 10:
x = ord('\u09E6')
print(chr(x+num))
elif num < 100:
mod = num % 10
num = int((num -mod) / 10)
x = ord('\u09E6')
print(''.join([chr(x+num), chr(x+mod)]))
else:
x = ord('\u09E6')
print(''.join([chr(x+1), '\u09E6', '\u09E6']))
You can try running it here
https://repl.it/repls/GloomyBewitchedMultitasking
EDIT:
Providing also javascript code as asked in comments.
function getAsciiNum(num){
zero = "০".charCodeAt(0)
if (num < 10){
return(String.fromCharCode(zero+num))
}
else if (num < 100) {
mod = num % 10
num = Math.floor((num -mod) / 10)
return(String.fromCharCode(zero+num) + String.fromCharCode(zero+mod))
}
else {
return(String.fromCharCode(zero+1) + "০০")
}
}
console.log(getAsciiNum(88))

Calculating average/standard deviations of rows containing certain string in pandas dataframe

I have a large pandas dataframe read as table. I would like to calculate the means and standard deviations of the two different groups, CRPS and Age, so I can plot them in a bar plot with std deviations as the error bars.
I can get the mean calculated by just the Age column. I figured it's a for loop that I have to construct, but I don't know how to construct further than table["Age"].mean(), which just gives me the average of all data points' age values. This is where I need some guidance. I want to look in the group column, tell it to calculate the average and standard deviation for the ages of that group. So, an average and standard deviation value for the ages of the CRPS group, for example.
I have the first 25 rows down below just to show what the dataframe looks like. I also have imported numpy as np as well.
Group Age
0 CRPS 50
1 CRPS 59
2 CRPS 22
3 CRPS 48
4 CRPS 53
5 CRPS 48
6 CRPS 29
7 CRPS 44
8 CRPS 28
9 CRPS 42
10 CRPS 35
11 CONTROLS 54
12 CONTROLS 43
13 CRPS 50
14 CRPS 62
15 CONTROLS 64
16 CONTROLS 39
17 CRPS 40
18 CRPS 59
19 CRPS 46
20 CONTROLS 56
21 CRPS 21
22 CRPS 45
23 CONTROLS 41
24 CRPS 46
25 CONTROLS 35
I don't think you need a for-loop.
Instead, you might try something like:
table.iloc[table['Group'] == 'CRPS']['Age'].mean()
I haven't tested with your table, but I think that will work.
The idea is to first create a boolean array, which is true for row indices where the group field contains 'CRPS', then to select all of those rows using iloc, and finally to take the mean. You could iterate over all of the groups in the following way:
mean_age = dict()
for group in set(table['Group']):
mean_age[group] = table.iloc[table['Group'] == group]['Age'].mean()
Maybe this is where you intended to use a for loop.

seek a better design suggestion for a trial-and-error mechanism in python?

See below data matrix get from sensors, just INT numbers, nothing specical.
A B C D E F G H I J K
1 25 0 25 66 41 47 40 12 69 76 1
2 17 23 73 97 99 39 84 26 0 44 45
3 34 15 55 4 77 2 96 92 22 18 71
4 85 4 71 99 66 42 28 41 27 39 75
5 65 27 28 95 82 56 23 44 97 42 38
…
10 95 13 4 10 50 78 4 52 51 86 20
11 71 12 32 9 2 41 41 23 31 70
12 54 31 68 78 55 19 56 99 67 34 94
13 47 68 79 66 10 23 67 42 16 11 96
14 25 12 88 45 71 87 53 21 96 34 41
The horizontal A to K is the sensor name, and vertical is the data from sensor by the timer manner.
Now I want to analysis those data with trial-and-error methods, I defined some concepts to explain what I want:
o source
source is all the raw data I get
o entry
a entry is a set of all A to K sensor, take the vertical 1st row for example: the entry is
25 0 25 66 41 47 40 12 69 76 1
o rules
a rule is a "suppose" function with assert value return, so far just "true" or "false".
For example, I suppose the sensor A, E and F value will never be same in one enrty, if one entry with A=E=F, it will tigger violation and this rule function will return false.
o range:
a range is function for selecting vertical entry, for example, the first 5 entries
Then, the basic idea is:
o source + range = subsource(s)
o subsource + rules = valiation(s)
The finally I want to get a list may looks like this:
rangeID ruleID violation
1 1 Y
2 1 N
3 1 Y
1 2 N
2 2 N
3 2 Y
1 3 N
2 3 Y
3 3 Y
But the problem is the rule and range I defined here will getting very complicated soon if you looks deeper, they have too much possible combinations, take "A=E=F" for example, one can define "B=E=F","C=E=F","C>F" ......
So soon I need a rule/range generator which may accept those "core parameters" such as "A=E=F" as input parameter even using regex string later. That is too complicated just defeated me, leave alone I may need to persistence rules unique ID, data storage problem, rules self nest combination problem ......
So my questions are:
Anyone knows if there's some module/soft fit for this kind of trial-and-error calculation or the rules defination I want?
Anyone can share me a better rules/range design I described?
Thanks for any hints.
Rgs,
KC
If I understand what you're asking correctly, I probably wouldn't even venture down the Numbpy path as I don't think given your description that it's really required. Here's a sample implementation of how I might go about solving the specific issue that you presented:
l = [\
{'a':25, 'b':0, 'c':25, 'd':66, 'e':41, 'f':47, 'g':40, 'h':12, 'i':69, 'j':76, 'k':1},\
{'a':25, 'b':0, 'c':25, 'd':66, 'e':41, 'f':47, 'g':40, 'h':12, 'i':69, 'j':76, 'k':1}\
]
r = ['a=g=i', 'a=b', 'a=c']
res = []
# test all given rules
for n in range(0, len(r)):
# i'm assuming equality here - you'd have to change this to accept other operators if needed
c = r[n].split('=')
vals = []
# build up a list of values given our current rule
for e in c:
vals.append(l[0][e])
# using len(set(v)) gives us the number of distinct values
res.append({'rangeID': 0, 'ruleID':n, 'violation':'Y' if len(set(vals)) == 1 else 'N'})
print res
Output:
[{'violation': 'N', 'ruleID': 0, 'rangeID': 0}, {'violation': 'N', 'ruleID': 1, 'rangeID': 0}, {'violation': 'Y', 'ruleID': 2, 'rangeID': 0}]
http://ideone.com/zbTZr
There are a few assumptions made here (such as equality being the only operator in use in your rules) and some functionality left out (such as parsing your input to the list of dicts I used, but I'm hopeful that you can figure that out on your own.
Of course, there could be a Numpy-based solution that's simpler than this that I'm just not thinking of at the moment (it's late and I'm going to bed now ;)), but hopefully this helps you out anyway.
Edit:
Woops, missed something else (forgot to add it in prior to posting) - I only test the first element in l (the given range).. You'd just want to stick that in another for loop rather than using that hard-coded 0 index.
You want to look at Numpy matrix for data structures like matrix etc. It exposes a list of functions that work on matrix manipulation.
As for rule / range generator I am afraid you will have to build your own domain specific language to achieve that.

Categories

Resources