Dataframe with fixed length (over writing)

Dataframe with fixed length (over writing) - python

I write a code that generates a mass amount of data in each round. So, I need to only store data for the last 10 rounds. How can I create a dataframe which erases the oldest object when I add a need object (over-writing)? The order of observations -from old to new- should be maintained. Is there any simple function or data format to do this?
Thanks in advance!

You could use this function:
def ins(arr, item):
if len(arr) < 10:
arr.insert(0, item)
else:
arr.pop()
arr.insert(0, item)
ex = [1, 2, 3, 4, 5, 6, 7, 8, 9]
ins(ex, 'a')
print(ex)
# ['a', 1, 2, 3, 4, 5, 6, 7, 8, 9]
ins(ex, 'b')
print(ex)
# ['b', 'a', 1, 2, 3, 4, 5, 6, 7, 8]
In order for this to work you MUST pass a list as argument to the function ins(), so that the new item is inserted and the 10th is removed (if there is one).
(I considered that the question is not pandas specific, but rather a way to store a maximum amount of items in an array)

Related

adding rows based on values of other rows

I have a list (in a dataframe) that looks like this:
oddnum = [1, 3, 5, 7, 9, 11, 23]
I want to create a new list that looks like this:
newlist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23]
I want to test if the distance between two numbers is 2 (if oddnum[index+1]-oddnum[index] == 2)
If the distance is 2, then I want to add the number following oddnum[index] and create a new list (oddnum[index] + 1)
If the distance is greater than two, keep the list as is
I keep getting key error because (I think) the list runs out of [index] and [index+1] no longer exists once it reaches the end of the list. How do I do this?

To pass errors, the best method is to use try and except conditions. Here's my code:
oddnum = [1, 3, 5, 7, 9, 11, 23]
res = [] # The new list
for i in range(len(oddnum)):
res.append(oddnum[i]) # Append the first value by default
try: # Tries to run the code
if oddnum[i] + 2 == oddnum[i+1]: res.append(oddnum[i]+1) # Appends if the condition is met
except: pass # Passes on exception (in our case KeyError)
print(res)

oddnum = [1, 3, 5, 7, 9, 11, 23]
new_list = []
for pos, num in enumerate(oddnum):
new_list.append(num)
try:
if num-oddnum[pos+1] in [2, -2]:
new_list.append(num+1)
except:
pass
print(new_list)
Use try: except: to prevent exceptions popping up and ignore it

How to (log) transform *args arguments without losing structure

I am attempting to apply statistical tests to some datasets with variable numbers of groups. This causes a problem when I try to perform a log transformation for said groups while maintaining the ability to perform the test function (in this case scipy's kruskal()), which takes a variable number of arguments, one for each group of data.
The code below is an idea of what I want. Naturally stats.kruskal([np.log(i) for i in args]) does not work, as kruskal() does not expect a list of arrays, but one argument for each array. How do I perform log transformation (or any kind of alteration, really), while still being able to use the function?
import scipy.stats as stats
import numpy as np
def t(*args):
test = stats.kruskal([np.log(i) for i in args])
return test
a = [11, 12, 4, 42, 12, 1, 21, 12, 6]
b = [1, 12, 4, 3, 14, 8, 8, 6]
c = [2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8]
print(t(a, b, c))

IIUC, * in front of the list you are forming while calling kruskal should do the trick:
test = stats.kruskal(*[np.log(i) for i in args])
Asterisk unpacks the list and passes each entry of the list as arguments to the function being called i.e. kruskal here.

The global variable declared in the class in Python

I have this question for quit a while and would like to make sure that I understand it correctly. I am now working on a question on algorithm
Kth Largest Number in a Stream
Design a class to efficiently find the Kth largest element in a stream of numbers.
The class should have the following two things:
The constructor of the class should accept an integer array containing initial numbers from the stream and an integer K.
The class should expose a function add(int num) which will store the given number and return the Kth largest number.
The result is:
from heapq import *
class KthLargestNumberInStream:
# minHeap = []
def __init__(self, _input, _k):
self.k = _k
# the update minHeap will keep in the class and won't get cleared
self.minHeap = []
# rather than assigning values to input
# call the add function to add
for num in _input:
self.add(num)
def add(self, num):
# minHeap is defined outside this function and within the class
# need to use the self.minHeap to call it
heappush(self.minHeap, num)
# return the top k
if len(self.minHeap) > self.k:
heappop(self.minHeap)
# print(self.minHeap)
return self.minHeap[0]
def main():
kthLargestNumber = KthLargestNumberInStream([3, 1, 5, 12, 2, 11], 4)
print("4th largest number is: " + str(kthLargestNumber.add(6)))
print("4th largest number is: " + str(kthLargestNumber.add(13)))
print("4th largest number is: " + str(kthLargestNumber.add(4)))
main()
The print out the all the elements visited in the min heap is as:
[3]
[1, 3]
[1, 3, 5]
[1, 3, 5, 12]
[1, 2, 5, 12, 3]
[1, 2, 5, 12, 3, 11]
[1, 2, 5, 12, 3, 11, 6]
4th largest number is: 1
[1, 2, 5, 12, 3, 11, 6, 13]
4th largest number is: 1
[1, 2, 5, 4, 3, 11, 6, 13, 12]
4th largest number is: 1
I am curious that each time we called the kthLargestNumber = KthLargestNumberInStream([3, 1, 5, 12, 2, 11], 4), and why it won't create a new and empty heap and then add values to it, but keep the element from the previous call of the add function.
However, in this question Anther question, the max_sum will get reset each time.
Thanks for your help in advance.

Printing top n distinct values of a list

I want to print the top 10 distinct elements from a list:
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
for i in range(0,top):
if test[i]==1:
top=top+1
else:
print(test[i])
It is printing:
2,3,4,5,6,7,8
I am expecting:
2,3,4,5,6,7,8,9,10,11
What I am missing?

Using numpy
import numpy as np
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
test=np.unique(np.array(test))
test[test!=1][:top]
Output
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

Since you code only executes the loop for 10 times and the first 3 are used to ignore 1, so only the following 3 is printed, which is exactly happened here.
If you want to print the top 10 distinct value, I recommand you to do this:
# The code of unique is taken from [remove duplicates in list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists)
def unique(l):
return list(set(l))
def print_top_unique(List, top):
ulist = unique(List)
for i in range(0, top):
print(ulist[i])
print_top_unique([1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 10)

My Solution
test = [1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
uniqueList = [num for num in set(test)] #creates a list of unique characters [1,2,3,4,5,6,7,8,9,10,11,12,13]
for num in range(0,11):
if uniqueList[num] != 1: #skips one, since you wanted to start with two
print(uniqueList[num])

How to define column headers when reading a csv file in Python

I have a comma separated value table that I want to read in Python. What I need to do is first tell Python not to skip the first row because that contains the headers. Then I need to tell it to read in the data as a list and not a string because I need to build an array out of the data and the first column is non-integer (row headers).
There are a total of 11 columns and 5 rows.
Here is the format of the table (except there are no row spaces):
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11
w0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w1 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w2 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
w3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Is there a way to do this? Any help is greatly appreciated!

You can use the csv module for this sort of thing. It will read in each row as a list of strings representing the different fields.
How exactly you'd want to use it depends on how you're going to process the data afterwards, but you might consider making a Reader object (from the csv.reader() function), calling next() on it once to get the first row, i.e. the headers, and then iterating over the remaining lines in a for loop.
r = csv.reader(...)
headers = r.next()
for fields in r:
# do stuff
If you're going to wind up putting the fields into a dict, you'd use DictReader instead (and that class will automatically take the field names from the first row, so you can just construct it an use it in a loop).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dataframe with fixed length (over writing) - python

Related

adding rows based on values of other rows

How to (log) transform *args arguments without losing structure

The global variable declared in the class in Python

Printing top n distinct values of a list

How to define column headers when reading a csv file in Python

Categories

Resources