Generating different experimental data sets [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to generate 5 experimental data sets for my dataset (heart.csv).
https://www.kaggle.com/ronitf/heart-disease-uci. Then for each experimental data set, I conduct 5 fold cross-validation.

Let's say you have read all the data into a dataframe called df, then try this:
import numpy as np
#Shuffle data
df_shuffled_1 = df.sample(frac=1)
df_shuffled_2 = df_shuffled_1.sample(frac=1)
...

Related

Alternatives to Write isnull().sum()? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 19 hours ago.
Improve this question
import pandas as pd
Data = pd.open_csv('file name')
Data = isnull().sum()
I know that here we are testing each cell if it were empty or not; then, return the summation of empty ones.
Is there any other ways to re-write the statement?
Can we store the True results in a data structure, add them using a loop, and return the result?

Calculate the 3 sigma value using python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have one data frame in python. How can I calculate the 3sigma value for each column of my data frame using python? please help me with this.
The command you're looking for is df.std() * 3

I'm trying to make an array work with user inputs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I would like to make a number array with numpy based on user inputs then find the mean median and mode of this array.
Here you go:
import numpy as np
array = np.asarray([int(i) for i in input().split()])
print(array.mean())
print(array.mode())
print(array.median())

How to shuffle segments of data without changing the order in the segments? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to shuffle each 5 rows together without changing the order in that group.
Pulling from: https://stackoverflow.com/a/44729807/7253453
You can achieve this by
import random
n = 5 #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]
random.shuffle(list_df)
df = pd.concat(list_df)

statsmodels data update frequency [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
consider a vector with hourly observations, the data is updated every 12 hours.
with R, I could do
ts(vector_with_R, frequency=12)
In statsmodels "freq" controls the units for the time series not the "data window". How can I change the "data window".
For this purpose you need pandas - most popular python package for working with timeseries and another analytic data. Statsmodels work with it:
import pandas as pd
t=pd.TimeSeries(range(10), index=pd.date_range(start='2010-10-10 06:00:00', periods=10, freq='H'))

Categories

Resources