Pandas data frame index - python

if I have a Series
s = pd.Series(1, index=[1,2,3,5,6,9,10])
But, I need a standard index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], with index[4, 7, 8] values equal to zeros.
So I expect the updated series will be
s = pd.Series([1,1,1,0,1,1,0,0,1,1], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
How should I update the series?
Thank you in advance!

Try this:
s.reindex(range(1,s.index.max() + 1),fill_value=0)
Output:
1 1
2 1
3 1
4 0
5 1
6 1
7 0
8 0
9 1
10 1

Related

Is there any method to append test data with predicted data?

I have 1 random array of tested dataset like array=[[5, 6 ,7, 1], [5, 6 ,7, 4], [5, 6 ,7, 3]] and 1 array of predicted data like array_pred=[10, 3, 4] both with the equal length. Now I want to append this result like this in 1 res_array = [[5, 6 ,7, 1, 10], [5, 6 ,7, 4, 3], [5, 6 ,7, 3, 4]]. I don't know what to say it but I want this type of result in python. Actually I have to store it in a dataframe and then have to generate an excel file from this data. this is what I want. Is it possible??
Use numpy.vstack for join arrays, convert to Series and then to excel:
a = np.hstack((array, np.array(array_pred)[:, None]))
#thank you #Ch3steR
a = np.column_stack([array, array_pred])
print(a)
0 [5, 6, 7, 1, 10]
1 [5, 6, 7, 4, 3]
2 [5, 6, 7, 3, 4]
dtype: object
s = pd.Series(a.tolist())
print (s)
0 [5, 6, 7, 1, 10]
1 [5, 6, 7, 4, 3]
2 [5, 6, 7, 3, 4]
dtype: object
s.to_excel(file, index=False)
Or if need flatten values convert to DataFrame, Series and use concat:
df = pd.concat([pd.DataFrame(array), pd.Series(array_pred)], axis=1, ignore_index=True)
print(df)
0 1 2 3 4
0 5 6 7 1 10
1 5 6 7 4 3
2 5 6 7 3 4
And then:
df.to_excel(file, index=False)

how do I write code in accord with the tasks

a matrix is given:
1 2 3 4 5 6 7 8,
8 7 6 5 4 3 2 1,
2 3 4 5 6 7 8 9,
9 8 7 6 5 4 3 2,
1 3 5 7 9 7 5 3,
3 1 5 3 2 6 5 7,
1 7 5 9 7 3 1 5,
2 6 3 5 1 7 3 2.
Define a structure for storing the matrix.
Write code that swaps the first and last rows of the matrix.
Write the code for creating a matrix of any size, filled with zeros (the size is set via the console).
Write a code that will count how many times the number 3 occurs in the matrix.
I tried solving this but My teacher says the following code is wrong. Where is my mistake??
matr = [[1, 2, 3, 4, 5, 6, 7, 8],
[8, 7, 6, 5, 4, 3, 2, 1],
[2, 3, 4, 5, 6, 7, 8, 9],
[9, 8, 7, 6, 5, 4, 3, 2],
[1, 3, 5, 7, 9, 7, 5, 3],
[3, 1, 5, 3, 2, 6, 5, 7],
[1, 7, 5, 9, 7, 3, 1, 5],
[2, 6, 3, 5, 1, 7, 3, 2]]
def will_swap_first_and_last_rows(matr):
matr[len(matr) - 1], matr[0] = matr[0], matr[len(matr) - 1]
return matr
def will_craete_matrix_of_any_size_filled_with_zeros():
m = int(input('Enter the number of rows of the matrix '))
n = int(input('enter the number of columns of the matrix '))
return [[0] * m for i in range(n)]
def will_count_how_many_times_the_number_3_occurs_in_the_matrix(matr):
s = 0
for row in matr:
for elem in row:
if elem == 3:
s += 1
return s
print(*will_swap_first_and_last_rows(matr), sep='\n')
print(will_craete_matrix_of_any_size_filled_with_zeros())
print(will_count_how_many_times_the_number_3_occurs_in_the_matrix(matr))
Your code has rows (m) and columns (n) swapped. Do it like this:
return [[0] * n for i in range(m)]

Use of index in pandas DataFrame for groupby and aggregation

I want to aggregate a single column DataFrame and count the number of elements. However, I always end up with an empty DataFrame:
pd.DataFrame({"A":[1, 2, 3, 4, 5, 5, 5]}).groupby("A").count()
Out[46]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5]
If I add a second column, I get the desired result:
pd.DataFrame({"A":[1, 2, 3, 4, 5, 5, 5], "B":[1, 2, 3, 4, 5, 5, 5]}).groupby("A").count()
Out[45]:
B
A
1 1
2 1
3 1
4 1
5 3
Can you explain the reason for this?
Give this a shot:
import pandas as pd
print(pd.DataFrame({"A":[1, 2, 3, 4, 5, 5, 5]}).groupby("A")["A"].count())
prints
A
1 1
2 1
3 1
4 1
5 3
You have to add the grouped by column in your result:
import pandas as pd
pd.DataFrame({"A":[1, 2, 3, 4, 5, 5, 5]}).groupby("A").A.count()
Output:
A
1 1
2 1
3 1
4 1
5 3

Pandas drop duplicated values partially

I have a dataframe as
df=pd.DataFrame({'A':[1, 3, 3, 4, 5, 3, 3],
'B':[0, 2, 3, 4, 5, 6, 7],
'C':[7, 2, 2, 5, 7, 2, 2]})
I would like to drop the duplicated values from columns A and C. However, I want it to work partially.
If I use
df.drop_duplicates(subset=['A','C'], keep='first')
It will drop row 2, 5, 6. However, I only want to drop row 2 and 6. The desired results are like:
df=pd.DataFrame({'A':[1, 3, 4, 5, 3],
'B':[0, 2, 4, 5, 6],
'C':[7, 2, 5, 7, 2]})
Here's how you can do this, using shift:
df.loc[(df[["A", "C"]].shift() != df[["A", "C"]]).any(axis=1)].reset_index(drop=True)
Output:
A B C
0 1 0 7
1 3 2 2
2 4 4 5
3 5 5 7
4 3 6 2
This question is a nice reference.
You can just keep every second repetition of A, C pair:
df=df.loc[df.groupby(["A", "C"]).cumcount()%2==0]
Outputs:
A B C
0 1 0 7
1 3 2 2
3 4 4 5
4 5 5 7
5 3 6 2

Get top K values in pandas series include recurring values

I have a code snippet in python. It gets top K=5 values but don't increment the value of K if the value has already occurred.
For example upon giving [1, 3, 3, 5, 5, 6, 1, 4, 8, 9, 34, 66, 124] and K = 5, it should return
[1, 3, 3, 5, 5, 6, 1, 4]
Here if a value is repeating then it should not increment the value of K. Here is the Python code. But how can I do it in pandas Series?.
def get_top_K_uniques(K, nums):
ret = []
presense = defaultdict(bool)
counter = 0
for each in nums:
if not presense[each]:
presense[each] = True
counter+=1
ret.append(each)
if counter == K:
return ret
Thanks in advance.
Using Series.unique() and Series.isin()
nums = pd.Series([1, 3, 3, 5, 5, 6, 1, 4, 8, 9, 34, 66, 124])
uniq = nums.unique()[:5]
nums[nums.isin(uniq)]
Output
0 1
1 3
2 3
3 5
4 5
5 6
6 1
7 4
Using category
s[s.astype('category').cat.codes<4]
Out[153]:
0 1
1 3
2 3
3 5
4 5
6 1
7 4
dtype: int64

Categories

Resources