Pad rows with no data as Na in Pandas Dataframe - python

I have an np array of timestamps:
ts = np.range(5)
In [34]: ts
Out[34]: array([0, 1, 2, 3, 4])
and I have a pandas DataFrame:
data = pd.DataFrame([10, 10, 10], index = [0,3,4])
In [33]: data
Out[33]:
0
0 10
3 10
4 10
The index of data is guaranteed to be a subset of ts. I want to generate the following data frame:
res:
0 10
1 nan
2 nan
3 10
4 10
So I want the index to be ts and the values to be from data. But for rows where timestamp doesn't exist in data, I want an NaN. How can I do this?

You are looking for the reindex function.
For example:
data.reindex(index=ts)
Output:
0
0 10
1 NaN
2 NaN
3 10
4 10

Related

Find row with nan value and delete it

I have a dataframe. This dataframe contains three cells id, horstid, date. The cell date has one NaN value. I want the below code what works with pandas, I want it with numpy.
First I want to transform my dataframe to a numpy array. After that I want is to find all rows where the date is NaN and print it. After that I want to remove all this rows. But how could I do this in numpy?
This is my dataframe
id horstid date
0 1 11 2008-09-24
1 2 22 NaN
2 3 33 2008-09-18
3 4 33 2008-10-24
This is my code. That works with fine, but with pandas.
d = {'id': [1, 2, 3, 4], 'horstid': [11, 22, 33, 33], 'date': ['2008-09-24', np.nan, '2008-09-18', '2008-10-24']}
df = pd.DataFrame(data=d)
df['date'].isna()
[OUT]
0 False
1 True
2 False
3 False
df.drop(df.index[df['date'].isna() == True])
[OUT]
id horstid date
0 1 11 2008-09-24
2 3 33 2008-09-18
3 4 33 2008-10-24
What I want is the above code without pandas but with numpy.
npArray = df.to_numpy()
date = npArray [:,2].astype(np.datetime64)
[OUT]
ValueError: Cannot create a NumPy datetime other than NaT with generic units
Here's a solution based on Numpy and pure python:
df = pd.DataFrame.from_dict(dict(horstid = [11, 22, 33, 33], id=[1,2,3,4], data=['2008-09-24', np.nan, '2008-09-18', '2008-10-24']))
a = df.values
index = list(map(lambda x: type(x) != type(1.),a[:, 2]))
print(a[index,:])
[[11 1 '2008-09-24']
[33 3 '2008-09-18']
[33 4 '2008-10-24']]

sum values in different rows and columns dataframe python

My Data Frame
A B C D
2 3 4 5
1 4 5 6
5 6 7 8
How do I add values of different rows and different columns
Column A Row 2 with Column B row 1
Column A Row 3 with Column B row 2
Similarly for all rows
If you only need do this with two columns (and I understand your question well), I think you can use the shift function.
Your data frame (pandas?) is something like:
d = {'A': [2, 1, 5], 'B': [3, 4, 6], 'C': [4, 5, 7], 'D':[5, 6, 8]}
df = pd.DataFrame(data=d)
So, it's possible to create a new data frame with B column shifted:
df2 = df['B'].shift(1)
which gives:
0 NaN
1 3.0
2 4.0
Name: B, dtype: float64
and then, merge this new data with the previous df and, for example, sum the values:
df = df.join(df2, rsuffix='shift')
df['out'] = df['A'] + df['Bshift']
The final output is in out column:
A B C D Bshift out
0 2 3 4 5 NaN NaN
1 1 4 5 6 3.0 4.0
2 5 6 7 8 4.0 9.0
But it's only an intuition, I'm not sure about your question!

pandas assign value based on mean

Let's say I have a dataframe column. I want to create a new column where the value for a given observation is 1 if the corresponding value in the old column is above average. But the value should be 0 if the value in the other column is average or below.
What's the fastest way of doing this?
Say you have the following DataFrame:
df = pd.DataFrame({'A': [1, 4, 6, 2, 8, 3, 7, 1, 5]})
df['A'].mean()
Out: 4.111111111111111
Comparison against the mean will get you a boolean vector. You can cast that to integer:
df['B'] = (df['A'] > df['A'].mean()).astype(int)
or use np.where:
df['B'] = np.where(df['A'] > df['A'].mean(), 1, 0)
df
Out:
A B
0 1 0
1 4 0
2 6 1
3 2 0
4 8 1
5 3 0
6 7 1
7 1 0
8 5 1

Adding a new column to a pandas dataframe with different number of rows

I'm not sure if pandas is made to do this... But I'd like to add a new row to my dataframe with more rows than the existing columns.
Minimal example:
import pandas as pd
df = pd.DataFrame()
df ['a'] = [0,1]
df ['b'] = [0,1,2]
Could someone please explain if this is possible? I'm using a dataframe to store long lists of data and they all have different lengths that I don't necessarily know at the start.
Absolutely possible. Use pd.concat
Demonstration
df1 = pd.DataFrame([[1, 2, 3]])
df2 = pd.DataFrame([[4, 5, 6, 7, 8, 9]])
pd.concat([df1, df2])
df1 looks like
0 1 2
0 1 2 3
df2 looks like
0 1 2 3 4 5
0 4 5 6 7 8 9
pd.concat looks like
0 1 2 3 4 5
0 1 2 3 NaN NaN NaN
0 4 5 6 7.0 8.0 9.0

pandas rearrange dataframe to have all values in ascending order per every column independently

The title should say it all, I want to turn this DataFrame:
A NaN 4 3
B 2 1 4
C 3 4 2
D 4 2 8
into this DataFrame:
A 2 1 2
B 3 2 3
C 4 4 4
D NaN 4 8
And I want to do it in a nice manner. The ugly solution would be to take every column and form a new DataFrame.
To test, use:
d = {'one':[None, 2, 3, 4],
'two':[4, 1, 4, 2],
'three':[3, 4, 6, 8],}
df = pd.DataFrame(d, index = list('ABCD'))
The desired sort ignores the index values, so the operation appears to be more
like a NumPy operation than a Pandas one:
import pandas as pd
d = {'one':[None, 2, 3, 4],
'two':[4, 1, 4, 2],
'three':[3, 4, 6, 8],}
df = pd.DataFrame(d, index = list('ABCD'))
# one three two
# A NaN 3 4
# B 2 4 1
# C 3 6 4
# D 4 8 2
arr = df.values
arr.sort(axis=0)
df = pd.DataFrame(arr, index=df.index, columns=df.columns)
print(df)
yields
one three two
A 2 3 1
B 3 4 2
C 4 6 4
D NaN 8 4

Categories

Resources