Using pandas.DataFrame.at() in a for loop - python

pandas DataFrame I'm starting with:
pandas DataFrame I'm trying to build:
I'm very new to computer science so I wasn't quite sure how to word my question without providing images. Basically, I want to build a pandas DataFrame with one row that has columns with column names -3 to 3 and the values below are the maximum absolute values of the second column from the first pandas DataFrame in relation to the first column from the first pandas DataFrame.
I also have the same data in a list a shown here:
Here is what I've tried but I keep getting an error:

Here's the solution. Looping over the dataframe to get what you want seems overkill
import pandas as pd
df = pd.DataFrame([[-1,1],[-2,2],[-2,1],[-2,2],[-1,6],[-1,2],[-1,1],[1,-2],[2,-2],[1,-2],[2,-1],[6,-1],[2,-1],[1,-1]])
max = df.groupby(0)[1].max()
x = dict()
for i in range(-3,4):
try:
if y[i] < 0:
x[i] = z[i]
else:
x[i] = y[i]
except KeyError:
x[i] = 0
x = pd.DataFrame(x, index = [0])
which gives the result
-3 -2 -1 0 1 2 3
0 2 6 0 -2 -2 0
This results in a dataframe with a column for '0' - that should be easy to get rid of at any point

Related

How to keep n characters of each row of a pd df, where n differs by row?

I have created a df one column of which contains string values that I want to trim based on a different int value each time.
Ex.:
From:
length
String
-3
adcdef
-5
ghijkl
I wanna get:
length
String
-3
def
-5
hijkl
What I tried is the following:
for i in range(len(df.index)):
val = df['string'].iloc[i]
n = df['length'].iloc[i]
df['string'].iloc[i] = val[n:]
However, I keep getting this warning:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
Any ideas on how I can avoid getting it?
Thanks!
Try with apply:
df["String"] = df.apply(lambda x: x["String"][x["lenght"]:], axis=1)
>>> df
lenght String
0 -3 def
1 -5 hijkl

Select a range in Pandas based on the same value in 2 columns

I am trying to select a range within a data frame based on the values. I have logic for what i am trying to implement in excel and i just need to translate it into a python script. I need to return a range of rows from where the value in starting where Column A value is and ending where Column B has that same value. Example below:
index
A
B
output range
0
dsdfsdf
1
2
3
4
quwfi
5
dsdfsdf
0:5
6
quwfi
4:6
One thing to note the value in Column B will always be lower down the list than Column A
So far I have tried to just grab the index of Column A and put it on the row in output range for Column B using,
df['output range'] = np.where(df['B'] != "", (df.index[df['A'] == df.at[df['B']].value]))
This gives me a ValueError: Invalid call for scalar access (getting)!
Removing the np.where portion of it does not change the result
This should give you the required behavior:
df = pd.DataFrame({'A': ['dsdfsdf','','','','quwfi','',''],'B': ['','','','','','dsdfsdf','quwfi']})
def get_range(x):
if x != '':
first_index = df[df['A'] == x].index.values[0]
current_index = df[df['B'] ==x].index.values[0]
return f"{first_index}:{current_index}"
return ''
df['output range']= df['B'].apply(get_range)
df

Dask item assignment. Cannot use loc for item assignment

I have a folder of parquet files that I can't fit in memory so I am using dask to perform the data cleansing operations. I have a function where I want to perform item assignment but I can't seem to find any solutions online that qualify as solutions to this particular function. Below is the function that works in pandas. How do I get the same results in a dask dataframe? I thought delayed might help but all of the solutions I've tried to write haven't been working.
def item_assignment(df):
new_col = np.bitwise_and(df['OtherCol'], 0b110)
df['NewCol'] = 0
df.loc[new_col == 0b010, 'NewCol'] = 1
df.loc[new_col == 0b100, 'NewCol'] = -1
return df
TypeError: '_LocIndexer' object does not support item assignment
You can replace your loc assignments with dask.dataframe.Series.mask:
df['NewCol'] = 0
df['NewCol'] = df['NewCol'].mask(new_col == 0b010, 1)
df['NewCol'] = df['NewCol'].mask(new_col == 0b100, -1)
You can use map_partitions in this case where you can use raw pandas functionality. I.e.
ddf.map_partitions(item_assignment)
this operates on the individual pandas constituent dataframes of the dask dataframe
df = pd.DataFrame({"OtherCol":[0b010, 0b110, 0b100, 0b110, 0b100, 0b010]})
ddf = dd.from_pandas(df, npartitions=2)
ddf.map_partitions(item_assignment).compute()
And we see the result as expected:
OtherCol NewCol
0 2 1
1 6 0
2 4 -1
3 6 0
4 4 -1
5 2 1

Python, Pandas: Using isin() like functionality but do not ignore duplicates in input list

I am trying to filter an input dataframe (df_in) against a list of indices. The indices list contains duplicates and I want my output df_out to contain all occurrences of a particular index. As expected, isin() gives me only a single entry for every index.
How do I try and not ignore duplicates and get output similar to df_out_desired?
import pandas as pd
import numpy as np
df_in = pd.DataFrame(index=np.arange(4), data={'A':[1,2,3,4],'B':[10,20,30,40]})
indices_needed_list = pd.Series([1,2,3,3,3])
# In the output df, I do not particularly care about the 'index' from the df_in
df_out = df_in[df_in.index.isin(indices_needed_list)].reset_index()
# With isin, as expected, I only get a single entry for each occurence of index in indices_needed_list
# What I am trying to get is an output df that has many rows and occurences of df_in index as in the indices_needed_list
temp = df_out[df_out['index'] == 3]
# This is what I would like to try and get
df_out_desired = pd.concat([df_out, df_out[df_out['index']==3], df_out[df_out['index']==3]])
Thanks!
Check reindex
df_out_desired = df_in.reindex(indices_needed_list)
df_out_desired
Out[177]:
A B
1 2 20
2 3 30
3 4 40
3 4 40
3 4 40

Fill a dataframe column with empty arrays

I need to fill a pandas dataframe column with empty numpy arrays. I mean that any row has to be an empty array. Something like
df['ColumnName'] = np.empty(0,dtype=float)
but this don't work because it tries to use every value of the array and assign one value per row.
I tried then
for k in range(len(df)):
df['ColumnName'].iloc[k] = np.empty(0,dtype=float)
but still no luck. Any advice ?
You can repeat the np.empty into number of rows and then assign them to the column. Since it aint a scalar it cant be directly assigned like df['x'] = some_scalar.
df = pd.DataFrame({'a':[0,1,2]})
df['c'] = [np.empty(0,dtype=float)]*len(df)
Output :
a c
0 0 []
1 1 []
2 2 []
You can also use a simple comprehension
df = pd.DataFrame({'a':[0,1,2]})
df['c'] = [[] for i in range(len(df))]
Output
a c
0 0 []
1 1 []
2 2 []

Categories

Resources