Change the values in a column using Pandas - python

I have this csv
name,age,count
will,12,2
joe,22,4
tim,34,10
I want to reset all the values in the table count to 0
name,age,count
will,12,0
joe,22,0
tim,34,0
To change the values
file = pd.read_csv("./users.csv")
df = pd.DataFrame(file)
df["count"].replace({ to_replace: value }, inplace=True)
Is there a way to do it without mentioning to_replace and setting the value directly?

In your case just assign back
df["count"] = 0

Related

Writing a loop with an integer in python

I have a dataframe as such:
data = [[0xD8E3ED, 2043441], [0xF7F4EB, 912788],[0x000000,6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
I am attempting to loop through c_code to get an integer value. The following code works to obtain the integer
hex_val = '0xFF9B3B'
print(int(hex_val, 0))
16751419
But when I try to loop through the column I run into an issue. I currently have this running but am just overwriting every value.
for i in range(len(df)):
df['value'] = int((df['c_code'].iloc[i]), 0)
Ideal output would be a df with a value column that reflects the value of the c_code. The image below shows the desired format but notice that the value is the same for all rows. I believe that I need to append rows but I am unsure of how to do that
I believe that you can modify the type of your column c_code and assign this to a new column.
import pandas as pd
data = [['0xD8E3ED', 2043441], ['0xF7F4EB', 912788],['0x000000',6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
df['value'] = df['c_code'].apply(int, base=16)
Also, I had to put the hexadecimal numbers as strings, if not pandas converts them to int directly.
I get this result:
You are assigning the entire column to a new value at each step in the loop
df["value"] = ...
To specify a row you need to change it to df["value"][i] = ...
However, You shouldn't have to loop through each value in Pandas.
try:
df["value"] = int(df["c_code"], 0)

How to check pandas column names and then append to row data efficiently?

I have a dataframe with several columns, some of which have names that match the keys in a dictionary. I want to append the value of the items in the dictionary to the non null values of the column whos name matches the key in said dictionary. Hopefully that isn't too confusing.
example:
realms = {}
realms['email'] = '<email>'
realms['android'] = '<androidID>'
df = pd.DataFrame()
df['email'] = ['foo#gmail.com','',foo#yahoo.com]
df['android'] = [1234567,,55533321]
how could I could I append '<email>' to 'foo#gmail.com' and 'foo#yahoo.com'
without appending to the empty string or null value too?
I'm trying to do this without using iteritems(), as I have about 200,000 records to apply this logic to.
expected output would be like 'foo#gmail.com<email>',,'foo#yahoo.com<email>'
for column in df.columns:
df[column] = df[column].astype(str) + realms[column]
>>> df
email android
0 foo#gmail.com<email> 1234567<androidID>
1 foo#yahoo.com<email> 55533321<androidID>

Pandas replace column values with a list

I have a dataframe df where some of the columns are strings and some are numeric. I am trying to convert all of them to numeric. So what I would like to do is something like this:
col = df.ix[:,i]
le = preprocessing.LabelEncoder()
le.fit(col)
newCol = le.transform(col)
df.ix[:,i] = newCol
but this does not work. Basically my question is how do I delete a column from a data frame then create a new column with the same name as the column I deleted when I do not know the column name, only the column index?
This should do it for you:
# Find the name of the column by index
n = df.columns[1]
# Drop that column
df.drop(n, axis = 1, inplace = True)
# Put whatever series you want in its place
df[n] = newCol
...where [1] can be whatever the index is, axis = 1 should not change.
This answers your question very literally where you asked to drop a column and then add one back in. But the reality is that there is no need to drop the column if you just replace it with newCol.
newcol = [..,..,.....]
df['colname'] = newcol
This will keep the colname intact while replacing its contents with newcol.

Add values to bottom of DataFrame automatically with Pandas

I'm initializing a DataFrame:
columns = ['Thing','Time']
df_new = pd.DataFrame(columns=columns)
and then writing values to it like this:
for t in df.Thing.unique():
df_temp = df[df['Thing'] == t] #filtering the df
df_new.loc[counter,'Thing'] = t #writing the filter value to df_new
df_new.loc[counter,'Time'] = dftemp['delta'].sum(axis=0) #summing and adding that value to the df_new
counter += 1 #increment the row index
Is there are better way to add new values to the dataframe each time without explicitly incrementing the row index with 'counter'?
If I'm interpreting this correctly, I think this can be done in one line:
newDf = df.groupby('Thing')['delta'].sum().reset_index()
By grouping by 'Thing', you have the various "t-filters" from your for-loop. We then apply a sum() to 'delta', but only within the various "t-filtered" groups. At this point, the dataframe has the various values of "t" as the indices, and the sums of the "t-filtered deltas" as a corresponding column. To get to your desired output, we then bump the "t's" into their own column via reset_index().

Read in data and set it to the index of a DataFrame with Pandas

I want to iterate through the rows of a DataFrame and assign values to a new DataFrame. I've accomplished that task indirectly like this:
#first I read the data from df1 and assign it to df2 if something happens
counter = 0 #line1
for index,row in df1.iterrows(): #line2
value = row['df1_col'] #line3
value2 = row['df1_col2'] #line4
#try unzipping a file (pseudo code)
df2.loc[counter,'df2_col'] = value #line5
counter += 1 #line6
#except
print("Error, could not unzip {}") #line7
#then I set the desired index for df2
df2 = df2.set_index(['df2_col']) #line7
Is there a way to assign the values to the index of df2 directly in line5? Sorry my original question was unclear. I'm creating an index based on the something happening.
There are a bunch of ways to do this. According to your code, all you've done is created an empty df2 dataframe with an index of values from df1.df1_col. You could do this directly like this:
df2 = pd.DataFrame([], df1.df1_col)
# ^ ^
# | |
# specifies no data, yet |
# defines the index
If you are concerned about having to filter df1 then you can do:
# cond is some boolean mask representing a condition to filter on.
# I'll make one up for you.
cond = df1.df1_col > 10
df2 = pd.DataFrame([], df1.loc[cond, 'df1_col'])
No need to iterate, you can do:
df2.index = df1['df1_col']
If you really want to iterate, save it to a list and set the index.

Categories

Resources