Writing a loop with an integer in python - python

I have a dataframe as such:
data = [[0xD8E3ED, 2043441], [0xF7F4EB, 912788],[0x000000,6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
I am attempting to loop through c_code to get an integer value. The following code works to obtain the integer
hex_val = '0xFF9B3B'
print(int(hex_val, 0))
16751419
But when I try to loop through the column I run into an issue. I currently have this running but am just overwriting every value.
for i in range(len(df)):
df['value'] = int((df['c_code'].iloc[i]), 0)
Ideal output would be a df with a value column that reflects the value of the c_code. The image below shows the desired format but notice that the value is the same for all rows. I believe that I need to append rows but I am unsure of how to do that

I believe that you can modify the type of your column c_code and assign this to a new column.
import pandas as pd
data = [['0xD8E3ED', 2043441], ['0xF7F4EB', 912788],['0x000000',6169]]
df = pd.DataFrame(data, columns=['c_code', 'occurence'])
df['value'] = df['c_code'].apply(int, base=16)
Also, I had to put the hexadecimal numbers as strings, if not pandas converts them to int directly.
I get this result:

You are assigning the entire column to a new value at each step in the loop
df["value"] = ...
To specify a row you need to change it to df["value"][i] = ...
However, You shouldn't have to loop through each value in Pandas.
try:
df["value"] = int(df["c_code"], 0)

Related

How to add a value inside an numpy array to a python dictionary

i'm trying to loop through a pandas dataframe's columns (which consists of 1's and 0's) to groupby and sum another column then add the groupby column name to an empty dictionary as a key and the summed value as the value. But my current code adds an array as the value instead of the actual value. Here is some sample code below.
import pandas
sample_dict = {'flag1':[0,1,1,1,1,0],
'flag2':[1,1,1,0,0,1],
'flag3':[0,0,0,0,0,1],
'flag4':[1,1,1,1,0,0],
'flag5':[1,0,1,0,1,0],
'dollars':[100,200,300,400,500,600]}
sample_df = pd.DataFrame(sample_dict)
ecols = sample_df.columns[:5]
rate = .46
empty_dict = {}
for i in ecols:
df= sample_df[sample_df[i] == 1]
yield1 = df.groupby(i)['dollars'].sum().values*rate
empty_dict[i] = yield1
empty_dict
That code yields the following output:
Out[223]:
{'flag1': array([644.]),
'flag2': array([552.]),
'flag3': array([276.]),
'flag4': array([460.]),
'flag5': array([414.])}
I would just like to have the actual integer as the value and not the array.
You consistently get an array of one single element: just take its first element (if it has one):
...
empty_dict[i] = yield1[0] if len(yield) >=1 else np.nan
...

Making a new column based on 2 other columns

I am trying to calculate a new column labeled in the code as "Sulphide-S(calc)-C_%S", this column can be calculated from one of two options (see below in the code). Both these columns wont be filled at the same time. So I want it to calculate from the column that has data present. Presently, I have this but the second equation overwrites the first.
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()
You can use the apply function in pandas to create a new column based on other columns, resulting in a Series that you can add to your original dataframe. Without knowing what your dataframe looks like, the following code might not work directly until you replace the if condition with a working condition to detect the empty dataframe spot.
def create_sulfide_col(row):
if row["Sulphate-S(HCL Leachable)_%S"] is None:
val = row["Total-S_%S"] - row["Sulphate-S(HCL Leachable)_%S"]
else:
val = ["Total-S_%S"]- df["Sulphate-S_%S"]
return val
df["Sulphide-S(calc)-C_%S"] = df.apply(lambda row: create_sulfide_col(row), axis='columns')
If I'm understanding what you're saying correctly, the second equation overwrites the first because they have the same column name. Try changing the column name in one or both of the "Sulphide-S(calc)-C_%S" to something else like "Sulphide-S(calc)-C_%S_A" and "Sulphide-S(calc)-C_%S_B":
df["Sulphide-S(calc)-C_%S_A"] = df["Total-S_%S"] - df["Sulphate-S(HCL Leachable)_%S"]
df.head()
df["Sulphide-S(calc)-C_%S_B"] = df["Total-S_%S"]- df["Sulphate-S_%S"]
df.head()

I'm using Pandas in Python and wanted to know how to split a value in a column and search that value in the column

Normally when splitting a value which is a string, one would simply do:
string = 'aabbcc'
small = string[0:2]
And that's simply it. I thought it would be the same thing for a dataframe by doing:
df = df['Column'][Range][Length of Desired value]
df = df['Date'][0:4][2:4]
Note: Every string in the column have the same length and are all integers written as a string data type
If I use the code above the program just throws the Range and takes [2:4] as the range which is weird.
When doing this individually it works:
df2 = df['Column'][index][2:4]
So right now I had to make a loop that goes one by one and append it to a new Dataframe.
To do the operation element wise, you can use apply (see link):
df['Column'][0:4].apply(lambda x : x[2:4])
When you did df2 = df['Column'][0:4][2:4], you are doing the same as df2 = df['Column'][2:4].
You're getting the indexes 0 to 4 of df and then getting the indexes 2 to 4 of that one.

Store Value From df to Variable

I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1

Reading values from Pandas dataframe rows into equations and entering result back into dataframe

I have a dataframe. For each row of the dataframe: I need to read values from two column indexes, pass these values to a set of equations, enter the result of each equation into its own column index in the same row, go to the next row and repeat.
After reading the responses to similar questions I tried:
import pandas as pd
DF = pd.read_csv("...")
Equation_1 = f(x, y)
Equation_2 = g(x, y)
for index, row in DF.iterrows():
a = DF[m]
b = DF[n]
DF[p] = Equation_1(a, b)
DF[q] = Equation_2(a, b)
Rather than iterating over DF, reading and entering new values for each row, this codes iterates over DF and enters the same values for each row. I am not sure what I am doing wrong here.
Also, from what I have read it is actually faster to treat the DF as a NumPy array and perform the calculation over the entire array at once rather than iterating. Not sure how I would go about this.
Thanks.
Turns out that this is extremely easy. All that must be done is to define two variables and assign the desired columns to them. Then set "the row to be replaced" equivalent to the equation containing the variables.
Pandas already knows that it must apply the equation to every row and return each value to its proper index. I didn't realize it would be this easy and was looking for more explicit code.
e.g.,
import pandas as pd
df = pd.read_csv("...") # df is a large 2D array
A = df[0]
B = df[1]
f(A,B) = ....
df[3] = f(A,B)
# If your equations are simple enough, do operations column-wise in Pandas:
import pandas as pd
test = pd.DataFrame([[1,2],[3,4],[5,6]])
test # Default column names are 0, 1
test[0] # This is column 0
test.icol(0) # This is COLUMN 0-indexed, returned as a Series
test.columns=(['S','Q']) # Column names are easier to use
test #Column names! Use them column-wise:
test['result'] = test.S**2 + test.Q
test # results stored in DataFrame
# For more complicated stuff, try apply, as in Python pandas apply on more columns :
def toyfun(df):
return df[0]-df[1]**2
test['out2']=test[['S','Q']].apply(toyfun, axis=1)
# You can also define the column names when you generate the DataFrame:
test2 = pd.DataFrame([[1,2],[3,4],[5,6]],columns = (list('AB')))

Categories

Resources