I have a dataframe of information. One column is a rank. I just want to find the row with a rank of 2 and get the another column item (called 'Name'.) I can find the row and get the name but its not a pure text item that I can add to other text. Its an object.
How do I just get the name as text?
Code:
print "The name of the 2nd best is: " + groupDF.loc[(DF['Rank']==2),'Name']
This gives me the id of the row and the Name. I just want the Name
This is what I get:
4 The name of the 2nd best is: Hawthorne
Name: CleanName, dtype: object
I just can't figure out what to search on to get the answer. I get lots of other stuff but not this answer.
Thanks in advance.
In a little bit more detail:
I understand you have a data frame of the kind:
names = ["Almond","Hawthorn","Peach"]
groupDF = pd.DataFrame({'Rank':[1,2,3],'Name':names})
groupDF.loc[(groupDF['Rank']==2),'Name'] gives you a Series object. If the rank is unique then either of the following two possibilities works
groupDF.loc[(groupDF['Rank']==2),'Name'].item()
or
groupDF.loc[(groupDF['Rank']==2),'Name'].iloc[0]
result:
'Hawthorn'
If the rank is not unique, the second one still works and gives you the first hit, that is, the first element of the Series object created by the command.
You need to call the item() method of the resulting Series object.
Related
I have a dataframe that might look like this:
print(df_selection_names)
name
0 fatty red meat, like prime rib
0 grilled
I have another dataframe, df_everything, with columns called name, suggestion and a lot of other columns. I want to find all the rows in df_everything with a name value matching the name values from df_selection_names so that I can print the values for each name and suggestion pair, e.g., "suggestion1 is suggested for name1", "suggestion2 is suggested for name2", etc.
I've tried several ways to get cell values from a dataframe and searching for values within a row including
# number of items in df_selection_names = df_selection_names.shape[0]
# so, in other words, we are looping through all the items the user selected
for i in range(df_selection_names.shape[0]):
# get the cell value using at() function
# in 'name' column and i-1 row
sel = df_selection_names.at[i, 'name']
# this line finds the row 'sel' in df_everything
row = df_everything[df_everything['name'] == sel]
but everything I tried gives me ValueErrors. This post leads me to think I may be
way off, but I'm feeling pretty confused about everything at this point!
https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html?highlight=isin#pandas.Series.isin
df_everything[df_everything['name'].isin(df_selection_names["name"])]
I would like to create a column in dataframe having name of an array. For example, the name of array is "customer" then name of the column should be "cust_prop" (initial 4 letters from array's name). Is there any way to get it?
Your question is a bit unclear, but presuming that you are asking: how do i turn the string "customer" into "cust_prop", thats easy enough:
Str = "customer"
NewStr = Str[0:4] + "_prop"
you might need to some extra checking for shorter strings, but i dont know what the behaviour there would be that you want.
If you mean something else, please post some code examples of what you have tried.
You didn't really describe from where you get an array name, so I'll just assume you have it in a variable:
array_name = 'customer'
to slice only first four digit and use it:
new_col_name = f'{array_name[0:4]}_prop'
df[new_col_name] = 1
here I "created" a new column in existing dataframe df, and put value of 1 to the entire column. Instead, you can create a series with any value you want:
series = pd.Series(name=new_col_name, data=array_customer)
Here I created a series with the name as desired, and assumed you have an array_customer variable which holds the array
i'm a beginner using pandas to look at a csv. i'm using .iterrows() to see if a given record matches today's date, so far so good. however when calling (row.name) for a .csv with a column headed 'name' i get different output than if i rename the column and edit the (row."column-heading") to match. i can call it anything but "name" and get the right output. i tried (row.notthename) (row.fish) and (row.thisisodd) - which all worked fine - before coming here.
if the first colmumn in birthdays.csv is "name" and i call print(row.name) it returns "2". if the first column is "notthename" and i call print(row.notthename) it returns the relevant name. what gives? i don't understand why arbitrarily renaming the column and the function call is yielding different output?
eg case A: column named "name"
birthdays.csv:
name,email,year,month,day
a test name,test#email.com,1961,12,21
testerito,blagh#sdgdg.com,1985,02,23
testeroonie,sihgfdb#sidkghsb.com,2022,01,17
data = pandas.read_csv("birthdays.csv")
for (index, row) in data.iterrows():
if (dt.datetime.now()).month == row.month and (dt.datetime.now()).day == row.day:
print(row.name)
outputs "2"
whereas case B: column named "notthename"
data = pandas.read_csv("birthdays.csv")
for (index, row) in data.iterrows():
if (dt.datetime.now()).month == row.month and (dt.datetime.now()).day == row.day:
print(row.notthename)
outputs "testeroonie"
i'm missing something.... is there some special handling of "name" going on?
thanks for helping me learn!
This happens because DataFrame.iterrows returns a Series object, and the Series object has a built-in property called name. This is why using the object shortcut for column names, although convenient, can be dangerous. The dictionary notation doesn't have this issue:
print(row['name'])
I am parsing data row-wise, how can I update a data frame cell value in a loop (read a value, parse it, write it to another columnn)
I have tried the below code
data = pd.read_csv("MyNames.csv")
data["title"] = ""
i = 0
for row in data.iterrows():
name = (HumanName(data.iat[i,1]))
print(name)
data.ix['title',i] = name["title"]
i = i + 1
data.to_csv('out.csv')
I would expect the following
name = "Mr John Smith"
| Title
Mr John Smith | Mr
All help appreciated!
Edit: I realise that I might not need to iterate. If I could call the function for all rows in a column and dump the results into another column that would be easier - like a SQL update statement. Thanks
Assuming that HumanName is a function or whatever that takes in a string and returns a dict you want. not able to test this code from here, but you get the gist
data['title'] = data['name'].apply(lambda name: HumanName(name)['title'])
EDIT I used row[1] because of your data.iat[i,1] that index might actually need to be 0 instead of 1 not sure
You can try .apply
def name_parsing(name):
"This function parses the name anyway you want"""
return HumanName(name)['title']
# with .apply, the function will be applied to every item in the column
# the return will be a series. In this case, the series will be attributed to 'title' column
data['title'] = data['name'].apply(name_parsing)
Also, another option, as we're discussing bellow, is to persist an instance of HumanName in the dataframe, so if you need other information from it later you don't need to instantiate and parse the name again (string manipulation can be very slow on big dataframes).
If so, part of the solution would be to create a new column. After that you would get the ['title'] attribute from it:
# this line creates a HumanName instance column
data['HumanName'] = data['name'].apply(lambda x: HumanName(x))
# this lines gets the 'title' from the HumanName object and applies to a 'title' column
data['title'] = data['HumanName'].apply(lambda x: x['title'])
I have a dataset in a relational database format (linked by ID's over various .csv files).
I know that each data frame contains only one value of an ID, and I'd like to know the simplest way to extract values from that row.
What I'm doing now:
# the group has only one element
purchase_group = purchase_groups.get_group(user_id)
price = list(purchase_group['Column_name'])[0]
The third row is bothering me as it seems ugly, however I'm not sure what is the workaround. The grouping (I guess) assumes that there might be multiple values and returns a <class 'pandas.core.frame.DataFrame'> object, while I'd like just a row returned.
If you want just the value and not a df/series then call values and index the first element [0] so just:
price = purchase_group['Column_name'].values[0]
will work.
If purchase_group has single row then doing purchase_group = purchase_group.squeeze() would make it into a series so you could simply call purchase_group['Column_name'] to get your values
Late to the party here, but purchase_group['Column Name'].item() is now available and is cleaner than some other solutions
This method is intuitive; for example to get the first row (list from a list of lists) of values from the dataframe:
np.array(df)[0]