If statement over pandas.series and append result to list - python

i'm trying to build a few list made from results. Could you tell me why this results is empty?
I'm not looking for sollution with numpy, that's why originally i'll create > 50 list, later save it to CSV.
df1 = pd.DataFrame(data={"Country":["USA","Germany","Russia","Poland"],
"Capital":["Washington","Berlin","Moscow","Warsaw"], "Region":
["America","Europe","Europe",'Europe']})
America = []
if (df1['Region']=='America').all():
America.append(df1)
print(America)

Your expression df1['Region']=='America' gives a so-called boolean mask (docs on boolean indexing). A boolean mask is a pandas Series of True and False whose index is lined up with the index of df1.
It's easy to get your expected values once you get used to boolean indexing:
df1[df1['Region']=='America']
Country Capital Region
0 USA Washington America
If you are interested in keeping entire rows, don't bother manually building a python list; that would complicate your work immensely compared to sticking to pandas. You can store the rows in a new DataFrame:
# Use df.copy() here so that changing America won't change df1
America = df1[df1['Region']=='America'].copy()
Why if (df1['Region']=='America').all(): didn't work
The Series.all() method checks whether all values in the Series are True. What you need to do here is to check each row for your condition df1['Region']=='America', and keep only those rows that match this condition (if I understand you correctly).

I'm not sure about what you want.
If you want to add the whole dataframe to the list if 'America' is in region :
for region in df1.Region :
if region == 'America':
America.append(df1)
If you want to add the element from each list wich is at same index than 'America' in 'Region' list :
count = 0
for region in df1.Region :
if region == 'America':
America.append(df1.Country[count])
America.append(df1.Capital[count])
count += 1
Does this answer the question ?

Related

Replace values for each group

I want to replace values in ['animal'] for each subid/group, based on a condition.
The values in animal column are numbers (0-3) and vary for each subid, so a where cond == 1 might look like [0,3] for one subid or [2,1] or [0,3] and the same goes for b.
for s in sids:
a = df[(df['subid']==s)&(df['cond'] == 1)]['animal'].unique()
b = df[(df['subid']==s)&(df['cond'] == 0)]['animal'].unique()
df["animal"].replace({a[0]: 0,a[1]:1,b[0]:2,b[1]:3})
The thing is I think the dataframe overwrites entirely each time and uses only the last iteration of the for loop instead of saving the appropriate values for each group.
I tried specifying the subid at the beginning like so, df[df['subid']==s["animal"].replace({a[0]:0,a[1]:1,b[0]:2,b[1]:3}) but it didn't work.
Any pointers are appreciated, thanks!

Why can't a DataFrame Index be used for data dictionary lookup in .apply()?

I have a DataFrame and a Dictionary. I want to assign values to a new column in the DataFrame based on the Dictionary.
ContinentDictionary = {'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Australia':'Australia',
'Argentina':'South America'}
c1 = pd.Series({'Size':'Large','Pi':6,'Pr':160})
c2 = pd.Series({'Size':'Small','Pi':9,'Pr':235})
c3 = pd.Series({'Size':'Large','Pi':12,'Pr':300})
Countries = pd.DataFrame([c1,c2,c3],index=['United States','Japan','United Kingdom'])
Countries.index.name='Country'
This gets the job done, assigning a continent to each country in the Countries DataFrame:
Countries['Continent'] = Countries.index.map(lambda x: ContinentDictionary[x])
This also works, but I need to set the index 'Country' as a column beforehand to make .apply work:
Countries['Continent'] = Countries.reset_index(inplace = True)
Countries['Continent'] = Countries.apply(lambda x: ContinentDictionary[x['Country']], axis=1)
I'd like to get a better understanding as to why these two approaches don't work and would be grateful for an explanation:
Countries['Continent'] = Countries.apply(lambda x: ContinentDictionary[x.index], axis=1)
Countries['Continent'] = ContinentDictionary[Countries.index]
Both give:
TypeError: ("unhashable type: 'Index'"
Of the two, I can imagine why #2 might not work but would still love to understand better.
It depends of version of pandas, in oldier versions is necessary add .get, also solution is simplify with remove lambda and pass only dictionary:
Countries['Continent'] = Countries.index.map(ContinentDictionary.get)
print (Countries)
Size Pi Pr Continent
United States Large 6 160 North America
Japan Small 9 235 Asia
United Kingdom Large 12 300 Europe
EDIT: From version pandas 0.23+ is possible use dictionary or Series for mapper:
Index.map() can now accept Series and dictionary input objects (GH12756, GH18482, GH18509).
This is in response to a few of your comments.
You wrote I thought axis=1 doesn't make me pass the entire index as the key but a single index instead? I'm not entirely sure I understand what you mean, so let me try to clarify something.
With axis=1, the function operates on rows. Each row is converted to a Series. Here is an example of what one of the rows looks like as a Series:
Size Large
Pi 6
Pr 160
Name: United States, dtype: object
When you call x.index, you're expecting to receive the index of row x in the DataFrame. In fact, you get the index of the Series x:
Index(['Size', 'Pi', 'Pr'], dtype='object')
So you're saying, I just cand use an "index" value as a key to a dictionary and there is no way to convert an index to a string? Index objects cannot be used as keys in a dictionary, or in any situation that requires a hashable object. You can convert them to strings, as I just did above.
As #Jezrael mentions in the comments, calling .apply() on a Series gets you the individual elements, which in this case do not have an index.

How do I get one column of an array into a new Array whilst applying a fancy indexing filter on it?

So basically I have an array, that consists of 14 Columns and 426 rows, every column represents one property of a dog and every row represents one dog, now I want to know the average heart frequency of an ill dog, the 14. column is the column that indicates whether the Dog is ill or not [0 = Healthy 1 = ill], the 8. row is the heart frequency. Now my problem is, that I don't know how I can get the 8. column out of the whole array and use the boolean filter on it
I am pretty new to Python. As I mentioned above I think that I know what I have to do [Use a fancy indexing filter] but I don't know how I can do this. I tried doing it while still being in the original Array but that didn't work out, so I thought I need to get the Infos into another one and use the Boolean filter on that one.
EDIT: Ok, so here is the code that I got right now:
import numpy as np
def average_heart_rate_for_pathologic_group(D):
a=np.array(D[:, 13]) #gets information, wether the dogs are sick or not
b=np.array(D[:, 7]) #gets the heartfrequency
R=(a >= 0) #gets all the values that are from sick dogs
amhr = np.mean(R) #calculates the average heartfrequency
return amhr
I think boolean indexing is the way foward.
The shortcuts for this work like:
#Your data:
data = [[0,1,2,3,4,5,6,7,8...],[..]...]
#This indexing chooses the rows in the 8th column that equals 1 and then their
#column number 14 values. Any analysis can be done after this on the new variable
heart_frequency_ill = data[data[:,7] == 1,13]
Probably you'll have to actually copy the data from the original array into a new one with the selected data.
Could you please share a sample with let's say 3 or 4 rows of your data?
I will give a try thought.
Let me build data with 4 columns here (but you could use 14 as in your problem)
data = [['c1a','c2a','c3a','c4a'], ['c1b','c2b','c3b','c4b']]
You could use numpy.array to get its nth column.
See how one can get the 2nd column:
import numpy as np
a = np.array(data)
a[:,2]
If you want to get the 8. Column of all the dogs that are healthy, you can do it the following:
# we use 7 for the column because the index starts by 0
# we use filter and fancy to get the rows where the conditions are true
# we use n.argwhere to get the indices where the conditions are true
A[np.argwhere([A[:,13] == 0])[:,1],7]
If you also want to compute the mean:
A[np.argwhere([A[:,13] == 0])[:,1],7].mean()

How to get the index of a value in a pandas series

What's the code to get the index of a value in a pandas series data structure?.
animals=pd.Series(['bear','dog','mammoth','python'],
index=['canada','germany','iran','brazil'])
What's the code to extract the index of "mammoth"?
You can just use boolean indexing:
In [8]: animals == 'mammoth'
Out[8]:
canada False
germany False
iran True
brazil False
dtype: bool
In [9]: animals[animals == 'mammoth'].index
Out[9]: Index(['iran'], dtype='object')
Note, indexes aren't necessarily unique for pandas data structures.
You have two options:
1) If you make sure that value is unique, or just want to get the first one, use the find function.
find(animals, 'mammoth') # retrieves index of first occurrence of value
2) If you would like to get all indices matching that value, as per #juanpa.arrivillaga 's post.
animals[animals == 'mammoth'].index # retrieves indices of all matching values
You can also index find any number occurrence of the value by treating the the above statement as a list:
animals[animas == 'mammoth'].index[1] #retrieves index of second occurrence of value.

df.set_index returns key error python pandas dataframe

I have this Pandas DataFrame and I have to convert some of the items into coordinates, (meaning they have to be floats) and it includes the indexes while trying to convert them into floats. So I tried to set the indexes to the first thing in the DataFrame but it doesn't work. I wonder if it has anything to do with the fact that it is a part of the whole DataFrame, only the section that is "Latitude" and "Longitude".
df = df_volc.iloc(axis = 0)[0:, 3:5]
df.set_index("hello", inplace = True, drop = True)
df
and I get the a really long error, but this is the last part of it:
KeyError: '34.50'
if I don't do the set_index part I get:
Latitude Longitude
0 34.50 131.60
1 -23.30 -67.62
2 14.50 -90.88
I just wanna know if its possible to get rid of the indexes or set them.
The parameter you need to pass to set_index() function is keys : column label or list of column labels / arrays. In your scenario, it seems like "hello" is not a column name.
I just wanna know if its possible to get rid of the indexes or set them.
It is possible to replace the 0, 1, 2 index with something else, though it doesn't sound like it's necessary for your end goal:
to convert some of the items into [...] floats
To achieve this, you could overwrite the existing values by using astype():
df['Latitude'] = df['Latitude'].astype('float')

Categories

Resources