I have a pandas.core.series.Series object that looks like this:
6 7
8 9
18 19
35 36
42 43
I want to get a list of 3 randomly chosen numbers from this list. I tried following the advice here and tried
sampled_list = random.sample(df['ID'], 3)
with no luck. Any suggestions?
I think you're looking for:
df['ID'].sample(3).tolist()
Docs: Series.sample()
sampled_list = random.sample(list(df['ID']), 3)
Related
I am relatively new to Python and I've run in to a problem that I cannot seem to search my way out of. I have written a function to query a third-party API. The function runs as expected and retrieves the correct results. However, my dataframe display returns the results with my columns and rows transposed. I have used this same function before without issue. I know I can transpose them to the correct position, but since I want to use this function as part of a larger function it is important that the query return the values with the columns and rows as intended.
I've included my snippet below as well as the results and desired outcome.
import pandas as pd
def get_tiering():
df = vendorAPI.getportfoliocustomcolumns('prod').as_dataframe()
records = df.to_dict('records')
return {rec['Patient']: rec for rec in records}
tiersdf = pd.DataFrame(get_tiering())
print(tiersdf)
[6 rows x 198 columns]
[198 rows x 6 columns]
I am wondering if there is some DataFrame setting that I inadvertently changed? I am using Spyder version 2.2 with Python 3.9. Any guidance you can provide would be appreciated.
Thank you for your time.
Did you try tiersdf.T?
This is my sample df
p1 p2 p3
height 65 66 5
weight 62 22 6
age 32 55 8
bp 12 44 6
hr 2 8 3
and i got this after doing transform
height weight age bp hr
p1 65 62 32 12 2
p2 66 22 55 44 8
p3 5 6 8 6 3
I would suggest you try to change
records = df.to_dict('records')
return {rec['Patient']: rec for rec in records}
to
return df.transpose().to_dict('series')
Please, let me know if it works. Otherwise, please let us know the exact output of vendorAPI.getportfoliocustomcolumns('prod')
I have generated a dataframe containing all the possible two combinations of electrocardiogram (ECG) leads using itertools using the code below
source = [ 'I-s', 'II-s', 'III-s', 'aVR-s', 'aVL-s', 'aVF-s', 'V1-s', 'V2-s', 'V3-s', 'V4-s', 'V5-s', 'V6-s', 'V1Long-s', 'IILong-s', 'V5Long-s', 'Information-s' ]
target = [ 'I-t', 'II-t', 'III-t', 'aVR-t', 'aVL-t', 'aVF-t', 'V1-t', 'V2-t', 'V3-t', 'V4-t', 'V5-t', 'V6-t', 'V1Long-t', 'IILong-t', 'V5Long-t', 'Information-t' ]
from itertools import product
test = pd.DataFrame(list(product(source, target)), columns=['source', 'target'])
The test dataframe contains 256 rows/lines containing all the two possible combinations.
The value for each combination is zero as follows
test['value'] = 0
The test df looks like this:
I have another dataframe called diagramDF that contains the combinations where the value column is non-zero. The diagramDF is significanntly smaller than the test dataframe.
source target value
0 I-s II-t 137
1 II-s I-t 3
2 II-s III-t 81
3 II-s IILong-t 13
4 II-s V1-t 21
5 III-s II-t 3
6 III-s aVF-t 19
7 IILong-s II-t 13
8 IILong-s V1Long-t 353
9 V1-s aVL-t 11
10 V1Long-s IILong-t 175
11 V1Long-s V3-t 4
12 V1Long-s aVF-t 4
13 V2-s V3-t 8
14 V3-s V2-t 6
15 V3-s V6-t 2
16 V5-s aVR-t 5
17 V6-s III-t 4
18 aVF-s III-t 79
19 aVF-s V1Long-t 235
20 aVL-s I-t 1
21 aVL-s aVF-t 16
22 aVR-s aVL-t 1
Note that the first two columns source and target have the same notations
I have tried to replace the zero values of the test dataframe with the nonzero values of the diagramDF using merge like below:
df = pd.merge(test, diagramDF, how='left', on=['source', 'target'])
However, I get an error informing me that:
ValueError: The column label 'source' is not unique. For a
multi-index, the label must be a tuple with elements corresponding to
each level
Is there something that I am getting wrong? Is there a more efficient and fast way to do this?
Might help,
pd.merge(test, diagramDF, how='left', on=['source', 'target'],right_index=True,left_index=True)
Check this:
test = test.reset_index()
diagramDF = diagramDF.reset_index()
new = pd.merge(test, diagramDF, how='left', on=['source', 'target'])
If I have multiple lists such that
hello = [1,3,5,7,9,11,13]
bye = [2,4,6,8,10,12,14]
and the user inputs 3
is there a way to get the output to go back 3 indexes in the list and start there to get:
9 10
11 12
13 14
with tabs \t between each space.
if the user would input 5
the expected output would be
5 6
7 8
9 10
11 12
13 14
I've tried
for i in range(user_input):
print(hello[-i-1], '\t', bye[-i-1])
Just use negative indexies that start from the end minus the user input (-user_input) and move to the the end (-1), something like:
for i in range(-user_input, 0):
print(hello[i], bye[i])
Another zip solution, but one-lined:
for h, b in zip(hello[-user_input:], bye[-user_input:]):
print(h, b, sep='\t')
Avoids converting the result of zip to a list, so the only temporaries are the slices of hello and bye. While iterating by index can avoid those temporaries, in practice it's almost always cleaner and faster to do the slice and iterate the values, as repeated indexing is both unpythonic and surprisingly slow in CPython.
Use negative indexing in the slice.
hello = [1,3,5,7,9,11,13]
print(hello[-3:])
print(hello[-3:-2])
output
[9, 11, 13]
[9]
You can zip the two lists and use itertools.islice to obtain the desired portion of the output:
from itertools import islice
print('\n'.join(map(' '.join, islice(zip(map(str, hello), map(str, bye)), len(hello) - int(input()), len(hello)))))
Given an input of 3, this outputs:
5 6
7 8
9 10
11 12
13 14
You can use zip to return a lists of tuple where the i-th element comes from the i-th iterable argument.
zip_ = list(zip(hello, bye))
for item in zip_[-user_input:]:
print(item[0], '\t' ,item[1])
then use negative index to get what you want.
If you want to analyze the data
I think using pandas.datafrme may be helpful.
INPUT_INDEX = int(input('index='))
df = pd.DataFrame([hello, bye])
df = df.iloc[:, len(df.columns)-INPUT_INDEX:]
for col in df.columns:
h_value, b_value = df[col].values
print(h_value, b_value)
console
index=3
9 10
11 12
13 14
I had just started with notebooks and Python. When I print my list will I just the list and then number how often they occur.
I would like to if I could get hour as a header above 12 and 8 and count as a header above 7 and 3.
x=df['hour'].value_counts()
print(x[0:2])
12 7
8 3
How I want it:
Hour . count
12 7
8 3
For the moment am I getting this below my results Name: hour, dtype:
int64
/Lisa
I am not sure, since the question is not so clear, but, as I understood here is my solution:
df[['hour', '_column_tobe_counted_']].groupby(['hour']).agg(['count']).reset_index()
Hope this helps.
I am a beginner using pandas.
I'm looking for mutations on several patients. I have 16 different conditions. I simply write a code about it but how can do this by for loop? I try to find the changes on MUT column and set them as True and False. Then try to count the True/False numbers. I have done for only 4.
Can you suggest a more simple way, instead of writing the same code 16 times?
s1=df["MUT"]
A_T= s1.str.contains("A:T")
ATnum= A_T.value_counts(sort=True)
s2=df["MUT"]
A_G=s2.str.contains("A:G")
AGnum=A_G.value_counts(sort=True)
s3=df["MUT"]
A_C=s3.str.contains("A:C")
ACnum=A_C.value_counts(sort=True)
s4=df["MUT"]
A__=s4.str.contains("A:-")
A_num=A__.value_counts(sort=True)
I'm not an expert with using Pandas, so don't know if there's a cleaner way of doing this, but perhaps the following might work?
chars = 'TGC-'
nums = {}
for char in chars:
s = df["MUT"]
A = s.str.contains("A:" + char)
num = A.value_counts(sort=True)
nums[char] = num
ATnum = nums['T']
AGnum = nums['G']
# ...etc
Basically, go through each unique character (T, G, C, -) then pull out the values that you need, then finally stick the numbers in a dictionary. Then, once the loop is finished, you can fetch whatever numbers you need back out of the dictionary.
Just use value_counts, this will give you a count of all unique values in your column, no need to create 16 variables:
In [5]:
df = pd.DataFrame({'MUT':np.random.randint(0,16,100)})
df['MUT'].value_counts()
Out[5]:
6 11
14 10
13 9
12 9
1 8
9 7
15 6
11 6
8 5
5 5
3 5
2 5
10 4
4 4
7 3
0 3
dtype: int64