Slicing large lists based on input

Slicing large lists based on input - python

If I have multiple lists such that
hello = [1,3,5,7,9,11,13]
bye = [2,4,6,8,10,12,14]
and the user inputs 3
is there a way to get the output to go back 3 indexes in the list and start there to get:
9 10
11 12
13 14
with tabs \t between each space.
if the user would input 5
the expected output would be
5 6
7 8
9 10
11 12
13 14
I've tried
for i in range(user_input):
print(hello[-i-1], '\t', bye[-i-1])

Just use negative indexies that start from the end minus the user input (-user_input) and move to the the end (-1), something like:
for i in range(-user_input, 0):
print(hello[i], bye[i])

Another zip solution, but one-lined:
for h, b in zip(hello[-user_input:], bye[-user_input:]):
print(h, b, sep='\t')
Avoids converting the result of zip to a list, so the only temporaries are the slices of hello and bye. While iterating by index can avoid those temporaries, in practice it's almost always cleaner and faster to do the slice and iterate the values, as repeated indexing is both unpythonic and surprisingly slow in CPython.

Use negative indexing in the slice.
hello = [1,3,5,7,9,11,13]
print(hello[-3:])
print(hello[-3:-2])
output
[9, 11, 13]
[9]

You can zip the two lists and use itertools.islice to obtain the desired portion of the output:
from itertools import islice
print('\n'.join(map(' '.join, islice(zip(map(str, hello), map(str, bye)), len(hello) - int(input()), len(hello)))))
Given an input of 3, this outputs:
5 6
7 8
9 10
11 12
13 14

You can use zip to return a lists of tuple where the i-th element comes from the i-th iterable argument.
zip_ = list(zip(hello, bye))
for item in zip_[-user_input:]:
print(item[0], '\t' ,item[1])
then use negative index to get what you want.

If you want to analyze the data
I think using pandas.datafrme may be helpful.
INPUT_INDEX = int(input('index='))
df = pd.DataFrame([hello, bye])
df = df.iloc[:, len(df.columns)-INPUT_INDEX:]
for col in df.columns:
h_value, b_value = df[col].values
print(h_value, b_value)
console
index=3
9 10
11 12
13 14

Related

Printing the number of different numbers in python

I would like to ask a question please regarding printing the number of different numbers in python.
for example:
Let us say that I have the following list:
X = [5, 5, 5]
Since here we have only one number, I want to build a code that can recognize that we have only one number here so the output must be:
1
The number is: 5
Let us say that I have the following list:
X = [5,4,5]
Since here we have two numbers (5 and 4), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 4, 5
Let us say that I have the following list:
X = [24,24,24,24,24,24,24,24,26,26,26,26,26,26,26,26]
Since here we have two numbers (24 and 26), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 24, 26

You could keep track of unique numbers with a set object:
X = [1,2,3,3,3]
S = set(X)
n = len(S)
print(n, S) # 3 {1,2,3}
Bear in mind sets are unordered, so you would need to convert back to a list and sort them if needed.

you can change this list into set, it will remove duplicate, then you can change it again into list.
list(set(X))

You can try numpy.unique, and use len() on the result

May I ask you please if we can use set() to read the data in a specific column in pandas?
For example, I have the following the DataFrame:
df1= [ 0 -10 2 5
1 24 5 10
2 30 3 6
3 30 2 1
4 30 4 5 ]
where the first column is the index..
I tried first to isolate the second column
[-10
24
30
30
30]
using the following: x = pd.DataFrame(df1, coulmn=[0])
Then, I transposed the column using the following XX = x.T
Then, I used set() function.
However, instead of obtaining
[-10 24 30]
I got the following [0 1 2 3 4]
So set() read the index instead of reading the first column

Python subsetting and slicing with [index,:] format

I've seen python DataFrames sometimes subsetted using the [index,:] notation when sometimes using [index] would suffice.
Using a simple toy example:
df = pd.DataFrame({'a':[1,5,10,15,20,50,88]})
idx = [2,4,6]
We can call the iloc method using either of these:
df.iloc[idx,:]
df.iloc[idx]
To get results:
a
2 10
4 20
6 88
Are there any differences between the call methods? Should I prefer the use of one over the other?

In df.iloc[idx,:] the colon is slicing over the columns. In python when you use [:] you slice over all the options. As example:
df = pd.DataFrame({'a':[1,5,10,15,20,50,88], 'b':[1,5,10,15,20,50,88]})
idx = [2,4,6]
Without columns slicing:
df.iloc[idx]
output:
a b
2 10 10
4 20 20
6 88 88
With columns slicing:
df.iloc[idx,:1]
output:
a
2 10
4 20
6 88
In this case the question is if you want to explicitly slice over all the columns. In my modest opinion I think it will be clear as the standar form df.iloc[idx].
http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position

Mainly they're same.
Axes left out of the specification are assumed to be :. (e.g.
p.loc['a'] is equiv to p.loc['a', :, :])
Different Choices for Indexing

counting T/F values for several conditions

I am a beginner using pandas.
I'm looking for mutations on several patients. I have 16 different conditions. I simply write a code about it but how can do this by for loop? I try to find the changes on MUT column and set them as True and False. Then try to count the True/False numbers. I have done for only 4.
Can you suggest a more simple way, instead of writing the same code 16 times?
s1=df["MUT"]
A_T= s1.str.contains("A:T")
ATnum= A_T.value_counts(sort=True)
s2=df["MUT"]
A_G=s2.str.contains("A:G")
AGnum=A_G.value_counts(sort=True)
s3=df["MUT"]
A_C=s3.str.contains("A:C")
ACnum=A_C.value_counts(sort=True)
s4=df["MUT"]
A__=s4.str.contains("A:-")
A_num=A__.value_counts(sort=True)

I'm not an expert with using Pandas, so don't know if there's a cleaner way of doing this, but perhaps the following might work?
chars = 'TGC-'
nums = {}
for char in chars:
s = df["MUT"]
A = s.str.contains("A:" + char)
num = A.value_counts(sort=True)
nums[char] = num
ATnum = nums['T']
AGnum = nums['G']
# ...etc
Basically, go through each unique character (T, G, C, -) then pull out the values that you need, then finally stick the numbers in a dictionary. Then, once the loop is finished, you can fetch whatever numbers you need back out of the dictionary.

Just use value_counts, this will give you a count of all unique values in your column, no need to create 16 variables:
In [5]:
df = pd.DataFrame({'MUT':np.random.randint(0,16,100)})
df['MUT'].value_counts()
Out[5]:
6 11
14 10
13 9
12 9
1 8
9 7
15 6
11 6
8 5
5 5
3 5
2 5
10 4
4 4
7 3
0 3
dtype: int64

Take multiple lists into dataframe

How do I take multiple lists and put them as different columns in a python dataframe? I tried this solution but had some trouble.
Attempt 1:
Have three lists, and zip them together and use that res = zip(lst1,lst2,lst3)
Yields just one column
Attempt 2:
percentile_list = pd.DataFrame({'lst1Tite' : [lst1],
'lst2Tite' : [lst2],
'lst3Tite' : [lst3] },
columns=['lst1Tite','lst1Tite', 'lst1Tite'])
yields either one row by 3 columns (the way above) or if I transpose it is 3 rows and 1 column
How do I get a 100 row (length of each independent list) by 3 column (three lists) pandas dataframe?

I think you're almost there, try removing the extra square brackets around the lst's (Also you don't need to specify the column names when you're creating a dataframe from a dict like this):
import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame(
{'lst1Title': lst1,
'lst2Title': lst2,
'lst3Title': lst3
})
percentile_list
lst1Title lst2Title lst3Title
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
...
If you need a more performant solution you can use np.column_stack rather than zip as in your first attempt, this has around a 2x speedup on the example here, however comes at bit of a cost of readability in my opinion:
import numpy as np
percentile_list = pd.DataFrame(np.column_stack([lst1, lst2, lst3]),
columns=['lst1Title', 'lst2Title', 'lst3Title'])

Adding to Aditya Guru's answer here. There is no need of using map. You can do it simply by:
pd.DataFrame(list(zip(lst1, lst2, lst3)))
This will set the column's names as 0,1,2. To set your own column names, you can pass the keyword argument columns to the method above.
pd.DataFrame(list(zip(lst1, lst2, lst3)),
columns=['lst1_title','lst2_title', 'lst3_title'])

Adding one more scalable solution.
lists = [lst1, lst2, lst3, lst4]
df = pd.concat([pd.Series(x) for x in lists], axis=1)

There are several ways to create a dataframe from multiple lists.
list1=[1,2,3,4]
list2=[5,6,7,8]
list3=[9,10,11,12]
pd.DataFrame({'list1':list1, 'list2':list2, 'list3'=list3})
pd.DataFrame(data=zip(list1,list2,list3),columns=['list1','list2','list3'])

Just adding that using the first approach it can be done as -
pd.DataFrame(list(map(list, zip(lst1,lst2,lst3))))

Adding to above answers, we can create on the fly
df= pd.DataFrame()
list1 = list(range(10))
list2 = list(range(10,20))
df['list1'] = list1
df['list2'] = list2
print(df)
hope it helps !

#oopsi used pd.concat() but didn't include the column names. You could do the following, which, unlike the first solution in the accepted answer, gives you control over the column order (avoids dicts, which are unordered):
import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
s1=pd.Series(lst1,name='lst1Title')
s2=pd.Series(lst2,name='lst2Title')
s3=pd.Series(lst3 ,name='lst3Title')
percentile_list = pd.concat([s1,s2,s3], axis=1)
percentile_list
Out[2]:
lst1Title lst2Title lst3Title
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
...

you can simply use this following code
train_data['labels']= train_data[["LABEL1","LABEL1","LABEL2","LABEL3","LABEL4","LABEL5","LABEL6","LABEL7"]].values.tolist()
train_df = pd.DataFrame(train_data, columns=['text','labels'])

I just did it like this (python 3.9):
import pandas as pd
my_dict=dict(x=x, y=y, z=z) # Set column ordering here
my_df=pd.DataFrame.from_dict(my_dict)
This seems to be reasonably straightforward (albeit in 2022) unless I am missing something obvious...
In python 2 one could've used a collections.OrderedDict().

Creating Simultaneous Loops in Python

I want to create a loop who has this sense:
for i in xrange(0,10):
for k in xrange(0,10):
z=k+i
print z
where the output should be
0
2
4
6
8
10
12
14
16
18

You can use zip to turn multiple lists (or iterables) into pairwise* tuples:
>>> for a,b in zip(xrange(10), xrange(10)):
... print a+b
...
0
2
4
6
8
10
12
14
16
18
But zip will not scale as well as izip (that sth mentioned) on larger sets. zip's advantage is that it is a built-in and you don't have to import itertools -- and whether that is actually an advantage is subjective.
*Not just pairwise, but n-wise. The tuples' length will be the same as the number of iterables you pass in to zip.

The itertools module contains an izip function that combines iterators in the desired way:
from itertools import izip
for (i, k) in izip(xrange(0,10), xrange(0,10)):
print i+k

You can do this in python - just have to make the tabs right and use the xrange argument for step.
for i in xrange(0, 20, 2);
print i

What about this?
i = range(0,10)
k = range(0,10)
for x in range(0,10):
z=k[x]+i[x]
print z
0
2
4
6
8
10
12
14
16
18

What you want is two arrays and one loop, iterate over each array once, adding the results.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Slicing large lists based on input - python

Just use negative indexies that start from the end minus the user input (-user_input) and move to the the end (-1), something like: for i in range(-user_input, 0): print(hello[i], bye[i])

Use negative indexing in the slice. hello = [1,3,5,7,9,11,13] print(hello[-3:]) print(hello[-3:-2]) output [9, 11, 13] [9]

You can zip the two lists and use itertools.islice to obtain the desired portion of the output: from itertools import islice print('\n'.join(map(' '.join, islice(zip(map(str, hello), map(str, bye)), len(hello) - int(input()), len(hello))))) Given an input of 3, this outputs: 5 6 7 8 9 10 11 12 13 14

You can use zip to return a lists of tuple where the i-th element comes from the i-th iterable argument. zip_ = list(zip(hello, bye)) for item in zip_[-user_input:]: print(item[0], '\t' ,item[1]) then use negative index to get what you want.

Related

Printing the number of different numbers in python

Python subsetting and slicing with [index,:] format

counting T/F values for several conditions

Take multiple lists into dataframe

Creating Simultaneous Loops in Python

Categories

Resources