I would like to ask a question please regarding printing the number of different numbers in python.
for example:
Let us say that I have the following list:
X = [5, 5, 5]
Since here we have only one number, I want to build a code that can recognize that we have only one number here so the output must be:
1
The number is: 5
Let us say that I have the following list:
X = [5,4,5]
Since here we have two numbers (5 and 4), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 4, 5
Let us say that I have the following list:
X = [24,24,24,24,24,24,24,24,26,26,26,26,26,26,26,26]
Since here we have two numbers (24 and 26), I want to the code to recognize that we have only two numbers here so the output must be:
2
The numbers are: 24, 26
You could keep track of unique numbers with a set object:
X = [1,2,3,3,3]
S = set(X)
n = len(S)
print(n, S) # 3 {1,2,3}
Bear in mind sets are unordered, so you would need to convert back to a list and sort them if needed.
you can change this list into set, it will remove duplicate, then you can change it again into list.
list(set(X))
You can try numpy.unique, and use len() on the result
May I ask you please if we can use set() to read the data in a specific column in pandas?
For example, I have the following the DataFrame:
df1= [ 0 -10 2 5
1 24 5 10
2 30 3 6
3 30 2 1
4 30 4 5 ]
where the first column is the index..
I tried first to isolate the second column
[-10
24
30
30
30]
using the following: x = pd.DataFrame(df1, coulmn=[0])
Then, I transposed the column using the following XX = x.T
Then, I used set() function.
However, instead of obtaining
[-10 24 30]
I got the following [0 1 2 3 4]
So set() read the index instead of reading the first column
Related
dict_with_series = {'Even':pd.Series([2,4,6,8,10]),'Odd':pd.Series([1,3,5,7,9])}
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series)
# Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=\[1,2,3,4,5\]), gives a NaN value I dont know why
display(Data_frame_using_dic_Series)
I tried labeling the index but when i did it eliminates the first column and row instead it prints extra column and row at the bottom with NaN value. Can anyone explain me why is it behaving like this , have I done something wrong
If I don't use the index labeling argument it works fine
When you run:
Data_frame_using_dic_Series = pd.DataFrame(dict_with_series,index=[1,2,3,4,5])
You request to only use the indices 1-5 from the provided Series, but the original indexing of a Series is from 0, thus resulting in a reindexing.
If you want to change the index, do it afterwards:
Data_frame_using_dic_Series = (pd.DataFrame(dict_with_series)
.set_axis([1, 2, 3, 4, 5])
)
Output:
Even Odd
1 2 1
2 4 3
3 6 5
4 8 7
5 10 9
If I have multiple lists such that
hello = [1,3,5,7,9,11,13]
bye = [2,4,6,8,10,12,14]
and the user inputs 3
is there a way to get the output to go back 3 indexes in the list and start there to get:
9 10
11 12
13 14
with tabs \t between each space.
if the user would input 5
the expected output would be
5 6
7 8
9 10
11 12
13 14
I've tried
for i in range(user_input):
print(hello[-i-1], '\t', bye[-i-1])
Just use negative indexies that start from the end minus the user input (-user_input) and move to the the end (-1), something like:
for i in range(-user_input, 0):
print(hello[i], bye[i])
Another zip solution, but one-lined:
for h, b in zip(hello[-user_input:], bye[-user_input:]):
print(h, b, sep='\t')
Avoids converting the result of zip to a list, so the only temporaries are the slices of hello and bye. While iterating by index can avoid those temporaries, in practice it's almost always cleaner and faster to do the slice and iterate the values, as repeated indexing is both unpythonic and surprisingly slow in CPython.
Use negative indexing in the slice.
hello = [1,3,5,7,9,11,13]
print(hello[-3:])
print(hello[-3:-2])
output
[9, 11, 13]
[9]
You can zip the two lists and use itertools.islice to obtain the desired portion of the output:
from itertools import islice
print('\n'.join(map(' '.join, islice(zip(map(str, hello), map(str, bye)), len(hello) - int(input()), len(hello)))))
Given an input of 3, this outputs:
5 6
7 8
9 10
11 12
13 14
You can use zip to return a lists of tuple where the i-th element comes from the i-th iterable argument.
zip_ = list(zip(hello, bye))
for item in zip_[-user_input:]:
print(item[0], '\t' ,item[1])
then use negative index to get what you want.
If you want to analyze the data
I think using pandas.datafrme may be helpful.
INPUT_INDEX = int(input('index='))
df = pd.DataFrame([hello, bye])
df = df.iloc[:, len(df.columns)-INPUT_INDEX:]
for col in df.columns:
h_value, b_value = df[col].values
print(h_value, b_value)
console
index=3
9 10
11 12
13 14
I want to know if there is a math expression that I can use to find this relation between two numbers.
Some examples of the input and expected output are below:
Input Multiple Result
4 3 3
6 3 6
8 3 6
4 4 4
12 4 12
16 5 15
Also, the expressions below from Wolfram Alpha show me the expected result but since they don't expand on the explanation on how to do it I can't learn from them...
Biggest multiple of 4 from 10
Biggest multiple of 4 from 12
try with // and % operators!
for //, you would do
Result = (Input // Multiple) * Multiple
This way you get how many times Multiple Fits into Input - this number is then multiplied with the Multiple itself and therefore gives you the expected results!
EDIT: how to do it with modulo %?
Result = Input - (Input % Multiple)
taken from MCO's answer!
You can employ modulo for this. For example, to calculate the biggest multiple of 4 that is less or equal than 13:
13 % 4 = 1
13 - 1 = 12
in python, that could look like this:
def biggest_multiple(multiple_of, input_number):
return input_number - input_number % multiple_of
So you use it as:
$ biggest_multiple(4, 9)
8
$ biggest_multiple(4, 12)
12
Here's how I would do it:
return int(input / multiple) * multiple
It truncates the division so that you get an integer, which you can multiply.
This can be trivial but damn easy to understand. To take into account if multiple is negative or zero
Multiple=[3,3,3,4,4,5,0,-5]
Input=[4,6,8,4,12,16,1,8]
Result=[]
for input,multiple in zip(Input,Multiple):
if(multiple):
Result.append((range(multiple,input+1,abs(multiple)))[-1])
else:
Result.append(0)
print(Result)
Output:
[3, 6, 6, 4, 12, 15, 0, 5]
I am running a Python script (Kaggle script). It works in a 3.4.5 virtualenv, but not in 3.5.2
I am not sure why and I am not familiar with the [[0]] syntax. Below is the snippet.
import pandas as pd
data = pd.read_csv(r'path\train.csv')
labels_flat = data[[0]].values.ravel()
It should produce a list of values from the csv's first column.
In 3.5.2 I get this error:
KeyError: '[0] not in index'
I tried to replicate the value with
labels_flat = []
lf = data.values.tolist()
for row in lf:
labels_flat.append(row[0])
But I don't think it is the same thing.
I dont think the problem is with the syntax, your Dataframe just does not contain the index you are looking for.
For me this works:
In [1]: data = pd.DataFrame({0:[1,2,3], 1:[4,5,6], 2:[7,8,9]})
In [2]: data[[0]]
Out[2]:
0
0 1
1 2
2 3
I think what confuses you about the [[0]] syntax is that the squared brackets are used in python for two completely different things, and the [[0]] statement uses both:
A. [] is used to create a list. In the above example [0] creates a list with the single element 0.
B. [] is also used to access an element from a list (or dict,...). So data[0] returns the 0.-th element of data.
The next confusion thing is that while the usual python lists are indexed by numbers (eg. data[4] is the 4. element of data), Pandas Dataframes can be indexed by lists. This is syntactic sugar to easily access multiple columns of the dataframe at once.
So in my example from above, to get column 0 and 1 you can do:
In [3]: data[[0, 1]]
Out[3]:
0 1
0 1 4
1 2 5
2 3 6
Here the inner [0, 1] creates a list with the elements 0 and 1. The outer [ ] retrieve the columns of the dataframe by using the inner list as an index.
For more readability look at this, its the exact same:
In [4]: l = [0, 1]
In [5]: data[l]
Out[5]:
0 1
0 1 4
1 2 5
2 3 6
If you only want the first column (column 0) you get this:
In [6]: data[[0]]
Out[6]:
0
0 1
1 2
2 3
Which is exactly what you were looking for.
I have some lists such as
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
Where 'hi', 'hello' and 'morning' are strings, while the rest are numbers.
However then I try to stack them up as:
matrix = np.vstack((list1,list2,list3))
However the types of the numbers become string. In particular they become numpy_str.
How do I solve this? I tried replacing the items, I tried changing their type, nothing works
edit
I made a mistake above! In my original problem, the first list is actually a list of headings, so for example
list1 = ['hi', 'number of hours', 'number of days', 'ideas']
So the first column (in the vertically stacked array) is a column of strings. The other columns have a string as their first element and then numbers.
You could use Pandas DataFrames, they allow for heterogeneous data:
>>> pandas.DataFrame([list1, list2, list3])
0 1 2 3
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1
If you want to name the columns, you can do that too:
pandas.DataFrame([list1, list2, list3], columns=list0)
hi nb_hours nb_days ideas
0 hi 2 3 4
1 hello 7 1 8
2 morning 7 2 1
Since number can be written as strings, but strings can not be written as number, your matrix will have all its elements of type string.
If you want to have a matrix of integers, you can:
1- Extract a submatrix corresponding to your numbers and then map it to be integers 2- Or you can directly extract only the numbers from your lists and stack them.
import numpy as np
list1 = ['hi',2,3,4]
list2 = ['hello', 7,1,8]
list3 = ['morning',7,2,1]
matrix = np.vstack((list1,list2,list3))
# First
m = map(np.int32,matrix[:,1:])
# [array([2, 3, 4], dtype=int32), array([7, 1, 8], dtype=int32), array([7, 2, 1], dtype=int32)]
# Second
m = np.vstack((list1[1:],list2[1:],list3[1:]))
# [[2 3 4] [7 1 8] [7 2 1]]
edit (Answer to comment)
I'll call the title list list0:
list0 = ['hi', 'nb_hours', 'nb_days', 'ideas']
It's basically the same ideas:
1- Stack all then extract submatrix (Here we don't take neither first row neither first column: [1:,1:])
matrix = np.vstack((list0,list1,list2,list3))
matrix_nb = map(np.int32,matrix[1:,1:])
2- Directly don't stack the list0 and stack all the other lists (except their first element [1:]):
m = np.vstack((list1[1:],list2[1:],list3[1:]))