Adding values from for-loop into an array - python

I'm trying to get the length of the values of the states array into a separate array then sort them by descending order, but I'm having trouble getting all the length values of the string into the array instead of having a single value after the iteration.
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
for i in states:
a = [len(i)]
print(a)

Since you want the lengths sorted in descending order, use sorted with reverse=True and list comprehension
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
a = sorted([len(i) for i in states], reverse=True)
print (a)
Output
[11, 9, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3]
To get the indices of the sorted list without resorting to NumPy arrays, there are many ways: see here. I personally prefer to directly make use of NumPy's argsort. As the name suggests, it returns an array of indices corresponding to the sorted array/list in ascending order. To get the indices for descending order, you can just reverse the array returned by argsort by using [::-1]. Following is a solution to your problem:
import numpy as np
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
a = [len(i) for i in states]
indices_sorted = np.argsort(a)[::-1] # [::-1] gives you indices for decreasing order
Output
array([ 8, 3, 24, 35, 19, 1, 2, 30, 5, 4, 10, 16, 17, 33, 32, 31, 22,
13, 6, 7, 9, 11, 14, 25, 23, 20, 21, 26, 27, 34, 28, 18, 0, 12,
15, 29])
Now as you can see, the first index in the above output is 8 which means the 9th element of states which is Cross River. Similarly you can access and verify the other elements.

You can use a list comprehension:
lengths = [len(state) for state in states]
If you need to use a for loop, create a list and append to it:
lengths = []
for i in states:
lengths.append(len(i))

You can also do this using the map function without using a for loop:
a = list(map(len,states))

Through generator:
lens = [len(a) for a in states]

Related

Add new element in the next sublist depending in if it has been added or not (involves also a dictionary problem) python

Community of Stackoverflow:
I'm trying to create a list of sublists with a loop based on a random sampling of values of another list; and each sublist has the restriction of not having a duplicate or a value that has already been added to a prior sublist.
Let's say (example) I have a main list:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
#I get:
[[1,13],[4,1],[8,13]]
#I WANT:
[[1,13],[4,9],[8,14]] #(no duplicates when checking previous sublists)
The real code that I thought it would work is the following (as a draft):
matrixvals=list(matrix.index.values) #list where values are obtained
lists=[[]for e in range(0,3)] #list of sublists that I want to feed
vls=[] #stores the values that have been added to prevent adding them again
for e in lists: #initiate main loop
for i in range(0,5): #each sublist will contain 5 different random samples
x=random.sample(matrixvals,1) #it doesn't matter if the samples are 1 or 2
if any(x) not in vls: #if the sample isn't in the evaluation list
vls.extend(x)
e.append(x)
else: #if it IS, then do a sample but without those already added values (line below)
x=random.sample([matrixvals[:].remove(x) for x in vls],1)
vls.extend(x)
e.append(x)
print(lists)
print(vls)
It didn't work as I get the following:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [13]], [[11], [7], [13], [17], [25]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 13, 11, 7, 13, 17, 25]
As you can see, number 13 is repeated 3 times, and I don't understand why
I would want:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [70]], [[11], [7], [100], [18], [27]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 70, 11, 7, 100, 18, 27] #no dups
In addition, is there a way to convert the sample.random results as values instead of lists? (to obtain):
[[25,16,15,31,17]], [4, 2, 13, 42,70], [11, 7, 100, 18, 27]]
Also, the final result in reality isn't a list of sublists, actually is a dictionary (the code above is a draft attempt to solve the dict problem), is there a way to obtain that previous method in a dict? With my present code I got the next results:
{'1stkey': {'1stsubkey': {'list1': [41,
40,
22,
28,
26,
14,
41,
15,
40,
33],
'list2': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33],
'list3': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,
7,
31,
12,
8,
22,
27,...}
Instead of that result, I would want the following:
{'1stkey': {'1stsubkey': {'list1': [41,40,22],
'list2': [28, 26, 14],
'list3': [41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,7,31],
'list2':[12,8,22],
'list3':[27...,...}#and so on
Is there a way to solve both list and dict problem? Any help will be very appreciated; I can made some progress even only with the list problem
Thanks to all
I realize you may be more interested in finding out why your particular approach isn't working. However, if I've understood your desired behavior, I may be able to offer an alternative solution. After posting my answer, I will take a look at your attempt.
random.sample lets you sample k number of items from a population (collection, list, whatever.) If there are no repeated elements in the collection, then you're guaranteed to have no repeats in your random sample:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_samples = 4
print(sample(pool, k=num_samples))
Possible output:
[9, 11, 8, 7]
>>>
It doesn't matter how many times you run this snippet, you will never have repeated elements in your random sample. This is because random.sample doesn't generate random objects, it just randomly picks items which already exist in a collection. This is the same approach you would take when drawing random cards from a deck of cards, or drawing lottery numbers, for example.
In your case, pool is the pool of possible unique numbers to choose your sample from. Your desired output seems to be a list of three lists, where each sublist has two samples in it. Rather than calling random.sample three times, once for each sublist, we should call it once with k=num_sublists * num_samples_per_sublist:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
print(sample(pool, k=num_samples))
Possible output:
[14, 10, 1, 8, 6, 3]
>>>
OK, so we have six samples rather than four. No sublists yet. Now you can simply chop this list of six samples up into three sublists of two samples each:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
def pairwise(iterable):
yield from zip(*[iter(iterable)]*samples_per_sublist)
print(list(pairwise(sample(pool, num_samples))))
Possible output:
[(4, 11), (12, 13), (8, 15)]
>>>
Or if you really want sublists, rather than tuples:
def pairwise(iterable):
yield from map(list, zip(*[iter(iterable)]*samples_per_sublist))
EDIT - just realized that you don't actually want a list of lists, but a dictionary. Something more like this? Sorry I'm obsessed with generators, and this isn't really easy to read:
keys = ["1stkey"]
subkeys = ["1stsubkey", "2ndsubkey"]
num_lists_per_subkey = 3
num_samples_per_list = 5
num_samples = num_lists_per_subkey * num_samples_per_list
min_sample = 1
max_sample = 50
pool = list(range(min_sample, max_sample + 1))
def generate_items():
def generate_sub_items():
from random import sample
samples = sample(pool, k=num_samples)
def generate_sub_sub_items():
def chunkwise(iterable, n=num_samples_per_list):
yield from map(list, zip(*[iter(iterable)]*n))
for list_num, chunk in enumerate(chunkwise(samples), start=1):
key = f"list{list_num}"
yield key, chunk
for subkey in subkeys:
yield subkey, dict(generate_sub_sub_items())
for key in keys:
yield key, dict(generate_sub_items())
print(dict(generate_items()))
Possible output:
{'1stkey': {'1stsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}, '2ndsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}}}
>>>

Split a numpy array with several sorted sequences

I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:
arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]
I would like to split it into subarrays - each one holds another sequence -
[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]
What is the fastest way to do that?
I would do it following way
import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)
output
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]
Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).
Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted
First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.
Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
Explanation: You can calculate the difference between consecutive values with np.diff.
>>> import numpy as np
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
>>> np.diff(a)
array([ 1, 1, 8, 1, 1, 1, 1, -17, 1, 1])
Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.
>>> np.where(np.diff(a) != 1)
(array([2, 7]),)
Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.
>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

How do i reference values from various ranges within a list?

What I want to do is reference several different ranges from within a list, i.e. I want the 4-6th elements, the 12 - 18th elements, etc. This was my initial attempt:
test = theList[4:7, 12:18]
Which I would expect to give do the same thing as:
test = theList[4,5,6,12,13,14,15,16,17]
But I got a syntax error. What is the best/easiest way to do this?
You can add the two lists.
>>> theList = list(range(20))
>>> theList[4:7] + theList[12:18]
[4, 5, 6, 12, 13, 14, 15, 16, 17]
You can also use itertools module :
>>> from itertools import islice,chain
>>> theList=range(20)
>>> list(chain.from_iterable(islice(theList,*t) for t in [(4,7),(12,18)]))
[4, 5, 6, 12, 13, 14, 15, 16, 17]
Note that since islice returns a generator in each iteration it performs better than list slicing in terms of memory use.
Also you can use a function for more indices and a general way .
>>> def slicer(iterable,*args):
... return chain.from_iterable(islice(iterable,*i) for i in args)
...
>>> list(slicer(range(40),(2,8),(10,16),(30,38)))
[2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 30, 31, 32, 33, 34, 35, 36, 37]
Note : if you want to loop over the result you don't need convert the result to list!
You can add the two lists as #Bhargav_Rao stated. More generically, you can also use a list generator syntax:
test = [theList[i] for i in range(len(theList)) if 4 <= i <= 7 or 12 <= i <= 18]

how to exclude elements from numpy matrix

Suppose we have a matrix:
mat = np.random.randn(5,5)
array([[-1.3979852 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
[ 0.12491113, 0.90526669, -0.18217331, 1.1252506 , -0.31782889],
[-3.5933008 , -0.17981343, 0.91469733, -0.59719805, 0.12728085],
[ 0.6906646 , 0.2316733 , -0.2804641 , 1.39864598, -0.09113139],
[-0.38012856, -1.7230821 , -0.5779237 , 0.30610451, -1.30015299]])
Suppose also that we have an index array:
idx = np.array([0,4,3,1,3])
While we can extract elements from the matrix using the following:
mat[idx, range(len(idx))]
array([-1.3979852 , -1.7230821 , -0.2804641 , 1.1252506 , -0.09113139])
What I want to know is how we can use the index to exclude elements from matrix, i.e. how do I obtain the following result:
array([[0.12491113 , -0.37711369, -1.99509723, -0.6151796 , -0.78780951],
[-3.5933008 , 0.90526669, -0.18217331, -0.59719805, -0.31782889],
[0.6906646 , -0.17981343, 0.91469733, 1.39864598, 0.12728085],
[-0.38012856, 0.2316733 , -0.5779237 , 0.30610451, -1.30015299]])
Thought it would be as simple as doing mat[-idx, range(len(idx))] but that doesn't work. I've also tried np.delete() but that doesn't seem to do it either. Any solutions out there that don't require looping or list comprehensions? Would appreciate any insight. Thanks.
EDIT: data must be in the same columns post processing.
When you say 'delete' does not work, what do you mean? What does it do? That might be diagnostic.
Lets first look at the selection that does work:
In [484]: mat=np.arange(25).reshape(5,5) # I like this better than random
In [485]: mat[idx,range(5)]
Out[485]: array([ 0, 21, 17, 8, 19])
this can also be used on a flattened version of the file:
In [486]: mat.flat[idx*5+np.arange(5)]
Out[486]: array([ 0, 21, 17, 8, 19])
now try the same with the default flat delete:
In [487]: np.delete(mat,idx*5+np.arange(5)).reshape(5,4)
Out[487]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 9],
[10, 11, 12, 13],
[14, 15, 16, 18],
[20, 22, 23, 24]])
delete isn't an inplace operator; it returns a new matrix. And if you specify an axis, delete removes whole rows or columns, not selected items.
mat[-idx, range(len(idx))] isn't going to work since negative indexes already have a meaning - count from the end.
This delete ends up doing boolean indexing, thus:
In [498]: mat1=mat.ravel()
In [499]: idx1=idx*5+np.arange(5)
In [500]: ii=np.ones(mat1.shape, bool)
In [501]: ii[idx1]=False
In [502]: mat1[ii]
Out[502]:
array([ 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24])
This sort of indexing/delete works even if you delete a different number of items from each row. Of course in that case you couldn't count on reshaping the matrix back to a rectangular matrix.
In general when dealing with different indexes for different rows, the operation ends up acting on the flat or raveled version of the matrix. 'Irregular' operations usually make more sense when dealing with 1d arrays than with 2d.
Looking more carefully at your example, I see that when you remove an item, you move the other column values up to fill the gap. In my version, I moved values along rows. Let's try this with F ordered.
In [523]: mat2=mat.flatten('F')
In [524]: np.delete(mat2,idx2).reshape(5,4).T
Out[524]:
array([[ 5, 1, 2, 3, 4],
[10, 6, 7, 13, 9],
[15, 11, 12, 18, 14],
[20, 16, 22, 23, 24]])
where I removed a value from each column:
In [525]: mat2[idx2]
Out[525]: array([ 0, 21, 17, 8, 19])

Intersection between all elements of same list where elements are set [duplicate]

This question already has answers here:
Best way to find the intersection of multiple sets?
(7 answers)
Closed 8 years ago.
I have a list-
list_of_sets = [{0, 1, 2}, {0}]
I want to calculate the intersection between the elements of the list. I have thought about this solution:
a = list_of_sets[0]
b = list_of_sets[1]
c = set.intersection(a,b)
This solution works as i know the number of the elements of the list. (So i can declare as many as variable i need like a,b etc.)
My problem is that i can't figure out a solution for the other case, where the number of the elements of the list is unknown.
N.B: the thought of counting the number of elements of the list using loop and than creating variables according to the result has already been checked. As i have to keep my code in a function (where the argument is list_of_sets), so i need a more generalized solution that can be used for any numbered list.
Edit 1:
I need a solution for all the elements of the list. (not pairwise or for 3/4 elements)
If you wanted the intersection between all elements of all_sets:
intersection = set.intersection(*all_sets)
all_sets is a list of sets. the set is the set type.
For pairwise calculations,
This calculates intersections of all unordered pairs of 2 sets from a list all_sets. Should you need for 3, then use 3 as the argument.
from itertools import combinations, starmap
all_intersections = starmap(set.intersection, combinations(all_sets, 2))
If you did need the sets a, b for calculations, then:
for a, b in combinations(all_sets, 2):
# do whatever with a, b
You want the intersection of all the set. Then:
list_of_sets[0].intersection(*list_of_sets[1:])
Should work.
Take the first set from the list and then intersect it with the rest (unpack the list with the *).
You can use reduce for this. If you're using Python 3 you will have to import it from functools. Here's a short demo:
#!/usr/bin/env python
n = 30
m = 5
#Find sets of numbers i: 1 <= i <= n that are coprime to each number j: 2 <= j <= m
list_of_sets = [set(i for i in range(1, n+1) if i % j) for j in range(2, m+1)]
print 'Sets in list_of_sets:'
for s in list_of_sets:
print s
print
#Get intersection of all the sets
print 'Numbers less than or equal to %d that are coprime to it:' % n
print reduce(set.intersection, list_of_sets)
output
Sets in list_of_sets:
set([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])
set([1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29])
set([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30])
set([1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29])
Numbers less than or equal to 30 that are coprime to it:
set([1, 7, 11, 13, 17, 19, 23, 29])
Actually, we don't even need reduce() for this, we can simply do
set.intersection(*list_of_sets)

Categories

Resources