DataFrame of frequencies by list comprehension? - python

I'm trying to build a pandas DataFrame of chromatic frequencies between A1 (55Hz) and A8 (7040Hz). Essentially, I want it to look like this...
df = pd.DataFrame(columns=['A', 'A#', 'B', 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#'])
df.loc[0] = (55, 58.27, 61.74, 32.7, 34.65, 36.71, 38.89, 41.2, 43.65, 49, 51.91)
But without having to manually assign all the frequencies to their respective notes and with an octave per row (octave 1 to 8).
Based on the site http://pages.mtu.edu/~suits/notefreqs.html, the space between each note (or a 'half-step') given a single note is...
def hz_stepper(fixed_note, steps):
a = 2 ** (1/12)
return fixed_note * a ** steps
Using that function 'hz_stepper', I can chromatically increase or decrease a given note n times by assigning 1 or -1 to steps variable.
My question is, how do I create a DataFrame where all the rows look like how I did it manually, but using a list comprehension to form the rows?

just iterate over the pitches and reshape the result afterwards:
import numpy as np
import pandas as pd
base = 55.
n_octave = 8
columns = ['A', 'A#', 'B', 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#']
factors = 2**(np.arange(12 * n_octave) / 12.)
pd.DataFrame(data=base * factors.reshape((n_octave, 12)), columns=columns)
Explanation
factors are the desired frequencies as 1d numpy array, but they are not in the tabular form required for the DataFrame. reshape creates a view of the array content, that has shape (n_octave, 12) such that rows are contiguous. E.g.
>>> np.arange(6).reshape((2, 3))
array([[0, 1, 2],
[3, 4, 5]])
This is just the format needed for the DataFrame.

from your begining :
df = pd.DataFrame(columns=['A', 'A#', 'B', 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#'])
df.loc[0] = 55*2**(np.arange(12)/12)
for i in range(8): df.loc[i+1]=2*df.loc[i]

Related

How to efficiently filter two-dimensional np array by values given in a list (by many values)

I have a two-dimensional np array and I need to efficiently filter it by values given in a list.
b = np.array([['a', 'b', 'c', 'd'], ['b', 'a', 'c', 'd'], ['c', 'b', 'a', 'd'], ['a', 'd', 'c', 'b']])
values_to_stay_in_b = ['a', 'b']
I found solution with using set difference, but the position in array b is important.
Is any better solution than simple list comprehensions as below:
output = []
for l in b:
output.append([a for a in l if a in values_to_stay_in_b ])
np.array(output)
Result:
array([['a', 'b'],
['b', 'a'],
['b', 'a'],
['a', 'b']], dtype='<U1')
Using pure numpy, so atleast removing for loops.
For each entry in b, compare it with each entry in values_to_stay_in_b and get a mask list. This is done adding extra axis and comapring using broadcasting
Any one in this need to be true.
Since you clarify that after filtering each row has same columns, I reshape it based on number of rows
b[(b[..., None] == values_to_stay_in_b).any(axis=2)].reshape(b.shape[0], -1)

Return a list with dataframe column values ordered based on another list

I have a df with columns a-h, and I wish to create a list of these column values, but in the order of values in another list (list1). list1 corresponds to the index value in df.
df
a b c d e f g h
list1
[3,1,0,5,2,7,4,6]
Desired list
['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g']
You can just do df.columns[list1]:
import pandas as pd
df = pd.DataFrame([], columns=list('abcdefgh'))
list1 = [3,1,0,5,2,7,4,6]
print(df.columns[list1])
# Index(['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g'], dtype='object')
First get a np.array of alphabets
arr = np.array(list('abcdefgh'))
Or in your case, a list of your df columns
arr = np.array(df.columns)
Then use your indices as a indexing mask
arr[[3,1,0]]
out:
['d', 'b', 'a']
Check
df.columns.to_series()[list1].tolist()

Split a list into various slices and concatenate it

I have seen similar questions to mine, but nothing I researched really fixed my issue.
So, basically I want to split a list, in order to remove some items and concatenate it back. Those items correspond to indexes that are given by a list of tuples.
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([list1[0:s] +list1[s+1:e] for s,e in indices])
#Returns: [['x', 'y', 'z', 'a'], ['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f']]
This code I have, which I got from one of the answers from this post nearly does what I need, but I tried to adapt it to loop over the first index of indices once but instead it does twice and it doesn't include the rest of the list.
I want my final list to split from zero index to the first item on first tuple and so on, using a for loop or some iterator.
Something like this,
`final_arr = arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:]<br/>
#Returns: [['x','y','a','b','c','f','g',2,3,4]]`
If someone could do it using for loops, it would be easier for me to see how you understand the problem, then after I can try to adapt to using shorter code.
Sort the indices using sorted and del the slices. You need reverse=True otherwise the indices of the later slices are incorrect.
for x, y in sorted(indices, reverse=True):
del(arr[x:y])
print(arr)
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
This is the same result as you get with
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
import itertools
ignore = set(itertools.chain.from_iterable(map(lambda i: range(*i), indices)))
out = [c for idx, c in enumerate(arr) if idx not in ignore]
print(out)
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
Output,
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
Like this:
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([v for t in indices for i,v in enumerate(arr) if i not in range(t[0],t[1])])
Output:
['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4, 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4]
1- If you can remove the list items:
I using the example for JimithyPicker. I change the index list (removed items), because always that one index was removed the size of list change.
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [2,5,5] #INDEXES THAT NEED TO BE CUT OUT
for index in indices:
arr.pop(index)
final_arr = [arr]
print(final_arr)
Output:
[['x', 'y', 'a', 'b', 'c', 'f', 'g', 2, 3, 4]]
2- If you can't remove items:
In this case is necessary change the second index! The number doesn't match with output that you want.
The indices = [(2,4),(7,9)] has the output: ['x', 'y', 'a', 'b', 'c', 'd', 'f', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(6,9)] #INDEXES THAT NEED TO BE CUT OUT
final_arr = arr[0:indices[0][0]] + arr[indices[0][1]-1:indices[1][0]] + arr[indices[1][1]-1:]
print(final_arr)
Output:
['x','y','a','b','c','f','g',2,3,4]

Is there a pandas method to do the opposite of "pandas.factorize" on dataframe columns?

Is there any pandas method to unfactor a dataframe column? I could not find any in the documentation, but was expecting something similar to unfactor in R language.
I managed to come up with the following code, for reconstructing the column (assuming none of the column values are missing), by using the labels array values as indices of uniques.
orig_col = ['b', 'b', 'a', 'c', 'b']
labels, uniques = pd.factorize(orig_col)
recon_col = np.array([uniques[label] for label in labels]).tolist()
orig_col == recon_col
orig_col = ['b', 'b', 'a', 'c', 'b']
labels, uniques = pd.factorize(orig_col)
# To get original list back
uniques[labels]
# array(['b', 'b', 'a', 'c', 'b'], dtype=object)
Yes we can do it via np.vectorize and create the dict
np.vectorize(dict(zip(range(len(uniques)),uniques)).get)(labels)
array(['b', 'b', 'a', 'c', 'b'], dtype='<U1')

Create a combined list of random order from a fixed number of items

Say I have 3 different items being A, B and C. I want to create a combined list containing NA copies of A, NB copies of B and NC copies of C in random orders. So the results should look like this:
finalList = [A, C, A, A, B, C, A, C,...]
Is there a clean way to get around this using np.random.rand Pythonically? If not, any other packages besides numpy?
I don't think you need numpy for that. You can use the random builtin package:
import random
na = nb = nc = 5
l = ['A'] * na + ['B'] *nb + ['C'] * nc
random.shuffle(l)
list l will look something like:
['A', 'C', 'A', 'B', 'C', 'A', 'C', 'B', 'B', 'B', 'A', 'C', 'B', 'C', 'A']
You can define a list of tuples. Each tuple should contain a character and desired frequency. Then you can create a list where each element is repeated with specified frequency and finally shuffle it using random.shuffle
>>> import random
>>> l = [('A',3),('B',5),('C',10)]
>>> a = [val for val, freq in l for i in range(freq)]
>>> random.shuffle(a)
>>> ['A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'C', 'B', 'C']
Yes, this is very much possible (and simple) with numpy. You'll have to create an array with your unique elements, repeat each element a specified number of times using np.repeat (using an axis argument makes this possible), and then shuffle with np.random.shuffle.
Here's an example with NA as 1, NB as 2, and NC as 3.
a = np.array([['A', 'B', 'C']]).repeat([1, 2, 3], axis=1).squeeze()
np.random.shuffle(a)
print(a)
array(['B', 'C', 'A', 'C', 'B', 'C'],
dtype='<U1')
Note that it is simpler to use numpy, specifying an array of unique elements and repeats, versus a pure python implementation when you have a large number of unique elements to repeat.

Categories

Resources