Construct a list from dataset of values and value counts - python

To expand the data Score into a list of Scores based on the Count, is there a better way in pandas and numpy than the following?
import pandas as pd
import numpy as np
data = {
"Count": [1, 3, 2],
"Score": [2, 5, 8]
}
df = pd.DataFrame(data)
scores = []
for c, s in zip(df['Count'], df['Score']):
for i in range(0, c):
scores.append(s)
print(scores)
Expected output:
[2, 5, 5, 5, 8, 8]

IIUC, you can use pd.series.repeat:
df['Score'].repeat(df['Count']).tolist()
Or np.repeat:
np.repeat(df['Score'],df['Count']).tolist()
Or pd.Index.repeat:
df.loc[df.index.repeat(df['Count']),'Score'].tolist()
[2, 5, 5, 5, 8, 8]

Related

Creating lists with variable names in python [duplicate]

I have a text file in the following format:
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
How can i read it into a list efficiently so as to get the following
output?
list=[[1,4,1,6],[1,5,2,9],[2,6,5,8],[3,7,7,5]]
Let's assume that the file is named spam.txt:
$ cat spam.txt
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
Using list comprehensions and the zip() built-in function, you can write a program such as:
>>> with open('spam.txt', 'r') as file:
... file.readline() # skip the first line
... rows = [[int(x) for x in line.split(',')[:-1]] for line in file]
... cols = [list(col) for col in zip(*rows)]
...
'a,b,c,d,\n'
>>> rows
[[1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]]
>>> cols
[[1, 4, 1, 6], [1, 5, 2, 9], [2, 6, 5, 8], [3, 7, 7, 5]]
Additionally, zip(*rows) is based on unpacking argument lists, which unpacks a list or tuple so that its elements can be passed as separate positional arguments to a function. In other words, zip(*rows) is reduced to zip([1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]).
EDIT:
This is a version based on NumPy for reference:
>>> import numpy as np
>>> with open('spam.txt', 'r') as file:
... ncols = len(file.readline().split(',')) - 1
... data = np.fromiter((int(v) for line in file for v in line.split(',')[:-1]), int, count=-1)
... cols = data.reshape(data.size / ncols, ncols).transpose()
...
>>> cols
array([[1, 4, 1, 6],
[1, 5, 2, 9],
[2, 6, 5, 8],
[3, 7, 7, 5]])
You can try the following code:
from numpy import*
x0 = []
for line in file('yourfile.txt'):
line = line.split()
x = line[1]
x0.append(x)
for i in range(len(x0)):
print x0[i]
Here the first column is appended onto x0[]. You can append the other columns in a similar fashion.
You can use data_py package to read column wise data from a file.
Install this package by using
pip install data-py==0.0.1
Example
from data_py import datafile
df1=datafile("C:/Folder/SubFolder/data-file-name.txt")
df1.separator=","
[Col1,Col2,Col3,Col4,Col5]=["","","","",""]
[Col1,Col2,Col3,Col4,Col5]=df1.read([Col1,Col2,Col3,Col4,Col5],lineNumber)
print(Col1,Col2,Col3,Col4,Col5)
For details please follow the link https://www.respt.in/p/python-package-datapy.html

Convert html table to List of Lists in Python

I have this code that converts a specific html table data cell into a list:
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [int(v) for v in '-'.join(df['Position']).split('-')]
print(my_list)
The code is fine, but what is the elegant way of converting the list from:
[1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12, 13]
to this instead:
[[1, 2, 3, 4],[4, 5, 6, 7],[7, 8, 9, 10],[10, 11, 12, 13]]
Insted of joining the rows with '-'.join(df['Position']), just iterate over each row, and create a sublist for each.
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [[int(v) for v in row.split('-')] for row in df['Position']]
print(my_list)
Asuming you want to keep the rows/sublists from the original table you put in the question.

Selecting a range of columns in Python without using numpy

I want to extract range of columns. I know how to do that in numpy but I don't want to use numpy slicing operator.
import numpy as np
a = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
arr = np.array(a)
k = 0
print(arr[k:, k+1]) # --> [2 7]
print([[a[r][n+1] for n in range(0,k+1)] for r in range(k,len(a))][0]) # --> [2]
What's wrong with second statement?
You're overcomplicating it. Get the rows with a[k:], then get a cell with row[k+1].
>>> [row[k+1] for row in a[k:]]
[2, 7]
a = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
k = 0
print(list(list(zip(*a[k:]))[k+1])) # [2, 7]
Is this what you're looking for?
cols = [1,2,3] # extract middle 3 columns
cols123 = [[l[col] for col in cols] for l in a]
# [[2, 3, 4], [7, 8, 9]]

Adding values from one array depending on occurrences of values in another array

Cant Really bend my mind around this problem I'm having:
say I have 2 arrays
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
what I'm trying to do is that if a value is repeated in array B (like how 1 is repeated 3 times), those corresponding values in array A are added up to be appended to another array (say C)
so C would look like (from above two arrays):
C = [13, 12, 12]
Also sidenote.. the application I'd be using this code for uses timestamps from a database acting as array B (so that once a day is passed, that value in the array obviously won't be repeated)
Any help is appreciated!!
Here is a solution without pandas, using only itertools groupby:
from itertools import groupby
C = [sum( a for a,_ in g) for _,g in groupby(zip(A,B),key = lambda x: x[1])]
yields:
[13, 12, 12]
I would use pandas for this
Say you put those arrays in a DataFrame. This does the job:
df = pd.DataFrame(
{
'A': [2, 7, 4, 3, 9, 4, 2, 6],
'B': [1, 1, 1, 4, 4, 7, 7, 7]
}
)
df.groupby('B').sum()
If you want pure python solution, you can use itertools.groupby:
from itertools import groupby
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
out = []
for _, g in groupby(zip(A, B), lambda k: k[1]):
out.append(sum(v for v, _ in g))
print(out)
Prints:
[13, 12, 12]

Remove an element from list of dictionaries with Pandas element

Considering "b" defined below as a list of dictionaries. How can I remove element 6 from the 'index' in second element of b (b[1]['index'][6]) and save the new list to b?
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
output:
[{'color': 'red', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}, {'color': 'blue', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}]
I tried np.delete and .pop or .del for lists (no success), but I do not know what is the best way to do it?
I think this will work for you
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
print a
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
d = b[1]['index']
b[1]['index'] = d.delete(6)
print b[1]['index']
Int64Index([0, 1, 2, 3, 4, 5, 7, 8, 9], dtype='int64')

Categories

Resources