Construct a list from dataset of values and value counts

Construct a list from dataset of values and value counts - python

To expand the data Score into a list of Scores based on the Count, is there a better way in pandas and numpy than the following?
import pandas as pd
import numpy as np
data = {
"Count": [1, 3, 2],
"Score": [2, 5, 8]
}
df = pd.DataFrame(data)
scores = []
for c, s in zip(df['Count'], df['Score']):
for i in range(0, c):
scores.append(s)
print(scores)
Expected output:
[2, 5, 5, 5, 8, 8]

IIUC, you can use pd.series.repeat:
df['Score'].repeat(df['Count']).tolist()
Or np.repeat:
np.repeat(df['Score'],df['Count']).tolist()
Or pd.Index.repeat:
df.loc[df.index.repeat(df['Count']),'Score'].tolist()
[2, 5, 5, 5, 8, 8]

Related

Creating lists with variable names in python [duplicate]

I have a text file in the following format:
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
How can i read it into a list efficiently so as to get the following
output?
list=[[1,4,1,6],[1,5,2,9],[2,6,5,8],[3,7,7,5]]

Let's assume that the file is named spam.txt:
$ cat spam.txt
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
Using list comprehensions and the zip() built-in function, you can write a program such as:
>>> with open('spam.txt', 'r') as file:
... file.readline() # skip the first line
... rows = [[int(x) for x in line.split(',')[:-1]] for line in file]
... cols = [list(col) for col in zip(*rows)]
...
'a,b,c,d,\n'
>>> rows
[[1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]]
>>> cols
[[1, 4, 1, 6], [1, 5, 2, 9], [2, 6, 5, 8], [3, 7, 7, 5]]
Additionally, zip(*rows) is based on unpacking argument lists, which unpacks a list or tuple so that its elements can be passed as separate positional arguments to a function. In other words, zip(*rows) is reduced to zip([1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]).
EDIT:
This is a version based on NumPy for reference:
>>> import numpy as np
>>> with open('spam.txt', 'r') as file:
... ncols = len(file.readline().split(',')) - 1
... data = np.fromiter((int(v) for line in file for v in line.split(',')[:-1]), int, count=-1)
... cols = data.reshape(data.size / ncols, ncols).transpose()
...
>>> cols
array([[1, 4, 1, 6],
[1, 5, 2, 9],
[2, 6, 5, 8],
[3, 7, 7, 5]])

You can try the following code:
from numpy import*
x0 = []
for line in file('yourfile.txt'):
line = line.split()
x = line[1]
x0.append(x)
for i in range(len(x0)):
print x0[i]
Here the first column is appended onto x0[]. You can append the other columns in a similar fashion.

You can use data_py package to read column wise data from a file.
Install this package by using
pip install data-py==0.0.1
Example
from data_py import datafile
df1=datafile("C:/Folder/SubFolder/data-file-name.txt")
df1.separator=","
[Col1,Col2,Col3,Col4,Col5]=["","","","",""]
[Col1,Col2,Col3,Col4,Col5]=df1.read([Col1,Col2,Col3,Col4,Col5],lineNumber)
print(Col1,Col2,Col3,Col4,Col5)
For details please follow the link https://www.respt.in/p/python-package-datapy.html

Convert html table to List of Lists in Python

I have this code that converts a specific html table data cell into a list:
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [int(v) for v in '-'.join(df['Position']).split('-')]
print(my_list)
The code is fine, but what is the elegant way of converting the list from:
[1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12, 13]
to this instead:
[[1, 2, 3, 4],[4, 5, 6, 7],[7, 8, 9, 10],[10, 11, 12, 13]]

Insted of joining the rows with '-'.join(df['Position']), just iterate over each row, and create a sublist for each.
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [[int(v) for v in row.split('-')] for row in df['Position']]
print(my_list)
Asuming you want to keep the rows/sublists from the original table you put in the question.

Selecting a range of columns in Python without using numpy

I want to extract range of columns. I know how to do that in numpy but I don't want to use numpy slicing operator.
import numpy as np
a = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
arr = np.array(a)
k = 0
print(arr[k:, k+1]) # --> [2 7]
print([[a[r][n+1] for n in range(0,k+1)] for r in range(k,len(a))][0]) # --> [2]
What's wrong with second statement?

You're overcomplicating it. Get the rows with a[k:], then get a cell with row[k+1].
>>> [row[k+1] for row in a[k:]]
[2, 7]

a = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
k = 0
print(list(list(zip(*a[k:]))[k+1])) # [2, 7]

Is this what you're looking for?
cols = [1,2,3] # extract middle 3 columns
cols123 = [[l[col] for col in cols] for l in a]
# [[2, 3, 4], [7, 8, 9]]

Adding values from one array depending on occurrences of values in another array

Cant Really bend my mind around this problem I'm having:
say I have 2 arrays
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
what I'm trying to do is that if a value is repeated in array B (like how 1 is repeated 3 times), those corresponding values in array A are added up to be appended to another array (say C)
so C would look like (from above two arrays):
C = [13, 12, 12]
Also sidenote.. the application I'd be using this code for uses timestamps from a database acting as array B (so that once a day is passed, that value in the array obviously won't be repeated)
Any help is appreciated!!

Here is a solution without pandas, using only itertools groupby:
from itertools import groupby
C = [sum( a for a,_ in g) for _,g in groupby(zip(A,B),key = lambda x: x[1])]
yields:
[13, 12, 12]

I would use pandas for this
Say you put those arrays in a DataFrame. This does the job:
df = pd.DataFrame(
{
'A': [2, 7, 4, 3, 9, 4, 2, 6],
'B': [1, 1, 1, 4, 4, 7, 7, 7]
}
)
df.groupby('B').sum()

If you want pure python solution, you can use itertools.groupby:
from itertools import groupby
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
out = []
for _, g in groupby(zip(A, B), lambda k: k[1]):
out.append(sum(v for v, _ in g))
print(out)
Prints:
[13, 12, 12]

Remove an element from list of dictionaries with Pandas element

Considering "b" defined below as a list of dictionaries. How can I remove element 6 from the 'index' in second element of b (b[1]['index'][6]) and save the new list to b?
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
output:
[{'color': 'red', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}, {'color': 'blue', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}]
I tried np.delete and .pop or .del for lists (no success), but I do not know what is the best way to do it?

I think this will work for you
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
print a
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
d = b[1]['index']
b[1]['index'] = d.delete(6)
print b[1]['index']
Int64Index([0, 1, 2, 3, 4, 5, 7, 8, 9], dtype='int64')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Construct a list from dataset of values and value counts - python

IIUC, you can use pd.series.repeat: df['Score'].repeat(df['Count']).tolist() Or np.repeat: np.repeat(df['Score'],df['Count']).tolist() Or pd.Index.repeat: df.loc[df.index.repeat(df['Count']),'Score'].tolist() [2, 5, 5, 5, 8, 8]

Related

Creating lists with variable names in python [duplicate]

Convert html table to List of Lists in Python

Selecting a range of columns in Python without using numpy

Adding values from one array depending on occurrences of values in another array

Remove an element from list of dictionaries with Pandas element

Categories

Resources