I have this code that converts a specific html table data cell into a list:
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [int(v) for v in '-'.join(df['Position']).split('-')]
print(my_list)
The code is fine, but what is the elegant way of converting the list from:
[1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12, 13]
to this instead:
[[1, 2, 3, 4],[4, 5, 6, 7],[7, 8, 9, 10],[10, 11, 12, 13]]
Insted of joining the rows with '-'.join(df['Position']), just iterate over each row, and create a sublist for each.
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [[int(v) for v in row.split('-')] for row in df['Position']]
print(my_list)
Asuming you want to keep the rows/sublists from the original table you put in the question.
Related
I have a text file in the following format:
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
How can i read it into a list efficiently so as to get the following
output?
list=[[1,4,1,6],[1,5,2,9],[2,6,5,8],[3,7,7,5]]
Let's assume that the file is named spam.txt:
$ cat spam.txt
a,b,c,d,
1,1,2,3,
4,5,6,7,
1,2,5,7,
6,9,8,5,
Using list comprehensions and the zip() built-in function, you can write a program such as:
>>> with open('spam.txt', 'r') as file:
... file.readline() # skip the first line
... rows = [[int(x) for x in line.split(',')[:-1]] for line in file]
... cols = [list(col) for col in zip(*rows)]
...
'a,b,c,d,\n'
>>> rows
[[1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]]
>>> cols
[[1, 4, 1, 6], [1, 5, 2, 9], [2, 6, 5, 8], [3, 7, 7, 5]]
Additionally, zip(*rows) is based on unpacking argument lists, which unpacks a list or tuple so that its elements can be passed as separate positional arguments to a function. In other words, zip(*rows) is reduced to zip([1, 1, 2, 3], [4, 5, 6, 7], [1, 2, 5, 7], [6, 9, 8, 5]).
EDIT:
This is a version based on NumPy for reference:
>>> import numpy as np
>>> with open('spam.txt', 'r') as file:
... ncols = len(file.readline().split(',')) - 1
... data = np.fromiter((int(v) for line in file for v in line.split(',')[:-1]), int, count=-1)
... cols = data.reshape(data.size / ncols, ncols).transpose()
...
>>> cols
array([[1, 4, 1, 6],
[1, 5, 2, 9],
[2, 6, 5, 8],
[3, 7, 7, 5]])
You can try the following code:
from numpy import*
x0 = []
for line in file('yourfile.txt'):
line = line.split()
x = line[1]
x0.append(x)
for i in range(len(x0)):
print x0[i]
Here the first column is appended onto x0[]. You can append the other columns in a similar fashion.
You can use data_py package to read column wise data from a file.
Install this package by using
pip install data-py==0.0.1
Example
from data_py import datafile
df1=datafile("C:/Folder/SubFolder/data-file-name.txt")
df1.separator=","
[Col1,Col2,Col3,Col4,Col5]=["","","","",""]
[Col1,Col2,Col3,Col4,Col5]=df1.read([Col1,Col2,Col3,Col4,Col5],lineNumber)
print(Col1,Col2,Col3,Col4,Col5)
For details please follow the link https://www.respt.in/p/python-package-datapy.html
This question already has an answer here:
Compute the running (cumulative) maximum for a series in pandas
(1 answer)
Closed 1 year ago.
Lets say I get a list from a dataframe, and the list goes
list = [1, 3, 2, 6, 4, 2, 7, 4, 2, 6, 8]
I want to get returned a dataframe that plots the highest recognized value. ex:
list2= [1, 3, 3, 6, 6, 6, 7, 7, 7, 7, 8]
As shown in the example above, a new list is generated. plots the highest value found.
I need it returned as its own column in the dataframe file.
My code for referrence;
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# read data frame
df = pd.read_csv('file.csv')
# numbers list: [1, 3, 2, 6, 4, 2, 7, 4, 2, 6, 8]
df['numbers']
# here will go the code you guys give me ###
df['highestnumbers'] = #####################
the output should be a list that goes
[1, 3, 3, 6, 6, 6, 7, 7, 7, 7, 8]
df['highestnumbers'] = df['numbers'].expanding().max()
Cant Really bend my mind around this problem I'm having:
say I have 2 arrays
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
what I'm trying to do is that if a value is repeated in array B (like how 1 is repeated 3 times), those corresponding values in array A are added up to be appended to another array (say C)
so C would look like (from above two arrays):
C = [13, 12, 12]
Also sidenote.. the application I'd be using this code for uses timestamps from a database acting as array B (so that once a day is passed, that value in the array obviously won't be repeated)
Any help is appreciated!!
Here is a solution without pandas, using only itertools groupby:
from itertools import groupby
C = [sum( a for a,_ in g) for _,g in groupby(zip(A,B),key = lambda x: x[1])]
yields:
[13, 12, 12]
I would use pandas for this
Say you put those arrays in a DataFrame. This does the job:
df = pd.DataFrame(
{
'A': [2, 7, 4, 3, 9, 4, 2, 6],
'B': [1, 1, 1, 4, 4, 7, 7, 7]
}
)
df.groupby('B').sum()
If you want pure python solution, you can use itertools.groupby:
from itertools import groupby
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
out = []
for _, g in groupby(zip(A, B), lambda k: k[1]):
out.append(sum(v for v, _ in g))
print(out)
Prints:
[13, 12, 12]
To expand the data Score into a list of Scores based on the Count, is there a better way in pandas and numpy than the following?
import pandas as pd
import numpy as np
data = {
"Count": [1, 3, 2],
"Score": [2, 5, 8]
}
df = pd.DataFrame(data)
scores = []
for c, s in zip(df['Count'], df['Score']):
for i in range(0, c):
scores.append(s)
print(scores)
Expected output:
[2, 5, 5, 5, 8, 8]
IIUC, you can use pd.series.repeat:
df['Score'].repeat(df['Count']).tolist()
Or np.repeat:
np.repeat(df['Score'],df['Count']).tolist()
Or pd.Index.repeat:
df.loc[df.index.repeat(df['Count']),'Score'].tolist()
[2, 5, 5, 5, 8, 8]
Considering "b" defined below as a list of dictionaries. How can I remove element 6 from the 'index' in second element of b (b[1]['index'][6]) and save the new list to b?
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
output:
[{'color': 'red', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}, {'color': 'blue', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}]
I tried np.delete and .pop or .del for lists (no success), but I do not know what is the best way to do it?
I think this will work for you
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
print a
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
d = b[1]['index']
b[1]['index'] = d.delete(6)
print b[1]['index']
Int64Index([0, 1, 2, 3, 4, 5, 7, 8, 9], dtype='int64')