Remove an element from list of dictionaries with Pandas element - python

Considering "b" defined below as a list of dictionaries. How can I remove element 6 from the 'index' in second element of b (b[1]['index'][6]) and save the new list to b?
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
output:
[{'color': 'red', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}, {'color': 'blue', 'index': Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')}]
I tried np.delete and .pop or .del for lists (no success), but I do not know what is the best way to do it?

I think this will work for you
import pandas as pd
import numpy as np
a = pd.DataFrame(np.random.randn(10))
print a
b = [{'color':'red','index':a.index},{'color':'blue','index':a.index}]
d = b[1]['index']
b[1]['index'] = d.delete(6)
print b[1]['index']
Int64Index([0, 1, 2, 3, 4, 5, 7, 8, 9], dtype='int64')

Related

Convert html table to List of Lists in Python

I have this code that converts a specific html table data cell into a list:
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [int(v) for v in '-'.join(df['Position']).split('-')]
print(my_list)
The code is fine, but what is the elegant way of converting the list from:
[1, 2, 3, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12, 13]
to this instead:
[[1, 2, 3, 4],[4, 5, 6, 7],[7, 8, 9, 10],[10, 11, 12, 13]]
Insted of joining the rows with '-'.join(df['Position']), just iterate over each row, and create a sublist for each.
import pandas as pd
import numpy as np
my_table = pd.read_html('https://kefirprobiotics.com/for_testing_only')
df = my_table[0]
my_list = [[int(v) for v in row.split('-')] for row in df['Position']]
print(my_list)
Asuming you want to keep the rows/sublists from the original table you put in the question.

python highest value in a csv data frame [duplicate]

This question already has an answer here:
Compute the running (cumulative) maximum for a series in pandas
(1 answer)
Closed 1 year ago.
Lets say I get a list from a dataframe, and the list goes
list = [1, 3, 2, 6, 4, 2, 7, 4, 2, 6, 8]
I want to get returned a dataframe that plots the highest recognized value. ex:
list2= [1, 3, 3, 6, 6, 6, 7, 7, 7, 7, 8]
As shown in the example above, a new list is generated. plots the highest value found.
I need it returned as its own column in the dataframe file.
My code for referrence;
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# read data frame
df = pd.read_csv('file.csv')
# numbers list: [1, 3, 2, 6, 4, 2, 7, 4, 2, 6, 8]
df['numbers']
# here will go the code you guys give me ###
df['highestnumbers'] = #####################
the output should be a list that goes
[1, 3, 3, 6, 6, 6, 7, 7, 7, 7, 8]
df['highestnumbers'] = df['numbers'].expanding().max()

Getting link between two columns in pandas

I have a data frame with two columns. The two columns contain integer numbers. The second column contains numbers that are linked to the first column. In case there is no link between the two columns, the number in the second column will have zero value. Here is an example of the table.
The expected output is a list of connections between the two columns. Using the attached table as an example, the output will be
[[2, 3, 4, 5], [6, 7, 8]]
This question is similar but not the same as finding transitive relation between two columns in pandas.
You could approach this as a graph, treating the dataframe as an edge list. You can then retrieve the connected nodes with networkx:
import pandas as pd
import networkx as nx
df = pd.DataFrame({'a': range(1, 11), 'b': [0, 4, 2, 5, 0, 7, 8, 0, 0, 0]})
g = nx.from_pandas_edgelist(df[df['b'] != 0], source='a', target='b')
print(list(nx.connected_components(g)))
Output:
[{2, 3, 4, 5}, {8, 6, 7}]
Not really a Pandas answer, but here's one approach (with help from here for finding runs of consecutive integers):
df = pd.DataFrame({'a': range(1, 11),
'b': [0, 4, 2, 5, 0, 7, 8, 0, 0, 0]})
from itertools import groupby
from operator import itemgetter
zero_locs = df['b'].to_numpy().nonzero()[0]
connections = []
for k,g in groupby(enumerate(zero_locs), lambda x: x[0]-x[1]):
group = (map(itemgetter(1),g))
group = list(map(int,group))
group.append(group[-1] + 1)
connections.append(list(df['a'][group]))
connections # [[2, 3, 4, 5], [6, 7, 8]]

Adding values from one array depending on occurrences of values in another array

Cant Really bend my mind around this problem I'm having:
say I have 2 arrays
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
what I'm trying to do is that if a value is repeated in array B (like how 1 is repeated 3 times), those corresponding values in array A are added up to be appended to another array (say C)
so C would look like (from above two arrays):
C = [13, 12, 12]
Also sidenote.. the application I'd be using this code for uses timestamps from a database acting as array B (so that once a day is passed, that value in the array obviously won't be repeated)
Any help is appreciated!!
Here is a solution without pandas, using only itertools groupby:
from itertools import groupby
C = [sum( a for a,_ in g) for _,g in groupby(zip(A,B),key = lambda x: x[1])]
yields:
[13, 12, 12]
I would use pandas for this
Say you put those arrays in a DataFrame. This does the job:
df = pd.DataFrame(
{
'A': [2, 7, 4, 3, 9, 4, 2, 6],
'B': [1, 1, 1, 4, 4, 7, 7, 7]
}
)
df.groupby('B').sum()
If you want pure python solution, you can use itertools.groupby:
from itertools import groupby
A = [2, 7, 4, 3, 9, 4, 2, 6]
B = [1, 1, 1, 4, 4, 7, 7, 7]
out = []
for _, g in groupby(zip(A, B), lambda k: k[1]):
out.append(sum(v for v, _ in g))
print(out)
Prints:
[13, 12, 12]

Construct a list from dataset of values and value counts

To expand the data Score into a list of Scores based on the Count, is there a better way in pandas and numpy than the following?
import pandas as pd
import numpy as np
data = {
"Count": [1, 3, 2],
"Score": [2, 5, 8]
}
df = pd.DataFrame(data)
scores = []
for c, s in zip(df['Count'], df['Score']):
for i in range(0, c):
scores.append(s)
print(scores)
Expected output:
[2, 5, 5, 5, 8, 8]
IIUC, you can use pd.series.repeat:
df['Score'].repeat(df['Count']).tolist()
Or np.repeat:
np.repeat(df['Score'],df['Count']).tolist()
Or pd.Index.repeat:
df.loc[df.index.repeat(df['Count']),'Score'].tolist()
[2, 5, 5, 5, 8, 8]

Categories

Resources