I have created the for loop to "extract" column from list of lists, now I want to assign this list to a variable.
How to do it?
I have the following for loop:
j = 1
for i in range(len(table)):
row = table[i]
print(row[j])
and the table looks like:
NAME
Bart First
Maria Great
Theresa Green
I would like to do some other "operations" on that column and I guess would be easy to use functions if that column is assign to variable...but I have no idea how to do it. (I must not use numpy or pandas for this).
Solution with minmal change of original: create list before loop and append to it i.e.:
j = 1
list1 = []
for i in range(len(table)):
row = table[i]
list1.append(row[j])
print(list1)
Note that you might use for to access element directly rather than using index, i.e. loop might be replace with
for row in table:
list1.append(row[j])
Wrt "I guess would be easy to use functions if that column is assign to variable": You could use a namedtuple which is a Python built-in to give these things "names". Essentially, you would be converting each row into a tuple (instead of a list) and each part of that tuple would be accessible by name as well as index instead of just index.
For that you'd first need to assign each row of table to a namedtuple. Without more details in your post about what table contains, apart from NAME, I'll make assumptions about your data:
import collections
# example `table` data
table = [[1, 'Bart First', 30, 'UK'],
[2, 'Maria Great', 25, 'US'],
[3, 'Theresa Green', 20, 'PL']]
# the namedtuple "structure" which will hold each record:
Person = collections.namedtuple('Person', 'id name age country')
people = [] # list of records
for row in table:
person = Person(*row) # convert it to a namedtuple
people.append(person)
# or the four lines above in one line:
people = [Person(*row) for row in table]
# or assign it back to `table`:
# table = [Person(*row) for row in table]
people would now look like:
[Person(id=1, name='Bart First', age=30, country='UK'),
Person(id=2, name='Maria Great', age=25, country='US'),
Person(id=3, name='Theresa Green', age=20, country='PL')]
Next, to get just the names:
all_names = []
for person in people:
all_names.append(person.name)
print(all_names)
# output:
# ['Bart First', 'Maria Great', 'Theresa Green']
# or as a list comprehension:
all_names = [person.name for person in people]
Since you mentioned you can't use pandas or numpy, that would prevent you from doing certain things like sum(people.age) or people.age.sum() but you could instead do
>>> sum(person.age for person in people)
75
And if you still need to use the index, then you can get the country data (4th column, index=3):
>>> col = 3
>>> for person in people:
... print(person[col])
...
UK
US
PL
Related
I have 2 variables I am trying to manipulate the data. I have a variable with a list that has 2 items.
row = [['Toyyota', 'Cammry', '3000'], ['Foord', 'Muustang', '6000']]
And a dictionary that has submissions
submission = {
'extracted1_1': 'Toyota', 'extracted1_2': 'Camry', 'extracted1_3': '1000',
'extracted2_1': 'Ford', 'extracted2_2': 'Mustang', 'extracted2_3': '5000',
'reportDate': '2022-06-01T08:30', 'reportOwner': 'John Smith'}
extracted1_1 would match up with the first value in the first item from row. extracted1_2 would be the 2nd value in the 1st item, and extracted2_1 would be the 1st value in the 2nd item and so on. I'm trying to update row with the corresponding submission and having a hard time getting it to work properly.
Here's what I have currently:
iter_bit = iter((submission.values()))
for bit in row:
i = 0
for bits in bit:
bit[i] = next(iter_bit)
i += 1
While this somewhat works, i'm looking for a more efficient way to do this by looping through the submission rather than the row. Is there an easier or more efficient way by looping through the submission to overwrite the corresponding value in row?
Iterate through submission, and check if the key is in the format extractedX_Y. If it does, use those as the indexes into row and assign the value there.
import re
regex = re.compile(r'^extracted(\d+)_(\d+)$')
for key, value in submissions.items():
m = regex.search(key)
if m:
x = int(m.group(1))
y = int(m.group(2))
row[x-1][y-1] = value
It seems you are trying to convert the portion of the keys after "extracted" to indices into row. To do this, first slice out the portion you don't need (i.e. "extracted"), and then split what remains by _. Then, convert each of these strings to integers, and subtract 1 because in python indices are zero-based.
for key, value in submission.items():
# e.g. key = 'extracted1_1', value = 'Toyota'
if not key.startswith("extracted"):
continue
indices = [int(i) - 1 for i in key[9:].split("_")]
# e.g. indices = [0, 0]
# Set the value
row[indices[0]][indices[1]] = value
Now you have your modified row:
[['Toyota', 'Camry', '1000'], ['Ford', 'Mustang', '5000']]
No clue if its faster but its a 2-liner hahaha
for n, val in zip(range(len(row) * 3), submission.values()):
row[n//3][n%3] = val
that said, i would probably do something safer in a work environment, like parsing the key for its index.
Say I have a string column in pandas in which each row is made of a list of strings
Class
Student
One
[Adam, Kanye, Alice Stocks, Joseph Matthew]
Two
[Justin Bieber, Selena Gomez]
I want to get rid of all the names in each class wherever the length of the string is more than 8 characters.
So the resulting table would be:
Class
Student
One
Adam, Kanye
Most of the data would be gone because only Adam and Kanye satisfy the condition of len(StudentName)<8
I tried coming up with a .applyfilter myself, but it seems that the code is running on each character level instead of word, can someone point out where I went wrong?
This is the code:
[[y for y in x if not len(y)>=8] for x in df['Student']]
Check Below code. Seems like you are not defining what you need to split at, hence things are automatically getting split a char level.
import pandas as pd
df = pd.DataFrame({'Class':['One','Two'],'Student':['[Adam, Kanye, Alice Stocks, Joseph Matthew]', '[Justin Bieber, Selena Gomez]'],
})
df['Filtered_Student'] = df['Student'].str.replace("\[|\]",'').str.split(',').apply(lambda x: ','.join([i for i in x if len(i)<8]))
df[df['Filtered_Student'] != '']
Output:
# If they're not actually lists, but strings:
if isinstance(df.Student[0], str):
df.Student = df.Student.str[1:-1].str.split(', ')
# Apply your filtering logic:
df.Student = df.Student.apply(lambda s: [x for x in s if len(x)<8])
Output:
Class Student
0 One [Adam, Kanye]
1 Two []
IIUC, this van be done in a oneliner np.where:
import pandas as pd
import numpy as np
df = pd.DataFrame( {'Class': ['One', 'Two'], 'Student': [['Adam', 'Kanye', 'Alice Stocks', 'Joseph Matthew'], ['Justin Bieber', 'Selena Gomez']]})
df.explode('Student').iloc[np.where(df.explode('Student').Student.str.len() <= 8)].groupby('Class').agg(list).reset_index()
Output:
Class Student
0 One [Adam, Kanye]
I'm importing a CSV to a dictionary, where there are a number of houses labelled (I.E. 1A, 1B,...)
Rows are labelled containing some item such as 'coffee' and etc. In the table is data indicating how much of each item each house hold needs.
Excel screenshot
What I am trying to do it check the values of the key value pairs in the dictionary for anything that isn't blank (containing either 1 or 2), and then take the key value pair and the 'PRODUCT NUMBER' (from the csv) and append those into a new list.
I want to create a shopping list that will contain what item I need, with what quantity, to which household.
the column containing 'week' is not important for this
I import the CSV into python as a dictionary like this:
import csv
import pprint
from typing import List, Dict
input_file_1 = csv.DictReader(open("DATA CWK SHOPPING DATA WEEK 1 FILE B.xlsb.csv"))
table: List[Dict[str, int]] = [] #list
for row in input_file_1:
string_row: Dict[str, int] = {} #dictionary
for column in row:
string_row[column] = row[column]
table.append(string_row)
I found on 'geeksforgeeks' how to access the pair by its value. however when I try this in my dictionary, it only seems to be able to search for the last row.
# creating a new dictionary
my_dict ={"java":100, "python":112, "c":11}
# list out keys and values separately
key_list = list(my_dict.keys())
val_list = list(my_dict.values())
# print key with val 100
position = val_list.index(100)
print(key_list[position])
I also tried to do a for in range loop, but that didn't seem to work either:
for row in table:
if row["PRODUCT NUMBER"] == '1' and row["Week"] == '1':
for i in range(8):
if string_row.values() != ' ':
print(row[i])
Please, if I am unclear anywhere, please let me know and I will clear it up!!
Here is a loop I made that should do what you want.
values = list(table.values())
keys = list(table.keys())
new_table = {}
index = -1
for i in range(values.count("")):
index = values.index("", index +1)
new_table[keys[index]] = values[index]
If you want to remove those values from the original dict you can just add in
d.pop(keys[index]) into the loop
I have defined 10 different DataFrames A06_df, A07_df , etc, which picks up six different data point inputs in a daily time series for a number of years. To be able to work with them I need to do some formatting operations such as
A07_df=A07_df.fillna(0)
A07_df[A07_df < 0] = 0
A07_df.columns = col # col is defined
A07_df['oil']=A07_df['oil']*24
A07_df['water']=A07_df['water']*24
A07_df['gas']=A07_df['gas']*24
A07_df['water_inj']=0
A07_df['gas_inj']=0
A07_df=A07_df[['oil', 'water', 'gas','gaslift', 'water_inj', 'gas_inj', 'bhp', 'whp']]
etc for a few more formatting operations
Is there a nice way to have a for loop or something so I don’t have to write each operation for each dataframe A06_df, A07_df, A08.... etc?
As an example, I have tried
list=[A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
for i in list:
i=i.fillna(0)
But this does not do the trick.
Any help is appreciated
As i.fillna() returns a new object (an updated copy of your original dataframe), i=i.fillna(0) will update the content of ibut not of the list content A06_df, A07_df,....
I suggest you copy the updated content in a new list like this:
list_raw = [A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
list_updated = []
for i in list_raw:
i=i.fillna(0)
# More code here
list_updated.append(i)
To simplify your future processes I would recommend to use a dictionary of dataframes instead of a list of named variables.
dfs = {}
dfs['A0'] = ...
dfs['A1'] = ...
dfs_updated = {}
for k,i in dfs.items():
i=i.fillna(0)
# More code here
dfs_updated[k] = i
How to combine three lists of lists into one so that the first strings of second-level lists would appear in the first row of the new list of lists and corresponding second strings - in the next rows (one row for each list)?
Let's say there are three lists of lists that look as follows:
[['item_1', 'price_100'], ['item_2', 'price_200']] #from shop_a
[['item_1', 'price_120'], ['item_2', 'price_180']] #from shop_b
[['item_2', 'price_80'], ['item_3', 'price_220']] #from shop_c
I'd like to merge them into a single list of lists like this:
[['item_name', 'shop_a', 'shop_b', 'shop_c'], #should become the header of the DataFrame
['item_1', 'price_100', 'price_120', ''], #should become the 1st row of the DF
['item_2', 'price_200', 'price_180', 'price_80'], #should become the 2nd row of the DF
['item_3', '', '', 'price_220']] #should become the 3rd row of the DF
The idea is to get all prices for the same item in each row, so that the DataFrame constructed from the list would represent a convenient matrix to compare prices from different shops.
How to do this? I would appreciate any suggestion...
PS: Please consider that the rows are not equal in length (the third list is different from the first two).
You can store them in a dictionary using the item name as the key then sort them alphabetically and create a df, for example this:
import pandas as pd
a = [['item_1', 'price_100'], ['item_2', 'price_200']] #from shop_a
b = [['item_1', 'price_120'], ['item_2', 'price_180']] #from shop_b
c = [['item_2', 'price_80'], ['item_3', 'price_220']] #from shop_c
data = {}
for item in a + b + c:
item_name = item[0]
item_price = item[1]
item_data = data.get(item_name, ['', '', ''])
item_data.append(item_price)
item_data.pop(0)
data[item_name] = item_data
sorted_rows = sorted([
[item_data[0]]+item_data[1] for item_data in data.items()
], key=lambda item: item[0])
df = pd.DataFrame(sorted_rows, columns=['item_name', 'shop_a', 'shop_b', 'shop_c'])
print(df)
>>>
item_name shop_a shop_b shop_c
0 item_1 price_100 price_120
1 item_2 price_200 price_180 price_80
2 item_3 price_220