Print list of lists in matrix format - python

I have a list of lists:
[[15 16 18 19 12 11],[13 19 23 21 16 12],[12 15 17 19 20 10],[10 14 16 13 9 6]]
The length of each list in the list is the same.
I want to print out as rows and columns such as:
15 16 18 19 12 11
13 19 23 21 16 12
12 15 17 19 20 10
10 14 16 13 9 6
I know I can do it by using
lst = (' '.join(map(str,lst))),
But I want every integer to indent at the same level like the 9 should be indented below the 0 of 20, and 6 should be under 0 of 10.

Given an input (list of lists) ll:
'\n'.join(' '.join('%2d' % x for x in l) for l in ll)
Result:
15 16 18 19 12 11
13 19 23 21 16 12
12 15 17 19 20 10
10 14 16 13 9 6

Related

How to extract the index of an element that can be found on several rows

I am looking to extract column name and index of elements from a dataframe
import numpy as np
import pandas as pd
import random
lst = list(range(30))
segments = np.repeat(lst, 3)
random.shuffle(segments)
segments = segments.reshape(10, 9)
col_names = ['lz'+str(i) for i in range(95,104)]
rows_names = ['con'+str(i) for i in range(0,10)]
df = pd.DataFrame(segments, index=rows_names, columns=col_names)
lz95 lz96 lz97 lz98 lz99 lz100 lz101 lz102 lz103
con0 6 9 11 7 9 24 18 10 1
con1 24 5 21 15 18 25 24 7 29
con2 17 27 2 0 11 11 18 23 0
con3 16 22 20 22 20 14 14 0 8
con4 10 10 3 13 25 14 9 17 16
con5 3 28 22 2 27 12 16 21 4
con6 26 1 19 7 19 6 29 15 26
con7 26 28 4 13 23 23 1 25 19
con8 28 8 3 6 5 8 4 5 29
con9 2 15 21 27 17 13 12 12 20
For value=12, I am able to extract the column name lz_val = df.columns[df.isin([12]).any()]
But not for rows as it extract all indexes
con_val = df[df==12].index
Index(['con0', 'con1', 'con2', 'con3', 'con4', 'con5', 'con6', 'con7', 'con8',
'con9'],
dtype='object')
What about using np.where?
rows, cols = np.where(df == 12)
rows = df.index[rows]
cols = df.columns[cols]
>>> rows
Index(['con5', 'con9', 'con9'], dtype='object')
>>> cols
Index(['lz100', 'lz101', 'lz102'], dtype='object')

How can I split columns and values of whole dataframe?

I have a dataframe like this:
a\tb\tc d\te\tf g\th\ti
20\t21\t22 1\t2\t3 30\t31\t32
17\t18\t19 4\t5\t6 27\t28\t29
14\t15\t16 7\t8\t9 24\t25\t26
11\t12\t13 10\t11\t12 21\t22\t23
8\t9\t10 13\t14\t15 18\t19\t20
5\t6\t7 16\t17\t18 15\t16\t17
2\t3\t4 19\t20\t21 12\t13\t14
expected output:
a b c d e f g h i
0 20 21 22 1 2 3 30 31 32
1 17 18 19 4 5 6 27 28 29
2 14 15 16 7 8 9 24 25 26
3 11 12 13 10 11 12 21 22 23
4 8 9 10 13 14 15 18 19 20
5 5 6 7 16 17 18 15 16 17
6 2 3 4 19 20 21 12 13 14
My solution is:
l = list()
for column in df.columns:
columns = column.split()
d = df[column].str.split(expand=True)
l.append(d.rename(columns=dict(zip(range(len(columns)),columns))))
pd.concat(l,axis=1)
But this looks so complex.
Is there a simple way of doing this ?
Your approach looks good. You can simplify the rename part by just assigning the new names to .columns attribute:
def expand(col):
_df = df[col].str.split(expand=True)
_df.columns = col.split()
return _df
pd.concat(map(expand, df.columns), axis=1)
a b c d e f g h i
0 20 21 22 1 2 3 30 31 32
1 17 18 19 4 5 6 27 28 29
2 14 15 16 7 8 9 24 25 26
3 11 12 13 10 11 12 21 22 23
4 8 9 10 13 14 15 18 19 20
5 5 6 7 16 17 18 15 16 17
6 2 3 4 19 20 21 12 13 14

What is the purpose of adding a list containing -1 and multiplying it with an integer?

Recently I was looking at some code related to a deep learning paper, repo here: https://github.com/wuyifan18/DeepLog/blob/master/LogKeyModel_predict.py
This has more to do with python so I will not mention anything else about it. Below is a file we need to parse:
5 5 5 22 11 9 11 9 11 9 26 26 26 23 23 23 21 21 21
5 5 22 5 11 9 11 9 11 9 26 26 26
5 22 5 5 11 9 11 9 11 9 26 26 26
5 22 5 5 11 9 11 9 11 9 26 26 26
5 22 5 5 11 9 11 9 11 9 26 26 26 23 23 23 21 21 21
22 5 5 5 11 9 11 9 11 9 26 26 26 23 23 23 21 21 21
5 22 5 5 11 9 11 9 11 9 26 26 26 23 23 23 21 21 21
5 5 5 22 11 9 11 9 11 9 26 26 26 2 23 23 23 21 21 21
5 22 5 5 11 9 11 9 11 9 26 26 26
The following function is supposed to parse it. First, we take each line, and make a list with the elements being the numbers in that line separated by spaces. Then we subtract said numbers by one at line ###.
What happens next?
def generate(name):
hdfs = set()
# hdfs = []
with open('data/' + name, 'r') as f:
for ln in f.readlines():
ln = list(map(lambda n: n - 1, map(int, ln.strip().split()))) ###
ln = ln + [-1] * (window_size + 1 - len(ln))
# print(ln)
hdfs.add(tuple(ln))
print('Number of sessions({}): {}'.format(name, len(hdfs)))
return hdfs
I am not sure what the purpose of ln = ln + [-1] * (window_size + 1 - len(ln)) is exactly. What is it doing? I have not seen list multiplication being used in many places before, so I am not sure. When I try and print out more of it, it seems that -1 is not present in ln at all. Anyone have some idea?
Without delving into the code, the idea is to make all the lines the same length, based on the window
If your window size is 10, and a line contains only 5 entries, your list will look like: [1, 2, 3, 4, 5, -1, -1, -1, -1, -1], this is to deal with the static sized window.

Iterator produced by itertools.groupby() is consumed unexpectedly

I have written a small program based on iterators to display a multicolumn calendar.
In that code I am using itertools.groupby to group the dates by month by the function group_by_months(). There I yield the month name and the grouped dates as a list for every month. However, when I let that function directly return the grouped dates as an iterator (instead of a list) the program leaves the days of all but the last column blank.
I can't figure out why that might be. Am I using groupby wrong? Can anyone help me to spot the place where the iterator is consumed or its output is ignored? Why is it especially the last column that "survives"?
Here's the code:
import datetime
from itertools import zip_longest, groupby
def grouper(iterable, n, fillvalue=None):
"""\
copied from the docs:
https://docs.python.org/3.4/library/itertools.html#itertools-recipes
"""
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def generate_dates(start_date, end_date, step=datetime.timedelta(days=1)):
while start_date < end_date:
yield start_date
start_date += step
def group_by_months(seq):
for k,v in groupby(seq, key=lambda x:x.strftime("%B")):
yield k, v # Why does it only work when list(v) is yielded here?
def group_by_weeks(seq):
yield from groupby(seq, key=lambda x:x.strftime("%2U"))
def format_month(month, dates_of_month):
def format_week(weeknum, dates_of_week):
def format_day(d):
return d.strftime("%3e")
weekdays = {d.weekday(): format_day(d) for d in dates_of_week}
return "{0} {7} {1} {2} {3} {4} {5} {6}".format(
weeknum, *[weekdays.get(i, " ") for i in range(7)])
yield "{:^30}".format(month)
weeks = group_by_weeks(dates_of_month)
yield from map(lambda x:format_week(*x), weeks)
start, end = datetime.date(2016,1,1), datetime.date(2017,1,1)
dates = generate_dates(start, end)
months = group_by_months(dates)
formatted_months = map(lambda x: (format_month(*x)), months)
ncolumns = 3
quarters = grouper(formatted_months, ncolumns)
interleaved = map(lambda x: zip_longest(*x, fillvalue=" "*30), quarters)
formatted = map(lambda x: "\n".join(map(" ".join, x)), interleaved)
list(map(print, formatted))
This is the failing output:
January February March
09 1 2 3 4 5
10 6 7 8 9 10 11 12
11 13 14 15 16 17 18 19
12 20 21 22 23 24 25 26
13 27 28 29 30 31
April May June
22 1 2 3 4
23 5 6 7 8 9 10 11
24 12 13 14 15 16 17 18
25 19 20 21 22 23 24 25
26 26 27 28 29 30
July August September
35 1 2 3
36 4 5 6 7 8 9 10
37 11 12 13 14 15 16 17
38 18 19 20 21 22 23 24
39 25 26 27 28 29 30
October November December
48 1 2 3
49 4 5 6 7 8 9 10
50 11 12 13 14 15 16 17
51 18 19 20 21 22 23 24
52 25 26 27 28 29 30 31
This is the expected output:
January February March
00 1 2 05 1 2 3 4 5 6 09 1 2 3 4 5
01 3 4 5 6 7 8 9 06 7 8 9 10 11 12 13 10 6 7 8 9 10 11 12
02 10 11 12 13 14 15 16 07 14 15 16 17 18 19 20 11 13 14 15 16 17 18 19
03 17 18 19 20 21 22 23 08 21 22 23 24 25 26 27 12 20 21 22 23 24 25 26
04 24 25 26 27 28 29 30 09 28 29 13 27 28 29 30 31
05 31
April May June
13 1 2 18 1 2 3 4 5 6 7 22 1 2 3 4
14 3 4 5 6 7 8 9 19 8 9 10 11 12 13 14 23 5 6 7 8 9 10 11
15 10 11 12 13 14 15 16 20 15 16 17 18 19 20 21 24 12 13 14 15 16 17 18
16 17 18 19 20 21 22 23 21 22 23 24 25 26 27 28 25 19 20 21 22 23 24 25
17 24 25 26 27 28 29 30 22 29 30 31 26 26 27 28 29 30
July August September
26 1 2 31 1 2 3 4 5 6 35 1 2 3
27 3 4 5 6 7 8 9 32 7 8 9 10 11 12 13 36 4 5 6 7 8 9 10
28 10 11 12 13 14 15 16 33 14 15 16 17 18 19 20 37 11 12 13 14 15 16 17
29 17 18 19 20 21 22 23 34 21 22 23 24 25 26 27 38 18 19 20 21 22 23 24
30 24 25 26 27 28 29 30 35 28 29 30 31 39 25 26 27 28 29 30
31 31
October November December
39 1 44 1 2 3 4 5 48 1 2 3
40 2 3 4 5 6 7 8 45 6 7 8 9 10 11 12 49 4 5 6 7 8 9 10
41 9 10 11 12 13 14 15 46 13 14 15 16 17 18 19 50 11 12 13 14 15 16 17
42 16 17 18 19 20 21 22 47 20 21 22 23 24 25 26 51 18 19 20 21 22 23 24
43 23 24 25 26 27 28 29 48 27 28 29 30 52 25 26 27 28 29 30 31
As the docs state (c.f.):
when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list
That means the iterators are consumed, when the code later accesses the returned iterators out of order, i.e., when the groupby is actually iterated. The iteration happens out of order because of the chunking and interleaving that is done here.
We observe this specific pattern (i.e., only the last column is fully displayed) because of the way we iterate. That is:
The month names for the first line are printed. Thereby the iterators for up to the last column's month are consumed (and their content discarded). The groupby() object produces the last column's month name only after the first columns' data.
We print the first week line. Thereby the already exhausted iterators for the first columns are filled up automatically using the default value passed to zip_longest(). Only the last column still provides actual data.
The same happens for the next lines of month names.

python - replace last n columns with sum of all files

I am novice in python.
I have 8 csv files with 26 columns and 600 rows in each. now I want to take the last 4 column of each csv files (Column 22 to column 25), read the files and sum them up to replace all the 4 columns in each file. for example (I am showing some random data here):
new-1.csv:
a b c d e f g h i j k
1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9
new2.csv:
a b c d e f g h i j k
11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19
Now, I want to sum each element of "h, i, j, k" of from these 2 files, then replace the files last 4 columns with this new sum.
Modified new-1.csv:
a b c d e f g h i j k
1 1 1 1 1 1 1 12 12 12 12
2 2 2 2 2 2 2 14 14 14 14
3 3 3 3 3 3 3 16 16 16 16
4 4 4 4 4 4 4 18 18 18 18
5 5 5 5 5 5 5 20 20 20 20
6 6 6 6 6 6 6 22 22 22 22
7 7 7 7 7 7 7 24 24 24 24
8 8 8 8 8 8 8 26 26 26 26
9 9 9 9 9 9 9 28 28 28 28
Modified new-2.csv:
a b c d e f g h i j k
11 11 11 11 11 11 11 12 12 12 12
12 12 12 12 12 12 12 14 14 14 14
13 13 13 13 13 13 13 16 16 16 16
14 14 14 14 14 14 14 18 18 18 18
15 15 15 15 15 15 15 20 20 20 20
16 16 16 16 16 16 16 22 22 22 22
17 17 17 17 17 17 17 24 24 24 24
18 18 18 18 18 18 18 26 26 26 26
19 19 19 19 19 19 19 28 28 28 28
I am assuming I should use Panda or numpy for this, but not sure how to do it. any suggestions/hints would be appreciated.
You can do this by just using numpy.
import numpy as np
# list of all the files
file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files
col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this
# initializing a numpy array, for containing sum from last 4 columns
add_cols = np.zeros((600,4))
# iterating over all .csv files
for file in file_list :
# skiprows will skip the first row and usecols will get values in last 4 cols
temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
add_cols = np.add(temp,add_cols)
# now again overwriting all the files, substituting the last 4 columns with the sum
for file in file_list :
#loading the content from file in temp
temp = np.loadtxt(file, skiprows=1, delimiter=',')
temp[:,[22,23,24,25]] = add_cols
# writing the column names first
with open(file,'w') as p:
p.write(','.join(col_names)+'\n')
# now appending final values in temp to the file as csv
with open(file,'a') as p:
np.savetxt(p,temp,delimiter=",",fmt="%i")
Now if your file is not comma separated and rather space separated, remove the delimiter option from all the functions as the delimiter is taken as space by default. Also join the first column accordingly.
After loading your csvs using read_csv, you can add the last 4 columns together and then overwrite them:
In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total
Out[10]:
array([[12, 12, 12, 12],
[14, 14, 14, 14],
[16, 16, 16, 16],
[18, 18, 18, 18],
[20, 20, 20, 20],
[22, 22, 22, 22],
[24, 24, 24, 24],
[26, 26, 26, 26],
[28, 28, 28, 28]], dtype=int64)
In [12]:
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df
Out[12]:
a b c d e f g h i j k
0 1 1 1 1 1 1 1 12 12 12 12
1 2 2 2 2 2 2 2 14 14 14 14
2 3 3 3 3 3 3 3 16 16 16 16
3 4 4 4 4 4 4 4 18 18 18 18
4 5 5 5 5 5 5 5 20 20 20 20
5 6 6 6 6 6 6 6 22 22 22 22
6 7 7 7 7 7 7 7 24 24 24 24
7 8 8 8 8 8 8 8 26 26 26 26
8 9 9 9 9 9 9 9 28 28 28 28
In [13]:
df1
Out[13]:
a b c d e f g h i j k
0 11 11 11 11 11 11 11 12 12 12 12
1 12 12 12 12 12 12 12 14 14 14 14
2 13 13 13 13 13 13 13 16 16 16 16
3 14 14 14 14 14 14 14 18 18 18 18
4 15 15 15 15 15 15 15 20 20 20 20
5 16 16 16 16 16 16 16 22 22 22 22
6 17 17 17 17 17 17 17 24 24 24 24
7 18 18 18 18 18 18 18 26 26 26 26
8 19 19 19 19 19 19 19 28 28 28 28
We need to call the attribute .values here to return a np array because otherwise it will try to align on the index which in this case do not align.
Once you overwrite them call df.to_csv(file_path) and df1.to_csv(file_path)
In the case of your 8 dfs you can loop over them and aggregate whilst looping:
# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
total += df[df.columns[-4:]].values
Then just loop over your dfs again to overwrite:
for df in df_list:
df[df.columns[-4:]] = total
And then write out again using to_csv.

Categories

Resources