I'm trying to write a file where you have 2 rows, with the first row being numbers and the 2nd row being letters. As an example, I was trying to do this with the alphabet.
list1=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
list2=list1+list1
abcList = [[],[]]
for i in range(len(list2)):
i+=1
if i % 5 == 0:
if i>=10:
abcList[0].append(str(i) + ' ')
else:
abcList[0].append(str(i) + ' ')
elif i<=1:
abcList[0].append(str(i) + ' ')
else:
abcList[0].append(' ')
for i,v in enumerate(list2):
i+=1
if i > 10:
abcList[1].append(' '+v+' ')
else:
abcList[1].append(v+' ')
print(''.join(abcList[0]))
print(''.join(abcList[1]))
with open('file.txt','w') as file:
file.write(''.join(abcList[0]))
file.write('\n')
file.write(''.join(abcList[1]))
The problem with the above setup is its very "hacky" (I don't know if its the right word). It "works", but its really just modifying 2 lists to make sure they stack on top of one another properly. The problem is if your list becomes too long, then the text wraps around, and stacks on itself instead of the numbers. I'm looking for something a bit less "hacky" that would work for any size list (trying to do this without external libraries, so I don't want to use pandas or numpy).
Edit: The output would be:
1 5 10
A B C D E F G H I J...etc.
Edit 2:
Just thought I'd add, I've gotten this far with it so far, but I've only been able to make columns, not rows.
list1=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
list2=list1*2
abcList = [[],[]]
for i in range(len(list2)):
i+=1
if i % 5 == 0:
if i>=5:
abcList[0].append(str(i))
elif i<=1:
abcList[0].append(str(i))
else:
abcList[0].append('')
for i,letter in enumerate(list2):
abcList[1].append(letter)
for number, letters in zip(*abcList):
print(number.ljust(5), letters)
However, this no longer has the wrapping issues, and the numbers line up with the letters perfectly. The only thing now is to get them from columns to rows.
Output of above is:
1 A
B
C
D
5 E
F
G
H
I
10 J
I mean, you could do something like this:
file_contents = """...""" # The file contents. I not the best at file manipulation
def parser(document): # This function will return a nested list
temp = str(document).split('\n')
return [[line] for line in temp] # List comprehension
parsed = parser(file_contents)
# And then do what you want with that
Your expected output is a bit inconsistent, since in the first one, you have 1, 6, 11, 16... and in the second: 1, 5, 10, 15.... So I have a couple of possible solutions:
print(''.join([' ' if n%5 else str(n+1).ljust(2) for n in range(len(list2))]))
print(''.join([c.ljust(2) for c in list2]))
Output:
1 6 11 16 21 26 31 36 41 46 51
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
print(''.join([' ' if n%5 else str(n).ljust(2) for n in range(len(list2))]))
print(''.join([c.ljust(2) for c in list2]))
Output:
0 5 10 15 20 25 30 35 40 45 50
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
print(''.join(['1 ']+[' ' if n%5 else str(n).ljust(2) for n in range(len(list2))][1:]))
print(''.join([c.ljust(2) for c in list2]))
Output:
1 5 10 15 20 25 30 35 40 45 50
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
If you are wanting to keep variable width strings aligned, you could use string formatting with a width equal to the maximum of the widths of the individual items in that position. (This example will work with more than any number of lists, by the way.)
list1 = ["", "5", "", "10", "", "4"]
list2 = ["A", "B", "C", "D", "EE", "F"]
lists = [list1, list2]
widths = [max(map(len, t)) for t in zip(*lists)]
for lst in lists:
line = " ".join("{val:{width}s}".format(val=val, width=width)
for val, width in zip(lst, widths))
print(line)
gives:
5 10 4
A B C D EE F
Related
I have a string column that I wish to split into three columns depending on the string. The column looks like this
full_string
x a b c
d e
m n o
y m n
y d e f
d e f
x and y are prefixes. I want to convert this column into three columns
prefix_string first_string last_string
x a c
d e
m o
y m n
y d f
d f
I have this code
df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0]
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]
df['last_string'] = df['full_string'].str.split().str[-1]
prefix_string = ['x', 'y']
df['prefix_string'] = df[df['full_string'].str.split().str[0].isin(prefix_string)]['full_string'].str.split().str[0]
This code isn't working correctly for first_string. Is there a way to extract the first string irrespective of prefix_string and the string length?
Try with numpy.where and pandas.Series.str.split:
import numpy as np
prefix_str = ["x", "y"]
res = df["full_string"].str.split(" ", expand=True).ffill(axis=1)
res["last_string"] = res.iloc[:, -1]
res["prefix_string"] = np.where(res[0].isin(prefix_str), res[0], "")
res["first_string"] = np.where(res["prefix_string"].ne(""), res[1], res[0])
res = res[["prefix_string", "first_string", "last_string"]]
Outputs:
prefix_string first_string last_string
0 x a c
1 d e
2 m o
3 y m n
4 y d f
5 d f
Instead of these lines in your Above code:
df['first_string'] = df[df['full_string'].str.split().str.len() == 2]['full_string'].str.split().str[0]
df['first_string'] = df[df['full_string'].str.split().str.len() > 2]['full_string'].str.split().str[1]
make use of split(),contains() and fillna() method:
df['first_string']=df['full_string'].str.split(expand=True).loc[~df['full_string'].str.split(expand=True)[0].str.contains('x|y'),0]
df['first_string']=df['first_string'].fillna(df['full_string'].str.split(expand=True)[1])
Output of df:
full_string first_string last_string prefix_string
0 x a b c a c x
1 d e d e NaN
2 m n o m o NaN
3 y m n m n y
4 y d e f d f y
5 d e f d f NaN
Here is how my dataframe looks like
0 M M W B k a D G 247.719248 39.935064 12.983612 177.537373 214.337385 70.248041 78.162404 215.383443
1 n a Y j A N Q m 39.014265 64.053771 13.677425 169.164911 153.225780 31.095511 198.805600 179.653853
2 j z v I n N I X 152.177940 50.524997 79.063318 181.993409 51.367824 19.294708 217.844628 166.896151
3 n w a Y G B y O 243.468930 92.694170 200.305038 249.760627 156.588164 200.031428 146.933709 202.202242
4 R i h L J a q S 122.006004 34.979958 151.963992 116.795194 74.713682 252.979874 34.272430 45.334396
5 m Y n r u t t b 86.097651 229.911157 75.242197 214.069558 246.390175 235.507510 125.431980 90.467756
6 d i u d f Q a q 135.740363 13.388095 107.297373 10.520204 118.578496 101.770257 177.253815 78.800327
7 n F A x H u b y 55.497867 210.402998 191.356683 6.438180 85.967328 64.461602 157.265270 213.673103
8 q h w i S B h i 253.696469 168.964278 31.592088 160.404929 241.434909 232.280512 116.353252 11.540209
9 a z s d Y z l B 50.440346 80.492069 64.991017 88.663195 155.993675 85.967207 120.467390 71.219658
10 A U W m y R k K 156.153985 15.862058 95.013242 48.339397 235.440190 160.565380 236.421396 59.981690
11 z K K w o c n l 56.310181 210.101571 173.887020 181.040997 193.653296 250.875304 81.096499 234.868844
I want to append a row which will contain the sum of the column but it also contain string values.
I have tried this solution
df.loc['Total'] = df.select_dtypes(include=['float64', 'int64']).sum(axis=0)
But I am getting the sum in the string column as well like this
0 M M W B k a D G 247.719248 39.935064 12.983612 177.537373 214.337385 70.248041 78.162404 215.383443
1 n a Y j A N Q m 39.014265 64.053771 13.677425 169.164911 153.225780 31.095511 198.805600 179.653853
2 j z v I n N I X 152.177940 50.524997 79.063318 181.993409 51.367824 19.294708 217.844628 166.896151
3 n w a Y G B y O 243.468930 92.694170 200.305038 249.760627 156.588164 200.031428 146.933709 202.202242
4 R i h L J a q S 122.006004 34.979958 151.963992 116.795194 74.713682 252.979874 34.272430 45.334396
5 m Y n r u t t b 86.097651 229.911157 75.242197 214.069558 246.390175 235.507510 125.431980 90.467756
6 d i u d f Q a q 135.740363 13.388095 107.297373 10.520204 118.578496 101.770257 177.253815 78.800327
7 n F A x H u b y 55.497867 210.402998 191.356683 6.438180 85.967328 64.461602 157.265270 213.673103
8 q h w i S B h i 253.696469 168.964278 31.592088 160.404929 241.434909 232.280512 116.353252 11.540209
9 a z s d Y z l B 50.440346 80.492069 64.991017 88.663195 155.993675 85.967207 120.467390 71.219658
10 A U W m y R k K 156.153985 15.862058 95.013242 48.339397 235.440190 160.565380 236.421396 59.981690
11 z K K w o c n l 56.310181 210.101571 173.887020 181.040997 193.653296 250.875304 81.096499 234.868844
Total 1598.32 1211.31 1197.37 1604.73 1927.69 1705.08 1690.31 1570.02 1598.323248 1211.310187 1197.373003 1604.727974 1927.690905 1705.077334 1690.308374 1570.021673
Can i keep some value for the string sum? How it should be done?
Any help would be appreciated. I am newbie to pandas
I have written following code:
def contalpha(n):
num = 65
for i in range(0, n):
for j in range(0, i+1):
ch = chr(num)
print(ch, end=" ")
num = num +1
print("\r")
n = 7
contalpha(n)
The output is:
A
B C
D E F
G H I J
K L M N O
P Q R S T U
V W X Y Z [ \
but what I want is:
A B C D E
A B C D
A B C
A B
A
How can I make it?
I'd advise against using chr. Ascii can be confusing, instead just use a string of all capital ascii characters (which is a sequence of characters, and can be handily found in the string module).
import string
def contalpha(n):
for i in range(n, 0, -1):
print(*string.ascii_uppercase[:i], sep=' ')
contalpha(5)
outputs:
A B C D E
A B C D
A B C
A B
A
You need to reverse the range in order to start from the bigger row range(0, n)[::-1]. Then you need to set num = 65 every time you start a new row for it to always start from A.
There you go:
def contalpha(n):
num = 65
for i in range(0, n)[::-1]:
for j in range(0, i+1):
ch = chr(num)
print(ch, end=" ")
num = num +1
num = 65
print("\r")
n = 7
contalpha(n)
Output
A B C D E F G
A B C D E F
A B C D E
A B C D
A B C
A B
A
Try this:
def contalpha(n):
for i in range(n, 0, -1):
num = 65
for j in range(0, i):
ch = chr(num)
print(ch, end=" ")
num = num +1
print("\r")
n = 7
contalpha(n)
First you need to set num to 65 every outer loop to get alphabets from A again and second you should reverse outer loop range to print from max size to min size.
output:
A B C D E F G
A B C D E F
A B C D E
A B C D
A B C
A B
A
In Python I want to read from a large file:
def aggregate(file_input):
import fileinput
reviews = []
with open(file_input.replace(".txt", "_aggregated.txt"), "w") as outp:
currComp = ""
outp.write("Business;Stars_In_Sequence")
for line in fileinput.input(file_input):
reviews.append(MyReview(line))
if(currComp != reviews[-1].getCompany()):
currComp = reviews[-1].getCompany()
outp.write("\n" + currComp + ";" + reviews[-1].getStars())
outp.flush()
else:
outp.write(reviews[-1].getStars())
outp.flush()
The file looks like this:
Business;User;Review_Stars;Date;Length;Votes_Cool;Votes_Funny;Votes_Useful;
0DI8Dt2PJp07XkVvIElIcQ;jkrzTC5P5QGJRoKECzcleQ;5;2014-03-11;421;0;1;0
0DI8Dt2PJp07XkVvIElIcQ;cK78PTjb65kdmRL9BnEdoQ;5;2014-03-29;190;0;1;0
and works fine if I use only a small part of the file, returning the right output:
Business;Stars_In_Sequence
Business;R
0DI8Dt2PJp07XkVvIElIcQ;55555455555555515
LTlCaCGZE14GuaUXUGbamg;555555555
EDqCEAGXVGCH4FJXgqtjqg;3324133
However, if I use the original file it returns this, and I cant figure out why
Business;Stars_In_Sequence
ÿþB u s i n e s s ;
0 D I 8 D t 2 P J p 0 7 X k V v I E l I c Q ;
L T l C a C G Z E 1 4 G u a U X U G b a m g ;
E D q C E A G X V G C H 4 F J X g q t j q g ;
my code looks like this:
import csv
mesta=["Ljubljana","Kranj","Skofja Loka","Trzin"]
opis=["ti","mene","ti mene","ne ti mene"]
delodajalci=["GENI","MOJEDELO","MOJADELNICA","HSE"]
ime=["domen","maja","andraz","sanja"]
datum=["2.1.2014","5.10.2014","11.12.2014","5.5.2014"]
with open('sth.csv','w',newline='') as csvfile:
zapis = csv.writer(csvfile, delimiter=' ')
dolzina=len(datum)
i=0
while i<dolzina:
zapis.writerows([ime[i]+","+delodajalci[i]+","+opis[i]+","+datum[i]+","+mesta[i]])
i+=1
and for some strange reason my result looks like:
d o m e n G E N I t i 2 . 1 . 2 0 1 4 L j u b l j a n a
m a j a M O J E D E L O m e n e 5 . 1 0 . 2 0 1 4 K r a n j
a n d r a z M O J A D E L N I C A t i " " m e n e 1 1 . 1 2 . 2 0 1 4 S k o f j a " " L o k a
s a n j a H S E n e " " t i " " m e n e 5 . 5 . 2 0 1 4 T r z i n
So a 4 rows x 5 column table.
I would like advice on how to make these white spaces and " " go away. So that for instance d o m e n would be domen and S k o f j a " " L o k a, Skofja Loka.
I would be forever grateful for your help. Oh and if it is possibile to do it without any other modules that'd be even better since I have a problem installing them on this computer aswell :(
Thank you for your time.
import csv
mesta=["Ljubljana","Kranj","Skofja Loka","Trzin"]
opis=["ti","mene","ti mene","ne ti mene"]
delodajalci=["GENI","MOJEDELO","MOJADELNICA","HSE"]
ime=["domen","maja","andraz","sanja"]
datum=["2.1.2014","5.10.2014","11.12.2014","5.5.2014"]
with open('sth.csv','w') as csvfile:
zapis = csv.writer(csvfile)
zapis.writerows(zip(ime,delodajalci,opis,datum,mesta))
creates the output:
domen,GENI,ti,2.1.2014,Ljubljana
maja,MOJEDELO,mene,5.10.2014,Kranj
andraz,MOJADELNICA,ti mene,11.12.2014,Skofja Loka
sanja,HSE,ne ti mene,5.5.2014,Trzin
In what you had, writerows was getting a single string (including commas between what you wanted). You wanted it to just print that string out. But the csvwriter wants to print a list with a separator between entries. It only had a string, so when it treats that as a list, each character gets printed, then the separator.
In what I've done, I've got writerows receiving a bunch of lists. Each list comes from zip, so each list consists of the 'i'th entries of the arguments zip got.
edit removed dolzina and i=0 since not needed anymore.