More than the specified no. of columns are renamed with Pandas - python

I joined multiple files using Pandas join() but now want to rename a few of the duplicate columns. But when I specify the indices to rename a few columns, more than the specified no. of columns are being renamed.
Input CSV files have the format
F1.csv
A,B,C,D,E,F
1,4,5,6,7,8
2,1,3,4,5,6
3,4,1,5,1,8
4,5,1,5,6,7
F2.csv
A,B,C,M,N
1,4,5,6,7
2,1,3,4,5
3,4,1,5,1
4,5,1,5,6
F3.csv
A,B,C,X,Y,Z
1,4,5,6,7,8
2,1,3,4,5,6
3,4,1,5,1,8
4,5,1,5,6,7
F4.csv
A,B,C,T,Q,R
1,4,5,6,7,8
2,1,3,4,5,6
3,4,1,5,1,8
4,5,1,5,6,7
And my code
data = None
for f in filelist:
if data is None:
data = pandas.read_csv(f, index_col='A')
else:
data = data.join(pandas.read_csv(f, index_col='A'), lsuffix='_left', rsuffix='_right', how=join_type)
print(list(data))
new_names =["HH","XX"]
old_names = data.columns[[0,1]]
data.rename(columns=dict(zip(old_names, new_names)), inplace=True)
print(list(data_union))
The first print gives the output
['B_left', 'C_left', 'D', 'E', 'F', 'B_right', 'C_right', 'M', 'N', 'B_left', 'C_left', 'X', 'Y', 'Z', 'B_right', 'C_right', 'T', 'Q', 'R']
And print after renaming gives
['HH', 'XX', 'D', 'E', 'F', 'B_right', 'C_right', 'M', 'N', 'HH', 'XX', 'X', 'Y', 'Z', 'B_right', 'C_right', 'T', 'Q', 'R']
My problem is instead of renaming columns at indices 0 and 1 alone, it is changing indices 10 and 11 too. Could anyone help me with this? I am new to Pandas and not able to figure this out. Thanks,

Related

Iterate over columns in a row in pandas

I have a csv file with following headers
question_no,question,A,B,C,D
where A,B,C,D are options for a question. The number of options for a question can vary from file to file(for eg. 4 - A,B,C,D 6 - A,B,C,D,E,F). I am trying to get the values of options in the row using the following code.
data = pd.read_csv(request.FILES['myfile'])
optioncodes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
col_nos = len(data.columns)
opt_lmt = col_nos - 2
for (idx, row) in data.iterrows():
print(row.question_no)
for j in range(opt_lmt):
print(row.optioncodes[j])
but I am getting the error
'Series' object has no attribute 'optioncodes'
How can I achieve this?
The dot accessor (df.col_name or serie.index_value) is only a shortcut for the named element accessor (df['col_name'] or serie['index_value']). And it is only valid at 2 conditions:
the name must be a constant - while you want it to be a variable
the name must be a valid identifier (no space or special character)
What you want here is just:
...
for j in range(opt_lmt):
print(row[optioncodes[j]])

Python script to generate a word with specific structure and letter combinations

I want to write a really short script that will help me generate a random/nonsense word with the following qualities:
-Has 8 letters
-First letter is "A"
-Second and Fourth letters are random letters
-Fifth letter is a vowel
-Sixth and Seventh letters are random letters and are the same
-Eighth letter is a vowel that's not "a"
This is what I have tried so far (using all the info I could find and understand online)
firsts = 'A'
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
print [''.join(first, second, third, fourth, fifth)
for first in firsts
for second in seconds
for third in thirds
for fourth in fourths
for fifth in fifths
for sixth in sixths
for seventh in sevenths
for eighth in eighths]
However it keeps showing a SyntaxError: invalid syntax after the for and now I have absolutely no idea how to make this work. If possible please look into this for me, thank you so much!
So the magic function you need to know about to pick a random letter is random.choice. You can pass a list into this function and it will give you a random element from that list. It also works with strings because strings are basically a list of chars. Also to make your life easier, use string module. string.ascii_lowercase returns all the letters from a to z in a string so you don't have to type it out. Lastly, you don't use loops to join strings together. Keep it simple. You can just add them together.
import string
from random import choice
first = 'A'
second = choice(string.ascii_lowercase)
third = choice(string.ascii_lowercase)
fourth = choice(string.ascii_lowercase)
fifth = choice("aeiou")
sixthSeventh = choice(string.ascii_lowercase)
eighth = choice("eiou")
word = first + second + third + fourth + fifth + sixthSeventh + sixthSeventh + eighth
print(word)
Try this:
import random
sixth=random.choice(sixths)
s='A'+random.choice(seconds)+random.choice(thirds)+random.choice(fourths)+random.choice(fifths)+sixth+sixth+random.choice(eighths)
print(s)
Output:
Awixonno
Ahiwojjy
etc
There are several things to consider. First, the str.join() method takes in an iterable (e.g. a list), not a bunch of individual elements. Doing
''.join([first, second, third, fourth, fifth])
fixes the program in this respect. If you are using Python 3, print() is a function, and so you should add parentheses around the entire list comprehension.
With the syntax out of the way, let's get to a more interesting problem: Your program constructs every (82255680 !) possible word. This takes a long time and memory. What you want is probably to just pick one. You can of course do this by first constructing all, then picking one at random. It's far cheaper though to pick one letter from each of firsts, seconds, etc. at random and then collecting these. All together then:
import random
firsts = ['A']
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
result = ''.join([
random.choice(firsts),
random.choice(seconds),
random.choice(thirds),
random.choice(fourths),
random.choice(fifths),
random.choice(sixths),
random.choice(sevenths),
random.choice(eighths),
])
print(result)
To improve the code from here, try to:
Find a way to generate the "data" in a neater way than writing it out explicitly. As an example:
import string
seconds = list(string.ascii_lowercase) # you don't even need list()!
Instead of having a separate variable firsts, seconds, etc., collect these into a single variable, e.g. a single list containing each original list as a single str with all characters included.
This will implement what you describe. You can make the code neater by putting the choices into an overall list rather than have several different variables, but you will have to explicitly deal with the fact that the sixth and seventh letters are the same; they will not be guaranteed to be the same simply because there are the same choices available for each of them.
The list choices_list could contain sub-lists per your original code, but as you are choosing single characters it will work equally with strings when using random.choice and this also makes the code a bit neater.
import random
choices_list = [
'A',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'eiouy'
]
letters = [random.choice(choices) for choices in choices_list]
word = ''.join(letters[:6] + letters[5:]) # here the 6th letter gets repeated
print(word)
Some example outputs:
Alaeovve
Aievellu
Ategiwwo
Aeuzykko
Here's the syntax fix:
print(["".join([first, second, third])
for first in firsts
for second in seconds
for third in thirds])
This method might take up a lot of memory.

Unique elements of sublists depending on specific value in sublist

I an trying to select unique datasets from a very large quite inconsistent list.
My Dataset RawData consists of string-items of different length.
Some items occure many times, for example: ['a','b','x','15/30']
The key to compare the item is always the last string: for example '15/30'
The goal is: Get a list: UniqueData with items that occure only once. (i want to keep the order)
Dataset:
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
My desired solution Dataset:
UniqueData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['i','j','k','l','m','n','o','p','20/60']]
I tried many possible solutions for instance:
for index, elem in enumerate(RawData): and appending to a new list if.....
for element in list does not work, because the items are not exactly the same.
Can you help me finding a solution to my problem?
Thanks!
The best way to remove duplicates is to add them into a set. Add the last element into a set as to keep track of all the unique values. When the value you want to add is already present in the set unique do nothing if not present add the value to set unique and append the lst to result list here it's new.
Try this.
new=[]
unique=set()
for lst in RawData:
if lst[-1] not in unique:
unique.add(lst[-1])
new.append(lst)
print(new)
#[['a', 'b', 'x', '15/30'],
['d', 'e', 'f', 'g', 'h', '20/30'],
['w', 'x', 'y', 'z', '10/10'],
['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
You could set up a new array for unique data and to track the items you have seen so far. Then as you loop through the data if you have not seen the last element in that list before then append it to unique data and add it to the seen list.
RawData = [['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'],
['a', 'x', 'c', '15/30'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60'], ['x', 'b', 'c', '15/30']]
seen = []
UniqueData = []
for data in RawData:
if data[-1] not in seen:
UniqueData.append(data)
seen.append(data[-1])
print(UniqueData)
OUTPUT
[['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
seen = []
seen_indices = []
for _,i in enumerate(RawData):
# _ -> index
# i -> individual lists
if i[-1] not in seen:
seen.append(i[-1])
else:
seen_indices.append(_)
for index in sorted(seen_indices, reverse=True):
del RawData[index]
print (RawData)
Using a set to filter out entries for which the key has already been seen is the most efficient way to go.
Here's a one liner example using a list comprehension with internal side effects:
UniqueData = [rd for seen in [set()] for rd in RawData if not(rd[-1] in seen or seen.add(rd[-1])) ]

Using .join() function on a set incorrectly reorders it [duplicate]

This question already has answers here:
Converting a list to a set changes element order
(16 answers)
Closed 3 years ago.
I have a set of characters (x) that is ordered as I need it:
{'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
'o',
'p',
'q',
'r',
's',
't',
'u',
'v',
'w',
'x',
'y',
'z'}
However, when I attempt to convert these back to a string using the .join() function:
return ' '.join(x)
The characters are being randomly reordered:
'c g e w i z n t l a q h p d f v m k b x u r j o y'
Any ideas as to what's going on here?
Sets don't "promise" to maintain order, sometimes they do, but they shouldn't be used with a dependency on it. Furthermore, consider using the following:
alpha = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Then:
return " ".join(alpha)
However, if you only care about it being in alphabetical and want to use a set you can force it to be sorted before using the join function...
return " ".join(sorted(x))
Good luck!
Sets and dictionaries are unordered (pre Python 3.7). Their exact implementation involves hashtables and can be a little complicated. However, suffice it to say that the order you put elements into the set does not determine the order they are stored.
You can use OrderedDict or you can convert the set to a list, sort, and go from there.

How to convert a numpy.ndarray type into a list?

I want to read a matfile in python and then export the data in a database. in order to do this I need to have the data type as list in python. I wrote the code below:
import scipy.io as si
import csv
a = si.loadmat('matfilename')
b = a['variable']
list1=b.tolist()
The variable has 1 row and 15 columns. when I print list1, I get the answer below: (It is indeed a list, but a list that contains only one element. It means when I call list1[0], I get the same result.):
[[array(['A'],
dtype='<U13'), array(['B'],
dtype='<U14'), array(['C'],
dtype='<U6'), array(['D'],
dtype='<U4'), array(['E'],
dtype='<U10'), array(['F'],
dtype='<U13'), array(['G'],
dtype='<U11'), array(['H'],
dtype='<U9'), array(['I'],
dtype='<U16'), array(['J'],
dtype='<U18'), array(['K'],
dtype='<U16'), array(['L'],
dtype='<U16'), array(['M'],
dtype='<U16'), array(['N'],
dtype='<U14'), array(['O'],
dtype='<U13')]]
While the form that I expect is:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
Does anyone know what the problem is?
To my experience, that is just like MATLAB files are structured, only nested arrays.
You can create the list yourself:
>>> [x[0][0] for x in list1[0]]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']

Categories

Resources