Currently I have a data frame that looks like this below:
name
value
position
length
table
5.0
1234567
.25
chair
8.0
789012
5
couch
6.0
345678
5
bed
5.3
1901234
.05
what I need to do is first edit the position column by adding a "+" after the tens place so the first number should be 12345+67
I think I would have to first break up every number in the position, then measure the length, and finally add the "+" sign by adding it the length of the value - 2?
Adding the "+" sign will cause it to align left in excel so I need to make sure it is aligned right.
I tried using df = df.style.set_properties(subset=["position"], **{'text-align': 'right'})
but this doesn't work because it appears I need columns that have a similar name.
What would be another way to get both of these complete?
Thank you in advanced.
UPDATE
I was able to break up the position column to two columns and added a third column with the "+" symbol. Then I combined all 3 new columns and replaced the position column. And lastly deleted the new columns using the following:
df['col1']= df['position'].astype(str).str[:-2]
df['col2'] = df['col'].str[:-2]
df['col3'] = df['col'].str[-2:]
df['col4'] = '+'
df['position'] = df[['col1','col3','col2']].apply(lambda row: ''.join(row.values.astype(str)), axis=1)
df = df.drop(["col", "col1", "col2", "col3", "col4"], axis=1)
The only thing left I need to do is be able to align the new value to the right because in excel it aligns left when I added the "+" sign
Excel by default aligns numerical values to the right and text values to the left. So you won't be able to make the change in pandas, but when you write to excel you can modify the alignment after saving using something like openpyxl
see here for an example
another option if you want the cells to retain their numerical value would be to leave them as a number but format them to display with the '+'. You could do this by setting the number format to "#+##"
see here for number format codes documentation
Related
Hi I am trying to edit rows for a particular column 'PM2.5' using '''df.at['0', 'PM2.5'] = 10''' for first row of the PM2.5 column but instead of editing, it is adding a new row. The headers for the columns are defined by titles but my rows are numbered, how do I go around this? I want to do this for 18 rows and manually add data to the column PM2.5. Thanks!
After comment problem is index values are integers (RangeIndex), so for set values need integer too.
So change '0' (string)
df.at['0', 'PM2.5'] = 10
to 0 (integer):
df.at[0, 'PM2.5'] = 10
I found a solution to this:
df['Name']=df['Name'].str.lstrip
df['Parent']=df['Name'].str.lstrip
I have this DataFrame df (there is a white space at the left of "A" and "C" in the second row (which doesn't show well here). I would like to remove that space.
Mark Name Parent age
10 A C 1
12 A C 2
13 B D 3
I tried
df['Name'].str.lstrip()
df['Parent'].str.lstrip()
then tried
df.to_excel('test.xlsx')
but the result in excel didn't remove the white spaces
I then tried defining another variable
x=df['Name'].str.lstrip
x.to_excel('test.xlsx')
that worked fine in Excel, but this is a new dataFrame, and only had the x column
I then tried repeating the same for 'Parent', and to play around with joining multiple dataframes to the original dataframe, but I still couldnt' get it to work, and that seems too convoluted anyway
Finally, even if my first attempt had worked, I would like to be able to replace the white spaces in one go, without having to do a separate call for each column name
You could try using
df['Name'].replace(" ", "")
this would delete all whitespaces though.
To preface: I'm new to using Python.
I'm working on cleaning up a file where data was spread across multiple rows. I'm struggling to find a solution that will concatenate multiple text strings to a single cell. The .csv data looks similar to this:
name,date,description
bundy,12-12-2017,good dog
,,smells kind of weird
,,needs to be washed
with one or two blank rows between each entry, too.
The amount of rows used for 'description' isn't consistent. Sometimes it's just one cell, sometimes up to about four. The ideal output turns these multiple rows into a single row of useful data, without all the wasted space. I thought maybe I could create a series of masks by copying the data across a few columns, shifted up, and then iterating in some way. I haven't found a solution that matches what I'm trying to do, though. This is where I'm at so far:
#Add column f description stuff and shift up a row for concatenation
DogData['Z'] = DogData['Y'].shift(-1)
DogData['AA'] = DogData['Z'].shift(-1)
DogData['AB'] = DogData['AA'].shift(-1)
#create series checks to determine how to concat values properly
YNAs = DogData['Y'].isnull()
ZNAs = DogData['Z'].isnull()
AANAs = DogData['AA'].isnull()
The idea here was basically that I'd iterate over column 'Y', check if the same row in column 'Z' was NA or had a value, and concat if it did. If not, just use the value in 'Y'. Carry that logic across but stopping if it encountered an NA in any subsequent columns. I can't figure out how to do that, or if there's a more efficient way to do this.
What do I have to do to get to my end result? I can't figure out the right way to iterate or concatenate in the way I was hoping to.
'''
name,date,description
bundy,12-12-2017,good dog
,,smells kind of weird
,,needs to be washed
'''
df = pd.read_clipboard(sep=',')
df.fillna(method = 'ffill').groupby([
'name',
'date'
]).description.apply(lambda x : ', '.join(x)).to_frame(name = 'description')
I'm not sure I follow exactly what you mean. I took that text, saved it as a csv file, and successfully read it into a pandas dataframe.
import pandas as pd
df = pd.read_csv('test.csv')
df
Output:
name date description
0 bundy 12-12-2017 good dog
1 NaN NaN smells kind of weird
2 NaN NaN needs to be washed
Isn't this the output you require?
pivot= pd.pivot_table(buliding_area_notnull,values = ['BuildingArea','Landsize'],index=['Bedroom2', 'Bathroom' ,'Car','Type'])
this is my code which gives a pivot table like=
Bedroom2 Bathroom Car Type Landsize
1_________1_________1______365.2
__________0_________2_______555[![enter image description here][2]][2]
____________________1________666
now i want to fill NaN dataframe['Landsize'] values using above pivot.what is the syntax.
note: the above pivot table is just a small part.
EDIT: So now I have a better idea of what you are doing.
What you need to do is flatten the multi index of the first dataframe with reset_index().
Then you want to join the two dataframes together on [Bedroom2, Bathroom, Car, Type].
This will give you an 8 column df (the four above plus buildingarea and landsize twice.
Then I would just create a new column and fill with building area from the second df if it is non nan and building area from the first df if it is nan.
EDIT END:
Your output there does not match what the code is that you typed at all. That being said, there is a fill parameter that you may find useful.
Docs below.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html
I have an excel where there are values in row 1 from column 1 to column 15. Each cell value in the end has a number.
I would like to create another row which merges cells based on the ending number and puts that corresponding text in the merged cell. But the row values still needs to maintain the order.
For example A1=ABC3, B1=ABC5, C1=ABC4 and so on. Now I would like to create in row 2 a merge of first 3 cells for and place ABC3. I need to create 5 merged cells next in the same row 2 to place ABC5. After that 4 Merged cells in the same row and place ABC4 and so on. Any thoughts how to implement this ?
This can be accomplished with the openpyxl module. If you're not familiar with it yet, then doing some of the tutorials would be a good start.