Shift non-periodic values in data - python

I have a data dictionary like this:
data = {
'new_value': [
'100', '100',
'250', '250',
'250', '50',
'90', '90',
'90', '90'
],
'prev_value': [
'None', 'None',
'None', 'None',
'None', 'None',
'None', 'None',
'None', 'None'
]
}
new_value prev_value
0 100 None
1 100 None
2 250 None
3 250 None
4 250 None
5 50 None
6 90 None
7 90 None
8 90 None
9 90 None
And I would expect to get another dictionary exp_result like this:
exp_result = {
'new_value': [
'100', '100',
'250', '250',
'250', '50',
'90', '90',
'90', '90'
],
'prev_value': [
'100', '100',
'100', '100',
'100', '250',
'50', '50',
'50', '50'
]
}
new_value prev_value
0 100 100
1 100 100
2 250 100
3 250 100
4 250 100
5 50 250
6 90 50
7 90 50
8 90 50
9 90 50
I tried pandas.Series.shift() function, but my data isn't periodic and I have no idea now.

Idea is convert all values without last consecutives to missing values by Series.mask, then Series.shift for values to next groups, forward filling missing values by previous values by ffill and last replace first missing values by originals by fillna:
m = df['new_value'].shift(-1).eq(df['new_value'])
df['prev_value'] = df['new_value'].mask(m).shift().ffill().fillna(df['new_value'])
print (df)
new_value prev_value
0 100 100
1 100 100
2 250 100
3 250 100
4 250 100
5 50 250
6 90 50
7 90 50
8 90 50
9 90 50

Related

Is there a way to do this in Python?

I have a data frame that looks like this:
data = {'State': ['24', '24', '24',
'24','24','24','24','24','24','24','24','24'],
'County code': ['001', '001', '001',
'001','002','002','002','002','003','003','003','003'],
'TT code': ['123', '123', '123',
'123','124','124','124','124','125','125','125','125'],
'BLK code': ['221', '221', '221',
'221','222','222','222','222','223','223','223','223'],
'Age Code': ['1', '1', '2', '2','2','2','2','2','2','1','2','1']}
df = pd.DataFrame(data)
essentially I want to just have where only the TT code where the age code is 2 and there are no 1's. So I just want to have the data frame where:
'State': ['24', '24', '24', '24'],
'County code': ['002','002','002','002',],
'TT code': ['124','124','124','124',],
'BLK code': ['222','222','222','222'],
'Age Code': ['2','2','2','2']
is there a way to do this?
IIUC, you want to keep only the TT groups where there are only Age groups with value '2'?
You can use a groupby.tranform('all') on the boolean Series:
df[df['Age Code'].eq('2').groupby(df['TT code']).transform('all')]
output:
State County code TT code BLK code Age Code
4 24 002 124 222 2
5 24 002 124 222 2
6 24 002 124 222 2
7 24 002 124 222 2
This should work.
df111['Age Code'] = "2"
I am just wondering why the choice of string for valueType of integer

Iterate a list of rows and split

I am trying to iterate through a list, then split the rows - to perform a function on specific element in the split.
what i want is something like this - so i can grab each element by postilion x[2] = 220
['2', '325', '220', '1.0']
what i get is this (split by character)
for row in range(len(pln)):
for j in range(len(pln[row])):
print( pln[row][j], end="")
x = [pln[row][j].split()]
print (x)
2[['2']]
[['', '']]
3[['3']]
2[['2']]
5[['5']]
[['', '']]
2[['2']]
2[['2']]
0[['0']]
[['', '']]
1[['1']]
.[['.']]
0[['0']]
[['\n']]
pln = (before iteration as list)
['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
pln = (after iteration)
2 325 220 1.0
2 600 200 3.3
2 325 100 3.3
2 600 120 5.5
2 600 125 5.5
2 325 100 3.4
Here is a solution:
lst = ['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
result_list = []
for i in lst:
k = i.split()
result_list.append(k)
print(result_list)
#Output:
[['2', '325', '220', '1.0'], ['2', '600', '200', '3.3'], ['2', '325', '100', '3.3'], ['2', '600', '120', '5.5'], ['2', '600', '125', '5.5'], ['2', '325', '100', '3.4']]
You can access an element like:
#result_list[row_number][element index]
print(result_list[2][3]) #fourth element in third row
#Output
3.3
You can iterate all rows like:
for row in result_list:
print(row)
#Output:
['2', '325', '220', '1.0']
['2', '600', '200', '3.3']
['2', '325', '100', '3.3']
['2', '600', '120', '5.5']
['2', '600', '125', '5.5']
['2', '325', '100', '3.4']
You can iterate any column like:
for row in result_list:
print(row[1]) #This will give second column
#Output:
325
600
325
600
600
325
Stop printing when doing your splitting. Print after the columns have been split.
pln = ['2 325 220 1.0\n', '2 600 200 3.3\n', '2 325 100 3.3\n', '2 600 120 5.5\n', '2 600 125 5.5\n', '2 325 100 3.4']
rows = [line.split() for line in pln]
for row in rows:
print('\t'.join(row))
You can then access each row by its index
second_row = rows[1]
Then access each column by index
third_column = second_row[2]
This code will store individual characters in new_listThis is all I understood from your question, if anything else do ask.
given_list = ['2','325','220','1.0']
new_list = list()
length = len(given_list)
for i in range(length):
for j in given_list[i]:
new_list.append(j)
if i+1 != length:
new_list.append(",")
print(new_list)
Output: ['2', ',', '3', '2', '5', ',', '2', '2', '0', ',', '1', '.', '0']

Is it possible to check a string comparing two regex then adding it to a dictionary?

Question
How can I run through the string so that when locationRegex condition is met it will add it's output to a dictionary, then add any subsequent numbers from numbersRegex to the same dictionary then create a new one with the next location arrives. As shown in Desired output.
Code
import re
# Text to check
text = "Italy Roma 20 40 10 4902520 10290" \
"Italy Milan 20 10 49 20 1030" \
"Germany Berlin 20 10 10 10 29 490" \
"Germany Frankfurt 20 0 0 0 0" \
"Luxemburg Luxemburg 20 10 49"
# regex to find location
locationRegex = re.compile(r'[A-Z]\w+\s[A-Z]\w+')
# regex to find numbers
numberRegex = re.compile(r'[0-9]+')
# Desired output
locations = {'Italy Roma': {'numbers': [10, 40, 10, 4902520]},
'Italy Milan': {'numbers': [20, 10, 49, 20, 1030]}}
What I have tried
I have ran the regex against the string with re.findall however I have the issue of assigning the numbers to the locations as they sit in two separate pots of locations and numbers.
Use a single regex to split the text in chunks, use groups within the regex to separate the data (note the parenthesis), and finally use split to split the number string on the spaces:
import re
text = (
"Italy Roma 20 40 10 4902520 10290"
"Italy Milan 20 10 49 20 1030"
"Germany Berlin 20 10 10 10 29 490"
"Germany Frankfurt 20 0 0 0 0"
"Luxemburg Luxemburg 20 10 49"
)
line_regex = re.compile(r"([A-Z]\w+\s[A-Z]\w+) ([0-9 ]+)")
loc_dict = {}
for match in re.finditer(line_regex, text):
print(match.group(1))
print(match.group(2))
loc_dict[match.group(1)] = {"numbers": match.group(2).split(" ")}
print(loc_dict)
The dict will be:
{'Italy Roma': {'numbers': ['20', '40', '10', '4902520', '10290']},
'Italy Milan': {'numbers': ['20', '10', '49', '20', '1030']},
'Germany Berlin': {'numbers': ['20', '10', '10', '10', '29', '490']},
'Germany Frankfurt': {'numbers': ['20', '0', '0', '0', '0']},
'Luxemburg Luxemburg': {'numbers': ['20', '10', '49']}}
Note that you should check for edge cases: no numbers, cities with a space in the name and so on.
Cheers!

How print a list into separate lists python

I have a list:
grades= ['doe john 100 90 80 90', 'miller sally 70 90 60 100 80', 'smith jakob 45 55 50 58', 'white jack 85 95 65 80 75']
I want to be able to break that list so the output would be:
['doe john 100 90 80 90']
['miller sally 70 90 60 100 80']
['smith jakob 45 55 50 58']
['white jack 85 95 65 80 75']
Additionally, I would like to split the elements in the list so it looks like:
['doe', 'john', '100', '90', '80', '90']
['miller', 'sally', '70', '90', '60', '100', '80']
['smith', 'jakob', '45', '55', '50', '58']
['white', 'jack', '85', '95', '65', '80', '75']
I'm not really sure how to go about doing this or if this is even possible as I'm just starting to learn python. Any ideas?
for l in grades:
l = l.split()
OR
final = [l.split() for l in grades]
See Split string on whitespace in Python
This can be done quickly with .split() in a list comprehension.
grades = ['doe john 100 90 80 90', 'miller sally 70 90 60 100 80', 'smith jakob 45 55 50 58', 'white jack 85 95 65 80 75']
grades = [grade.split() for grade in grades]
print (grades)

Pandas - Resort Column Location

I have a pandas frame. When I print the columns (shown below), its turns out that my columns are out of order. Is there a way to sort only the first 30 columns so they are in order (30,60,90...900)?
[in] df.columns
[out] Index(['120', '150', '180', '210', '240', '270', '30', '300', '330', '360',
'390', '420', '450', '480', '510', '540', '570', '60', '600', '630',
'660', '690', '720', '750', '780', '810', '840', '870', '90', '900',
'Item', 'Price', 'Size', 'Time', 'Type', 'Unnamed: 0'],
dtype='object')
The fixed frame would be as follows:
[out] Index(['30','60','90,'120', '150', '180', '210', '240', '270','300', '330', '360',
'390', '420', '450', '480', '510', '540', '570','600', '630',
'660', '690', '720', '750', '780', '810', '840', '870','900',
'Item', 'Price', 'Size', 'Time', 'Type', 'Unnamed: 0'],
dtype='object')
If you know that the columns will be named 30 through 900 in multiples of 30, you can generate that explicitly like this:
c = [str(i) for i in range(30, 901, 30)]
Then add it to the other columns:
c = c + ['Item', 'Price', 'Size', 'Time', 'Type', 'Unnamed: 0']
Then you should be able to access it as df[c]
You need select first column names, convert to int and sort. Then convert back to str if necessary and use reindex_axis:
np.sort(df.columns[:30].astype(int)).astype(str).tolist() +
df.columns[30:].tolist()
Sample:
df = pd.DataFrame(np.arange(36).reshape(1,-1),
columns=['120', '150', '180', '210', '240', '270', '30', '300',
'330', '360','390', '420', '450', '480', '510', '540', '570', '60', '600', '630',
'660', '690', '720', '750', '780', '810', '840', '870', '90', '900',
'Item', 'Price', 'Size', 'Time', 'Type', 'Unnamed: 0'])
print (df)
120 150 180 210 240 270 30 300 330 360 ... 840 870 90 \
0 0 1 2 3 4 5 6 7 8 9 ... 26 27 28
900 Item Price Size Time Type Unnamed: 0
0 29 30 31 32 33 34 35
[1 rows x 36 columns]
df = df.reindex_axis(np.sort(df.columns[:30].astype(int)).astype(str).tolist() +
df.columns[30:].tolist(), axis=1)
print (df)
30 60 90 120 150 180 210 240 270 300 ... 810 840 870 \
0 6 17 28 0 1 2 3 4 5 7 ... 25 26 27
900 Item Price Size Time Type Unnamed: 0
0 29 30 31 32 33 34 35
[1 rows x 36 columns]

Categories

Resources