How To Create A Dataframe From Another Dataframe In Pandas

25. Accessing and Changing values of DataFrames

By Bernd Klein. Last modified: 23 Nov 2021.

On this page ➤

Pandas Replacing Values

We have seen in the previous chapters of our tutorial many ways to create Series and DataFrames. We also learned how to access and replace complete columns. This chapter of our Pandas and Python tutorial will show various ways to access and change selectively values in Pandas DataFrames and Series. We will show ways how to change single value or values matching strings or regular expressions.

For this purpose we will learn to know the methods loc, at and replace

We will work with the following DataFrame structure in our examples:

              import              pandas              as              pd              first              =              (              'Mike'              ,              'Dorothee'              ,              'Tom'              ,              'Bill'              ,              'Pete'              ,              'Kate'              )              last              =              (              'Meyer'              ,              'Maier'              ,              'Meyer'              ,              'Mayer'              ,              'Meyr'              ,              'Mair'              )              job              =              (              'data analyst'              ,              'programmer'              ,              'computer scientist'              ,              'data scientist'              ,              'accountant'              ,              'psychiatrist'              )              language              =              (              'Python'              ,              'Perl'              ,              'Java'              ,              'Java'              ,              'Cobol'              ,              'Brainfuck'              )              df              =              pd              .              DataFrame              (              list              (              zip              (              last              ,              job              ,              language              )),              columns              =              [              'last'              ,              'job'              ,              'language'              ],              index              =              first              )              df

	last	job	language
Mike	Meyer	data analyst	Python
Dorothee	Maier	programmer	Perl
Tom	Meyer	computer scientist	Java
Bill	Mayer	data scientist	Java
Pete	Meyr	accountant	Cobol
Kate	Mair	psychiatrist	Brainfuck

Changing one value in DataFrame

Pandas provides two ways, i.e. loc and at, to access or change a single value of a DataFrame. We will experiment with the height of Bill in the following Python code:

              # accessing the job of Bill:              print              (              df              .              loc              [              'Bill'              ,              'job'              ])              # alternative way to access it with at:              print              (              df              .              at              [              'Bill'              ,              'job'              ])              # setting the job of Bill to 'data analyst' with 'loc'              df              .              loc              [              'Bill'              ,              'job'              ]              =              'data analyst'              # let us check it:              print              (              df              .              loc              [              'Bill'              ,              'job'              ])              # setting the job of Bill to 'computer scientist' with 'at'              df              .              at              [              'Pete'              ,              'language'              ]              =              'Python'

OUTPUT:

data scientist data scientist data analyst

The following image shows what we have done:

Replacing values in Pandas with loc and at

You will ask yourself now which one you should use? The help on the at method says the following: "Access a single value for a row/column label pair. Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series." loc on the other hand can be used to access a single value but also to access a group of rows and columns by a label or labels.

Another intestering question is about the speed of both methods in comparison. We will measure the time behaviour in the following code examples:

              %              timeit              df.loc['Bill', 'language'] = 'Python'

OUTPUT:

129 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

              %              timeit              df.at['Bill', 'language'] = 'Python'

OUTPUT:

4.69 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

When it comes to speed the answer is clear: we should definitely use at.

replace

The syntax of replace:

          replace(self,          to_replace=None,          value=None,          inplace=False,          limit=None,          regex=False,          method='pad')

This method replaces values given in to_replace with value. This differs from updating with .loc or .iloc, which requires you to specify a location to update with some value. With replace it is possible to replace values in a Series or DataFrame without knowing where they occur.

replace works both with Series and DataFrames.

We will explain the way of working of the method replace by discussing the different data types of the parameter to_replace individually:

to_replace can be numeric value, string or regular expression:

Data Type	processing
numeric	numeric values in the DataFrame (or Series) which are equal to `to_replace` will be replaced with parameter `value`
str	If the parameter `to_replace` consists of a string, all the strings of the DataFrame (or Series) which exactly match this string will be replaced by the value of the parameter `value`
regex	All strings inside of the DataFrame (or Series) which match the regular expression of `to_replace` will be replaced with `value`

Let's start with a Series example. We will change one value into another one. Be aware of the fact that replace by default creates a copy of the object in which all the values are replaced. This means that the parameter inplace is set to False by default.

              s              =              pd              .              Series              ([              27              ,              33              ,              13              ,              19              ])              s              .              replace              (              13              ,              42              )

OUTPUT:

0    27 1    33 2    42 3    19 dtype: int64

If we really want to change the object s is referencing, we should set the inplace parameter to True:

              s              =              pd              .              Series              ([              27              ,              33              ,              13              ,              19              ])              s              .              replace              (              13              ,              42              ,              inplace              =              True              )              s

OUTPUT:

0    27 1    33 2    42 3    19 dtype: int64

We can also change multiple values into one single value, as you can see in the following example.

              s              =              pd              .              Series              ([              0              ,              1              ,              2              ,              3              ,              4              ])              s              .              replace              ([              0              ,              1              ,              2              ],              42              ,              inplace              =              True              )              s

OUTPUT:

0    42 1    42 2    42 3     3 4     4 dtype: int64

We will show now, how replace can be used on DataFrames. For this purpose we will recreate the example from the beginning of this chapter of our tutorial. It will not be exactly the same though. More people have learned Python now, but unfortunately somebody has put in some spelling errors in the word Python. Pete has become a programmer now:

              import              pandas              as              pd              first              =              (              'Mike'              ,              'Dorothee'              ,              'Tom'              ,              'Bill'              ,              'Pete'              ,              'Kate'              )              last              =              (              'Meyer'              ,              'Maier'              ,              'Meyer'              ,              'Mayer'              ,              'Meyr'              ,              'Mair'              )              job              =              (              'data analyst'              ,              'programmer'              ,              'computer scientist'              ,              'data scientist'              ,              'programmer'              ,              'psychiatrist'              )              language              =              (              'Python'              ,              'Perl'              ,              'Java'              ,              'Pithon'              ,              'Pythen'              ,              'Brainfuck'              )              df              =              pd              .              DataFrame              (              list              (              zip              (              last              ,              job              ,              language              )),              columns              =              [              'last'              ,              'job'              ,              'language'              ],              index              =              first              )              df

	last	job	language
Mike	Meyer	data analyst	Python
Dorothee	Maier	programmer	Perl
Tom	Meyer	computer scientist	Java
Bill	Mayer	data scientist	Pithon
Pete	Meyr	programmer	Pythen
Kate	Mair	psychiatrist	Brainfuck

Both ambitious Dorothee and enthusiastic Pete have finished a degree in Computer Science in the meantime. So we have to change our DataFrame accodingly:

              df              .              replace              (              "programmer"              ,              "computer scientist"              ,              inplace              =              True              )              df

	last	job	language
Mike	Meyer	data analyst	Python
Dorothee	Maier	computer scientist	Perl
Tom	Meyer	computer scientist	Java
Bill	Mayer	data scientist	Pithon
Pete	Meyr	computer scientist	Pythen
Kate	Mair	psychiatrist	Brainfuck

to_replace can be a list consisting of str, regex or numeric objects:

to_replace can be a list consisting of str, regex or numeric objects. If both the values of to_replace and value are both lists, they must be the same length.

First, if to_replace and value are both lists, they must be the same length. Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. We slightly changed our DataFrame for the next example. The first names build a column now!

              df              =              pd              .              DataFrame              (              list              (              zip              (              first              ,              last              ,              job              ,              language              )),              columns              =              [              'first'              ,              'last'              ,              'job'              ,              'language'              ])              df

	first	last	job	language
0	Mike	Meyer	data analyst	Python
1	Dorothee	Maier	programmer	Perl
2	Tom	Meyer	computer scientist	Java
3	Bill	Mayer	data scientist	Pithon
4	Pete	Meyr	programmer	Pythen
5	Kate	Mair	psychiatrist	Brainfuck

'Mike' and 'Tom' are abbreviations of the real first names, so we will change them now. We are also glad to announce that Dorothee finally migrated from Perl to Python:

              df              .              replace              (              to_replace              =              [              'Mike'              ,              'Tom'              ,              'Perl'              ],              value              =              [              'Michael'              ,              'Thomas'              ,              'Python'              ],              inplace              =              True              )              df

	first	last	job	language
0	Michael	Meyer	data analyst	Python
1	Dorothee	Maier	programmer	Python
2	Thomas	Meyer	computer scientist	Java
3	Bill	Mayer	data scientist	Pithon
4	Pete	Meyr	programmer	Pythen
5	Kate	Mair	psychiatrist	Brainfuck

Using Regular Expressions

Now it is time to do something about the surnames in our DataFrame. All these names sound the same - at least for German speakers. Let us assume that we just found out that they should all be spelled the same, i.e. Mayer. Now the time has come for regular expression. The parameter to_replace can also be used with regular expressions. We will use a regular expression to unify the surnames. We will also fix the misspellings of Python:

              df              .              replace              (              to_replace              =              [              r              'M[ea][iy]e?r'              ,              r              'P[iy]th[eo]n'              ],              value              =              [              'Mayer'              ,              'Python'              ],              regex              =              True              ,              inplace              =              True              )              df

	first	last	job	language
0	Michael	Mayer	data analyst	Python
1	Dorothee	Mayer	programmer	Python
2	Thomas	Mayer	computer scientist	Java
3	Bill	Mayer	data scientist	Python
4	Pete	Mayer	programmer	Python
5	Kate	Mayer	psychiatrist	Brainfuck

to_replace can be a dict

Dictionaries can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and 'y' with 'z'. To use a dict in this way the value parameter should be None.

Ìf replace is applied on a DataFrame, a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and the value 'z' in column 'b' and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

If we use nested dictionaries like {'A': {'foo': 'bar'}}, they are interpreted like this: look in column 'A' for the value 'foo' and replace it with 'bar'. The value parameter has to be None in this case.

              df              =              pd              .              DataFrame              ({              'A'              :              [              0              ,              1              ,              2              ,              3              ,              4              ],              'B'              :              [              'foo'              ,              'bar'              ,              'bloo'              ,              'blee'              ,              'bloo'              ],              'C'              :              [              'green'              ,              'red'              ,              'blue'              ,              'yellow'              ,              'green'              ]})              df              .              replace              (              to_replace              =              {              "A"              :              {              0              :              42              ,              3              :              33              },              'B'              :              {              'bloo'              :              'vloo'              }},              inplace              =              True              )              df

	A	B	C
0	42	foo	green
1	1	bar	red
2	2	vloo	blue
3	33	blee	yellow
4	4	vloo	green

The method parameter of replace:

When the parameter value is None and the parameter to_replace is a scalar, list or tuple, the method replace will use the parameter method to decide which replacement to perform. The possible values for method are pad, ffill, bfill, None. The default value is pad.

First of all, you have to know that pad and ffill are equivalent.

We will explain the way of working with the following Python example code. We will replace the values NN and LN of the following DataFrame in different ways by using the parameter method.

            import            pandas            as            pd            df            =            pd            .            DataFrame            ({            'name'            :[            'Ben'            ,            'Kate'            ,            'Agnes'            ,            'Ashleigh'            ,            'Tom'            ],            'job'            :[            'programmer'            ,            'NN'            ,            'NN'            ,            'engineer'            ,            'teacher'            ],            'language'            :[            'Java'            ,            'Python'            ,            'LN'            ,            'LN'            ,            'C'            ]})

We will start with changing the 'NN' values by using ffill in the parameter method:

Replacing values in Pandas with replace and the parameter method

The picture above shows how the following call to replace. In words: Every occurrence of NN in the job column will have to be replaced. The value is defined by the parameter method. ffill means 'forward fill', i.e. we will take the preceding value to the first NN as the fill value. This is why we will replace the values by 'programmer':

Replacing values in Pandas with replace and the parameter method

              # method is backfill              df              .              replace              (              to_replace              =              'NN'              ,              value              =              None              ,              method              =              'ffill'              )

	name	job	language
0	Ben	programmer	Java
1	Kate	programmer	Python
2	Agnes	programmer	LN
3	Ashleigh	engineer	LN
4	Tom	teacher	C

Instead of using a single value for to_replace we can also use a list or tuple. We will replace in the following all occurrences of NN and LN accordingly, as we can see in the following picture:

Replacing values in Pandas with replace and the parameter method

              df              .              replace              (              to_replace              =              [              'NN'              ,              'LN'              ],              value              =              None              ,              method              =              'ffill'              )

	name	job	language
0	Ben	programmer	Java
1	Kate	programmer	Python
2	Agnes	programmer	Python
3	Ashleigh	engineer	Python
4	Tom	teacher	C

We will show now what happens, if we use bfill (backward fill). Now the occurrences of 'LN' become 'C' instead of Python. We also turn the 'NN's into engineers instead of programmers.

Replacing values in Pandas with replace and the parameter method

              df              .              replace              ([              'NN'              ,              'LN'              ],              value              =              None              ,              method              =              'bfill'              )

	name	job	language
0	Ben	programmer	Java
1	Kate	engineer	Python
2	Agnes	engineer	C
3	Ashleigh	engineer	C
4	Tom	teacher	C

              df              .              replace              (              'NN'              ,              value              =              None              ,              inplace              =              True              ,              method              =              'bfill'              )              df              .              replace              (              'LN'              ,              value              =              None              ,              inplace              =              True              ,              method              =              'ffill'              )              df

	name	job	language
0	Ben	programmer	Java
1	Kate	engineer	Python
2	Agnes	engineer	Python
3	Ashleigh	engineer	Python
4	Tom	teacher	C

Other Examples

We will present some more examples, which are taken from the help file of loc:

              df              =              pd              .              DataFrame              ([[              1              ,              2              ],              [              4              ,              5              ],              [              7              ,              8              ]],              index              =              [              'cobra'              ,              'viper'              ,              'sidewinder'              ],              columns              =              [              'max_speed'              ,              'shield'              ])              df

	max_speed	shield
cobra	1	2
viper	4	5
sidewinder	7	8

Single label. Note this returns the row as a Series:

OUTPUT:

max_speed    4 shield       5 Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame:

              df              .              loc              [[              'viper'              ,              'sidewinder'              ]]

	max_speed	shield
viper	4	5
sidewinder	7	8

Single label for row and column

              df              .              loc              [              'cobra'              ,              'shield'              ]

OUTPUT:

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

              df              .              loc              [              'cobra'              :              'viper'              ,              'max_speed'              ]

OUTPUT:

cobra    1 viper    4 Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

              df              .              loc              [[              False              ,              False              ,              True              ]]

	max_speed	shield
sidewinder	7	8

Conditional that returns a boolean Series

	max_speed	shield
sidewinder	7	8

Conditional that returns a boolean Series with column labels specified

            df            .            loc            [            df            [            'shield'            ]            >            6            ,            [            'max_speed'            ]]

Callable that returns a boolean Series

              df              .              loc              [              lambda              df              :              df              [              'shield'              ]              ==              8              ]

	max_speed	shield
sidewinder	7	8

Set value for all items matching the list of labels

              df              .              loc              [[              'viper'              ,              'sidewinder'              ],              [              'shield'              ]]              =              50              df

	max_speed	shield
cobra	1	2
viper	4	50
sidewinder	7	50

Set value for an entire row

	max_speed	shield
cobra	10	10
viper	4	50
sidewinder	7	50

Set value for an entire column

              df              .              loc              [:,              'max_speed'              ]              =              30              df

	max_speed	shield
cobra	30	10
viper	30	50
sidewinder	30	50

Set value for rows matching callable condition

              df              .              loc              [              df              [              'shield'              ]              >              35              ]              =              0              df

	max_speed	shield
cobra	30	10
viper	0	0
sidewinder	0	0