loc and iloc - basics

How to use iloc and loc in pandas


Pandas is a popular Python library for data analysis and manipulation. It provides various methods and attributes to access and modify data in different ways. Two of the most commonly used methods are iloc and loc, which allow you to select rows and columns by integer position or by label, respectively.

In this blog post, we will explain the difference between iloc and loc, how to use them effectively, and some common pitfalls to avoid.

iloc vs loc

The iloc method stands for integer location, and it allows you to select rows and columns by their integer position. For example, if you have a DataFrame df with 5 rows and 3 columns, you can use iloc to access the element in the second row and third column as follows:

df.iloc[1, 2]

Note that iloc uses zero-based indexing, meaning that the first row or column has index 0, the second has index 1, and so on.

The loc method stands for label location, and it allows you to select rows and columns by their labels. For example, if you have a DataFrame df with 5 rows and 3 columns, and the columns are named 'A', 'B', and 'C', you can use loc to access the element in the second row and third column as follows:

df.loc[1, 'C']

Note that loc uses label-based indexing, meaning that you have to specify the exact name of the row or column you want to access.

You can also use iloc and loc to select multiple rows or columns at once, by passing a list or a slice of indices or labels. For example, you can use iloc to select the first two rows and the last two columns as follows:

df.iloc[:2, -2:]

And you can use loc to select the rows with labels 1 and 3, and the columns with labels 'A' and 'C' as follows:

df.loc[[1, 3], ['A', 'C']]

How to use iloc and loc effectively

Iloc and loc are very powerful methods that can help you access and modify data in various ways. Here are some tips on how to use them effectively:

  • Use iloc when you know the exact position of the rows or columns you want to select, or when you want to select a range of positions by slicing.
  • Use loc when you know the exact label of the rows or columns you want to select, or when you want to select a range of labels by slicing.
  • You can also combine iloc and loc with boolean indexing, which allows you to select rows or columns based on a condition. For example, you can use iloc to select the rows where the value in column 'A' is greater than 10 as follows:
df.iloc[df['A'] > 10]
  • And you can use loc to select the columns where the value in row 0 is less than 5 as follows:
df.loc[:, df.loc[0] < 5]
  • You can also use iloc and loc to assign new values to the selected rows or columns. For example, you can use iloc to set the value in the second row and third column to 100 as follows:
df.iloc[1, 2] = 100
  • And you can use loc to set the value in the rows with labels 1 and 3, and the column with label 'C' to 0 as follows:
df.loc[[1, 3], 'C'] = 0

Some common pitfalls to avoid

While iloc and loc are very useful methods, they also have some limitations and potential pitfalls that you should be aware of. Here are some of them:

  • You cannot use iloc or loc with negative indices (except for slicing). For example, if you try to access the last row or column using -1 as an index, you will get an error. Instead, you should use the shape attribute of the DataFrame to get the number of rows or columns, and subtract 1 from it. For example, if you want to access the last row using iloc, you can do it as follows:
df.iloc[df.shape[0] - 1]
  • You cannot use iloc or loc with mixed types of indices or labels. For example, if you try to access a row by its integer position and a column by its label using iloc or loc, you will get an error. Instead, you should use either iloc or loc consistently for both rows and columns. For example, if you want to access the element in the second row and third column using iloc, you can do it as follows:
df.iloc[1][2]
  • Or using loc as follows:
df.loc[1][df.columns[2]]
  • You should be careful when modifying data using iloc or loc, as it may affect other views or copies of the same data. For example, if you create a new DataFrame by selecting some rows or columns from another DataFrame using iloc or loc, and then modify the new DataFrame, it may also modify the original DataFrame. This is because pandas sometimes returns a view of the data, which is a reference to the original data, and sometimes returns a copy of the data, which is a separate object. To avoid this ambiguity, you should use the copy method to explicitly create a copy of the data when you want to modify it. For example, if you want to create a new DataFrame by selecting the first two rows from another DataFrame using iloc, and then modify the new DataFrame, you can do it as follows:
new_df = df.iloc[:2].copy()
new_df['A'] = 0

This way, you can ensure that the original DataFrame is not affected by the modification.

Conclusion

In this blog post, we have explained the difference between iloc and loc, how to use them effectively, and some common pitfalls to avoid. We hope that this post has helped you understand and appreciate these powerful methods for accessing and modifying data in pandas.

If you have any questions or feedback, please feel free to leave a comment below. Thank you for reading!

Comments

Popular posts from this blog

Drawing Tables with ReportLab: A Comprehensive Example

Blog Topics

DataFrame groupby agg style bar