Split
For this post, I want to look at an amazingly simple, elegant, and useful method – split.
The goal is to look at a column and array out all the data based on a delimiter of our choice. For example, if I have a column containing all the genres of a movie (action, thriller, adventure, etc.) delimited by a “|” I can use split expand =True to accomplish the task:
When we add the chaining method, we end up with the
following code:
(m
.genres
.str.split('|', expand=True)
.rename(columns=lambda c: 'gen_'+str(c))
)
This gives us a wonderful DataFrame representing a column for each delimited value. In this example, we see eight columns represented by 0 through 7. I wanted to rename the columns with the prefix "gen_" followed by the numeric index.
Comments
Post a Comment