Split

 For this post, I want to look at an amazingly simple, elegant, and useful method – split.

The goal is to look at a column and array out all the data based on a delimiter of our choice.  For example, if I have a column containing all the genres of a movie (action, thriller, adventure, etc.) delimited by a “|” I can use split expand =True to accomplish the task: 

When we add the chaining method, we end up with the following code:

(m
    .genres
    .str.split('|', expand=True)
    .rename(columns=lambda c: 'gen_'+str(c))

)

This gives us a wonderful DataFrame representing a column for each delimited value. In this example, we see eight columns represented by 0 through 7. I wanted to rename the columns with the prefix "gen_" followed by the numeric index. 


There we go, we have our columns split out, ready to be joined back to our original table (if needed).  


Comments

Popular posts from this blog

Blog Topics

Drawing Tables with ReportLab: A Comprehensive Example

DataFrame groupby agg style bar