Fórum Fórum

Voltar

Parse text and exclude punctuation

Alternar
Parse text and exclude punctuation
Resposta
09/04/25 18:44
What is the best practice for splitting text at spaces, commas, periods, semicolons, etc. so I get just a list of words without punctuation or spaces?
0 (0 Votos)

RE: Parse text and exclude punctuation
Resposta
09/04/25 22:04 em resposta a JL Pope.
Great question! The approach that I would take is to take a couple passes. First pass, replace any punctuation with a specific special character, then split at that character, then remove anything that is blank.

So if you have:
I am hungry; what's for lunch? Not again - I ate that for breakfast!! Fine, I'll have it again...
You can convert it to:
I_am_hungry__what's_for_lunch__Not_again___I_ate_that_for_breakfast___Fine__I'll_have_it_again___

I'm not replacing the apostrophe - that's a choice. There are a couple ways we could replace these characters. We could run several loops to remove specific characters one at a time using the replace segment block (which is found under Advanced > Text). See image 1.

A better way is to use the "pattern" option in the replace block, which allows you to use regular expressions. I'd read up on regex a little, and ask an LLM to generate the code that you want. For REPLACING content, remember you often want to search for the opposite of what you want to keep. So if you want to keep alphanumeric and apostrophe, you want the regex for anything that is NOT alphanumeric and apostrophe, and you're going replace that with some character, like | or _. See image 2.

From there, split and remove empty values. See image 3.

Here's a video that walks through it - https://share.smartbuilder.com/public/videotutorials/regex-regular-expression-in-SmartBuilder.mp4

Hope that helps!
0 (0 Votos)