<?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"> <title>Parse text and exclude punctuation</title> <link rel="self" href="https://portal.smartbuilder.com/c/message_boards/find_thread?p_l_id=&amp;threadId=647585" /> <subtitle>Parse text and exclude punctuation</subtitle> <id>https://portal.smartbuilder.com/c/message_boards/find_thread?p_l_id=&amp;threadId=647585</id> <updated>2026-05-18T15:02:38Z</updated> <dc:date>2026-05-18T15:02:38Z</dc:date> <entry> <title>RE: Parse text and exclude punctuation</title> <link rel="alternate" href="https://portal.smartbuilder.com/c/message_boards/find_message?p_l_id=&amp;messageId=647640" /> <author> <name>Navdeep Dhillon</name> </author> <id>https://portal.smartbuilder.com/c/message_boards/find_message?p_l_id=&amp;messageId=647640</id> <updated>2025-04-09T22:04:29Z</updated> <published>2025-04-09T22:02:11Z</published> <summary type="html">Great question! The approach that I would take is to take a couple passes. First pass, replace any punctuation with a specific special character, then split at that character, then remove anything that is blank.&lt;br /&gt;&lt;br /&gt;So if you have:&lt;br /&gt;&lt;span style="font-family: &amp;#x22;courier&amp;#x20;new&amp;#x22;&amp;#x2c;&amp;#x20;courier&amp;#x2c;&amp;#x20;monospace"&gt;&lt;strong&gt;I am hungry; what&amp;#039;s for lunch? Not again - I ate that for breakfast!! Fine, I&amp;#039;ll have it again...&lt;/strong&gt;&lt;br /&gt;&lt;/span&gt;You can convert it to:&lt;br /&gt;&lt;strong&gt;&lt;span style="font-family: &amp;#x22;courier&amp;#x20;new&amp;#x22;&amp;#x2c;&amp;#x20;courier&amp;#x2c;&amp;#x20;monospace"&gt;I_am_hungry__what&amp;#039;s_for_lunch__Not_again___I_ate_that_for_breakfast___Fine__I&amp;#039;ll_have_it_again___&lt;/span&gt;&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;I&amp;#039;m not replacing the apostrophe - that&amp;#039;s a choice. There are a couple ways we could replace these characters. We could run several loops to remove specific characters one at a time using the replace segment block (which is found under &lt;strong&gt;Advanced &amp;gt; Text&lt;/strong&gt;). See image 1.&lt;br /&gt;&lt;br /&gt;A better way is to use the &amp;#034;pattern&amp;#034; option in the replace block, which allows you to use regular expressions. I&amp;#039;d read up on &lt;strong&gt;regex&lt;/strong&gt; a little, and ask an LLM to generate the code that you want. For REPLACING content, remember you often want to search for the opposite of what you want to keep. So if you want to keep alphanumeric and apostrophe, you want the regex for anything that is NOT alphanumeric and apostrophe, and you&amp;#039;re going replace that with some character, like | or _. See image 2.&lt;br /&gt;&lt;br /&gt;From there, split and remove empty values. See image 3.&lt;br /&gt;&lt;br /&gt;Here&amp;#039;s a video that walks through it - &lt;a href="https&amp;#x3a;&amp;#x2f;&amp;#x2f;share&amp;#x2e;smartbuilder&amp;#x2e;com&amp;#x2f;public&amp;#x2f;videotutorials&amp;#x2f;regex-regular-expression-in-SmartBuilder&amp;#x2e;mp4"&gt;https://share.smartbuilder.com/public/videotutorials/regex-regular-expression-in-SmartBuilder.mp4&lt;br /&gt;&lt;br /&gt;&lt;/a&gt;Hope that helps!</summary> <dc:creator>Navdeep Dhillon</dc:creator> <dc:date>2025-04-09T22:02:11Z</dc:date> </entry> <entry> <title>Parse text and exclude punctuation</title> <link rel="alternate" href="https://portal.smartbuilder.com/c/message_boards/find_message?p_l_id=&amp;messageId=647584" /> <author> <name>JL Pope</name> </author> <id>https://portal.smartbuilder.com/c/message_boards/find_message?p_l_id=&amp;messageId=647584</id> <updated>2025-04-09T18:44:02Z</updated> <published>2025-04-09T18:44:02Z</published> <summary type="html">&lt;span style="color: inherit"&gt;&lt;span style="font-family: inherit"&gt;&lt;span style="font-size: 12px"&gt;&lt;span style="color: #006400"&gt;&lt;span style="font-family: Verdana&amp;#x2c;&amp;#x20;sans-serif"&gt;&lt;span style="font-size: 12px"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;What is the best practice for splitting text at spaces, commas, periods, semicolons, etc. so I get just a list of words without punctuation or spaces?&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;</summary> <dc:creator>JL Pope</dc:creator> <dc:date>2025-04-09T18:44:02Z</dc:date> </entry> </feed> 