Text analytics
Extracting insights from unstructured text using TrueState’s natural language tools
Text is one of the richest but most challenging types of business data. It lives in emails, support tickets, surveys, reviews, contracts, and more. To extract meaning from it at scale, you need the right mix of automation, structure, and interpretability.
TrueState supports a powerful set of text analytics tools, all accessible directly within the Pipeline canvas. These tools let you enrich, classify, tag, and summarise unstructured text—without writing code or managing external models.
This guide explains each of the supported methods, when to use them, and how to configure them inside a pipeline.
Supported methods:
- Automations from Pipelines – Flexible LLM chains, scraping, and external enrichment
- High-volume LLM inference – Fast summarisation and enrichment
- Text Classification – Direct label assignment using fine-tuned classifiers
- Universal Classification – Single-label classification using logic-based statements
- Tagging – Multi-label classification using pipe-separated tags
- Hierarchy Classification – Tree-based, MECE-aligned label selection
1. Calling automations from pipelines
For advanced or multi-step enrichment, you can call a full automation from within a pipeline.
Using the Automation step, you can run any automation once per record. Each input is passed through the automation, and the result is appended to the dataset.
This allows you to combine:
- Large Language Model (LLM) chains
- Web scraping
- Conditional logic
- External APIs
For more on how to build these, see the Automations guide.
Use cases:
- Summarising documents using GPT-4 or Claude
- Enriching lead records with web-sourced company metadata
- Running multi-hop reasoning on product reviews
Use Automations when you need control, orchestration, or hybrid workflows. If you only need high-throughput enrichment, prefer LLM inference.
2. High-volume LLM inference
The LLM Inference step uses small, efficient models to enrich text quickly and cost-effectively. It’s ideal when you need structured output across thousands of records.
Capabilities include:
- Summarising product descriptions
- Extracting entities or topics
- Rewriting or simplifying text
- Light classification or tone detection
Use cases:
- Creating short-form summaries for a UI
- Extracting country and company mentions from survey responses
- Rephrasing raw notes into business-ready summaries
This step is optimised for performance and throughput—not deep reasoning or chain-of-thought logic.
3. Text classification
Text Classification allows you to assign a label to each record by choosing from a predefined set of categories. It uses pre-trained or fine-tuned models behind the scenes to select the most appropriate label from your list.
Unlike Universal Classification or Tagging, this approach doesn’t require logic statements—it simply learns the mapping from text to labels based on examples or embeddings.
Use cases:
- Categorising feedback into predefined themes (e.g., UI, Pricing, Support)
- Assigning sentiment categories: Positive, Neutral, Negative
- Labeling intent in form submissions or queries
Configuration:
- You provide a list of possible labels (e.g., “Bug”, “Feature Request”, “General Inquiry”)
- The model selects the best match for each row
Use Text Classification when you already have a clear list of categories and don’t need explanation logic or flexible tagging.
4. Universal classification
Universal classification uses a Natural Language Inference (NLI) model to classify a text input based on a statement you provide.
You define a set of labels, each paired with a statement. The model determines whether that statement is entailed by the input. If so, the corresponding label is applied.
Example:
- Input: “The customer asked for a refund after receiving a broken item.”
- Statement: “This message is a complaint.”
- → Entailed → Assign label:
"Complaint"
Use cases:
- Intent detection in messages or tickets
- Filtering for eligibility criteria in open-ended responses
- Auto-labeling short texts for downstream filtering
Write statements as plain-English factual assertions. Avoid ambiguous or compound phrasing.
5. Tagging (criteria-based)
Tagging is a multi-label abstraction of universal classification. Instead of assigning just one label, you define a set of tags, each with one or more criteria statements. If any statement is entailed by the input, the tag is applied.
Multiple tags can be assigned per row. They are returned as a |
-separated string.
Example output: "Complaint|Urgent|Refund"
Use cases:
- Flagging multiple concerns in a support transcript
- Annotating user feedback with multiple themes
- Extracting overlapping topics from interviews
How to define:
Each tag is defined as:
TagName | Statement
Multiple criteria can be attached to the same tag.
Use tagging when you want broad annotation across multiple dimensions. For single-label classification, use universal classification instead.
6. Hierarchy classification
Hierarchy classification is for structured, multi-level label selection. You define a hierarchy of labels, grouped by level, where the model evaluates entailed statements within each level and selects the highest-scoring peer.
This approach ensures mutually exclusive decisions at each level of a hierarchy.
Key rule: Labels at each level must be MECE (Mutually Exclusive, Collectively Exhaustive).
Example structure:
Level 1
- Product Feedback: “This message is about the product.”
- Support Request: “This message is asking for help.”
- General Comment: “This message is general commentary.”
Level 2 (under Product Feedback)
- Pricing Concern: “The message discusses the product’s pricing.”
- Feature Request: “The message asks for a new product feature.”
If a record is classified as “Product Feedback” at Level 1, it will be evaluated against the Level 2 options. Among any peer group, only the highest-scoring label is selected.
Use cases:
- Classifying tickets into department → topic → subtopic
- Routing forms through a business process hierarchy
- Multi-level content categorisation
Use hierarchy classification when your label set is nested or tree-structured. Ensure each group at a level has no overlaps in definition.
Choosing the right text analytics method
Goal | Recommended Step | Notes |
---|---|---|
Flexible, high-quality enrichment | Automation step | Use for chains, scraping, or external APIs |
Fast enrichment at scale | LLM Inference step | Best for summarisation and extraction |
Simple multi-class prediction | Text Classification | Use when categories are known and unambiguous |
Logic-based single-label classification | Universal Classification | Uses NLI to match statements |
Multi-label annotation | Tagging step | Flexible tagging using pipe-separated output |
Tree-based classification | Hierarchy Classification | Supports nested taxonomies with MECE label logic |
Glossary
- Automation step – Executes a full automation workflow for each row in a dataset.
- LLM Inference step – Applies fast, high-throughput language models to text.
- Text Classification – Assigns a best-match label from a list without needing logic statements.
- Universal Classification – Uses NLI to assign a single label based on entailed logic.
- Tagging – Assigns multiple tags using matching statements and pipe-separated outputs.
- Hierarchy Classification – Selects labels across multiple levels with MECE structure.
- MECE – A classification principle: Mutually Exclusive, Collectively Exhaustive.
Best practices
- Write clear, concise, and specific statements for classification
- Use Text Classification when labels are stable and fixed
- Don’t overload tagging steps—group by theme when possible
- Validate performance on a small batch before scaling
- Use Automations for advanced logic, but monitor cost and latency
Next steps
- Go to the Pipeline section in TrueState
- Upload a dataset with one or more text columns
- Add the appropriate text analytics node to your pipeline
- Configure using natural language, label lists, or tag templates
- Combine with downstream classification, enrichment, or reporting steps