Foundation Text Utilities Reference
Use when using Foundation or NaturalLanguage text utilities — NSRegularExpression, NSDataDetector, NLTagger, NLTokenizer, or NSString bridging.
Use when using Foundation or NaturalLanguage text utilities — NSRegularExpression, NSDataDetector, NLTagger, NLTokenizer, or NSString bridging.
Family: Text Model And Foundation Utilities
Use this skill when you need the exact Foundation or NaturalLanguage tool for a text-processing problem.
When to Use
Section titled “When to Use”- You need
NSRegularExpression,NSDataDetector, or NaturalLanguage APIs. - You are measuring text or bridging
StringandNSString. - The question is about utility APIs, not parser choice alone.
Quick Decision
Section titled “Quick Decision”- Need parser choice guidance ->
/skill apple-text-parsing - Need the exact utility API or compatibility details -> stay here
- Need attributed-text model guidance instead of utilities ->
/skill apple-text-attributed-string
Core Guidance
Section titled “Core Guidance”NSRegularExpression
Section titled “NSRegularExpression”ICU-compatible regex engine. Reference type.
let pattern = "\\b[A-Z][a-z]+\\b"let regex = try NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
// Find all matcheslet text = "Hello World from Swift"let fullRange = NSRange(text.startIndex..., in: text)let matches = regex.matches(in: text, range: fullRange)
for match in matches { if let range = Range(match.range, in: text) { print(text[range]) }}
// First match onlylet firstMatch = regex.firstMatch(in: text, range: fullRange)
// Number of matcheslet count = regex.numberOfMatches(in: text, range: fullRange)
// Replacelet replaced = regex.stringByReplacingMatches( in: text, range: fullRange, withTemplate: "[$0]")
// Enumerate matchesregex.enumerateMatches(in: text, range: fullRange) { result, flags, stop in guard let result else { return } // Process match}Options
Section titled “Options”NSRegularExpression.Options: .caseInsensitive // i .allowCommentsAndWhitespace // x .ignoreMetacharacters // literal match .dotMatchesLineSeparators // s .anchorsMatchLines // m .useUnixLineSeparators .useUnicodeWordBoundariesCapture Groups
Section titled “Capture Groups”let regex = try NSRegularExpression(pattern: "(\\w+)@(\\w+\\.\\w+)")let text = "user@example.com"if let match = regex.firstMatch(in: text, range: NSRange(text.startIndex..., in: text)) { // match.range(at: 0) — full match // match.range(at: 1) — first group ("user") // match.range(at: 2) — second group ("example.com") let user = String(text[Range(match.range(at: 1), in: text)!]) let domain = String(text[Range(match.range(at: 2), in: text)!])}Modern Alternative: Swift Regex (iOS 16+)
Section titled “Modern Alternative: Swift Regex (iOS 16+)”let regex = /(?<user>\w+)@(?<domain>\w+\.\w+)/if let match = text.firstMatch(of: regex) { let user = match.user let domain = match.domain}
// With RegexBuilderimport RegexBuilderlet pattern = Regex { Capture { OneOrMore(.word) } "@" Capture { OneOrMore(.word); "."; OneOrMore(.word) }}When to use NSRegularExpression vs Swift Regex:
- NSRegularExpression: Dynamic patterns (user input), pre-iOS 16, NSRange-based APIs
- Swift Regex: Static patterns, type-safe captures, iOS 16+
NSDataDetector
Section titled “NSDataDetector”Detects semantic data in natural language text. Subclass of NSRegularExpression.
let types: NSTextCheckingResult.CheckingType = [.link, .phoneNumber, .address, .date]let detector = try NSDataDetector(types: types.rawValue)
let text = "Call 555-1234 on March 15, 2025 or visit https://apple.com"let matches = detector.matches(in: text, range: NSRange(text.startIndex..., in: text))
for match in matches { switch match.resultType { case .link: print("URL: \(match.url!)") case .phoneNumber: print("Phone: \(match.phoneNumber!)") case .address: print("Address: \(match.addressComponents!)") case .date: print("Date: \(match.date!)") case .transitInformation: print("Flight: \(match.components!)") default: break }}Supported Types
Section titled “Supported Types”| Type | Properties | Example |
|---|---|---|
.link | url | ”https://apple.com” |
.phoneNumber | phoneNumber | ”555-1234” |
.address | addressComponents | ”1 Apple Park Way, Cupertino” |
.date | date, duration, timeZone | ”March 15, 2025” |
.transitInformation | components (airline, flight) | “UA 123” |
Modern Alternative: DataDetection (iOS 18+)
Section titled “Modern Alternative: DataDetection (iOS 18+)”import DataDetection// New API with structured results and better accuracyNaturalLanguage Framework (iOS 12+)
Section titled “NaturalLanguage Framework (iOS 12+)”Replaces deprecated NSLinguisticTagger.
NLTagger
Section titled “NLTagger”Tag text with linguistic information:
import NaturalLanguage
let tagger = NLTagger(tagSchemes: [.lexicalClass, .nameType, .lemma])tagger.string = "Apple released new iPhones in Cupertino"
// Enumerate tagstagger.enumerateTags( in: tagger.string!.startIndex..<tagger.string!.endIndex, unit: .word, scheme: .lexicalClass) { tag, range in if let tag { print("\(tagger.string![range]): \(tag.rawValue)") // "Apple": Noun, "released": Verb, etc. } return true}Tag Schemes
Section titled “Tag Schemes”| Scheme | Tags | Purpose |
|---|---|---|
.tokenType | .word, .punctuation, .whitespace | Token classification |
.lexicalClass | .noun, .verb, .adjective, .adverb, etc. | Part of speech |
.nameType | .personalName, .placeName, .organizationName | Named entity recognition |
.lemma | (base form string) | Word lemmatization |
.language | (BCP 47 code) | Per-word language |
.script | (ISO 15924 code) | Writing script |
NLTokenizer
Section titled “NLTokenizer”Segment text into tokens:
let tokenizer = NLTokenizer(unit: .word) // .word, .sentence, .paragraph, .documenttokenizer.string = "Hello, world! How are you?"
tokenizer.enumerateTokens(in: tokenizer.string!.startIndex..<tokenizer.string!.endIndex) { range, attrs in print(tokenizer.string![range]) return true}// Output: "Hello", "world", "How", "are", "you"NLLanguageRecognizer
Section titled “NLLanguageRecognizer”Identify language of text:
let recognizer = NLLanguageRecognizer()recognizer.processString("Bonjour le monde")let language = recognizer.dominantLanguage // .french
// With probabilitieslet hypotheses = recognizer.languageHypotheses(withMaximum: 3)// [.french: 0.95, .italian: 0.03, .spanish: 0.02]
// Constrain to specific languagesrecognizer.languageConstraints = [.english, .french, .german]
// Language hints (prior probabilities)recognizer.languageHints = [.french: 0.8, .english: 0.2]NLEmbedding
Section titled “NLEmbedding”Word and sentence embeddings for semantic similarity:
// Built-in word embeddingsif let embedding = NLEmbedding.wordEmbedding(for: .english) { let distance = embedding.distance(between: "king", and: "queen")
// Find nearest neighbors embedding.enumerateNeighbors(for: "swift", maximumCount: 5) { neighbor, distance in print("\(neighbor): \(distance)") return true }}
// Sentence embedding (iOS 14+)if let sentenceEmbedding = NLEmbedding.sentenceEmbedding(for: .english) { let distance = sentenceEmbedding.distance( between: "The cat sat on the mat", and: "A feline rested on the rug" )}Custom NLModel (via Create ML)
Section titled “Custom NLModel (via Create ML)”// Load trained modellet model = try NLModel(mlModel: MyTextClassifier().model)
// Classify textlet label = model.predictedLabel(for: "This is great!")// e.g., "positive"
// With confidencelet hypotheses = model.predictedLabelHypotheses(for: "This is great!", maximumCount: 3)NSStringDrawingContext
Section titled “NSStringDrawingContext”Controls text drawing behavior, especially scaling:
let context = NSStringDrawingContext()context.minimumScaleFactor = 0.5 // Allow shrinking to 50%
let boundingRect = CGRect(x: 0, y: 0, width: 200, height: 50)attributedString.draw(with: boundingRect, options: [.usesLineFragmentOrigin], context: context)
// Check what scale was actually usedprint("Scale used: \(context.actualScaleFactor)")// 1.0 = no shrinking needed, < 1.0 = text was shrunkBounding Rect Calculation
Section titled “Bounding Rect Calculation”// Calculate size needed for attributed stringlet size = attributedString.boundingRect( with: CGSize(width: maxWidth, height: .greatestFiniteMagnitude), options: [.usesLineFragmentOrigin, .usesFontLeading], context: nil).size
// Round up for pixel alignmentlet ceilSize = CGSize(width: ceil(size.width), height: ceil(size.height))Options:
.usesLineFragmentOrigin— Multi-line text (ALWAYS include for multi-line).usesFontLeading— Include font leading in height.truncatesLastVisibleLine— Truncate if exceeds bounds
String / NSString Bridging
Section titled “String / NSString Bridging”Key Differences
Section titled “Key Differences”| Aspect | String (Swift) | NSString (ObjC) |
|---|---|---|
| Encoding | UTF-8 internal | UTF-16 internal |
| Indexing | String.Index (Character) | Int (UTF-16 code unit) |
| Count | .count (Characters) | .length (UTF-16 units) |
| Empty check | .isEmpty | .length == 0 |
| Type | Value type | Reference type |
Bridging Cost
Section titled “Bridging Cost”let swiftStr: String = "Hello"let nsStr = swiftStr as NSString // Bridge (may defer copy)let backStr = nsStr as String // Bridge back
// NSRange ↔ Range conversionlet nsRange = NSRange(swiftStr.startIndex..., in: swiftStr)let swiftRange = Range(nsRange, in: swiftStr)Performance note: Bridging is NOT zero-cost. UTF-8 ↔ UTF-16 conversion may occur. For tight loops with Foundation APIs, consider working with NSString directly.
Common Pattern: NSRange from String
Section titled “Common Pattern: NSRange from String”let text = "Hello 👋🏽 World"
// ✅ CORRECT: Using String for conversionlet nsRange = NSRange(text.range(of: "World")!, in: text)
// ✅ CORRECT: Full rangelet fullRange = NSRange(text.startIndex..., in: text)
// ❌ WRONG: Assuming character count = NSString lengthlet badRange = NSRange(location: 0, length: text.count) // WRONG for emoji/CJKWhy counts differ: "👋🏽".count = 1 (one Character), ("👋🏽" as NSString).length = 4 (four UTF-16 code units).
Quick Reference
Section titled “Quick Reference”| Need | API | Min OS |
|---|---|---|
| Pattern matching (dynamic) | NSRegularExpression | All |
| Pattern matching (static) | Swift Regex | iOS 16 |
| Detect links/phones/dates | NSDataDetector | All |
| Detect data (modern) | DataDetection | iOS 18 |
| Part of speech tagging | NLTagger (.lexicalClass) | iOS 12 |
| Named entity recognition | NLTagger (.nameType) | iOS 12 |
| Language detection | NLLanguageRecognizer | iOS 12 |
| Text segmentation | NLTokenizer | iOS 12 |
| Word similarity | NLEmbedding.wordEmbedding | iOS 13 |
| Sentence similarity | NLEmbedding.sentenceEmbedding | iOS 14 |
| Custom classifier | NLModel + Create ML | iOS 12 |
| Text measurement | NSAttributedString.boundingRect | All |
| Draw text with scaling | NSStringDrawingContext | All |
Common Pitfalls
Section titled “Common Pitfalls”- Assuming String.count == NSString.length — They use different counting units (Characters vs UTF-16). Always convert ranges explicitly.
- Missing
.usesLineFragmentOrigin— Without this option,boundingRectcalculates for single-line text. - NSRegularExpression with user input — Always
trythe constructor — invalid patterns throw. - NLTagger requires enough text — Very short strings produce unreliable linguistic analysis.
- Bridging in hot loops — String ↔ NSString conversion has overhead. Keep one type in tight loops.
Documentation Scope
Section titled “Documentation Scope”This page documents the apple-text-foundation-ref reference skill. Use it when the subsystem is already known and you need mechanics, behavior, or API detail.
Related
Section titled “Related”apple-text-parsing: Use when deciding between Swift Regex and NSRegularExpression, bridging regex results to NSRange, or choosing a parsing strategy.apple-text-attributed-string: Use when choosing between AttributedString and NSAttributedString, defining custom attributes, or converting between them.apple-text-markdown: Use when working with Markdown in SwiftUI Text or AttributedString — what renders, PresentationIntent, or rendering gaps.
Full SKILL.md source
---name: apple-text-foundation-refdescription: Use when using Foundation or NaturalLanguage text utilities — NSRegularExpression, NSDataDetector, NLTagger, NLTokenizer, or NSString bridginglicense: MIT---
# Foundation Text Utilities Reference
Use this skill when you need the exact Foundation or NaturalLanguage tool for a text-processing problem.
## When to Use
- You need `NSRegularExpression`, `NSDataDetector`, or NaturalLanguage APIs.- You are measuring text or bridging `String` and `NSString`.- The question is about utility APIs, not parser choice alone.
## Quick Decision
- Need parser choice guidance -> `/skill apple-text-parsing`- Need the exact utility API or compatibility details -> stay here- Need attributed-text model guidance instead of utilities -> `/skill apple-text-attributed-string`
## Core Guidance
## NSRegularExpression
ICU-compatible regex engine. Reference type.
```swiftlet pattern = "\\b[A-Z][a-z]+\\b"let regex = try NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
// Find all matcheslet text = "Hello World from Swift"let fullRange = NSRange(text.startIndex..., in: text)let matches = regex.matches(in: text, range: fullRange)
for match in matches { if let range = Range(match.range, in: text) { print(text[range]) }}
// First match onlylet firstMatch = regex.firstMatch(in: text, range: fullRange)
// Number of matcheslet count = regex.numberOfMatches(in: text, range: fullRange)
// Replacelet replaced = regex.stringByReplacingMatches( in: text, range: fullRange, withTemplate: "[$0]")
// Enumerate matchesregex.enumerateMatches(in: text, range: fullRange) { result, flags, stop in guard let result else { return } // Process match}```
### Options
```swiftNSRegularExpression.Options: .caseInsensitive // i .allowCommentsAndWhitespace // x .ignoreMetacharacters // literal match .dotMatchesLineSeparators // s .anchorsMatchLines // m .useUnixLineSeparators .useUnicodeWordBoundaries```
### Capture Groups
```swiftlet regex = try NSRegularExpression(pattern: "(\\w+)@(\\w+\\.\\w+)")let text = "user@example.com"if let match = regex.firstMatch(in: text, range: NSRange(text.startIndex..., in: text)) { // match.range(at: 0) — full match // match.range(at: 1) — first group ("user") // match.range(at: 2) — second group ("example.com") let user = String(text[Range(match.range(at: 1), in: text)!]) let domain = String(text[Range(match.range(at: 2), in: text)!])}```
### Modern Alternative: Swift Regex (iOS 16+)
```swiftlet regex = /(?<user>\w+)@(?<domain>\w+\.\w+)/if let match = text.firstMatch(of: regex) { let user = match.user let domain = match.domain}
// With RegexBuilderimport RegexBuilderlet pattern = Regex { Capture { OneOrMore(.word) } "@" Capture { OneOrMore(.word); "."; OneOrMore(.word) }}```
**When to use NSRegularExpression vs Swift Regex:**- NSRegularExpression: Dynamic patterns (user input), pre-iOS 16, NSRange-based APIs- Swift Regex: Static patterns, type-safe captures, iOS 16+
## NSDataDetector
Detects semantic data in natural language text. Subclass of NSRegularExpression.
```swiftlet types: NSTextCheckingResult.CheckingType = [.link, .phoneNumber, .address, .date]let detector = try NSDataDetector(types: types.rawValue)
let text = "Call 555-1234 on March 15, 2025 or visit https://apple.com"let matches = detector.matches(in: text, range: NSRange(text.startIndex..., in: text))
for match in matches { switch match.resultType { case .link: print("URL: \(match.url!)") case .phoneNumber: print("Phone: \(match.phoneNumber!)") case .address: print("Address: \(match.addressComponents!)") case .date: print("Date: \(match.date!)") case .transitInformation: print("Flight: \(match.components!)") default: break }}```
### Supported Types
| Type | Properties | Example ||------|-----------|---------|| `.link` | `url` | "https://apple.com" || `.phoneNumber` | `phoneNumber` | "555-1234" || `.address` | `addressComponents` | "1 Apple Park Way, Cupertino" || `.date` | `date`, `duration`, `timeZone` | "March 15, 2025" || `.transitInformation` | `components` (airline, flight) | "UA 123" |
### Modern Alternative: DataDetection (iOS 18+)
```swiftimport DataDetection// New API with structured results and better accuracy```
## NaturalLanguage Framework (iOS 12+)
Replaces deprecated `NSLinguisticTagger`.
### NLTagger
Tag text with linguistic information:
```swiftimport NaturalLanguage
let tagger = NLTagger(tagSchemes: [.lexicalClass, .nameType, .lemma])tagger.string = "Apple released new iPhones in Cupertino"
// Enumerate tagstagger.enumerateTags( in: tagger.string!.startIndex..<tagger.string!.endIndex, unit: .word, scheme: .lexicalClass) { tag, range in if let tag { print("\(tagger.string![range]): \(tag.rawValue)") // "Apple": Noun, "released": Verb, etc. } return true}```
### Tag Schemes
| Scheme | Tags | Purpose ||--------|------|---------|| `.tokenType` | `.word`, `.punctuation`, `.whitespace` | Token classification || `.lexicalClass` | `.noun`, `.verb`, `.adjective`, `.adverb`, etc. | Part of speech || `.nameType` | `.personalName`, `.placeName`, `.organizationName` | Named entity recognition || `.lemma` | (base form string) | Word lemmatization || `.language` | (BCP 47 code) | Per-word language || `.script` | (ISO 15924 code) | Writing script |
### NLTokenizer
Segment text into tokens:
```swiftlet tokenizer = NLTokenizer(unit: .word) // .word, .sentence, .paragraph, .documenttokenizer.string = "Hello, world! How are you?"
tokenizer.enumerateTokens(in: tokenizer.string!.startIndex..<tokenizer.string!.endIndex) { range, attrs in print(tokenizer.string![range]) return true}// Output: "Hello", "world", "How", "are", "you"```
### NLLanguageRecognizer
Identify language of text:
```swiftlet recognizer = NLLanguageRecognizer()recognizer.processString("Bonjour le monde")let language = recognizer.dominantLanguage // .french
// With probabilitieslet hypotheses = recognizer.languageHypotheses(withMaximum: 3)// [.french: 0.95, .italian: 0.03, .spanish: 0.02]
// Constrain to specific languagesrecognizer.languageConstraints = [.english, .french, .german]
// Language hints (prior probabilities)recognizer.languageHints = [.french: 0.8, .english: 0.2]```
### NLEmbedding
Word and sentence embeddings for semantic similarity:
```swift// Built-in word embeddingsif let embedding = NLEmbedding.wordEmbedding(for: .english) { let distance = embedding.distance(between: "king", and: "queen")
// Find nearest neighbors embedding.enumerateNeighbors(for: "swift", maximumCount: 5) { neighbor, distance in print("\(neighbor): \(distance)") return true }}
// Sentence embedding (iOS 14+)if let sentenceEmbedding = NLEmbedding.sentenceEmbedding(for: .english) { let distance = sentenceEmbedding.distance( between: "The cat sat on the mat", and: "A feline rested on the rug" )}```
### Custom NLModel (via Create ML)
```swift// Load trained modellet model = try NLModel(mlModel: MyTextClassifier().model)
// Classify textlet label = model.predictedLabel(for: "This is great!")// e.g., "positive"
// With confidencelet hypotheses = model.predictedLabelHypotheses(for: "This is great!", maximumCount: 3)```
## NSStringDrawingContext
Controls text drawing behavior, especially scaling:
```swiftlet context = NSStringDrawingContext()context.minimumScaleFactor = 0.5 // Allow shrinking to 50%
let boundingRect = CGRect(x: 0, y: 0, width: 200, height: 50)attributedString.draw(with: boundingRect, options: [.usesLineFragmentOrigin], context: context)
// Check what scale was actually usedprint("Scale used: \(context.actualScaleFactor)")// 1.0 = no shrinking needed, < 1.0 = text was shrunk```
### Bounding Rect Calculation
```swift// Calculate size needed for attributed stringlet size = attributedString.boundingRect( with: CGSize(width: maxWidth, height: .greatestFiniteMagnitude), options: [.usesLineFragmentOrigin, .usesFontLeading], context: nil).size
// Round up for pixel alignmentlet ceilSize = CGSize(width: ceil(size.width), height: ceil(size.height))```
**Options:**- `.usesLineFragmentOrigin` — Multi-line text (ALWAYS include for multi-line)- `.usesFontLeading` — Include font leading in height- `.truncatesLastVisibleLine` — Truncate if exceeds bounds
## String / NSString Bridging
### Key Differences
| Aspect | String (Swift) | NSString (ObjC) ||--------|---------------|-----------------|| **Encoding** | UTF-8 internal | UTF-16 internal || **Indexing** | `String.Index` (Character) | `Int` (UTF-16 code unit) || **Count** | `.count` (Characters) | `.length` (UTF-16 units) || **Empty check** | `.isEmpty` | `.length == 0` || **Type** | Value type | Reference type |
### Bridging Cost
```swiftlet swiftStr: String = "Hello"let nsStr = swiftStr as NSString // Bridge (may defer copy)let backStr = nsStr as String // Bridge back
// NSRange ↔ Range conversionlet nsRange = NSRange(swiftStr.startIndex..., in: swiftStr)let swiftRange = Range(nsRange, in: swiftStr)```
**Performance note:** Bridging is NOT zero-cost. UTF-8 ↔ UTF-16 conversion may occur. For tight loops with Foundation APIs, consider working with NSString directly.
### Common Pattern: NSRange from String
```swiftlet text = "Hello 👋🏽 World"
// ✅ CORRECT: Using String for conversionlet nsRange = NSRange(text.range(of: "World")!, in: text)
// ✅ CORRECT: Full rangelet fullRange = NSRange(text.startIndex..., in: text)
// ❌ WRONG: Assuming character count = NSString lengthlet badRange = NSRange(location: 0, length: text.count) // WRONG for emoji/CJK```
**Why counts differ:** `"👋🏽".count` = 1 (one Character), `("👋🏽" as NSString).length` = 4 (four UTF-16 code units).
## Quick Reference
| Need | API | Min OS ||------|-----|--------|| Pattern matching (dynamic) | NSRegularExpression | All || Pattern matching (static) | Swift Regex | iOS 16 || Detect links/phones/dates | NSDataDetector | All || Detect data (modern) | DataDetection | iOS 18 || Part of speech tagging | NLTagger (.lexicalClass) | iOS 12 || Named entity recognition | NLTagger (.nameType) | iOS 12 || Language detection | NLLanguageRecognizer | iOS 12 || Text segmentation | NLTokenizer | iOS 12 || Word similarity | NLEmbedding.wordEmbedding | iOS 13 || Sentence similarity | NLEmbedding.sentenceEmbedding | iOS 14 || Custom classifier | NLModel + Create ML | iOS 12 || Text measurement | NSAttributedString.boundingRect | All || Draw text with scaling | NSStringDrawingContext | All |
## Common Pitfalls
1. **Assuming String.count == NSString.length** — They use different counting units (Characters vs UTF-16). Always convert ranges explicitly.2. **Missing `.usesLineFragmentOrigin`** — Without this option, `boundingRect` calculates for single-line text.3. **NSRegularExpression with user input** — Always `try` the constructor — invalid patterns throw.4. **NLTagger requires enough text** — Very short strings produce unreliable linguistic analysis.5. **Bridging in hot loops** — String ↔ NSString conversion has overhead. Keep one type in tight loops.
## Related Skills
- Use `/skill apple-text-parsing` for Swift Regex vs `NSRegularExpression` choice.- Use `/skill apple-text-markdown` when parsing feeds Markdown-rendering workflows.- Use `/skill apple-text-attributed-string` when utility output becomes attributed content.