Skip to content

Commit 8e535dd

Browse files
committed
Updating documentation with code examples.
1 parent f828918 commit 8e535dd

4 files changed

Lines changed: 325 additions & 4 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
A Python port of [Phileas (Java)](https://github.com/philterd/phileas) — a library to deidentify and redact PII, PHI, and other sensitive information from text.
44

5-
Check out the [documentation](https://philterd.github.io/phileas-python/) or details and code examples.
6-
7-
Built by [Philterd](https://www.philterd.ai).
5+
* Check out the [documentation](https://philterd.github.io/phileas-python/) or details and code examples.
6+
* Built by [Philterd](https://www.philterd.ai).
7+
* Commercial support and consulting is available - [contact us](https://www.philterd.ai).
88

99
## Overview
1010

@@ -504,7 +504,7 @@ pytest tests/ -v
504504

505505
## License
506506

507-
Copyright 2027 Philterd, LLC.
507+
Copyright 2026 Philterd, LLC.
508508

509509
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.
510510

docs/api-reference.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,56 @@ The main entry point for filtering text. `FilterService` is stateless; a single
88

99
```python
1010
from phileas.services.filter_service import FilterService
11+
from phileas.policy.policy import Policy
1112

13+
# Create with default in-memory context service
1214
service = FilterService()
15+
16+
# Or provide a custom context service
17+
from phileas.services.context_service import InMemoryContextService
18+
ctx_svc = InMemoryContextService()
19+
service = FilterService(context_service=ctx_svc)
1320
```
1421

22+
### Constructor
23+
24+
```python
25+
FilterService(context_service=None)
26+
```
27+
28+
**Parameters**
29+
30+
| Parameter | Type | Default | Description |
31+
|---|---|---|
32+
| `context_service` | `AbstractContextService` or `None` | `None` | Context service implementation for managing referential integrity. If `None`, an `InMemoryContextService` is created automatically. |
33+
1534
### `filter(policy, context, document_id, text)`
1635

1736
Apply the policy to the given text and return a `FilterResult`.
1837

1938
```python
39+
from phileas.services.filter_service import FilterService
40+
from phileas.policy.policy import Policy
41+
42+
policy = Policy.from_dict({
43+
"name": "example",
44+
"identifiers": {
45+
"emailAddress": {
46+
"emailAddressFilterStrategies": [{"strategy": "REDACT"}]
47+
}
48+
}
49+
})
50+
51+
service = FilterService()
2052
result = service.filter(
2153
policy=policy,
2254
context="my-app",
2355
document_id="doc-001",
2456
text="Contact john@example.com.",
2557
)
58+
59+
print(result.filtered_text)
60+
# Contact {{{REDACTED-email-address}}}.
2661
```
2762

2863
**Parameters**
@@ -215,6 +250,119 @@ Serialise to a dict.
215250

216251
---
217252

253+
## AbstractContextService
254+
255+
`phileas.services.context_service.AbstractContextService`
256+
257+
Abstract base class for context service implementations. Subclass this to provide a custom backend (e.g., Redis, database, etc.).
258+
259+
```python
260+
from phileas.services.context_service import AbstractContextService
261+
from phileas.services.filter_service import FilterService
262+
from phileas.policy.policy import Policy
263+
264+
class RedisContextService(AbstractContextService):
265+
"""Example custom context service using Redis."""
266+
267+
def __init__(self, redis_client):
268+
self.redis = redis_client
269+
270+
def put(self, context: str, token: str, replacement: str) -> None:
271+
"""Store token -> replacement mapping in Redis."""
272+
key = f"phileas:{context}:{token}"
273+
self.redis.set(key, replacement)
274+
275+
def get(self, context: str, token: str) -> str | None:
276+
"""Retrieve replacement from Redis, or None if not found."""
277+
key = f"phileas:{context}:{token}"
278+
value = self.redis.get(key)
279+
return value.decode('utf-8') if value else None
280+
281+
def contains(self, context: str, token: str) -> bool:
282+
"""Check if token exists in Redis."""
283+
key = f"phileas:{context}:{token}"
284+
return self.redis.exists(key) > 0
285+
286+
# Usage example (requires redis package)
287+
# import redis
288+
# redis_client = redis.Redis(host='localhost', port=6379, db=0)
289+
# ctx_svc = RedisContextService(redis_client)
290+
# service = FilterService(context_service=ctx_svc)
291+
```
292+
293+
### Methods
294+
295+
| Method | Signature | Description |
296+
|---|---|---|
297+
| `put` | `(context, token, replacement) -> None` | Store a replacement value for a token under the given context |
298+
| `get` | `(context, token) -> str \| None` | Return the stored replacement, or `None` if not found |
299+
| `contains` | `(context, token) -> bool` | Return `True` if a replacement exists for the token in the given context |
300+
301+
---
302+
303+
## InMemoryContextService
304+
305+
`phileas.services.context_service.InMemoryContextService`
306+
307+
Default implementation of `AbstractContextService` backed by a `dict[str, dict[str, str]]`. Suitable for single-process, in-memory use.
308+
309+
```python
310+
from phileas.services.context_service import InMemoryContextService
311+
from phileas.services.filter_service import FilterService
312+
from phileas.policy.policy import Policy
313+
314+
# Create and pre-populate the context service
315+
ctx_svc = InMemoryContextService()
316+
ctx_svc.put("patient-123", "john@example.com", "EMAIL-001")
317+
ctx_svc.put("patient-123", "555-867-5309", "PHONE-001")
318+
319+
# Use it with FilterService
320+
policy = Policy.from_dict({
321+
"name": "medical",
322+
"identifiers": {
323+
"emailAddress": {
324+
"emailAddressFilterStrategies": [{"strategy": "REDACT"}]
325+
},
326+
"phoneNumber": {
327+
"phoneNumberFilterStrategies": [{"strategy": "REDACT"}]
328+
}
329+
}
330+
})
331+
332+
service = FilterService(context_service=ctx_svc)
333+
334+
# The pre-populated replacements will be used
335+
result1 = service.filter(
336+
policy, "patient-123", "doc-1",
337+
"Contact john@example.com or 555-867-5309."
338+
)
339+
print(result1.filtered_text)
340+
# Contact EMAIL-001 or PHONE-001.
341+
342+
# The same replacements persist across documents in the same context
343+
result2 = service.filter(
344+
policy, "patient-123", "doc-2",
345+
"Patient called 555-867-5309 from john@example.com."
346+
)
347+
print(result2.filtered_text)
348+
# Patient called PHONE-001 from EMAIL-001.
349+
350+
# Check what's stored
351+
print(ctx_svc.get("patient-123", "john@example.com")) # EMAIL-001
352+
print(ctx_svc.contains("patient-123", "555-867-5309")) # True
353+
print(ctx_svc.get("patient-123", "unknown@example.com")) # None
354+
```
355+
356+
### Methods
357+
358+
| Method | Signature | Description |
359+
|---|---|---|
360+
| `put` | `(context, token, replacement) -> None` | Store a replacement for a token |
361+
| `get` | `(context, token) -> str \| None` | Retrieve a replacement, or `None` if not found |
362+
| `contains` | `(context, token) -> bool` | Check if a replacement exists |
363+
364+
---
365+
218366
## Policy key reference
219367

220368
The table below maps every JSON/YAML policy key to the `Identifiers` attribute it populates and the strategies key used in its filter config.

docs/examples.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,9 @@ ignoredPatterns:
341341
Requires a running [ph-eye](https://github.com/philterd/ph-eye) service.
342342
343343
```python
344+
from phileas.policy.policy import Policy
345+
from phileas.services.filter_service import FilterService
346+
344347
policy = Policy.from_dict({
345348
"name": "ner-demo",
346349
"identifiers": {
@@ -355,10 +358,158 @@ policy = Policy.from_dict({
355358
}
356359
})
357360

361+
service = FilterService()
358362
result = service.filter(
359363
policy, "app", "doc-12",
360364
"Dr. Alice Johnson reviewed the case."
361365
)
362366
print(result.filtered_text)
363367
# Dr. {{{REDACTED-person}}} reviewed the case.
364368
```
369+
370+
---
371+
372+
## Redact custom patterns with regex
373+
374+
Use custom pattern filters to detect domain-specific PII not covered by built-in filters.
375+
376+
```python
377+
from phileas.policy.policy import Policy
378+
from phileas.services.filter_service import FilterService
379+
380+
policy = Policy.from_dict({
381+
"name": "custom-patterns",
382+
"identifiers": {
383+
"patterns": [
384+
{
385+
"pattern": r"EMP-\d{6}",
386+
"label": "employee-id",
387+
"patternFilterStrategies": [{"strategy": "REDACT"}]
388+
},
389+
{
390+
"pattern": r"[A-Z]{2}\d{6}",
391+
"label": "passport-number",
392+
"patternFilterStrategies": [{"strategy": "MASK"}]
393+
}
394+
]
395+
}
396+
})
397+
398+
service = FilterService()
399+
result = service.filter(
400+
policy, "app", "doc-13",
401+
"Employee EMP-123456 has passport AB123456."
402+
)
403+
print(result.filtered_text)
404+
# Employee {{{REDACTED-employee-id}}} has passport *********.
405+
```
406+
407+
---
408+
409+
## Redact terms from a dictionary
410+
411+
Use the dictionaries filter to redact known names or sensitive terms.
412+
413+
```python
414+
from phileas.policy.policy import Policy
415+
from phileas.services.filter_service import FilterService
416+
417+
policy = Policy.from_dict({
418+
"name": "dictionary-demo",
419+
"identifiers": {
420+
"dictionaries": [
421+
{
422+
"terms": ["John Smith", "Jane Doe", "confidential", "proprietary"],
423+
"dictionaryFilterStrategies": [{"strategy": "REDACT"}]
424+
}
425+
]
426+
}
427+
})
428+
429+
service = FilterService()
430+
result = service.filter(
431+
policy, "app", "doc-14",
432+
"John Smith shared confidential data with Jane Doe about the proprietary algorithm."
433+
)
434+
print(result.filtered_text)
435+
# {{{REDACTED-dictionary}}} shared {{{REDACTED-dictionary}}} data with {{{REDACTED-dictionary}}} about the {{{REDACTED-dictionary}}} algorithm.
436+
```
437+
438+
---
439+
440+
## Use conditions to filter selectively
441+
442+
Apply different strategies based on the matched value using condition expressions.
443+
444+
```python
445+
from phileas.policy.policy import Policy
446+
from phileas.services.filter_service import FilterService
447+
448+
policy = Policy.from_dict({
449+
"name": "conditional-filtering",
450+
"identifiers": {
451+
"phoneNumber": {
452+
"phoneNumberFilterStrategies": [
453+
# Redact phone numbers starting with 555 (test numbers)
454+
{"strategy": "REDACT", "condition": 'token startswith "555"'},
455+
# Mask all other phone numbers
456+
{"strategy": "MASK"}
457+
]
458+
}
459+
}
460+
})
461+
462+
service = FilterService()
463+
result = service.filter(
464+
policy, "app", "doc-15",
465+
"Test: 555-123-4567, Real: 800-867-5309"
466+
)
467+
print(result.filtered_text)
468+
# Test: {{{REDACTED-phone-number}}}, Real: ***-***-****
469+
```
470+
471+
---
472+
473+
## Maintain referential integrity across documents
474+
475+
Use contexts to ensure the same PII value gets the same replacement across multiple documents.
476+
477+
```python
478+
from phileas.policy.policy import Policy
479+
from phileas.services.filter_service import FilterService
480+
481+
policy = Policy.from_dict({
482+
"name": "context-demo",
483+
"identifiers": {
484+
"emailAddress": {
485+
"emailAddressFilterStrategies": [{"strategy": "HASH_SHA256_REPLACE"}]
486+
}
487+
}
488+
})
489+
490+
service = FilterService()
491+
492+
# Filter multiple documents in the same context
493+
doc1 = service.filter(
494+
policy, "patient-123", "note-1",
495+
"Patient emailed from john@example.com"
496+
)
497+
doc2 = service.filter(
498+
policy, "patient-123", "note-2",
499+
"Follow-up: john@example.com responded"
500+
)
501+
502+
# The hash will be identical in both documents
503+
print(doc1.filtered_text)
504+
# Patient emailed from 5bb8a5cbf6...
505+
506+
print(doc2.filtered_text)
507+
# Follow-up: 5bb8a5cbf6... responded
508+
509+
# Different context = different hash
510+
doc3 = service.filter(
511+
policy, "patient-456", "note-1",
512+
"Patient emailed from john@example.com"
513+
)
514+
# Hash will be different in patient-456 context
515+
```

0 commit comments

Comments
 (0)