EK9 Security: Input Sanitization and Sensitive Data

EK9 provides a comprehensive security framework with two complementary layers:

Both layers follow a "Rejection at the Source" philosophy and are secure by default with zero configuration required, while remaining enterprise extensible for organizations that need custom security policies or specialized logging formats.

Contents


The sanitized Parameter Modifier

The simplest way to add input sanitization is the sanitized modifier on incoming String parameters. When a function or method is called with a sanitized parameter, EK9 automatically creates a defensive copy of the String at the call site and sanitizes it before passing it to the function.

#!ek9
defines module introduction

  defines function

    executeQuery()
      -> sql as sanitized String
      <- result as String?
      //The 'sql' parameter is automatically sanitized at the call site
      //before this function receives it - preventing SQL injection
      ...

    processUserInput()
      ->
        name as sanitized String
        comment as sanitized String
      <- success as Boolean?
      //Both parameters are sanitized before use
      ...
//EOF

How it works: The sanitized modifier triggers automatic sanitization:

Important Restrictions

Target Sites vs Call Sites: The "Rejection at the Source" Philosophy

The sanitized modifier is a target site annotation — it belongs on function/method parameter definitions where untrusted data first enters the system. It cannot be used at:

Why this design? Security reasoning becomes simpler when sanitization happens in one place:

  1. Entry points define where untrusted data enters
  2. Everything inside the trust boundary is already safe
  3. The compiler enforces sanitization automatically at call sites

This eliminates ambiguity about who is responsible for sanitization. The function/method author specifies requirements; the compiler enforces them at all call sites.

#!ek9
//WRONG: Trying to sanitize at call site
processInput(sanitized userInput)  //E07942: Caller can't specify sanitization

//WRONG: Trying to sanitize at capture
handler <- (sanitized var) extends BaseHandler  //E07941: Capture already trusted

//WRONG: Trying to sanitize at variable declaration
name <- sanitized getValue()  //E07943: Wrong location for sanitization

//CORRECT: Sanitize at the definition (entry point)
processInput()
  -> data as sanitized String  //OK: Target site annotation
  ...

Why Direct Assignment is Blocked

When you write local <- sanitizedParam, both variables point to the same sanitized copy in memory. This creates "hidden aliasing" where mutations to one variable affect the other — defeating the purpose of defensive copying.

#!ek9
//BAD: Creates hidden alias
processInput()
  -> input as sanitized String
  local <- input      // ERROR: 'local' is alias to 'input'
  local += " modified" // Also modifies 'input'!

//GOOD: Explicit copy
processInput()
  -> input as sanitized String
  local <- String(input)  // OK: Explicit copy constructor
  local += " modified"     // Only affects 'local'


InputSanitizer Class

The InputSanitizer class is the core threat detection engine in EK9. It provides programmatic access to the same threat detection used by the sanitized modifier, allowing explicit control over when and how sanitization occurs.

This class is particularly useful when you need to:

Constructors

#!ek9
//Default - always logs threats to stderr
sanitizer <- InputSanitizer()

//With custom reporter (alternative destination)
sanitizer <- InputSanitizer(Stderr())  //explicit stderr
logFile <- TextFile("/var/log/security.log").output()
sanitizer <- InputSanitizer(logFile)   //log to file
//EOF

The InputSanitizer always logs detected threats — this cannot be disabled. By default, threats are logged to stderr. You can optionally provide a custom StringOutput destination (file, stdout, etc.).

The log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable (see Log Format Configuration).

Methods

#!ek9
defines module introduction

  defines program
    SanitizerExample()
      sanitizer <- InputSanitizer(Stderr())

      //Sanitize input - returns unset if threat detected
      userInput <- "some user input"
      result <- sanitizer.sanitize(userInput)
      if result?
        //Safe to use
        processData(result)
      else
        //Threat detected, handle appropriately
        handleThreatDetection()

      //Check safety without modification
      if sanitizer.isSafe(userInput)
        processData(userInput)

      //Get threat type(s) as comma-separated string
      threat <- sanitizer.detectThreat(userInput)
      if threat?
        logThreat(threat)  //e.g., "SQL_INJECTION,XSS"

      //For environment variables (skip path traversal checks)
      envValue <- EnvVars().get("CONFIG_PATH")
      if envValue?
        cleanEnvValue <- sanitizer.sanitizeWithoutPathChecks(envValue.get())
//EOF

Threat Types Detected

The InputSanitizer detects the following threat categories, aligned with the OWASP Top 10:

Threat Type Description Example Pattern
SQL_INJECTION SQL injection patterns ' OR '1'='1, ; DROP TABLE
COMMAND_INJECTION Shell/OS command injection ; rm -rf /, | cat /etc/passwd
PATH_TRAVERSAL Directory traversal attacks ../../../etc/passwd
XSS Cross-Site Scripting <script>alert('xss')</script>
XXE XML External Entity attacks <!ENTITY xxe SYSTEM "file://">
SSTI Server-Side Template Injection {{7*7}}, ${T(java.lang.Runtime)}

When multiple threats are detected in a single input, the threat types are returned as a comma-separated string (e.g., "SQL_INJECTION,XSS").


Log Format Configuration

The InputSanitizer log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable. This allows you to integrate with your existing SIEM infrastructure without any code changes.

# Set log format (default: JSON)
export EK9_SANITIZER_LOG_FORMAT=JSON    # Simple JSON (universal, default)
export EK9_SANITIZER_LOG_FORMAT=ECS     # Elastic Common Schema
export EK9_SANITIZER_LOG_FORMAT=CEF     # Common Event Format
export EK9_SANITIZER_LOG_FORMAT=SIMPLE  # Simple bracket format [THREAT_TYPE]
export EK9_SANITIZER_LOG_FORMAT=SILENT  # Suppress all output (testing only)

JSON Format (Default)

Simple JSON format that works with any log shipper or monitoring tool:

{
  "timestamp": "2026-01-21T10:30:45.123Z",
  "level": "warn",
  "threat": "SQL_INJECTION",
  "message": "Dangerous input detected: ' OR '1'='1"
}

ECS Format (Elastic Common Schema)

ECS-aligned JSON for Elasticsearch, Splunk, Datadog, CloudWatch, and Google Cloud Logging:

{
  "@timestamp": "2026-01-21T10:30:45.123Z",
  "log.level": "warn",
  "event.category": "intrusion_detection",
  "event.type": "denied",
  "event.action": "input_rejected",
  "threat.indicator.type": "SQL_INJECTION",
  "message": "Dangerous input detected",
  "source.ip": "192.168.1.100",
  "service.name": "UserService",
  "ek9.field.name": "userId",
  "ek9.input.value": "' OR '1'='1",
  "ecs.version": "8.11"
}

CEF Format (Common Event Format)

CEF for ArcSight, Azure Sentinel, QRadar, and LogRhythm:

CEF:0|EK9|InputSanitizer|1.0|SQL_INJECTION|Dangerous input detected|8|src=192.168.1.100 svc=UserService cs1=userId cs1Label=FieldName msg=' OR '1'='1

CEF Severity Mapping:

SIMPLE Format

Minimal bracket format for simple scripts, debugging, or when JSON parsing is overkill:

[SQL_INJECTION] Dangerous input detected: ' OR '1'='1

No timestamp, no context fields — just the threat type and input value.

SILENT Format

Suppresses all sanitizer output. This is useful for testing where the sanitizer behavior is not the focus of the test. For example, when testing bytecode generation for code that uses the sanitized keyword, you may not want sanitizer logs polluting test output.

Warning: Do not use SILENT in production — you will lose visibility into attack attempts. This format is intended only for testing and development scenarios.


SanitizationContext

A record that captures metadata for security event logging. When you call InputSanitizer methods with a SanitizationContext, the context fields are included in the log output (for ECS and CEF formats).

#!ek9
defines module org.ek9.lang

  defines record
    SanitizationContext
      timestamp as DateTime     // When sanitization occurred
      service as String         // "UserService"
      operation as String       // "getUser"
      fieldName as String       // "userId"
      fieldSource as String     // "PATH", "QUERY", "HEADER", "CONTENT"
      sourceIp as String        // Client IP address
      traceId as String         // Request correlation ID
//EOF

Using SanitizationContext with InputSanitizer:

#!ek9
defines module introduction

  defines program
    ContextExample()
      sanitizer <- InputSanitizer()

      //Create context with rich metadata
      context <- SanitizationContext(
        DateTime(),
        "UserService",
        "getUser",
        "userId",
        "PATH",
        "192.168.1.100",
        "trace-abc-123"
      )

      userInput <- "some user input"

      //Sanitize with context - logs include all context fields
      result <- sanitizer.sanitize(userInput, context)
      if result?
        processData(result)

      //Check safety with context
      if sanitizer.isSafe(userInput, context)
        processData(userInput)
//EOF

When context is provided, the log output includes all set fields. Fields that are not set are simply omitted from the log output. This provides rich metadata for security teams to investigate incidents, correlate attacks, and identify patterns.


Integration Examples

TextFile Automatic Sanitization

When reading files specified by user input, use sanitized paths to prevent path traversal:

#!ek9
defines module introduction

  defines function
    readUserFile()
      -> filename as sanitized String
      <- content as String?

      if filename?
        file <- TextFile(filename)
        if file.exists() and file.isReadable()
          content: file.readAll()
//EOF

Web Service Input Handling

In web services, use the sanitized modifier on path parameters, query parameters, and request body fields:

#!ek9
defines module introduction

  defines service

    UserService :/users

      getUser() :/{userId} as GET
        -> userId as sanitized String
        <- response as HTTPResponse?
        //userId is automatically sanitized before reaching this method
        ...

      searchUsers() :/search as GET
        -> query as sanitized String
        <- response as HTTPResponse?
        //query parameter is sanitized
        ...
//EOF

Command-Line Argument Processing

When processing command-line arguments that will be used in file operations or external commands:

#!ek9
defines module introduction

  defines program
    ProcessFiles()
      -> argv as List of String

      sanitizer <- InputSanitizer(Stderr())

      for arg in argv
        cleanArg <- sanitizer.sanitize(arg)
        if cleanArg?
          processFile(cleanArg)
        else
          Stderr().println("Rejected potentially dangerous argument")
//EOF


Best Practices

Use the sanitized Modifier by Default

For any function or method that accepts user-provided String input, add the sanitized modifier. This is the simplest and most effective defense:

#!ek9
//GOOD: Default to sanitized for user input
processUserData()
  -> input as sanitized String
  ...

//Only omit sanitized for trusted internal data
processInternalData()
  -> trustedInput as String
  ...

Handle Unset Results Gracefully

When sanitization detects a threat, the parameter becomes unset. Design your business logic to handle this case with appropriate error messages:

#!ek9
createUser()
  -> username as sanitized String
  <- result as Boolean?

  if username?
    //Safe to process
    result: doCreateUser(username)
  else
    //Threat detected - apply normal business validation message
    //This provides defense in depth without information leakage
    result: false
    logValidationError("Invalid username format")

Use InputSanitizer for Environment Variables

Environment variables may legitimately contain paths. Use sanitizeWithoutPathChecks() for these cases:

#!ek9
sanitizer <- InputSanitizer(Stderr())
envVars <- EnvVars()

//Path traversal patterns are legitimate in env vars
configPath <- envVars.get("CONFIG_PATH")
if configPath?
  cleanPath <- sanitizer.sanitizeWithoutPathChecks(configPath.get())
  ...

Enterprise Logging Integration

For production systems, configure the log format via the EK9_SANITIZER_LOG_FORMAT environment variable to integrate with your SIEM:

#Production deployment - route stderr to SIEM
export EK9_SANITIZER_LOG_FORMAT=ECS   #For Elasticsearch/Splunk/Datadog
#or
export EK9_SANITIZER_LOG_FORMAT=CEF   #For ArcSight/Azure Sentinel/QRadar

In production, route stderr to your log aggregator (Fluentd, Filebeat, etc.) and the security events will automatically flow to your SIEM.


Sensitive Type and Secret Detection

Input sanitization protects against malicious input at runtime. EK9 also provides a complementary layer that protects against leaked credentials — both at compile time and at runtime. Together, these two systems form a comprehensive security framework.

Compile-Time Secret Detection

The EK9 compiler automatically scans all string literals (including text segments within interpolated strings) for patterns matching known credential formats. If a hardcoded secret is detected, compilation fails immediately — the secret never reaches version control, build artifacts, or production.

Detected credential categories include:

Error Code Category Example Patterns
E11080 Cloud Provider AWS keys (AKIA...), GCP API keys (AIza...), Azure connection strings
E11081 Platform Token GitHub (ghp_), GitLab (glpat-), Slack (xoxb-), npm, Shopify, Heroku
E11082 Private Key PEM headers: RSA, EC, DSA, PKCS8, OPENSSH private keys
E11083 Database URL postgres://user:pass@host, MySQL, MongoDB, Redis, JDBC
E11084 JWT Token eyJhbGci... three-part header.payload.signature structure
E11086 API Key Stripe (sk_test_), Anthropic (sk-ant-), SendGrid (SG.), 50+ services

This detection covers 100+ distinct patterns across cloud providers, platform tokens, private keys, database URLs, JWT tokens, and API keys — comparable to commercial tools like GitGuardian and TruffleHog, but enforced at compile time rather than after the fact.

The Sensitive Type

Once secrets are removed from source code, they need to be loaded securely at runtime. The Sensitive built-in type wraps secret values with automatic protection:

#!ek9
defines module introduction

  defines class

    HttpClient with trait of Privileged
      apiKey as Sensitive?

      HttpClient()
        -> key as Sensitive
        apiKey: key

      sendRequest()
        <- response as String?
        //reveal() only works because HttpClient has the Privileged trait
        if apiKey?
          header <- apiKey.reveal()
          response: doHttpCall(header)

  defines function

    demo()
      env <- EnvVars()
      //sensitiveGet() is the ONLY way to create a set Sensitive value
      key <- env.sensitiveGet("API_KEY")
      if key?
        client <- HttpClient(key)
        result <- client.sendRequest()

        stdout <- Stdout()
        //Safe: printing key shows "***REDACTED***", not the actual secret
        stdout.println(`Key: ${key}`)
        if result?
          stdout.println(result)
//EOF

The Privileged trait creates an auditable access boundary — searching for with trait of Privileged in any codebase gives a complete list of every class that can access raw secret values.

Two Layers Working Together

The sanitization and sensitive data systems complement each other:

Input Sanitization Sensitive Type
Protects against Malicious input (SQL injection, XSS, etc.) Credential leakage (API keys, passwords, tokens)
When Runtime (at function entry points) Compile time (literals) + Runtime (redaction)
Mechanism sanitized modifier, InputSanitizer Sensitive type, Privileged trait
On failure Parameter becomes unset + threat logged Compilation fails (literals) or redacted output (runtime)

The following compiler errors relate to the sanitized modifier:

The following compiler errors relate to secret detection and the Sensitive type:


Next Steps

For more details on related topics: