How Certification Stays Current - Scientia Ex Machina

1Why Static Certification Is Broken

Most certification schemes test once and stamp forever. A skill certified in January could be vulnerable to an attack discovered in February, and nobody would know until something went wrong.

This creates a dangerous illusion. The compliance checkbox is ticked. The badge is displayed. Everyone assumes the skill is safe. Meanwhile, the threat landscape has moved on.

The security environment for AI skills changes faster than almost any other domain in technology. New prompt injection techniques surface weekly. Novel data exfiltration vectors appear in research papers. Production incidents reveal attack patterns nobody anticipated.

Annual audits cannot keep up. Quarterly reviews cannot keep up. The only approach that works is continuous evaluation against a continuously evolving standard.

That is what SXM does.

2The SXM Certification Lifecycle

Certification is not a single event. It is an ongoing relationship between the skill, the evaluator, and the threat landscape.

01

Submit

The skill author submits a manifest describing inputs, outputs, dependencies, and failure modes. This is the contract the skill will be tested against.

02

Three-Pillar Evaluation

Functional verification (40%), security audit (35%), and performance benchmarking (25%). Every dimension matters. You cannot trade off security for speed.

03

Scoring

Every test is documented. Every deduction is explained. The full evaluation report is public. There is nothing hidden in how we arrive at a score.

04

Certification

90+ overall with an 85+ security floor. If a skill meets the bar, it earns certification and a blockchain attestation on Polygon. Immutable, independently verifiable.

05

Ongoing Monitoring

Certified skills are continuously re-evaluated as the evaluator evolves. New test patterns are run against every certified skill, not just new submissions.

06

Re-certification

Pass the updated evaluator or get suspended. There are no exceptions, no grace periods, no grandfather clauses. The bar is the bar.

3Static Analysis + Live Endpoint Testing

SXM evaluation has two layers. The first is static analysis of the skill manifest — checking documentation quality, declared permissions, dependencies, and failure modes. This is the first gate.

The second, and more important layer, is live endpoint testing. When a skill provides a test_endpoint in its manifest, the evaluator sends real HTTP requests to the skill and measures actual behaviour.

What Live Testing Covers

Functional tests: Author-provided test cases plus auto-generated tests for edge cases, empty inputs, type mismatches, and missing fields.
Security tests: 20+ attack payloads across five categories: prompt injection, indirect injection, data exfiltration, system prompt extraction, and permission probing.
Performance tests: Sequential and concurrent request benchmarking with p50/p95/p99 latency measurement and memory leak detection.

Skills without a test endpoint are capped at 85/100. Manifest analysis alone cannot verify actual behaviour. To achieve full certification, skills must provide a live endpoint for real testing.

How Scoring Works

With live endpoint: Final score = 70% live testing + 30% static analysis. Live testing is more valuable because it measures real behaviour.
Without live endpoint: Score is based on static analysis only, capped at 85/100.
The attack library evolves weekly from published research, real-world incidents, and internal findings.

4The Living Evaluator

The SXM evaluator is not a fixed test suite. It evolves every week based on what is happening in the real world.

Weekly Pattern Ingestion

Every week, the evaluator ingests new patterns from three sources:

Published Research

New papers and advisories from arXiv, NIST, OWASP, and MITRE ATLAS. When researchers discover a new class of vulnerability, we add test patterns for it.

Real-World Incidents

Disclosed CVEs, security advisories, and production incidents affecting AI systems. When something breaks in the wild, we test for it.

Internal Findings

Patterns discovered during SXM evaluations that reveal new attack surfaces or failure modes. Our own evaluation process is a source of intelligence.

How the Evaluator Changes

New test patterns are added to the suite when a new vulnerability class is identified.
Existing patterns get weight adjustments based on real-world severity. A theoretical attack that becomes a practical exploit gets weighted more heavily.
Every change is logged and available via the public API at GET /api/evolution/history.

Real example: On 3 February 2026, CVE-2026-1847 revealed that Unicode homoglyph characters could bypass input validation in AI skill interfaces. Within 24 hours, we added homoglyph injection patterns to the evaluator. Every certified skill was re-evaluated. Two skills that failed the new pattern were suspended until their authors patched the vulnerability and resubmitted.

That is the point. A static certification would have left those skills marked as "certified" while they were vulnerable. Our living evaluator caught it within a day.

5What Happens When a Skill Fails Re-certification

When a certified skill fails against an updated evaluator, the process is straightforward and fully transparent:

Status changes to "suspended". The skill is no longer listed as certified.
Badge updates immediately. Anywhere the badge is embedded, it shows the suspended status.
Blockchain record stays. The original attestation is immutable history. It is not revoked. But the verification endpoint returns "suspended", so anyone checking programmatically gets the current status.
Author is notified with the specific test that caused the failure, including the full test details and expected behaviour.
Author can patch and resubmit. The skill goes through the full evaluation again.
Upon passing, certification is restored with an incremented reconfirmation count. The suspension and restoration are both part of the public record.

There is no back channel. There is no way to negotiate around a failing test. If the evaluator says a skill is vulnerable, the skill is suspended until it is fixed.

6The Compounding Trust Effect

Every time a skill passes re-certification, the certification becomes more valuable.

A skill that has been reconfirmed 12 times across 12 evaluator updates has survived 12 rounds of evolving threat patterns. It has been tested against prompt injection techniques that did not exist when it was first certified. It has weathered new CVEs, new research findings, new attack vectors.

That skill is demonstrably more trustworthy than one certified yesterday.

The reconfirmed_count field in every certification record is a signal of ongoing quality. Enterprise buyers can see exactly how battle-tested a skill is before deploying it.

Think of it like a credit score that improves with consistent good behaviour. Each successful re-certification is evidence that the skill's author maintains quality over time, not just at the moment of submission.

7Full Transparency

We publish everything. Not because regulations require it, but because trust requires it. If we hid our process, why would you trust our certifications?

Evaluation Reports

Every evaluation report is public. See exactly how a skill was tested and scored.

GET /api/skills/:id/report

Evolution History

Every change to the evaluator is logged. See what patterns were added, when, and why.

GET /api/evolution/history

Re-certification Status

Every skill's re-certification history is public. Suspensions and restorations are on the record.

/recertification

Blockchain Attestations

Every certification is attested on Polygon via EAS. Independently verifiable by anyone.

Polygon EAS Explorer

8For Enterprise Security Teams

If you are evaluating SXM for your organisation, here is the quick reference:

Methodology: Aligns with NIST AI RMF, OWASP LLM Top 10, and MITRE ATLAS frameworks.
Coverage: Three-pillar approach covering functional correctness, security posture, and runtime performance.
Threshold: 90/100 overall, 85/100 security floor, zero-exploit policy. No exceptions.
Immutable audit trail: Blockchain-attested credentials on Polygon via Ethereum Attestation Service.
Programmatic verification: Public API for automated compliance checking. No authentication required for read operations.
Ongoing compliance: Re-certification ensures skills meet the current threat landscape, not a point-in-time snapshot.
Full audit access: Every evaluation report, every evaluator change, every re-certification event is publicly available.

If you need something specific for your compliance review, get in touch. We are happy to walk your security team through the full process.