A certification that never changes is a certification you cannot trust. Here is how SXM keeps pace with the real world.
Most certification schemes test once and stamp forever. A skill certified in January could be vulnerable to an attack discovered in February, and nobody would know until something went wrong.
This creates a dangerous illusion. The compliance checkbox is ticked. The badge is displayed. Everyone assumes the skill is safe. Meanwhile, the threat landscape has moved on.
The security environment for AI skills changes faster than almost any other domain in technology. New prompt injection techniques surface weekly. Novel data exfiltration vectors appear in research papers. Production incidents reveal attack patterns nobody anticipated.
Annual audits cannot keep up. Quarterly reviews cannot keep up. The only approach that works is continuous evaluation against a continuously evolving standard.
That is what SXM does.
Certification is not a single event. It is an ongoing relationship between the skill, the evaluator, and the threat landscape.
The skill author submits a manifest describing inputs, outputs, dependencies, and failure modes. This is the contract the skill will be tested against.
Functional verification (40%), security audit (35%), and performance benchmarking (25%). Every dimension matters. You cannot trade off security for speed.
Every test is documented. Every deduction is explained. The full evaluation report is public. There is nothing hidden in how we arrive at a score.
90+ overall with an 85+ security floor. If a skill meets the bar, it earns certification and a blockchain attestation on Polygon. Immutable, independently verifiable.
Certified skills are continuously re-evaluated as the evaluator evolves. New test patterns are run against every certified skill, not just new submissions.
Pass the updated evaluator or get suspended. There are no exceptions, no grace periods, no grandfather clauses. The bar is the bar.
SXM evaluation has two layers. The first is static analysis of the skill manifest — checking documentation quality, declared permissions, dependencies, and failure modes. This is the first gate.
The second, and more important layer, is live endpoint testing. When a skill provides a test_endpoint in its manifest, the evaluator sends real HTTP requests to the skill and measures actual behaviour.
Skills without a test endpoint are capped at 85/100. Manifest analysis alone cannot verify actual behaviour. To achieve full certification, skills must provide a live endpoint for real testing.
The SXM evaluator is not a fixed test suite. It evolves every week based on what is happening in the real world.
Every week, the evaluator ingests new patterns from three sources:
New papers and advisories from arXiv, NIST, OWASP, and MITRE ATLAS. When researchers discover a new class of vulnerability, we add test patterns for it.
Disclosed CVEs, security advisories, and production incidents affecting AI systems. When something breaks in the wild, we test for it.
Patterns discovered during SXM evaluations that reveal new attack surfaces or failure modes. Our own evaluation process is a source of intelligence.
GET /api/evolution/history.Real example: On 3 February 2026, CVE-2026-1847 revealed that Unicode homoglyph characters could bypass input validation in AI skill interfaces. Within 24 hours, we added homoglyph injection patterns to the evaluator. Every certified skill was re-evaluated. Two skills that failed the new pattern were suspended until their authors patched the vulnerability and resubmitted.
That is the point. A static certification would have left those skills marked as "certified" while they were vulnerable. Our living evaluator caught it within a day.
When a certified skill fails against an updated evaluator, the process is straightforward and fully transparent:
There is no back channel. There is no way to negotiate around a failing test. If the evaluator says a skill is vulnerable, the skill is suspended until it is fixed.
Every time a skill passes re-certification, the certification becomes more valuable.
A skill that has been reconfirmed 12 times across 12 evaluator updates has survived 12 rounds of evolving threat patterns. It has been tested against prompt injection techniques that did not exist when it was first certified. It has weathered new CVEs, new research findings, new attack vectors.
That skill is demonstrably more trustworthy than one certified yesterday.
The reconfirmed_count field in every certification record is a signal of ongoing quality. Enterprise buyers can see exactly how battle-tested a skill is before deploying it.
Think of it like a credit score that improves with consistent good behaviour. Each successful re-certification is evidence that the skill's author maintains quality over time, not just at the moment of submission.
We publish everything. Not because regulations require it, but because trust requires it. If we hid our process, why would you trust our certifications?
Every evaluation report is public. See exactly how a skill was tested and scored.
GET /api/skills/:id/report
Every change to the evaluator is logged. See what patterns were added, when, and why.
GET /api/evolution/history
Every skill's re-certification history is public. Suspensions and restorations are on the record.
Every certification is attested on Polygon via EAS. Independently verifiable by anyone.
If you are evaluating SXM for your organisation, here is the quick reference:
If you need something specific for your compliance review, get in touch. We are happy to walk your security team through the full process.