This guide provides comprehensive instructions for testing the NerdCabalMCP server and all 17 specialized agents.
Before testing, ensure you have:
node --version # Should be v18.x.x or higher
cd mcp-server
npm install
npm run build
Test the server runs without errors:
cd mcp-server
node dist/index.js
Expected output:
MCP server running on stdio
If you see this, the server is working! Press Ctrl+C to stop.
Test the server responds to basic requests:
cd mcp-server
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}' | node dist/index.js
Expected: JSON response with server capabilities
@ to see available toolsnerdcabal toolsIf you see tools like llm-rubric-architect, experimental-designer, etc., the MCP is working!
Purpose: Creates evaluation rubrics for AI systems
Test Prompt in Claude Desktop:
@nerdcabal Use llm-rubric-architect to create a rubric for evaluating chatbot responses.
Include these dimensions: accuracy, helpfulness, safety, and tone. Use a 1-5 scale.
Expected Output:
Purpose: Designs controlled experiments
Test Prompt:
@nerdcabal Use experimental-designer to design an A/B test comparing two prompting strategies.
Hypothesis: Chain-of-thought prompting improves accuracy on math problems.
Baseline: Direct answer prompts
Intervention: Chain-of-thought prompts
Metric: Accuracy
Sample size: 1000
Expected Output:
Purpose: Creates financial budgets and projections
Test Prompt:
@nerdcabal Use budget-agent to create a budget for an AI research project.
Project: LLM fine-tuning research
Funding target: $500,000
Timeline: 18 months
Categories: personnel, compute, equipment
Format: NSF
Expected Output:
Purpose: Operations management and Iron Triangle optimization
Test Prompt:
@nerdcabal Use comptroller-agent to analyze the trade-offs for a project.
Project: Build a new feature for our app
Timeline: 2 weeks
Budget: $10,000
Quality requirements: Production-ready, full test coverage
Expected Output:
Purpose: Organizational design and SOPs
Test Prompt:
@nerdcabal Use administrator-agent to design an org chart for a distributed AI team.
Team size: 15 people
Timezones: US East, US West, EU, Asia
Roles needed: engineers, researchers, designers, product managers
Expected Output:
Purpose: Experiment tracking queries
Test Prompt:
@nerdcabal Use mlflow-agent to generate a query that finds the top 10 model runs
with the highest accuracy from the last 30 days.
Expected Output:
Note: Requires MLflow server running for full functionality
Purpose: Creates training datasets for ML
Test Prompt:
@nerdcabal Use dataset-builder to create a supervised fine-tuning (SFT) dataset
for teaching a model to write Python code.
Include 5 examples with prompts and completions.
Output format: HuggingFace compatible
Expected Output:
Purpose: Security threat modeling
Test Prompt:
@nerdcabal Use ciso-agent to perform a STRIDE threat model for an LLM API.
Components: API gateway, model inference server, user database
Framework: STRIDE
Expected Output:
Purpose: Multi-agent workflow coordination
Test Prompt:
@nerdcabal Use orchestrator to create a sequential workflow:
1. Use experimental-designer to create an experiment plan
2. Use budget-agent to budget the experiment
3. Use administrator-agent to staff the team
Expected Output:
Purpose: Design systems and UI/UX
Test Prompt:
@nerdcabal Use creative-director to create a design system.
Style: cyberpunk-brutalist-bauhaus
Colors: black, white, red
Components: buttons, cards, navigation
Output format: CSS
Expected Output:
Purpose: Dataset visualization and quality analysis
Test Prompt:
@nerdcabal Use visual-inspector to generate a FiftyOne visualization script
for analyzing an image classification dataset.
Dataset: CIFAR-10
Tasks: Find mislabeled images, detect outliers
Expected Output:
Note: Requires FiftyOne installed for execution
Purpose: Neural forensics for LLM analysis
Test Prompt:
@nerdcabal Use forensic-analyst to analyze this transcript for hallucinations:
"User: What's the capital of France?
Assistant: The capital of France is Paris, which was founded in 1850 by Napoleon Bonaparte
and has a population of 50 million people."
Use DSMMD taxonomy to detect confabulation, metadata leakage, or semantic drift.
Expected Output:
Test Prompt:
@nerdcabal Use ip_analytics to analyze copyright infringement patterns.
IP type: copyright
Timeframe: last 90 days
Portfolio IDs: PORT-001, PORT-002
Jurisdiction: US
Expected Output:
Test Prompt:
@nerdcabal Use compliance_check to validate GDPR compliance.
Context:
- Processes personal data: yes
- Consent obtained: yes
- Data retention policy: 2 years
- Right to deletion: implemented
Jurisdiction: EU
Expected Output:
Test Prompt:
@nerdcabal Use archival_system to store evidence of IP infringement.
Evidence type: image
Source URL: https://example.com/infringement.jpg
Description: Unauthorized use of copyrighted work
Jurisdiction: US
Case ID: CASE-2026-001
Expected Output:
Scenario: Complete research project planning
Steps:
experimental-designer to create experiment planbudget-agent to create budgetadministrator-agent to design team structureciso-agent to assess security risksPrompt:
@nerdcabal Let's plan a research project step by step:
1. First, use experimental-designer to design an experiment for testing a new prompting technique
2. Then use budget-agent to create a 6-month budget for $200k
3. Then use administrator-agent to design a team of 5 people
4. Finally, use ciso-agent to identify security risks
Scenario: Design and validate a UI component
Steps:
creative-director to create design systemciso-agent to review for security issuesdataset-builder to create training data for UI testingScenario: Detect, validate, and archive infringement
Steps:
ip_analytics to detect infringement patternscompliance_check to validate enforcement actionsarchival_system to store evidenceSolutions:
claude_desktop_config.json has correct absolute path# macOS
cat ~/Library/Logs/Claude/mcp*.log
# Windows
type %APPDATA%\Claude\Logs\mcp*.log
# Linux
cat ~/.config/Claude/logs/mcp*.log
Solutions:
cd mcp-server
node dist/index.js
npm run build
Solutions:
llm-rubric-architect not rubric-architect)LOG_LEVEL=debug node dist/index.js
MLflow:
# Start MLflow server
mlflow server --host 0.0.0.0 --port 5000
# Set environment variable
export MLFLOW_TRACKING_URI=http://localhost:5000
FiftyOne:
# Install FiftyOne
pip install fiftyone
# Start FiftyOne app
fiftyone app launch
HuggingFace:
# Login to HuggingFace
huggingface-cli login
# Verify
huggingface-cli whoami
Goal: Plan a research project for a grant proposal
Agents to use:
experimental-designer - Design the experimentbudget-agent - Create NSF grant budgetdataset-builder - Plan training datamlflow-agent - Setup experiment trackingGoal: Audit an AI system for security issues
Agents to use:
ciso-agent - STRIDE threat modelforensic-analyst - Analyze outputs for issuesip_analytics - Check for IP compliancecompliance_check - Validate regulatory complianceGoal: Design and launch a new feature
Agents to use:
creative-director - Create design systemcomptroller-agent - Analyze speed/cost/quality tradeoffsadministrator-agent - Design team structureciso-agent - Security reviewGoal: Monitor and protect intellectual property
Agents to use:
ip_analytics - Detect infringement patternscompliance_check - Validate enforcement actionsarchival_system - Store evidenceciso-agent - Security auditUse this checklist to verify all agents are working:
npm run build)node dist/index.js)@ shows nerdcabal tools)llm-rubric-architect creates rubricsexperimental-designer creates experiment plansbudget-agent creates budgetscomptroller-agent analyzes Iron Triangleadministrator-agent creates org chartsmlflow-agent generates queriesdataset-builder creates datasetsciso-agent performs threat modelingorchestrator coordinates workflowscreative-director creates design systemsvisual-inspector generates FiftyOne scriptsforensic-analyst detects hallucinationsip_analytics analyzes IP patternscompliance_check validates compliancearchival_system stores evidenceHappy Testing! π§ͺ
Built with β€οΈ by TUESDAY and the OG NerdCabal