Content moderation system

Live example

You can play with this example in the Latitude Playground.

Overview

In this example, we will create a content moderation system that can analyze user-generated content and provide feedback on its appropriateness. The agent uses subagents to handle different aspects of content moderation efficiently.

Multi-Agent Architecture

The system uses specialized subagents for different responsibilities:

main: Coordinates the moderation process by dispatching content to all subagents, gathering their evaluations, and generating the final decision based on their collective input.
rule_checker: Runs deterministic, rule-based checks—such as profanity filters or length validation—against the content, ensuring compliance with explicitly defined policies.
toxicity_analyzer: Analyzes content for toxicity and subtle forms of harm like harassment, hate speech, or threats, taking context and intent into account, even in ambiguous or nuanced cases.
safety_scorer: Calculates comprehensive risk and safety scores for the content, highlighting any areas of concern, escalation potential, or need for human review.

All the tools used in the sub-agents have to be defined in the main prompt.

The prompts

---
provider: google
model: gemini-1.5-flash
temperature: 0.2
type: agent
agents:
  - rule_checker
  - toxicity_evaluator
  - safety_scorer
tools:
  - check_profanity_filter:
      description: Detect explicit language and banned words in content
      parameters:
        type: object
        properties:
          content:
            type: string
            description: The content to check for profanity
          content_type:
            type: string
            description: Type of content (text, comment, post, etc.)
        required: [content]

  - validate_content_length:
      description: Ensure content meets platform length guidelines
      parameters:
        type: object
        properties:
          content:
            type: string
            description: The content to validate
          content_type:
            type: string
            description: Type of content to determine length limits
        required: [content, content_type]

  - scan_for_patterns:
      description: Identify suspicious patterns and spam indicators
      parameters:
        type: object
        properties:
          content:
            type: string
            description: The content to scan for patterns
          pattern_types:
            type: array
            items:
              type: string
            description: Types of patterns to look for (spam, repetitive, etc.)
        required: [content]
schema:
  type: object
  properties:
    decision:
      type: string
      enum: [approve, flag, reject]
      description: The final moderation decision
    confidence:
      type: number
      minimum: 0
      maximum: 1
      description: Confidence score for the decision
    reasoning:
      type: string
      description: Brief explanation for the decision
    violations:
      type: array
      items:
        type: string
      description: List of policy violations found
    recommended_action:
      type: string
      description: Specific action to take
  required: [decision, confidence, reasoning]
---

<system>
You are the main coordinator for an intelligent content moderation system. Your role is to orchestrate the moderation pipeline by delegating tasks to specialized agents and making final moderation decisions.

You have access to three specialized agents:
1. rule_checker - Applies programmatic rules and filters
2. toxicity_evaluator - Uses LLM-as-judge for nuanced content analysis
3. safety_scorer - Calculates safety metrics and risk scores

Process each content submission through all agents and synthesize their outputs into a final moderation decision.
</system>

<user>
Content to moderate: {{ content }}
Content type: {{ content_type }}
Platform context: {{ platform_context }}
</user>

Breakdown

Let’s break down the example step by step to understand how it works.

Main Prompt

The main prompt acts as the central coordinator. It receives user-generated content, delegates the moderation tasks to the specialized subagents, aggregates their results, and produces a structured final decision with confidence and reasoning.

rule_checker

The rule_checker agent checks for clear, rule-based violations—like banned words, excessive length, or explicit policy breaches—using programmatic filters and deterministic logic.

toxicity_analyzer

The toxicity_analyzer (or toxicity_evaluator) uses advanced AI to evaluate whether the content contains toxicity, harassment, hate speech, or other forms of harmful language, considering nuance, context, and potential for implicit harm.

safety_scorer

The safety_scorer calculates various risk scores for the content, such as immediate harm, community impact, and escalation risk, and determines whether the situation requires human review or additional monitoring.

Final Decision

The main agent synthesizes all subagent outputs, weighing rule violations, toxicity, and risk scores to make a final moderation decision. This decision includes a confidence score, explanation, and recommended action for handling the content.

Structured Output

Main prompt returns a structured output because the moderation process must be machine-readable and reliable, allowing easy integration with other systems and clear auditing of every moderation decision.

Code

In the code we prepared 4 cases of possible user input from different sources. In github you have the code but the idea is to launch this code with different types of possible input to see how it works.

pnpm run ts:cases:content_moderation --type toxicity

The important part is that you can see the use of tools. The tools defined in the code are used to respond to the tools defined in the main prompt. These kind of tools are on your control and are things that usually don’t need an LLM or AI to be responded like measure the length of the text of if the the text contains words that yout put in a blacklist.

import { Latitude } from '@latitude-data/sdk'

function printStatus(message: string) {
  process.stdout.write('\r' + message + ' '.repeat(30)) // Pad to overwrite old text
}

const MODEREATION_TYPES = {
  allGood: 'allGood',
  spam: 'spam',
  toxicity: 'toxicity',
  badWords: 'badWords',
} as const

type Content = {
  content: string
  content_type: string // post, comment, message, etc.
  platform_context: string // e.g., 'social_media', 'email', 'forum'
}
const CONTENT_BY_MODERATION_TYPE: Record<
  keyof typeof MODEREATION_TYPES,
  Content
> = {
  [MODEREATION_TYPES.allGood]: {
    content: 'The world is a beautiful place and I love it.',
    platform_context: 'Twitter',
    content_type: 'post',
  },
  [MODEREATION_TYPES.spam]: {
    content: 'Nigerian Prince wants to transfer money to you.',
    platform_context: 'email',
    content_type: 'Email message',
  },
  [MODEREATION_TYPES.toxicity]: {
    content: 'You are an idiot and nobody likes you.',
    platform_context: 'Reddit',
    content_type: 'comment',
  },
  [MODEREATION_TYPES.badWords]: {
    content: 'I hate Tomatoes because they could kill me',
    platform_context: 'Instagram',
    content_type: 'post',
  },
}

type Tools = {
  check_profanity_filter: { content: string; content_type: string }
  validate_content_lenght: { content: string; content_type: string }
  scan_for_patterns: {
    content: string
    content_type: 'spam' | 'phishing' | 'repititive'
  }
}

async function run({
  moderationType,
}: {
  moderationType: keyof typeof MODEREATION_TYPES
}) {
  const sdk = new Latitude(process.env.LATITUDE_API_KEY, {
    projectId: Number(process.env.PROJECT_ID),
    versionUuid: 'live',
  })

  try {
    const result = await sdk.prompts.run<Tools>(
      'content-moderation-system/main',
      {
        parameters: CONTENT_BY_MODERATION_TYPE[moderationType],
        stream: true,
        onEvent: (event) => {
          printStatus(`Generating response... ${event.data.type}`)
        },
        tools: {
          check_profanity_filter: async ({ content }) => {
            if (content.includes('Tomatoes')) {
              return {
                content_type: 'badWords',
                description: 'Content contains prohibited words.',
              }
            }

            return {
              content_type: 'ok',
              description:
                'Content is clean and does not contain prohibited words.',
            }
          },
          validate_content_lenght: async ({ content: _c }) => {
            return 'ok' // Assuming content length is valid for this example
          },
          scan_for_patterns: async ({ content }) => {
            if (moderationType === 'spam') {
              if (content.includes('Nigerian Prince')) {
                return {
                  content_type: 'spam',
                  description:
                    'This content appears to be spam, possibly a scam involving a Nigerian Prince.',
                }
              }
            }

            return {
              content_type: 'ok',
              description:
                'Content is clean and does not match any known patterns.',
            }
          },
        },
      },
    )

    const response = result.response
    const toolCalls = 'toolCalls' in response ? response.toolCalls : []
    const agentResponse = toolCalls[0] ?? { error: 'Agent response not found' }
    console.log('Agent Response: \n', JSON.stringify(agentResponse, null, 2))
  } catch (error) {
    console.error('Error: ', error.message, '\nStack:', error.stack)
  }
}

const [, , ...args] = process.argv

const moderationType = MODEREATION_TYPES[args[1]]

if (!moderationType) {
  console.error('Invalid moderation type. Please use one of the following: \n')
  Object.keys(MODEREATION_TYPES).forEach((type) => {
    console.error(`pnpm run ts:cases:content_moderation --type ${type} \n`)
  })
  process.exit(1)
}

run({ moderationType })

Resources

Custom Tools - How to integrate with customer databases and CRM systems
Tool call SDK example - A simple example of how to run a prompt with tools with Latitude SDK.
JSON Schema Output - Ensuring consistent response formatting

Overview

Prompting Techniques

SDK examples

Use cases

Content moderation system

Live example

Overview

Multi-Agent Architecture

The prompts

Breakdown

Structured Output

Code

Resources

Overview

Prompting Techniques

SDK examples

Use cases

Live example

​Overview

​Multi-Agent Architecture

​The prompts

​Breakdown

​Structured Output

​Code

​Resources

Overview

Multi-Agent Architecture

The prompts

Breakdown

Structured Output

Code

Resources