Guides

Data Policy Filtering

Guidance on routing API requests to providers based on their data usage policies.

Introduction

When using Large Language Models (LLMs), understanding and respecting data privacy and usage policies is crucial. Different LLM providers have varying terms regarding how they handle the data sent to their APIs (prompts and completions), including whether they might use it for training future models.

This guide explains A4F's role in this context and provides strategies for you, the developer, to filter your API requests to underlying providers based on your data policy requirements, such as only using providers that commit not to train on your data.

Understanding Data Policies

Key aspects of provider data policies often include:

Training on Data: Whether the provider may use your prompts and completions to train or improve their models. Many enterprise offerings or specific APIs allow opting out.
Data Retention: How long the provider stores your data.
Confidentiality: Commitments regarding the privacy and security of your data.

It is your responsibility to review and understand the terms of service and data usage policies of the specific underlying LLM providers you intend to use via A4F.

A4F and Data Policies

A4F acts as a gateway, routing your API requests to the provider you specify via the model prefix (e.g., provider-1/model-name).

A4F itself does not train on your API prompt or completion data unless you explicitly opt-in to features like prompt logging for debugging, as outlined in our Privacy Documentation.
A4F does not currently offer direct API parameters (e.g., a data_policy_preference field) to automatically filter providers based on their data policies.

Developer-Managed Filtering

You, as the developer, are responsible for implementing logic in your application to select A4F provider prefixes that align with your data policy requirements. A4F provides the routing mechanism; you provide the routing decision.

Client-Side Filtering Strategy

To ensure your API requests are only sent to providers meeting your data policy criteria (e.g., "no training on data"), you can implement the following strategy in your application:

1. Identifying Provider Policies

Research the data usage policies of the underlying providers accessible through A4F. You can usually find this information in their terms of service or specific API documentation. Maintain a list or configuration in your application that maps A4F provider prefixes (e.g., provider-1, provider-3) to their data policy stance (e.g., "trains_on_data: false").

A4F aims to provide transparency. Check our Provider Routing documentation for links to provider terms, and the Models page for any provider-specific notes.

2. Implementing the Filter in Your Application

Before making an API call to A4F, your application logic should:

Determine the base model needed (e.g., "gpt-4o", "claude-3-haiku").
Consult your internal list of providers and their data policies.
Select an A4F provider prefix that offers the desired base model AND meets your data policy criteria.
Construct the A4F model ID (e.g., "chosen-provider-prefix/base-model-name") and use it in your API request to A4F.

This gives you explicit control over where your data is routed via A4F.

import os
from openai import OpenAI

A4F_API_KEY = os.getenv("A4F_API_KEY", "YOUR_A4F_API_KEY")
A4F_BASE_URL = "https://api.a4f.co/v1"

a4f_client = OpenAI(api_key=A4F_API_KEY, base_url=A4F_BASE_URL)

# This is a simplified, illustrative list.
# In a real app, you'd maintain this based on your research of provider terms.
TRUSTED_PROVIDERS_NO_TRAIN = {
    "provider-1": True, # Example: Assumed Provider 1 does not train on data
    "provider-2": False, # Example: Assumed Provider 2 might train
    "provider-3": True, 
    # ... and so on for other A4F providers
}

def get_a4f_model_id(base_model_name: str, preferred_provider_prefix: str = None):
    """
    Selects an A4F model ID based on policy and optional preference.
    This is a very basic example. Real logic would be more complex.
    """
    if preferred_provider_prefix and TRUSTED_PROVIDERS_NO_TRAIN.get(preferred_provider_prefix, False):
        return f"{preferred_provider_prefix}/{base_model_name}"
    
    # Fallback to any trusted provider offering the base model
    # (This would require knowing which providers offer which base models)
    for prefix, is_trusted in TRUSTED_PROVIDERS_NO_TRAIN.items():
        if is_trusted:
            # Here, you'd ideally check if prefix/base_model_name is a valid A4F model.
            # For simplicity, we just return the first trusted one.
            print(f"Note: Falling back or selecting trusted provider '{prefix}' for '{base_model_name}'.")
            return f"{prefix}/{base_model_name}"
            
    raise ValueError(f"No trusted provider found for model '{base_model_name}' that meets 'no_train' policy.")

try:
    # Example: User wants GPT-4o, prefers Provider 1 if it's trusted
    model_to_use = get_a4f_model_id(base_model_name="gpt-4o", preferred_provider_prefix="provider-1")
    print(f"Using A4F model: {model_to_use}")

    completion = a4f_client.chat.completions.create(
        model=model_to_use,
        messages=[{"role": "user", "content": "Hello, world!"}]
    )
    print(completion.choices[0].message.content)

except ValueError as e:
    print(f"Error selecting model: {e}")
except Exception as e:
    print(f"API call failed: {e}")

Conceptual: Future API Enhancements

While not currently implemented, A4F may consider future enhancements to simplify data policy-based routing. This could hypothetically involve:

An API parameter to express data policy preferences (e.g., "data_usage_profile": "strict_no_train").
A4F maintaining and exposing structured data about provider policies that could be queried.

Such features would be announced in our changelog and API documentation. For now, the client-side filtering strategy is the recommended approach.

Important Considerations

Accuracy of Policy Information: Provider policies can change. Regularly review and update your internal mapping of providers to their data policies.
Model Availability: Restricting requests to only certain providers might limit your access to the full range of models available through A4F if your preferred model is not offered by a policy-compliant provider.
Fallback Logic: If your primary policy-compliant provider for a given model is unavailable, your application may need fallback logic to try another compliant provider or inform the user.

Your Responsibility

Ultimately, ensuring compliance with your organization's data handling requirements when using third-party LLMs (even via A4F) rests with you. A4F provides the tools for access and routing; due diligence on provider policies is key.

Was this page helpful?

Custom User Billing

Strategies for billing your end-users.

Using Third-Party SDKs

Integrating A4F with other client libraries.

Documentation