Skip to content

Customizing the Pipeline

There are countless ways to customize the pipeline. The default MetadataPipeline can be constructed with any permutation of Fetchers, Adapters, Parsers, and ContractCallers. And you can even define your own custom Pipeline by extending the BasePipeline class.

In this guide, we'll cover a few ways you can customize the MetadataPipeline to best suit your needs.

Custom RPC Provider

By default, the pipeline uses https://cloudflare-eth.com as the Ethereum JSON RPC url. This is a free Ethereum RPC provider, which means that it is very easy to exceed the rate-limit. If you have a custom RPC provider url you'd like to use, you can specify it like this:

from offchain import MetadataPipeline
from offchain.web3.contract_caller import ContractCaller
from offchain.web3.jsonrpc import EthereumJSONRPC

rpc = EthereumJSONRPC(provider_url=MY_PROVIDER_URL)
contract_caller = ContractCaller(rpc=rpc)
pipeline = MetadataPipeline(contract_caller=contract_caller)

Custom Parsers

By default, the pipeline runs with all collection, schema, and catch-all parsers. That said, you may find that you're only interested in using a subset of the parsers. Let's say you're only interested in parsing metadata for a specific collection.

If this is the case, you can pass in a list of specific parser instances to run. For example, the following configuration runs the pipeline using only the ENS collection parser.

from offchain import MetadataPipeline
from offchain.metadata import ENSParser

ens_parser = ENSParser()
pipeline = MetadataPipeline(parsers=[ens_parser])

View the full list of available parsers here.

Custom Adapters

By default, the pipeline is run with all available adapters. Each adapter has a default host prefix and is configured with the following args: {"pool_connections": 100, "pool_maxsize": 1000, "max_retries": 0}

You can also customize the pipeline to only use a subset of the adapters. For instance, if you wanted to build a metadata indexer that only indexes onchain metadata, you may opt to only use the IPFS, ARWeave, and DataURI adapters.

There are two ways to configure custom adapters for the pipeline:

Specifying Adapter Configs

from offchain.metadata.adapters import ARWeaveAdapter, DataURIAdapter, HTTPAdapter, IPFSAdapter
from offchain.metadata.pipelines.metadata_pipeline import AdapterConfig, MetadataPipeline


adapter_configs = [
    AdapterConfig(
        adapter_cls=ARWeaveAdapter,
        mount_prefixes=["ar://"],
        host_prefixes=["https://arweave.net/"],
        kwargs={"pool_connections": 100, "pool_maxsize": 1000, "max_retries": 0},
    ),
    AdapterConfig(adapter_cls=DataURIAdapter, mount_prefixes=["data:"]),
]

pipeline = MetadataPipeline(adapter_configs=adapter_configs)

View the full list of available adapters here.

Mounting Custom Adapters

from offchain.metadata.adapters import IPFSAdapter
from offchain.metadata.pipelines.metadata_pipeline import MetadataPipeline

pipeline = MetadataPipeline()
pipeline.mount_adapter(
    adapter=IPFSAdapter(
        host_prefixes=[MY_CUSTOM_IPFS_HOST],
        pool_connections=100,
        pool_maxsize=1000,
        max_retries=0,
    ),
    url_prefixes=[
        "ipfs://",
        "https://gateway.pinata.cloud/",
        "https://ipfs.io/",
    ],
)