概述

RHI封装了众多图形API(DirectX、OpenGL、Vulkan、Metal)之间的差异,为Game和Renderer模块提供了简便且一致的概念、数据、资源和接口,实现一份渲染代码跑在多个平台的目标。

RHI线程的工作是将渲染线程生成的RHI中间指令转译到对应指定图形API的GPU指令。如果渲染线程是并行生成的RHI中间指令,那么RHI线程也会并行转译。

ParallelCommandlistGeneration

同步

UE的渲染流程中,最多存在4种工作线程:游戏线程(Game Thread)、渲染线程(Render Thread)、RHI线程和GPU。

游戏线程是整个引擎的驱动者,提供所有的源数据和事件,以驱动渲染线程和RHI线程。游戏线程领先渲染线程不超过1帧,更具体地说如果第N帧的渲染线程在第N+1帧的游戏线程的Tick结束时还没有完成,那么游戏线程会被渲染线程卡住。反之,如果游戏线程负载过重,没能及时发送事件和数据给渲染线程,也会导致渲染线程卡住。

渲染线程负责产生RHI的中间命令,在适当的时机派发、刷新指令到RHI线程。因此,渲染线程的卡顿也可能导致RHI的卡顿。

RHI线程负责派发、转译、提交指令,且渲染的最后一步需要SwapBuffer,这一步需要等待GPU完成渲染工作。因此,渲染GPU的繁忙也会导致RHI线程的卡顿。

MultiThreads

实现

基本类型

RHIClassesDiagram

RHIResource

FRHIResource抽象了GPU侧的资源,也是各类RHI资源类型的父类。FRHIResource提供了:引用计数、延迟删除及追踪、运行时数据和标记等功能。其子类包括各种State资源、Shader、Uniform Buffer、Texture等。

RHICommand

FRHICommand是RHI模块的渲染指令基类,这些指令通常由渲染线程通过命令队列Push到RHI线程,在合适的时机由RHI线程执行。

FRHICommandListBase定义了命令队列所需的基本数据(命令列表、设备上下文)和接口(命令的刷新、等待、入队、派发等)。

FRHIComputeCommandList继承自FRHICommandListBase,定义了计算着色器相关的接口、GPU资源状态转换和着色器部分参数的设置。

FRHICommandList继承自FRHIComputeCommandList,定义了普通渲染管线的接口,包含VS、PS、GS的绑定,图元绘制,更多着色器参数的设置和资源状态转换,资源创建、更新和等待等。

FRHICommandContext & FDynamicRHI

IRHICommandContext是RHI的命令上下文接口类,定义了一组图形API相关的操作。FDynamicRHI是由动态绑定的RHI实现的接口。

传统图形API(D3D11、OpenGL)除了继承FDynamicRHI,还需要继承IRHICommandContextPSOFallback,因为需要借助后者的接口处理PSO的数据和行为,以保证传统和现代API对PSO的一致处理行为。而现代图形API(D3D12、Vulkan、Metal)不需要继承IRHICommandContext的任何继承体系的类型,单单直接继承FDynamicRHI就可以处理RHI层的所有数据和操作。

RHI命令执行

  • 首先调用 FRHICommandList::DrawPrimitive,在其中根据是否开启Bypass进行不同处理。
    • 若开启ByPass,则FRHICommandList将直接调用图形API上下文的接口,相当于同步调用图形API,此时的图形API运行于渲染线程
    • 若未开启ByPass,则使用ALLOC_COMMAND宏构造 FRHICommandDrawPrimitive 命令,ALLOC_COMMAND展开后实际上是一个placement new,并且其中使用FRHICOMMAND_MACRO宏构造的FRHICommandDrawPrimitive命令,该命令并不会立即执行,而是通过placement new加入FRHICommandListBase维护的命令链表。
// Engine\Source\Runtime\RHI\Public\RHICommandList.h

class FRHICommandList : public FRHIComputeCommandList
{
public:
    (...)

    FORCEINLINE_DEBUGGABLE void DrawPrimitive(uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances)
    {
        //check(IsOutsideRenderPass());
        if (Bypass())
        {
            GetContext().RHIDrawPrimitive(BaseVertexIndex, NumPrimitives, NumInstances);
            return;
        }
        ALLOC_COMMAND(FRHICommandDrawPrimitive)(BaseVertexIndex, NumPrimitives, NumInstances);
    }

    (...)
};

FRHICOMMAND_MACRO(FRHICommandDrawPrimitive)
{
    uint32 BaseVertexIndex;
    uint32 NumPrimitives;
    uint32 NumInstances;
    FORCEINLINE_DEBUGGABLE FRHICommandDrawPrimitive(uint32 InBaseVertexIndex, uint32 InNumPrimitives, uint32 InNumInstances)
        : BaseVertexIndex(InBaseVertexIndex)
        , NumPrimitives(InNumPrimitives)
        , NumInstances(InNumInstances)
    {
    }
    RHI_API void Execute(FRHICommandListBase& CmdList);
};
  • FRHICommandDrawPrimitive::Execute
void FRHICommandDrawPrimitive::Execute(FRHICommandListBase& CmdList)
{
    RHISTAT(DrawPrimitive);
    INTERNAL_DECORATOR(RHIDrawPrimitive)(BaseVertexIndex, NumPrimitives, NumInstances);
}

其中调用对应IRHICommandContext的RHIDrawPrimitive函数,例如:

FD3D12CommandContextRedirector::RHIDrawPrimitive --> FD3D12CommandContext::RHIDrawPrimitive

并行执行

FParallelCommandListSet

// Engine\Source\Runtime\Renderer\Private\SceneRendering.h

class FParallelCommandListSet
{
public:
    const FRDGPass* Pass; // RDG Pass
    const FViewInfo& View; // 所属的视图
    FRHICommandListImmediate& ParentCmdList; // 父命令队列
    TStatId ExecuteStat;
    int32 Width;
    int32 NumAlloc;
    int32 MinDrawsPerCommandList;

private:
    // 命令队列列表
    TArray<FRHICommandListImmediate::FQueuedCommandList, SceneRenderingAllocator> QueuedCommandLists;

protected:
    // 派发, 须由子类调用
    void Dispatch(bool bHighPriority = false);
    FRHICommandList* AllocCommandList();
    bool bHasRenderPasses;

public:
    FParallelCommandListSet(const FRDGPass* InPass, TStatId InExecuteStat, const FViewInfo& InView, FRHICommandListImmediate& InParentCmdList, bool bHasRenderPasses = true);
    virtual ~FParallelCommandListSet();

    int32 NumParallelCommandLists() const // 获取命令队列数量
    {
        return QueuedCommandLists.Num();
    }

    FRHICommandList* NewParallelCommandList(); // 新建一个并行的命令队列

    FORCEINLINE FGraphEventArray* GetPrereqs() // 获取前序任务
    {
        return nullptr;
    }

    // 增加并行的命令队列
    void AddParallelCommandList(FRHICommandList* CmdList, FGraphEventRef& CompletionEvent, int32 InNumDrawsIfKnown = -1);

    virtual void SetStateOnCommandList(FRHICommandList& CmdList) {}
};
并行执行过程

以UE Mesh Drawing Pipeline中FDeferredShadingSceneRenderer::RenderPrePass为例

在其中构造FRDGParallelCommandListSet(FParallelCommandListSet的子类)实例,然后ParallelMeshDrawCommandPasses::DispatchDraw,以FRDGParallelCommandListSet实例为参数传入。

// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

void FDeferredShadingSceneRenderer::RenderPrePass(FRDGBuilder& GraphBuilder, TArrayView<FViewInfo> InViews, FRDGTextureRef SceneDepthTexture, FInstanceCullingManager& InstanceCullingManager, FRDGTextureRef* FirstStageDepthBuffer)
{
    (...)

    const bool bParallelDepthPass = GRHICommandList.UseParallelAlgorithms() && CVarParallelPrePass.GetValueOnRenderThread();

    (...)

    auto RenderDepthPass = [&](uint8 DepthMeshPass)
    {
        check(DepthMeshPass == EMeshPass::DepthPass || DepthMeshPass == EMeshPass::SecondStageDepthPass);
        const bool bSecondStageDepthPass = DepthMeshPass == EMeshPass::SecondStageDepthPass;

        if (bParallelDepthPass)
        {
            RDG_WAIT_FOR_TASKS_CONDITIONAL(GraphBuilder, IsDepthPassWaitForTasksEnabled());

            // 遍历所有view,每个view都渲染一次深度Pass
            for (int32 ViewIndex = 0; ViewIndex < InViews.Num(); ++ViewIndex)
            {
                FViewInfo& View = InViews[ViewIndex];

                // 处理深度Pass的渲染资源和状态。
                RDG_GPU_MASK_SCOPE(GraphBuilder, View.GPUMask);
                RDG_EVENT_SCOPE_CONDITIONAL(GraphBuilder, InViews.Num() > 1, "View%d", ViewIndex);

                FMeshPassProcessorRenderState DrawRenderState;
                SetupDepthPassState(DrawRenderState);

                const bool bShouldRenderView = View.ShouldRenderView() && (bSecondStageDepthPass ? View.bUsesSecondStageDepthPass : true);
                if (bShouldRenderView)
                {
                    View.BeginRenderView();

                    FDepthPassParameters* PassParameters = GetDepthPassParameters(GraphBuilder, View, SceneDepthTexture);
                    View.ParallelMeshDrawCommandPasses[DepthMeshPass].BuildRenderingCommands(GraphBuilder, Scene->GPUScene, PassParameters->InstanceCullingDrawParams);

                    GraphBuilder.AddPass(
                        bSecondStageDepthPass ? RDG_EVENT_NAME("SecondStageDepthPassParallel") : RDG_EVENT_NAME("DepthPassParallel"),
                        PassParameters,
                        ERDGPassFlags::Raster | ERDGPassFlags::SkipRenderPass,
                        [this, &View, PassParameters, DepthMeshPass](const FRDGPass* InPass, FRHICommandListImmediate& RHICmdList)
                    {
                        // 构造绘制指令存储容器
                        FRDGParallelCommandListSet ParallelCommandListSet(InPass, RHICmdList, GET_STATID(STAT_CLP_Prepass), View, FParallelCommandListBindings(PassParameters));
                        ParallelCommandListSet.SetHighPriority();
                        // 触发并行绘制
                        View.ParallelMeshDrawCommandPasses[DepthMeshPass].DispatchDraw(&ParallelCommandListSet, RHICmdList, &PassParameters->InstanceCullingDrawParams);
                    });

                    RenderPrePassEditorPrimitives(GraphBuilder, View, PassParameters, DrawRenderState, DepthPass.EarlyZPassMode, InstanceCullingManager);
                }
            }
        }
        else
        {
            (...)
        }
    };

    (...)
}

在ParallelMeshDrawCommandPasses::DispatchDraw中,利用传入的FRDGParallelCommandListSet处理前序任务,并构建新的命令队列,用此作为参数之一,构建任务FDrawVisibleMeshCommandsAnyThreadTask,并获得事件对象,并添加事件到FRDGParallelCommandListSet实例中

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchDraw(FParallelCommandListSet* ParallelCommandListSet, FRHICommandList& RHICmdList, const FInstanceCullingDrawParams* InstanceCullingDrawParams) const
{
    (...)

    if (ParallelCommandListSet)
    {
        const ENamedThreads::Type RenderThread = ENamedThreads::GetRenderThread();

        // 处理前序任务
        FGraphEventArray Prereqs;
        if (ParallelCommandListSet->GetPrereqs())
        {
            Prereqs.Append(*ParallelCommandListSet->GetPrereqs());
        }
        if (TaskEventRef.IsValid())
        {
            Prereqs.Add(TaskEventRef);
        }

        // 构造与工作线程数量相同的并行绘制任务数
        const int32 NumThreads = FMath::Min<int32>(FTaskGraphInterface::Get().GetNumWorkerThreads(), ParallelCommandListSet->Width);
        const int32 NumTasks = FMath::Min<int32>(NumThreads, FMath::DivideAndRoundUp(MaxNumDraws, ParallelCommandListSet->MinDrawsPerCommandList));
        const int32 NumDrawsPerTask = FMath::DivideAndRoundUp(MaxNumDraws, NumTasks);

        // 遍历NumTasks次,构造NumTasks个绘制任务(FDrawVisibleMeshCommandsAnyThreadTask)实例
        for (int32 TaskIndex = 0; TaskIndex < NumTasks; TaskIndex++)
        {
            const int32 StartIndex = TaskIndex * NumDrawsPerTask;
            const int32 NumDraws = FMath::Min(NumDrawsPerTask, MaxNumDraws - StartIndex);
            checkSlow(NumDraws > 0);
            // 新建命令队列
            FRHICommandList* CmdList = ParallelCommandListSet->NewParallelCommandList();

            // 构造FDrawVisibleMeshCommandsAnyThreadTask实例并加入TaskGraph中
            // 其中TaskContext.MeshDrawCommands就是上一部分由FMeshPassProcessor生成的
            FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FDrawVisibleMeshCommandsAnyThreadTask>::CreateTask(&Prereqs, RenderThread)
                .ConstructAndDispatchWhenReady(*CmdList, TaskContext.InstanceCullingContext, TaskContext.MeshDrawCommands, TaskContext.MinimalPipelineStatePassSet,
                    OverrideArgs,
                    TaskContext.InstanceFactor,
                    TaskIndex, NumTasks);

            // 将事件加入ParallelCommandListSet,以便追踪深度Pass的并行绘制是否完成。
            ParallelCommandListSet->AddParallelCommandList(CmdList, AnyThreadCompletionEvent, NumDraws);
        }
    }
    else
    {
        (...)
    }
}

当FRDGParallelCommandListSet被析构时,会调用其Dispatch函数真正地分发命令队列。

// Engine\Source\Runtime\Renderer\Private\SceneRendering.h

class FRDGParallelCommandListSet final : public FParallelCommandListSet
{
public:
    ~FRDGParallelCommandListSet() override
    {
        Dispatch(bHighPriority);
    }
    (...)
}

// Engine\Source\Runtime\Renderer\Private\SceneRendering.cpp

void FParallelCommandListSet::Dispatch(bool bHighPriority)
{
    (...)

    if (bActuallyDoParallelTranslate)
    {
        check(GRHISupportsParallelRHIExecute);

        auto Priority = bHighPriority
            ? FRHICommandListImmediate::ETranslatePriority::High
            : FRHICommandListImmediate::ETranslatePriority::Normal;

        NumAlloc -= QueuedCommandLists.Num();
        ParentCmdList.QueueAsyncCommandListSubmit(QueuedCommandLists, Priority, (MinDrawsPerCommandList * 4) / 3);

        // #todo-renderpasses PS4 breaks if this isn't here. Why?
        SetStateOnCommandList(ParentCmdList);
        
        if (bHasRenderPasses)
        {
            ParentCmdList.EndRenderPass();
        }
    }
    else
    {
        NumAlloc -= QueuedCommandLists.Num();
        ParentCmdList.QueueAsyncCommandListSubmit(QueuedCommandLists);
    }
    QueuedCommandLists.Reset();

    (...)
}

随后进入FRHICommandListImmediate::QueueAsyncCommandListSubmit,完成并行Translate任务的创建和入队

注:UE4.26中的FParallelTranslateSetupCommandList类(设置Translate的任务)和FParallelTranslateCommandList类(真正的Translate命令队列),在UE5中已被去除。UE4中FRHICommandListBase::QueueParallelAsyncCommandListSubmit依赖上述两个类完成工作,而在UE5的FRHICommandListImmediate::QueueAsyncCommandListSubmit中直接写成了Lambda

// Engine\Source\Runtime\RHI\Private\RHICommandList.cpp

void FRHICommandListImmediate::QueueAsyncCommandListSubmit(TArrayView<FQueuedCommandList> CommandLists, ETranslatePriority ParallelTranslatePriority, int32 MinDrawsPerTranslate)
{
    check(IsInRenderingThread());

    if (CommandLists.Num() == 0)
        return;

    for (FQueuedCommandList const& QueuedCmdList : CommandLists)
    {
        check(QueuedCmdList.CmdList);

        // Accumulate dispatch ready events into the WaitOutstandingTasks list.
        // This is used by FRHICommandListImmediate::WaitForTasks() when the render thread 
        // wants to block until all parallel RHICmdList recording tasks are completed.
        WaitOutstandingTasks.Add(QueuedCmdList.CmdList->DispatchEvent);
    }

    if (ParallelTranslatePriority != ETranslatePriority::Disabled && GRHISupportsParallelRHIExecute && IsRunningRHIInSeparateThread())
    {
        // The provided RHI command lists will be translated to platform command lists in parallel.
        
        // Commands may already be queued on the immediate command list. These need to be executed
        // first before any parallel commands can be inserted, otherwise commands will run out-of-order.
        ExecuteAndReset(false);
        InitializeImmediateContexts();

        struct FTask
        {
            FGraphEventRef Event;
            TArrayView<FRHICommandListBase*> InCmdLists;
            TArray<IRHIPlatformCommandList*, TInlineAllocator<GetRHIPipelineCount()>> OutCmdLists;
            FRHIDrawStats Stats;
        };

        uint32 NumTasks = 0;
        TArrayView<FTask> Tasks = AllocArrayUninitialized<FTask>(CommandLists.Num());

        const bool bMerge = !!CVarRHICmdMergeSmallDeferredContexts.GetValueOnRenderThread();
        for (int32 RangeStart = 0, RangeEnd = 0; RangeStart < CommandLists.Num(); RangeStart = RangeEnd)
        {
            (...)

            // Gather the list of active pipelines and prerequisites for this batch of command lists
            FGraphEventArray Prereqs;
            for (int32 Index = 0; Index < NumCmdListsInBatch; ++Index)
            {
                FRHICommandListBase* CmdList = CommandLists[RangeStart + Index].CmdList;

                Task.InCmdLists[Index] = CmdList;
                Prereqs.Add(CmdList->DispatchEvent);
            }

            (...)

            // Start a parallel translate task to replay the command list batch into the given pipeline contexts
            Task.Event = FFunctionGraphTask::CreateAndDispatchWhenReady(
                [&Task, GPUStatsInitial = PersistentState.Stats]()
                {
                    FOptionalTaskTagScope Scope(ETaskTag::EParallelRhiThread);
                    SCOPE_CYCLE_COUNTER(STAT_ParallelTranslate);
                    SCOPED_NAMED_EVENT(FParallelTranslateCommandList_DoTask, FColor::Magenta);

                    TRHIPipelineArray<IRHIComputeContext*> Contexts = {};

                    // Replay the recorded commands. The Contexts array accumulates any used
                    // contexts depending on the SwitchPipeline commands that were recorded.
                    for (FRHICommandListBase* RHICmdList : Task.InCmdLists)
                    {
                        // Redirect the output stats to this parallel task's copy
                        FPersistentState::FGPUStats Stats = GPUStatsInitial;
                        Stats.Ptr = &Task.Stats;

                        RHICmdList->Execute(Contexts, &Stats);
                        delete RHICmdList;
                    }

                    // Convert the completed contexts into IRHIPlatformCommandList instances.
                    // These are submitted by the RHI thread waiting on this translate task.
                    for (IRHIComputeContext* Context : Contexts)
                    {
                        if (Context)
                        {
                            IRHIPlatformCommandList* CommandList = GDynamicRHI->RHIFinalizeContext(Context);
                            if (CommandList)
                            {
                                Task.OutCmdLists.Add(CommandList);
                            }
                        }
                    }
                }
                , QUICK_USE_CYCLE_STAT(FParallelTranslateCommandList, STATGROUP_TaskGraphTasks)
                , &Prereqs
                , ParallelTranslatePriority == ETranslatePriority::High
                    ? CPrio_FParallelTranslateCommandListPrepass.Get()
                    : CPrio_FParallelTranslateCommandList.Get()
            );
        }

        // Resize the tasks array view to how many tasks we actually created after merging
        Tasks = TArrayView<FTask>(Tasks.GetData(), NumTasks);

        // Finally, add an RHI thread task to submit the completed platform command lists.
        // The task blocks for each parallel translate completion, in the order they will be submitted in.
        EnqueueLambda(TEXT("SubmitParallelCommandLists"), [Tasks](FRHICommandListBase& ExecutingCmdList)
        {
            TArray<IRHIPlatformCommandList*> AllCmdLists;

            for (FTask& Task : Tasks)
            {
                if (!Task.Event->IsComplete())
                {
                    SCOPE_CYCLE_COUNTER(STAT_ParallelTranslateWait);
                    Task.Event->Wait();
                }

                AllCmdLists.Append(Task.OutCmdLists);
                ExecutingCmdList.PersistentState.Stats.Ptr->Accumulate(Task.Stats);

                Task.~FTask();
            }

            if (AllCmdLists.Num())
            {
                GDynamicRHI->RHISubmitCommandLists(AllCmdLists, false);
            }
        });
    }
    else
    {
        // Commands will be executed directly on the RHI thread / default contexts
        (...)
    }
}

Reference

剖析虚幻渲染体系(10)- RHI

UE4之RHI图形API封装

01 UE5.1 C++引擎源码分析 渲染 RHI资源