is this where multi modal model perform better?

    All notes