Eliminating common subexpression from machine code

The aim of the CSE algorithm is to eliminate common subexpressions to make machine code compact and remove unnecessary, duplicate code. Let's look at the code in the LLVM trunk to understand how it is implemented. The detailed code is in the lib/CodeGen/MachineCSE.cpp file.

How to do it…

  1. The MachineCSE class runs on a machine function, and hence it should inherit the MachineFunctionPass class. It has various members, such as TargetInstructionInfo, which is used to get information about the target instruction (used in performing CSE); TargetRegisterInfo, which is used to get information about the target register (whether it belongs to a reserved register class, or to more such similar classes; and MachineDominatorTree, which is used to get information about the dominator tree for the machine block:
    class MachineCSE : public MachineFunctionPass {
        const TargetInstrInfo *TII;
        const TargetRegisterInfo *TRI;
        AliasAnalysis *AA;
        MachineDominatorTree *DT;
        MachineRegisterInfo *MRI;
  2. The constructor for this class is defined as follows, which initializes the pass:
    public:
        static char ID; // Pass identification
        MachineCSE() : MachineFunctionPass(ID), LookAheadLimit(5), CurrVN(0) {
          initializeMachineCSEPass(*PassRegistry::getPassRegistry());
        }
  3. The getAnalysisUsage() function determines which passes will run before this pass to get statistics that can be used in this pass:
        void getAnalysisUsage(AnalysisUsage &AU) const override {
          AU.setPreservesCFG();
          MachineFunctionPass::getAnalysisUsage(AU);
          AU.addRequired<AliasAnalysis>();
          AU.addPreservedID(MachineLoopInfoID);
          AU.addRequired<MachineDominatorTree>();
          AU.addPreserved<MachineDominatorTree>();
        }
  4. Declare some helper functions in this pass to check for simple copy propagation and trivially dead definitions, check for the liveness of physical registers and their definition uses, and so on:
      private:
    …..
    …..
    
    bool PerformTrivialCopyPropagation(MachineInstr *MI,
                                       MachineBasicBlock *MBB);
    
    bool isPhysDefTriviallyDead(unsigned Reg,
             MachineBasicBlock::const_iterator I,
             MachineBasicBlock::const_iterator E) const;
    
    bool hasLivePhysRegDefUses(const MachineInstr *MI,
                         const MachineBasicBlock *MBB,
                       SmallSet<unsigned,8> &PhysRefs,
                  SmallVectorImpl<unsigned> &PhysDefs,
                              bool &PhysUseDef) const;
    
    bool PhysRegDefsReach(MachineInstr *CSMI, MachineInstr *MI,
                          SmallSet<unsigned,8> &PhysRefs, SmallVectorImpl<unsigned> &PhysDefs,
                          bool &NonLocal) const;
  5. Some more helper functions help to determine the legality and profitability of the expression being a CSE candidate:
        bool isCSECandidate(MachineInstr *MI);
        bool isProfitableToCSE(unsigned CSReg, unsigned Reg,
                         MachineInstr *CSMI, MachineInstr *MI);
    
    Actual CSE performing function
        bool PerformCSE(MachineDomTreeNode *Node);

Let's look at the actual implementation of a CSE function:

  1. The runOnMachineFunction() function is called first as the pass runs:
    bool MachineCSE::runOnMachineFunction(MachineFunction &MF){
      if (skipOptnoneFunction(*MF.getFunction()))
        return false;
    
      TII = MF.getSubtarget().getInstrInfo();
      TRI = MF.getSubtarget().getRegisterInfo();
      MRI = &MF.getRegInfo();
      AA = &getAnalysis<AliasAnalysis>();
      DT = &getAnalysis<MachineDominatorTree>();
      return PerformCSE(DT->getRootNode());
    }
  2. The PerformCSE() function is called next. It takes the root node of the DomTree, performs a DFS walk on the DomTree (starting from the root node), and populates a work list consisting of the nodes of the DomTree. After the DFS traverses through the DomTree, it processes the MachineBasicBlock class corresponding to each node in the work list:
    bool MachineCSE::PerformCSE(MachineDomTreeNode *Node) {
      SmallVector<MachineDomTreeNode*, 32> Scopes;
      SmallVector<MachineDomTreeNode*, 8> WorkList;
      DenseMap<MachineDomTreeNode*, unsigned> OpenChildren;
    
      CurrVN = 0;
    // DFS to populate worklist
      WorkList.push_back(Node);
      do {
        Node = WorkList.pop_back_val();
        Scopes.push_back(Node);
        const std::vector<MachineDomTreeNode*> &Children = Node->getChildren();
        unsigned NumChildren = Children.size();
        OpenChildren[Node] = NumChildren;
        for (unsigned i = 0; i != NumChildren; ++i) {
          MachineDomTreeNode *Child = Children[i];
          WorkList.push_back(Child);
        }
      } while (!WorkList.empty());
    
      // perform CSE.
      bool Changed = false;
      for (unsigned i = 0, e = Scopes.size(); i != e; ++i) {
        MachineDomTreeNode *Node = Scopes[i];
        MachineBasicBlock *MBB = Node->getBlock();
        EnterScope(MBB);
        Changed |= ProcessBlock(MBB);
        ExitScopeIfDone(Node, OpenChildren);
      }
    
      return Changed;
    }
  3. The next important function is the ProcessBlock() function, which acts on the machine basic block. The instructions in the MachineBasicBlock class are iterated and checked for legality and profitability if they can be a CSE candidate:
    bool MachineCSE::ProcessBlock(MachineBasicBlock *MBB) {
      bool Changed = false;
    
      SmallVector<std::pair<unsigned, unsigned>, 8> CSEPairs;
      SmallVector<unsigned, 2> ImplicitDefsToUpdate;
    
    // Iterate over each Machine instructions in the MachineBasicBlock
      for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end(); I != E; ) {
        MachineInstr *MI = &*I;
        ++I;
    
    // Check if this can be a CSE candidate.
        if (!isCSECandidate(MI))
          continue;
    
    
        bool FoundCSE = VNT.count(MI);
        if (!FoundCSE) {
          // Using trivial copy propagation to find more CSE opportunities.
          if (PerformTrivialCopyPropagation(MI, MBB)) {
            Changed = true;
    
            // After coalescing MI itself may become a copy.
            if (MI->isCopyLike())
              continue;
    
            // Try again to see if CSE is possible.
            FoundCSE = VNT.count(MI);
          }
        }
    
        bool Commuted = false;
        if (!FoundCSE && MI->isCommutable()) {
          MachineInstr *NewMI = TII->commuteInstruction(MI);
          if (NewMI) {
            Commuted = true;
            FoundCSE = VNT.count(NewMI);
            if (NewMI != MI) {
              // New instruction. It doesn't need to be kept.
              NewMI->eraseFromParent();
              Changed = true;
            } else if (!FoundCSE)
              // MI was changed but it didn't help, commute it back!
              (void)TII->commuteInstruction(MI);
          }
        }
    
        // If the instruction defines physical registers and the values *may* be
        // used, then it's not safe to replace it with a common subexpression.
        // It's also not safe if the instruction uses physical registers.
        bool CrossMBBPhysDef = false;
        SmallSet<unsigned, 8> PhysRefs;
        SmallVector<unsigned, 2> PhysDefs;
        bool PhysUseDef = false;
    
    // Check if this instruction has been marked for CSE. Check if it is using physical register, if yes then mark as non-CSE candidate
     if (FoundCSE && hasLivePhysRegDefUses(MI, MBB, PhysRefs,
                                                    PhysDefs, PhysUseDef)) {
          FoundCSE = false;
    …
    …
        }
    
        if (!FoundCSE) {
          VNT.insert(MI, CurrVN++);
          Exps.push_back(MI);
          continue;
        }
    
        // Finished job of determining if there exists a common subexpression.
      // Found a common subexpression, eliminate it.
        unsigned CSVN = VNT.lookup(MI);
        MachineInstr *CSMI = Exps[CSVN];
        DEBUG(dbgs() << "Examining: " << *MI);
        DEBUG(dbgs() << "*** Found a common subexpression: " << *CSMI);
    
        // Check if it's profitable to perform this CSE.
        bool DoCSE = true;
        unsigned NumDefs = MI->getDesc().getNumDefs() +
                           MI->getDesc().getNumImplicitDefs();
    
        for (unsigned i = 0, e = MI->getNumOperands(); NumDefs && i != e; ++i) {
          MachineOperand &MO = MI->getOperand(i);
          if (!MO.isReg() || !MO.isDef())
            continue;
          unsigned OldReg = MO.getReg();
          unsigned NewReg = CSMI->getOperand(i).getReg();
    
          // Go through implicit defs of CSMI and MI, if a def is not dead at MI,
          // we should make sure it is not dead at CSMI.
          if (MO.isImplicit() && !MO.isDead() && CSMI->getOperand(i).isDead())
            ImplicitDefsToUpdate.push_back(i);
          if (OldReg == NewReg) {
            --NumDefs;
            continue;
          }
    
          assert(TargetRegisterInfo::isVirtualRegister(OldReg) &&
                 TargetRegisterInfo::isVirtualRegister(NewReg) &&
                 "Do not CSE physical register defs!");
    
          if (!isProfitableToCSE(NewReg, OldReg, CSMI, MI)) {
            DEBUG(dbgs() << "*** Not profitable, avoid CSE!
    ");
            DoCSE = false;
            break;
          }
    
          // Don't perform CSE if the result of the old instruction cannot exist
          // within the register class of the new instruction.
          const TargetRegisterClass *OldRC = MRI->getRegClass(OldReg);
          if (!MRI->constrainRegClass(NewReg, OldRC)) {
            DEBUG(dbgs() << "*** Not the same register class, avoid CSE!
    ");
            DoCSE = false;
            break;
          }
    
          CSEPairs.push_back(std::make_pair(OldReg, NewReg));
          --NumDefs;
        }
    
        // Actually perform the elimination.
        if (DoCSE) {
          for (unsigned i = 0, e = CSEPairs.size(); i != e; ++i) {
            MRI->replaceRegWith(CSEPairs[i].first, CSEPairs[i].second);
            MRI->clearKillFlags(CSEPairs[i].second);
          }
    
          // Go through implicit defs of CSMI and MI, if a def is not dead at MI,
          // we should make sure it is not dead at CSMI.
          for (unsigned i = 0, e = ImplicitDefsToUpdate.size(); i != e; ++i)
            CSMI->getOperand(ImplicitDefsToUpdate[i]).setIsDead(false);
    
          if (CrossMBBPhysDef) {
            // Add physical register defs now coming in from a predecessor to MBB
            // livein list.
            while (!PhysDefs.empty()) {
              unsigned LiveIn = PhysDefs.pop_back_val();
              if (!MBB->isLiveIn(LiveIn))
                MBB->addLiveIn(LiveIn);
            }
            ++NumCrossBBCSEs;
          }
    
          MI->eraseFromParent();
          ++NumCSEs;
          if (!PhysRefs.empty())
            ++NumPhysCSEs;
          if (Commuted)
            ++NumCommutes;
          Changed = true;
        } else {
          VNT.insert(MI, CurrVN++);
          Exps.push_back(MI);
        }
        CSEPairs.clear();
        ImplicitDefsToUpdate.clear();
      }
    
      return Changed;
    }
  4. Let's also look into the legality and profitability functions to determine the CSE candidates:
    bool MachineCSE::isCSECandidate(MachineInstr *MI) {
    // If Machine Instruction is PHI, or inline ASM or implicit defs, it is not a candidate for CSE.
    
      if (MI->isPosition() || MI->isPHI() || MI->isImplicitDef() || MI->isKill() ||
          MI->isInlineAsm() || MI->isDebugValue())
        return false;
    
      // Ignore copies.
      if (MI->isCopyLike())
        return false;
    
      // Ignore instructions that we obviously can't move.
      if (MI->mayStore() || MI->isCall() || MI->isTerminator() || MI->hasUnmodeledSideEffects())
        return false;
    
      if (MI->mayLoad()) {
        // Okay, this instruction does a load. As a refinement, we allow the target
        // to decide whether the loaded value is actually a constant. If so, we can
        // actually use it as a load.
        if (!MI->isInvariantLoad(AA))
          return false;
      }
      return true;
    }
  5. The profitability function is written as follows:
    bool MachineCSE::isProfitableToCSE(unsigned CSReg, unsigned Reg,
      MachineInstr *CSMI, MachineInstr *MI) {
    
      // If CSReg is used at all uses of Reg, CSE should not increase register
      // pressure of CSReg.
      bool MayIncreasePressure = true;
      if (TargetRegisterInfo::isVirtualRegister(CSReg) &&
          TargetRegisterInfo::isVirtualRegister(Reg)) {
        MayIncreasePressure = false;
        SmallPtrSet<MachineInstr*, 8> CSUses;
        for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {
          CSUses.insert(&MI);
        }
        for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {
          if (!CSUses.count(&MI)) {
            MayIncreasePressure = true;
            break;
          }
        }
      }
      if (!MayIncreasePressure) return true;
    
      // Heuristics #1: Don't CSE "cheap" computation if the def is not local or in
      // an immediate predecessor. We don't want to increase register pressure and
      // end up causing other computation to be spilled.
      if (TII->isAsCheapAsAMove(MI)) {
        MachineBasicBlock *CSBB = CSMI->getParent();
        MachineBasicBlock *BB = MI->getParent();
        if (CSBB != BB && !CSBB->isSuccessor(BB))
          return false;
      }
    
      // Heuristics #2: If the expression doesn't not use a vr and the only use
      // of the redundant computation are copies, do not cse.
      bool HasVRegUse = false;
      for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
        const MachineOperand &MO = MI->getOperand(i);
        if (MO.isReg() && MO.isUse() &&
            TargetRegisterInfo::isVirtualRegister(MO.getReg())) {
          HasVRegUse = true;
          break;
        }
      }
      if (!HasVRegUse) {
        bool HasNonCopyUse = false;
        for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {
          // Ignore copies.
          if (!MI.isCopyLike()) {
            HasNonCopyUse = true;
            break;
          }
        }
        if (!HasNonCopyUse)
          return false;
      }
    
      // Heuristics #3: If the common subexpression is used by PHIs, do not reuse
      // it unless the defined value is already used in the BB of the new use.
      bool HasPHI = false;
      SmallPtrSet<MachineBasicBlock*, 4> CSBBs;
      for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {
        HasPHI |= MI.isPHI();
        CSBBs.insert(MI.getParent());
      }
    
      if (!HasPHI)
        return true;
      return CSBBs.count(MI->getParent());
    }

How it works…

The MachineCSE pass runs on a machine function. It gets the DomTree information and then traverses the DomTree in the DFS way, creating a work list of nodes that are essentially MachineBasicBlocks. It then processes each block for CSE. In each block, it iterates through all the instructions and checks whether any instruction is a candidate for CSE. Then it checks whether it is profitable to eliminate the identified expression. Once it has found that the identified CSE is profitable to eliminate, it eliminates the MachineInstruction class from the MachineBasicBlock class. It also performs a simple copy propagation of the machine instruction. In some cases, the MachineInstruction may not be a candidate for CSE in its initial run, but may become one after copy propagation.

See more

To see more machine code optimization in the SSA form, look into the implementation of the machine dead code elimination pass in the lib/CodeGen/DeadMachineInstructionElim.cpp file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220