diff options
Diffstat (limited to 'docs/tutorial/LangImpl8.rst')
-rw-r--r-- | docs/tutorial/LangImpl8.rst | 462 |
1 files changed, 0 insertions, 462 deletions
diff --git a/docs/tutorial/LangImpl8.rst b/docs/tutorial/LangImpl8.rst deleted file mode 100644 index 3b0f443f08d54..0000000000000 --- a/docs/tutorial/LangImpl8.rst +++ /dev/null @@ -1,462 +0,0 @@ -====================================== -Kaleidoscope: Adding Debug Information -====================================== - -.. contents:: - :local: - -Chapter 8 Introduction -====================== - -Welcome to Chapter 8 of the "`Implementing a language with -LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a -decent little programming language with functions and variables. -What happens if something goes wrong though, how do you debug your -program? - -Source level debugging uses formatted data that helps a debugger -translate from binary and the state of the machine back to the -source that the programmer wrote. In LLVM we generally use a format -called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding -that represents types, source locations, and variable locations. - -The short summary of this chapter is that we'll go through the -various things you have to add to a programming language to -support debug info, and how you translate that into DWARF. - -Caveat: For now we can't debug via the JIT, so we'll need to compile -our program down to something small and standalone. As part of this -we'll make a few modifications to the running of the language and -how programs are compiled. This means that we'll have a source file -with a simple program written in Kaleidoscope rather than the -interactive JIT. It does involve a limitation that we can only -have one "top level" command at a time to reduce the number of -changes necessary. - -Here's the sample program we'll be compiling: - -.. code-block:: python - - def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); - - fib(10) - - -Why is this a hard problem? -=========================== - -Debug information is a hard problem for a few different reasons - mostly -centered around optimized code. First, optimization makes keeping source -locations more difficult. In LLVM IR we keep the original source location -for each IR level instruction on the instruction. Optimization passes -should keep the source locations for newly created instructions, but merged -instructions only get to keep a single location - this can cause jumping -around when stepping through optimized programs. Secondly, optimization -can move variables in ways that are either optimized out, shared in memory -with other variables, or difficult to track. For the purposes of this -tutorial we're going to avoid optimization (as you'll see with one of the -next sets of patches). - -Ahead-of-Time Compilation Mode -============================== - -To highlight only the aspects of adding debug information to a source -language without needing to worry about the complexities of JIT debugging -we're going to make a few changes to Kaleidoscope to support compiling -the IR emitted by the front end into a simple standalone program that -you can execute, debug, and see results. - -First we make our anonymous function that contains our top level -statement be our "main": - -.. code-block:: udiff - - - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); - + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); - -just with the simple change of giving it a name. - -Then we're going to remove the command line code wherever it exists: - -.. code-block:: udiff - - @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { - /// top ::= definition | external | expression | ';' - static void MainLoop() { - while (1) { - - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: - return; - @@ -1184,7 +1183,6 @@ int main() { - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - - fprintf(stderr, "ready> "); - getNextToken(); - -Lastly we're going to disable all of the optimization passes and the JIT so -that the only thing that happens after we're done parsing and generating -code is that the llvm IR goes to standard error: - -.. code-block:: udiff - - @@ -1108,17 +1108,8 @@ static void HandleExtern() { - static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (auto FnAST = ParseTopLevelExpr()) { - - if (auto *FnIR = FnAST->codegen()) { - - // We're just doing this to make sure it executes. - - TheExecutionEngine->finalizeObject(); - - // JIT the function, returning a function pointer. - - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); - - - - // Cast it to the right type (takes no arguments, returns a double) so we - - // can call it as a native function. - - double (*FP)() = (double (*)())(intptr_t)FPtr; - - // Ignore the return value for this. - - (void)FP; - + if (!F->codegen()) { - + fprintf(stderr, "Error generating code for top level expr"); - } - } else { - // Skip token for error recovery. - @@ -1439,11 +1459,11 @@ int main() { - // target lays out data structures. - TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); - OurFPM.add(new DataLayoutPass()); - +#if 0 - OurFPM.add(createBasicAliasAnalysisPass()); - // Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass()); - @@ -1218,7 +1210,7 @@ int main() { - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - - + #endif - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - -This relatively small set of changes get us to the point that we can compile -our piece of Kaleidoscope language down to an executable program via this -command line: - -.. code-block:: bash - - Kaleidoscope-Ch8 < fib.ks | & clang -x ir - - -which gives an a.out/a.exe in the current working directory. - -Compile Unit -============ - -The top level container for a section of code in DWARF is a compile unit. -This contains the type and function data for an individual translation unit -(read: one file of source code). So the first thing we need to do is -construct one for our fib.ks file. - -DWARF Emission Setup -==================== - -Similar to the ``IRBuilder`` class we have a -`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class -that helps in constructing debug metadata for an llvm IR file. It -corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names. -Using it does require that you be more familiar with DWARF terminology than -you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you -read through the general documentation on the -`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it -should be a little more clear. We'll be using this class to construct all -of our IR level descriptions. Construction for it takes a module so we -need to construct it shortly after we construct our module. We've left it -as a global static variable to make it a bit easier to use. - -Next we're going to create a small container to cache some of our frequent -data. The first will be our compile unit, but we'll also write a bit of -code for our one type since we won't have to worry about multiple typed -expressions: - -.. code-block:: c++ - - static DIBuilder *DBuilder; - - struct DebugInfo { - DICompileUnit *TheCU; - DIType *DblTy; - - DIType *getDoubleTy(); - } KSDbgInfo; - - DIType *DebugInfo::getDoubleTy() { - if (DblTy.isValid()) - return DblTy; - - DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); - return DblTy; - } - -And then later on in ``main`` when we're constructing our module: - -.. code-block:: c++ - - DBuilder = new DIBuilder(*TheModule); - - KSDbgInfo.TheCU = DBuilder->createCompileUnit( - dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); - -There are a couple of things to note here. First, while we're producing a -compile unit for a language called Kaleidoscope we used the language -constant for C. This is because a debugger wouldn't necessarily understand -the calling conventions or default ABI for a language it doesn't recognize -and we follow the C ABI in our llvm code generation so it's the closest -thing to accurate. This ensures we can actually call functions from the -debugger and have them execute. Secondly, you'll see the "fib.ks" in the -call to ``createCompileUnit``. This is a default hard coded value since -we're using shell redirection to put our source into the Kaleidoscope -compiler. In a usual front end you'd have an input file name and it would -go there. - -One last thing as part of emitting debug information via DIBuilder is that -we need to "finalize" the debug information. The reasons are part of the -underlying API for DIBuilder, but make sure you do this near the end of -main: - -.. code-block:: c++ - - DBuilder->finalize(); - -before you dump out the module. - -Functions -========= - -Now that we have our ``Compile Unit`` and our source locations, we can add -function definitions to the debug info. So in ``PrototypeAST::codegen()`` we -add a few lines of code to describe a context for our subprogram, in this -case the "File", and the actual definition of the function itself. - -So the context: - -.. code-block:: c++ - - DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), - KSDbgInfo.TheCU.getDirectory()); - -giving us an DIFile and asking the ``Compile Unit`` we created above for the -directory and filename where we are currently. Then, for now, we use some -source locations of 0 (since our AST doesn't currently have source location -information) and construct our function definition: - -.. code-block:: c++ - - DIScope *FContext = Unit; - unsigned LineNo = 0; - unsigned ScopeLine = 0; - DISubprogram *SP = DBuilder->createFunction( - FContext, Name, StringRef(), Unit, LineNo, - CreateFunctionType(Args.size(), Unit), false /* internal linkage */, - true /* definition */, ScopeLine, DINode::FlagPrototyped, false); - F->setSubprogram(SP); - -and we now have an DISubprogram that contains a reference to all of our -metadata for the function. - -Source Locations -================ - -The most important thing for debug information is accurate source location - -this makes it possible to map your source code back. We have a problem though, -Kaleidoscope really doesn't have any source location information in the lexer -or parser so we'll need to add it. - -.. code-block:: c++ - - struct SourceLocation { - int Line; - int Col; - }; - static SourceLocation CurLoc; - static SourceLocation LexLoc = {1, 0}; - - static int advance() { - int LastChar = getchar(); - - if (LastChar == '\n' || LastChar == '\r') { - LexLoc.Line++; - LexLoc.Col = 0; - } else - LexLoc.Col++; - return LastChar; - } - -In this set of code we've added some functionality on how to keep track of the -line and column of the "source file". As we lex every token we set our current -current "lexical location" to the assorted line and column for the beginning -of the token. We do this by overriding all of the previous calls to -``getchar()`` with our new ``advance()`` that keeps track of the information -and then we have added to all of our AST classes a source location: - -.. code-block:: c++ - - class ExprAST { - SourceLocation Loc; - - public: - ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} - virtual ~ExprAST() {} - virtual Value* codegen() = 0; - int getLine() const { return Loc.Line; } - int getCol() const { return Loc.Col; } - virtual raw_ostream &dump(raw_ostream &out, int ind) { - return out << ':' << getLine() << ':' << getCol() << '\n'; - } - -that we pass down through when we create a new expression: - -.. code-block:: c++ - - LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), - std::move(RHS)); - -giving us locations for each of our expressions and variables. - -From this we can make sure to tell ``DIBuilder`` when we're at a new source -location so it can use that when we generate the rest of our code and make -sure that each instruction has source location information. We do this -by constructing another small function: - -.. code-block:: c++ - - void DebugInfo::emitLocation(ExprAST *AST) { - DIScope *Scope; - if (LexicalBlocks.empty()) - Scope = TheCU; - else - Scope = LexicalBlocks.back(); - Builder.SetCurrentDebugLocation( - DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); - } - -that both tells the main ``IRBuilder`` where we are, but also what scope -we're in. Since we've just created a function above we can either be in -the main file scope (like when we created our function), or now we can be -in the function scope we just created. To represent this we create a stack -of scopes: - -.. code-block:: c++ - - std::vector<DIScope *> LexicalBlocks; - std::map<const PrototypeAST *, DIScope *> FnScopeMap; - -and keep a map of each function to the scope that it represents (an -DISubprogram is also an DIScope). - -Then we make sure to: - -.. code-block:: c++ - - KSDbgInfo.emitLocation(this); - -emit the location every time we start to generate code for a new AST, and -also: - -.. code-block:: c++ - - KSDbgInfo.FnScopeMap[this] = SP; - -store the scope (function) when we create it and use it: - - KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); - -when we start generating the code for each function. - -also, don't forget to pop the scope back off of your scope stack at the -end of the code generation for the function: - -.. code-block:: c++ - - // Pop off the lexical block for the function since we added it - // unconditionally. - KSDbgInfo.LexicalBlocks.pop_back(); - -Variables -========= - -Now that we have functions, we need to be able to print out the variables -we have in scope. Let's get our function arguments set up so we can get -decent backtraces and see how our functions are being called. It isn't -a lot of code, and we generally handle it when we're creating the -argument allocas in ``PrototypeAST::CreateArgumentAllocas``. - -.. code-block:: c++ - - DIScope *Scope = KSDbgInfo.LexicalBlocks.back(); - DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), - KSDbgInfo.TheCU.getDirectory()); - DILocalVariable D = DBuilder->createParameterVariable( - Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true); - - DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), - DebugLoc::get(Line, 0, Scope), - Builder.GetInsertBlock()); - -Here we're doing a few things. First, we're grabbing our current scope -for the variable so we can say what range of code our variable is valid -through. Second, we're creating the variable, giving it the scope, -the name, source location, type, and since it's an argument, the argument -index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR -level that we've got a variable in an alloca (and it gives a starting -location for the variable), and setting a source location for the -beginning of the scope on the declare. - -One interesting thing to note at this point is that various debuggers have -assumptions based on how code and debug information was generated for them -in the past. In this case we need to do a little bit of a hack to avoid -generating line information for the function prologue so that the debugger -knows to skip over those instructions when setting a breakpoint. So in -``FunctionAST::CodeGen`` we add a couple of lines: - -.. code-block:: c++ - - // Unset the location for the prologue emission (leading instructions with no - // location in a function are considered part of the prologue and the debugger - // will run past them when breaking on a function) - KSDbgInfo.emitLocation(nullptr); - -and then emit a new location when we actually start generating code for the -body of the function: - -.. code-block:: c++ - - KSDbgInfo.emitLocation(Body); - -With this we have enough debug information to set breakpoints in functions, -print out argument variables, and call functions. Not too bad for just a -few simple lines of code! - -Full Code Listing -================= - -Here is the complete code listing for our running example, enhanced with -debug information. To build this example, use: - -.. code-block:: bash - - # Compile - clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy - # Run - ./toy - -Here is the code: - -.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp - :language: c++ - -`Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_ - |