## Desentrañando el código: La magia de la instrumentación en Python  Santa Cruz de Tenerife, Oct 07, 2023 --- ## Who is Federico Mon  [@gnufede](https://twitter.com/gnufede) --- ## 🔬 Instrumentation the measure of a product's performance, in order to diagnose errors and to write trace information (Wikipedia) --- ## 📊🐶 Datadog's APM  --- ## 📊🐶 Datadog's APM  --- ## 📊🐶 Datadog's APM Python library ```shell pip install ddtrace ``` [http://github.com/Datadog/dd-trace-py](http://github.com/Datadog/dd-trace-py) --- ## 🦠 Vuln detection (OWASP)  --- ## 🎩 What's inside the hat?  --- ## 📜 Story time --- ## our_query.py ```python def inner_query(): import sqlite3 # Define the path to your SQLite database file db_file = ":memory:" try: # Establish a connection to the SQLite database connection = sqlite3.connect(db_file) # Create a cursor object to interact with the database cursor = connection.cursor() ... ``` --- ## our_query.py ```python # Write your SQL query sql_query = """ SELECT type, name, sql FROM sqlite_master WHERE 1; """ print("executing:", sql_query) # Execute the SQL query cursor.execute(sql_query) ``` --- ## our_query.py ```python # Fetch the results (if the query returns data) results = cursor.fetchall() except sqlite3.Error as e: print("Error connecting to SQLite:", e) finally: # Close the cursor and the database connection if cursor: cursor.close() if connection: connection.close() ``` --- ## 🤚 Measuring by hand ```python import time start_time = time.time() result = func(*args, **kwargs) # inner_query() end_time = time.time() print( f"Function took {end_time - start_time} seconds" ) return result ``` --- ## 📇 Techniques --- ## Inheritance (OOP)  --- ## Inheritance (OOP) ```python class MeasuredDatabase(object): def inner_query(self): pass def query_wrapper(self): start_time = time.time() self.inner_query() end_time = time.time() print( f"Function took {end_time - start_time} seconds" ) ``` --- ## Inheritance (OOP) ```python from interface import MeasuredDatabase class MeasuredSQLite(MeasuredDatabase): def inner_query(self): ... ``` --- ## Inheritance Pros - Very straightforward - Multiple inheritance is "easy" --- ## Inheritance Cons - Code ownership required - Only class/method "inheritance" supported (no functions) --- ## Decorators  --- ## Decorators ```python from interface import timing_decorator @timing_decorator def inner_query() ... ``` --- ## Decorators ```python def timing_decorator(func): def wrapper(*args, **kwargs): start_time = time.time() result = func(*args, **kwargs) end_time = time.time() print( f"Function took {end_time - start_time} seconds" ) return result return wrapper ``` --- ## Decorators Pros - Quite straightforward - Multiple decorators supported --- ## Decorators Cons - Code ownership required --- ## More on decorators - `import functools.wraps` - `pip install wrapt` --- ## Monkeypatching  --- ## Monkeypatching ```python def inner_query_patched(): start_time = time.time() # Call the original function result = our_query._inner_query() end_time = time.time() print(f"Function inner_query took {end_time - start_time} seconds") return result # Replace the original inner_query with the patched version setattr(our_query, "_inner_query", our_query.inner_query) setattr(our_query, "inner_query", inner_query_patched) # Call inner_query function to execute the patched function query_results = our_query.inner_query() ``` --- ## Uses monkeypatching techniques - `import mock` - `import pdb` --- ## Monkeypatching Pros - Very low patching overhead --- ## Monkeypatching Cons - Multiple monkeypatching is messy --- ## More on monkeypatching - `pip install wrapt` - `pip install forbiddenfruit` --- ## Tree of Life  --- ## Abstract Syntax Tree  --- ## AST ```python >>> astpretty.pprint(ast.parse("start_time = time.time()")) Module( body=[ Assign( targets=[Name(id='start_time', ctx=Store())], value=Call( func=Attribute( value=Name(id='time', ctx=Load()), attr='time', ctx=Load(), ), args=[], keywords=[], ), ), ], ) ``` --- ## AST Patching ```python import ast class TimingVisitor(ast.NodeVisitor): def visit_FunctionDef(self, node): if node.name == "inner_query": import_time_code = ast.parse("import time") start_time_code = ast.parse("start_time = time.time()") end_time_code = ast.parse("end_time = time.time()") print_time_code = ast.parse( "print(f'Function took {end_time - start_time} seconds to execute')" ) node.body.insert(0, start_time_code.body[0]) node.body.insert(0, import_time_code.body[0]) node.body.append(end_time_code.body[0]) node.body.append(print_time_code.body[0]) self.generic_visit(node) ``` --- ## AST Patching ```python if __name__ == "__main__": # Apply AST patching to modify the inner_query method tree = ast.parse(inspect.getsource(inner_query)) TimingVisitor().visit(tree) altered_code = astor.to_source(tree) compiled_code = compile(altered_code, "
", "exec") exec(compiled_code, globals()) # Call the inner_query method to execute the SQL query query_results = inner_query() ``` --- ## Uses AST techniques - `pip install flake8` --- ## AST Patching Pros - Allows source code information (i.e. line number) - Allows fine-grained modifications (i.e. change a try/except) --- ## AST Patching Cons - Multiple AST patching is messy - Doesn't work on ``.pyc`` files - Introduces patching overhead (at startup) --- ## More on AST - `import ast` - `pip install astpretty` - `pip install astunparse` - `pip install astor` - https://greentreesnakes.readthedocs.io/en/latest/ --- ## Bytecode ``` >>> def fun(): ... import time ... start_time=time.time() ... >>> import dis ``` --- ## Bytecode ``` >>> dis.dis(fun) 2 0 LOAD_CONST 1 (0) 2 LOAD_CONST 0 (None) 4 IMPORT_NAME 0 (time) 6 STORE_FAST 0 (time) 3 8 LOAD_FAST 0 (time) 10 LOAD_METHOD 0 (time) 12 CALL_METHOD 0 14 STORE_FAST 1 (start_time) 16 LOAD_CONST 0 (None) 18 RETURN_VALUE ``` --- ## Bytecode Patching ```python from bytecode import Bytecode, Instr top_bytecode = [ Instr("LOAD_CONST", 0), Instr("LOAD_CONST", arg=None), Instr("IMPORT_NAME", arg="time"), Instr("STORE_FAST", arg="time"), Instr("LOAD_FAST", arg="time"), Instr("LOAD_METHOD", arg="time"), Instr("CALL_METHOD", arg=0), Instr("STORE_FAST", arg="start_time"), Instr("LOAD_CONST", arg=0), Instr("LOAD_CONST", arg=None), ] ``` --- ## Bytecode Patching ```python bottom_bytecode = [ Instr("LOAD_FAST", arg="time"), Instr("LOAD_METHOD", arg="time"), Instr("CALL_METHOD", arg=0), Instr("STORE_FAST", arg="end_time"), Instr("LOAD_GLOBAL", arg="print"), Instr("LOAD_CONST", arg="Function took "), Instr("LOAD_FAST", arg="end_time"), Instr("LOAD_FAST", arg="start_time"), Instr("BINARY_SUBTRACT"), Instr("FORMAT_VALUE", arg=0), Instr("LOAD_CONST", arg=" seconds to execute"), Instr("BUILD_STRING", arg=3), Instr("CALL_FUNCTION", arg=1), Instr("POP_TOP"), Instr("LOAD_CONST", arg=None), Instr("RETURN_VALUE"), ] ``` --- ## Bytecode Patching ```python # Function to add timing measurement to a function's bytecode def add_timing_measurement(func): # Get the bytecode of the function code = func.__code__ bytecode = Bytecode.from_code(code) bytecode[:2] = top_bytecode bytecode[-2:] = bottom_bytecode func.__code__ = bytecode.to_code() return func ``` --- ## Bytecode Patching ```python from our_query import inner_query # Apply bytecode patching to modify the inner_query function inner_query = add_timing_measurement(inner_query) # Call the inner_query function to execute the SQL query query_results = inner_query() ``` --- ## Uses Bytecode techniques - `pip install coveragepy` - `pip install slipcover` --- ## Bytecode Patching Pros - Allows low level access and modifications, i.e. insert a return wherever --- ## Bytecode Patching Cons - Introduces patching overhead (low) - Bytecode patching is quite messy - Hard to maintain (wrt Python versions) --- ## More on Bytecode - `import dis` - `pip install bytecode` --- ## The Prestige  --- ## ✨ Bytecode auto patcher ```python from bytecode import Bytecode, Instr def print_stmt_bytecode(fullname): return [ Instr("LOAD_NAME", arg="print"), Instr("LOAD_CONST", arg=fullname), Instr("CALL_FUNCTION", arg=1), ] def new_get_code(self, fullname): orig_code = self._orig_get_code(fullname) new_bytecode = Bytecode.from_code(orig_code) new_bytecode[:0] = print_stmt_bytecode(fullname) return new_bytecode.to_code() ``` --- ## ✨ Bytecode auto patcher ```python from importlib._bootstrap_external import ( SourceLoader, SourcelessFileLoader ) for i in SourceLoader, SourcelessFileLoader: setattr(i, "_orig_get_code", i.get_code) setattr(i, "get_code", new_get_code) ``` --- ## ✨ AST auto patcher ```python import ast def patch_ast(module, module_name): # Open and parse the source code of the module with open(module.__file__, "r") as source_file: source_code = source_file.read() # Parse the source code into an AST tree = ast.parse(source_code) # Insert your AST manipulation logic here insert_ast_print_stmt(tree, module_name) # Compile the modified AST back to source code return compile(tree, filename=module.__file__, mode="exec" ) ``` --- ## ✨ AST auto patcher ```python from importlib._bootstrap_external import SourceFileLoader def exec_module(self, module): # Patch the AST here code = patch_ast(module, self.name) # Execute the modified source code # within the module's namespace _call_with_frames_removed(exec, code, module.__dict__) setattr(SourceFileLoader, "exec_module", exec_module) ``` --- ## 📚 Sitecustomize https://docs.python.org/3.10/library/site.html > This module is automatically imported during initialization. [...] > After these path manipulations, an attempt is made to import a module named sitecustomize, which can perform arbitrary site-specific customizations. --- ## 📚 PYTHONPATH https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH > Augment the default search path for module files. The format is the same as the shell’s PATH: one or more directory pathnames separated by os.pathsep (e.g. colons on Unix or semicolons on Windows). --- ## ✨ All together now ```shell $ echo "import auto_patcher" > sitecustomize.py $ PYTHONPATH=$PYTHONPATH:. python Python 3.9.9 (main, Jun 22 2022, 12:48:18) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. rlcompleter >>> import pdb string cmd fnmatch opcode dis token [...] ``` --- ## Too much information  --- ## Fede's Recommendations  --- 1. For your own code, use decorators and OOP 2. For other's code, use monkeypatching 3. If you need to mangle with source code: AST 4. Avoid bytecode patching if you can --- ## Thanks!  [We're hiring: datadoghq.com/jobs-engineering](https://www.datadoghq.com/jobs-engineering/) --- ## Thanks!  [http://github.com/gnufede/pycones2023-the-magic-behind-python-instrumentation](https://github.com/gnufede/pycones2023-the-magic-behind-python-instrumentation)