{"id":3753,"date":"2022-02-09T12:02:14","date_gmt":"2022-02-09T12:02:14","guid":{"rendered":"https:\/\/wiki.thomasandsofia.com\/?p=3753"},"modified":"2022-02-09T12:02:14","modified_gmt":"2022-02-09T12:02:14","slug":"bqfbd-execution-plan-of-bigquery","status":"publish","type":"post","link":"https:\/\/wiki.thomasandsofia.com\/?p=3753","title":{"rendered":"BQFBD &#8211; Execution Plan of BigQuery"},"content":{"rendered":"<p><a href=\"\/bigquery-for-big-data\/\">Main Menu<\/a><\/p>\n<h1>Section 7: Execution Plan of BigQuery<\/h1>\n<h2>31. How BQ creates execution plan of a query<\/h2>\n<ul>\n<li>How does BQ decide what order to process query statements\n<ul>\n<li>What processes need to be performed before others<\/li>\n<li>What processes can be ran in parallel<\/li>\n<li>This is similar to EXPLAIN statements<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Example:<\/strong><\/p>\n<p>This query calculates the total scores of all employee ids present in both tables<\/p>\n<pre>select c.id, max(c.scores) as scr from (\r\n  select id, name from emp_table as emp\r\n  INNER JOIN (select id, scores from score_table) as scr\r\n  ON emp_id = scr.id\r\n) as c\r\nGROUP BY id<\/pre>\n<p><strong>Steps<\/strong><\/p>\n<ol>\n<li>Select required data from emp_table<\/li>\n<li>Select from score_table<\/li>\n<li>Join them together and aggregate the data<\/li>\n<\/ol>\n<p><strong>Optimized Execution Plan<\/strong><\/p>\n<ol>\n<li>Run steps 1 and 2 in parallel<\/li>\n<li>Run step 3<\/li>\n<\/ol>\n<p><strong>How<\/strong><\/p>\n<ul>\n<li>Steps communicate using in-memory shuffle architecture\n<ul>\n<li>BQ executes all queries completely in memory<\/li>\n<\/ul>\n<\/li>\n<li><strong>Shuffle<\/strong> is a key ingredient\n<ul>\n<li>required for execution of large and complex joins, aggregations and analytic operations<\/li>\n<li>Traditional distributed processing systems, such as MapReduce in (and?) Hadoop\n<ul>\n<li>Various mappers output was shuffled across nodes and placed under 1 node where the reduce step is executed.<\/li>\n<\/ul>\n<\/li>\n<li>Shuffling requires a lot of disk R\/W operations while consuming a lot of network bandwidth so it is slow and can be a bottleneck\n<ul>\n<li>The solution to this is in-memory shuffling<\/li>\n<\/ul>\n<\/li>\n<li>BQ stores intermediate data produced from various stages of query processing in memory in a set of nodes that are dedicated to hosting remote memory\n<ul>\n<li>This idea is not new and is common in many systems<\/li>\n<\/ul>\n<\/li>\n<li>BQ also uses memory for shuffle and repartitioning operations\n<ul>\n<li>This is different from MapReduce styles for shuffle\n<ul>\n<li>In MapReduce style shuffles, the rows in the repartitioned data can only be used after all rows have been shuffled and sorted<\/li>\n<li>In BQ, each shuffled row can be consumed by BQ workers as soon as it is created by the producers.<\/li>\n<li>This makes it possible to execute distributed operations in a pipeline.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>BQ breaks the declarative SQL statement into granular sets of stages and designs an optimized graph of execution.\n<ul>\n<li>This is dynamic and can be modified while the query is in flight.<\/li>\n<li>new stages can be introduced while a query is running\n<ul>\n<li>These stages are typically labeled Repartition stages<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Viewing the query plan<\/h3>\n<ul>\n<li>Run a query<\/li>\n<li>Click the Executive Details option<\/li>\n<\/ul>\n<h3>Deep Dive 5:00<\/h3>\n<p>There is a LOT of good information here.\u00a0 Recommend revisiting<\/p>\n<ul>\n<li>Elapsed time = total execution time<\/li>\n<li>Slots required\n<ul>\n<li>Slots run in parallel<\/li>\n<li>Think of slots as tiny processors, memory and IO<\/li>\n<\/ul>\n<\/li>\n<li>Bytes shuffled, lower the better<\/li>\n<li>Bytes to disk = overflow<\/li>\n<\/ul>\n<h2>32. Understanding Execution Plan in UI Dashboard<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22742749#questions<\/p>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3755\" src=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1.png\" alt=\"\" width=\"1000\" height=\"411\" srcset=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1.png 1000w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1-300x123.png 300w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1-768x316.png 768w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/stagetimings-1-150x62.png 150w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/p>\n<ul>\n<li>Many stages can be processed in parallel<\/li>\n<li>Some stages have to wait for others to complete before they can start.<\/li>\n<li>At each stage, workers spend their time in various actions, read, write, compute, etc.<\/li>\n<li>Can view the average time and long-tail slowest worker performance for a given classification<\/li>\n<li>Ave and Max further broken down into Absolute and Relative representations<\/li>\n<li>UI shows Relative<\/li>\n<li>If Average and Max are the same, stage was only using 1 worker.<\/li>\n<li>Wait\n<ul>\n<li>Waiting for IO access to source<\/li>\n<li>Waiting for another stage<\/li>\n<\/ul>\n<\/li>\n<li>Read\n<ul>\n<li>Includes filtering<\/li>\n<\/ul>\n<\/li>\n<li>Compute\n<ul>\n<li>Combines Compute and Aggregate<\/li>\n<li>Compute example: round(divide(multiply(subtract($2,32),5),9),1)<\/li>\n<li>Aggregate does grouping operations<\/li>\n<\/ul>\n<\/li>\n<li>Write\n<ul>\n<li>Output operations<\/li>\n<\/ul>\n<\/li>\n<li>The Execution Plan can help you understand which stages are dominating the resource utilization\n<ul>\n<li>JOIN stage generating far more output rows than input\n<ul>\n<li>This would indicate an opportunity to apply a filter before the join<\/li>\n<\/ul>\n<\/li>\n<li>By viewing the number of bytes, you can identify if the data is getting skewed somewhere.<\/li>\n<li>Good for troubleshooting\n<ul>\n<li>Queries can fail mid-execution<\/li>\n<li>Use this to identify at which stage the failure occurred.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Main Menu Section 7: Execution Plan of BigQuery 31. How BQ creates execution plan of a query How does BQ decide what order to process query statements What processes need to be performed before others What processes can be ran in parallel This is similar to EXPLAIN statements Example: This query calculates the total scores ..<\/p>\n<div class=\"clear-fix\"><\/div>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/?p=3753\" title=\"read more...\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,78],"tags":[],"class_list":["post-3753","post","type-post","status-publish","format-standard","hentry","category-bigquery","category-bigquery-for-big-data-engineers"],"_links":{"self":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3753","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3753"}],"version-history":[{"count":1,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3753\/revisions"}],"predecessor-version":[{"id":3756,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3753\/revisions\/3756"}],"wp:attachment":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3753"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3753"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3753"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}