{"id":3788,"date":"2022-02-13T23:55:48","date_gmt":"2022-02-13T23:55:48","guid":{"rendered":"https:\/\/wiki.thomasandsofia.com\/?p=3788"},"modified":"2022-02-18T21:27:59","modified_gmt":"2022-02-18T21:27:59","slug":"bqfbd-materialized-views-in-bigquery","status":"publish","type":"post","link":"https:\/\/wiki.thomasandsofia.com\/?p=3788","title":{"rendered":"BQFBD &#8211; Materialized Views in BigQuery"},"content":{"rendered":"<p><a href=\"\/bigquery-for-big-data\/\">Main Menu<\/a><\/p>\n<p>50% Complete!!<\/p>\n<h1>Section 12: Materialized Views in BigQuery<\/h1>\n<h2>51. What are Materialized Views<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22758503#overview<\/p>\n<h3>Materialized Views<\/h3>\n<ul>\n<li>Stored physically on disk\n<ul>\n<li>These do occupy space and you will be charged<\/li>\n<li>They do not rely on the base table unless you re-run or refresh the view query<\/li>\n<\/ul>\n<\/li>\n<li>Must be in the same Dataset at the source tables!<\/li>\n<li>Inherit the expiration times of the source<\/li>\n<li>Provide the latest data including streaming tables<\/li>\n<li>Faster than normal views<\/li>\n<li><strong>Smart tuning<\/strong>\n<ul>\n<li>If a query or subquery on a table can be served with an already created Materialized View, BQ reroutes the query to use Materialized view for efficiency.<\/li>\n<li>SELECT name, SUM(count) as total FROM `proj.ds.table` GROUP BY name\n<ul>\n<li>If this or a subquery of this can use a materialized view (mv), it will<\/li>\n<\/ul>\n<\/li>\n<li>Uses less bytes<\/li>\n<\/ul>\n<\/li>\n<li>Use Cases\n<ul>\n<li>ETL or BI pipelines<\/li>\n<li>Highly computational queries like aggregations<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Views vs. Materialized Views (Not part of the class)<\/h3>\n<h4>Views<\/h4>\n<ul>\n<li>A short\/simplified version of the query used to create the view.<\/li>\n<li>Every time you run a query against the view, you are actually running the query that created the view in the background, then running your query against those results.<\/li>\n<li>Your data is always up to date<\/li>\n<li>If the &#8220;creation query&#8221; is <strong>expensive or slow<\/strong>, so will all queries against the view.<\/li>\n<li>You can create a View in a different dataset than the original table(s)\n<ul>\n<li>This allows you to set different access permissions on the View<\/li>\n<li>Example: You can grant access to the View while blocking access to the actual table<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Materialized Views<\/h3>\n<ul>\n<li>Also a short\/simplified version of the query used to create the view.<\/li>\n<li>Materialized Views are actual tables created using the &#8220;create query&#8221;<\/li>\n<li>Queries against the MV are ran against the table created, so are <strong>faster<\/strong> than Views<\/li>\n<li>Your data can get stale if the MV are not refreshed<\/li>\n<li>You cannot create a MV in a different dataset than the original tables.<\/li>\n<li>You can use Partitioned Tables and Clustering<\/li>\n<\/ul>\n<h2>52. Create a Materialized View<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22758507#overview<\/p>\n<p>Note: Materialized Views are still in Beta and features and functionality are still under development<\/p>\n<h3>Create the view<\/h3>\n<ul>\n<li>Standard SQL\n<ul>\n<li>Cannot use Table Joins<\/li>\n<li>Limited set of aggregation functions\n<ul>\n<li>APPROX_COUNT_DISTINCT<\/li>\n<li>ARRAY_AGG<\/li>\n<li>HLL_COUNT.INIT<\/li>\n<li>MAX, MIN, SUM, COUNT, AVG<\/li>\n<\/ul>\n<\/li>\n<li>Stacked functions are not supported\n<ul>\n<li>ROUND(AVG(count))<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><code>CREATE MATERIALIZED VIEW proj.ds.table_mv as SELECT name, SUM(count) as total FROM proj.ds.table group by name<\/code>\n<ul>\n<li>I don&#8217;t see any back-tics on the table names&#8230;.<\/li>\n<\/ul>\n<\/li>\n<li>You cannot use `HAVING` clause.\u00a0 To use this, use it on the MV.<\/li>\n<li>UNLESS Auto-refresh is disabled, an auto-refresh will start running as soon as the initial build query runs.<\/li>\n<li>Once it completes creation, it will appear in the table list.<\/li>\n<li>You can create Partitioned and Clustered MVs<\/li>\n<\/ul>\n<pre>CREATE MATERIALIZED VIEW \r\n proj.ds.mv \r\n PARTITION BY date AS\r\nSELECT \r\n name, \r\n SUM(count) as total,\r\n _PARTITIONDATE as date\r\nFROM \r\n proj.ds.partitioned_table \r\n WHERE _PARTITONDATE = \"YYYY_MM_DD\"\r\nGROUP BY\r\n 1,3<\/pre>\n<ul>\n<li>1, 3 refers to the position of the columns being returned<\/li>\n<\/ul>\n<h3>Smart Tuning Demonstration<\/h3>\n<ul>\n<li>Locate the code for the MV\n<ul>\n<li>MV &gt; Details &gt; Scroll down to Query<\/li>\n<li>Copy\/paste the query into the editor and run it.\n<ul>\n<li>Note that the source table is the original table and not an MV<\/li>\n<\/ul>\n<\/li>\n<li>Once the query completes, click the &#8220;Execution Details&#8221; tab near &#8220;Results&#8221;<\/li>\n<li>Scroll down and expand a job. You&#8217;ll see it used the MV and not the original source table<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>To prevent smart tuning:<\/h4>\n<ul>\n<li>Add a where clause to the query<\/li>\n<\/ul>\n<pre>SELECT \r\n name, \r\n SUM(code) as total_count\r\nFROM \r\n proj.ds.table\r\n  WHERE rand() &lt; 1\r\nGROUP BY \r\n name<\/pre>\n<h2>53. ALTER a Materialized View<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22759071?start=0#overview<\/p>\n<h3>Allowed<\/h3>\n<ul>\n<li>CREATE MATERIALIZED VIEW&#8230;<\/li>\n<li>ALTER MATERIALIZED VIEW `proj.ds.mv` SET OPTIONS&#8230;\n<ul>\n<li>(enable_refresh = true)<\/li>\n<\/ul>\n<\/li>\n<li>DROP MATERIALIZED VIEW `proj.ds.mv`<\/li>\n<\/ul>\n<p>No others are allowed<\/p>\n<h3>Not allowed<\/h3>\n<ul>\n<li>Running COPY, IMPORT or EXPORT jobs where either the source or the destination is a materialized view<\/li>\n<li>Writing query results into a materialized view<\/li>\n<li>Using the BQ Storage API<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>54. Design an optimized query for Materialized View<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22759107#overview<\/p>\n<h3>How to design a query such that it always hits the MV<\/h3>\n<h4>Sub queries are supported<\/h4>\n<p>&nbsp;<\/p>\n<pre>CREATE MATERIALIZED VIEW\r\n  proj.ds.mv\r\nAS SELECT\r\n  name, \r\n  total_count\r\nFROM (\r\n  SELECT\r\n    name,\r\n    count(*) as total_count\r\n  FROM\r\n    proj.ds.table\r\n  GROUP BY\r\n    name\r\n)<\/pre>\n<h4>Use the grouping keys from the MV<\/h4>\n<ul>\n<li>Notice how you are actually calling the Table, not the Materialized View<\/li>\n<li>The field &#8216;total_count&#8217; does not exist on the table, so it will use the Materialized View instead.<\/li>\n<\/ul>\n<pre>SELECT\r\n  name,\r\n  total_count\r\nFROM\r\n  proj.ds.table<\/pre>\n<h4>Performing computation on grouping keys<\/h4>\n<pre>SELECT\r\n  name,\r\n  AVG(total_count) as ave_count\r\nFROM\r\n  proj.ds.table<\/pre>\n<h4>Ad-hoc query contains filters on grouping columns OR direct filters of MV definition<\/h4>\n<p><strong>Create MV<\/strong><\/p>\n<pre>CREATE MATERIALIZED VIEW\r\n  proj.ds.mv\r\nAS SELECT\r\n  DATE(shipdate) as shipdate,\r\n  SUM(price) as sum_price\r\nFROM \r\n  proj.ds.table\r\nWHERE\r\n  shipping_mode = \"land\"\r\nGROUP BY 1<\/pre>\n<p><strong>Ad-Hoc Query<\/strong><\/p>\n<pre>SELECT\r\n  shipdate,\r\n  SUM(price) as sum_price\r\nFROM\r\n  proj.ds.table\r\nWHERE\r\n  [optional filter AND ]\r\n  shipping_mode = \"land\"\r\nGROUP BY\r\n  1<\/pre>\n<h4>Ad-hoc query contains a filter which is a subset filter of the MV definition<\/h4>\n<ul>\n<li>While the filter is not exactly the same, it is a subset of the Create view.<\/li>\n<\/ul>\n<p><strong>Create MV<\/strong><\/p>\n<pre>CREATE MATERIALIZED VIEW\r\n  proj.ds.mv\r\nAS SELECT\r\n  price,\r\n  COUNT(*) as cnt\r\nFROM \r\n  proj.ds.table\r\nWHERE\r\n  price IS NOT NULL\r\nGROUP BY 1<\/pre>\n<p><strong>Ad-hoc Query<\/strong><\/p>\n<pre>SELECT\r\n  price,\r\n  COUNT(*) as cnt\r\nFROM\r\n  proj.ds.table\r\nWHERE\r\n  price &gt; 100\r\nGROUP BY 1<\/pre>\n<p>&nbsp;<\/p>\n<h2>55. Auto and Manual Refreshes of Materialized Views<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22766733#overview<\/p>\n<h3>Updates<\/h3>\n<ul>\n<li>If an update (alter) is performed prior to the next refresh\n<ul>\n<li>The portions that have changed will be invalidated and reread<\/li>\n<li>If on a partition, only the partition will be invalidated and reread<\/li>\n<\/ul>\n<\/li>\n<li>If only inserts\n<ul>\n<li>The changes are added to the materialized view<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Manual Refresh<\/h3>\n<ul>\n<li>REFRESH_MATERIALIZED_VIEW\n<ul>\n<li>BQ looks for the changes in the base table and applies those changes to the MV<\/li>\n<li><code>CALL BQ.REFRESH_MATERIALIZED_VIEW('proj.ds.mv_table_name')<\/code>\n<ul>\n<li>BQ is literal. It is not an abbreviation for &#8216;BigQuery&#8217; in this case<\/li>\n<li>The table name in the command is in single quotes, not back ticks.<\/li>\n<li>View &#8216;Details&#8217; to see the refresh time<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Automatic<\/h3>\n<ul>\n<li>By default, Materialized Views are updated within 5 minutes of changes to the base table\n<ul>\n<li>Not &#8216;Every 5 minutes&#8217;<\/li>\n<\/ul>\n<\/li>\n<li>If constant changes are made to the base table\n<ul>\n<li>Refreshes are performed per Refresh Interval (ms) which is default 1,800,000 ms (30 minutes)<\/li>\n<li>&#8216;Details&#8217;<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Setting the Refresh Interval<\/h3>\n<ul>\n<li>Minimum refresh cap = 1 minute<\/li>\n<li>Maximum = 7 days<\/li>\n<\/ul>\n<p><strong>At time of creation<\/strong><\/p>\n<pre>CREATE MATERIALIZED VIEW\r\n  proj.ds.mv\r\nOPTIONS (enable_refresh = true, refresh_interval_minutes = 60)\r\nAS SELECT ...\r\n<\/pre>\n<p><strong>Alter an existing MV<\/strong><\/p>\n<pre>ALTER MATERIALIZED VIEW \r\n  `proj.ds.mv` \r\nSET OPTIONS (enable_refresh = true, refresh_interval_minutes = 60)<\/pre>\n<h3>Keeping the MV up to date<\/h3>\n<ul>\n<li>BQ ensures your query replies are ALWAYS up to date<\/li>\n<\/ul>\n<p><strong>Appends Only<\/strong><\/p>\n<ol>\n<li>Fetch all materialized data<\/li>\n<li>Read the delta<\/li>\n<li>Adjust the aggregations as per delta<\/li>\n<li>Show results<\/li>\n<\/ol>\n<p><strong>Updates or Deletions<\/strong><\/p>\n<ul>\n<li>Materialized View might not be scanned<\/li>\n<li>Query will hit the base table only<\/li>\n<li>You might have to wait until the next refresh before the queries start using the MV again.<\/li>\n<\/ul>\n<h2>56. Limitations and Quotas of Materialized Views<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22766737#overview<\/p>\n<ul>\n<li>You cannot copy a MV either as a source or destination of a copy job<\/li>\n<li>Cannot export data<\/li>\n<li>Cannot load data into MVs using load query<\/li>\n<li>Cannot write data to it &#8211; READ ONLY<\/li>\n<li>No DML statements<\/li>\n<li>SHOULD be in the same dataset location<\/li>\n<li>Supports limited aggregation functions in query definitions<\/li>\n<li>Only 1 reference table.\u00a0 No joins during creation<\/li>\n<li>Cannot be nested<\/li>\n<li>Only Standard SQL<\/li>\n<li>Maximum 20 MVs per base table<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>57. Best Practices<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22766743#overview<\/p>\n<ul>\n<li>Do not create Materialized views for every permutation of a query\n<ul>\n<li>Create them for broader use.<\/li>\n<\/ul>\n<\/li>\n<li>If your users often filter on a date range, add a date range filter on your query that includes that set of dates.<\/li>\n<li>if the base table is large and partitioned, your MV should be partitioned as well.\n<ul>\n<li>If the MV is about the same size or larger than a partition, then partition the MV<\/li>\n<\/ul>\n<\/li>\n<li>Refreshes are expensive.\u00a0 Do some analysis to determine the necessary refresh interval\n<ul>\n<li>if the base table is updated relatively infrequently, then refresh less often.<\/li>\n<\/ul>\n<\/li>\n<li>Batch DML statements and manual refresh your MVs after.<\/li>\n<\/ul>\n<h3>Work around for Joins on Materialized Views<\/h3>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3797\" src=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join.png\" alt=\"\" width=\"931\" height=\"481\" srcset=\"https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join.png 931w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join-300x155.png 300w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join-768x397.png 768w, https:\/\/wiki.thomasandsofia.com\/wp-content\/uploads\/2022\/02\/mv-join-150x77.png 150w\" sizes=\"auto, (max-width: 931px) 100vw, 931px\" \/><\/a><\/p>\n<ol>\n<li>Create the MV on the aggregated view<\/li>\n<li>Run your query as shown<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Main Menu 50% Complete!! Section 12: Materialized Views in BigQuery 51. What are Materialized Views https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22758503#overview Materialized Views Stored physically on disk These do occupy space and you will be charged They do not rely on the base table unless you re-run or refresh the view query Must be in the same Dataset at the ..<\/p>\n<div class=\"clear-fix\"><\/div>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/?p=3788\" title=\"read more...\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,78],"tags":[],"class_list":["post-3788","post","type-post","status-publish","format-standard","hentry","category-bigquery","category-bigquery-for-big-data-engineers"],"_links":{"self":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3788","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3788"}],"version-history":[{"count":6,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3788\/revisions"}],"predecessor-version":[{"id":3798,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3788\/revisions\/3798"}],"wp:attachment":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3788"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3788"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}