{"id":3768,"date":"2022-02-11T00:36:34","date_gmt":"2022-02-11T00:36:34","guid":{"rendered":"https:\/\/wiki.thomasandsofia.com\/?p=3768"},"modified":"2022-02-12T16:14:34","modified_gmt":"2022-02-12T16:14:34","slug":"bqfbd-clustered-tables-in-bigquery","status":"publish","type":"post","link":"https:\/\/wiki.thomasandsofia.com\/?p=3768","title":{"rendered":"BQFBD &#8211; Clustered Tables in BigQuery"},"content":{"rendered":"<p><a href=\"\/bigquery-for-big-data\/\">Main Menu<\/a><\/p>\n<h1>Section 9: Clustered Tables in BigQuery<\/h1>\n<h2>40. What is Clustering<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22755755#questions<\/p>\n<ul>\n<li>Clustered tables\n<ul>\n<li>aka Bucketing in Hive<\/li>\n<\/ul>\n<\/li>\n<li>Further break down Partitioned Tables<\/li>\n<li>Work best for very large unpartitioned databases, or very large partitions.<\/li>\n<li>When a table or partitions is bucketed on a column, all data in that row will be moved to that bucket<\/li>\n<li>Buckets can be grouped similar to Integer partitions.\u00a0 You can specify more than one value for the Bucket column.<\/li>\n<li>Use filters on both the Partition and Bucket columns to significantly reduce the number of rows a query has to scan.<\/li>\n<li>If the table\/partition is &lt; 1 GB, then clustering isn&#8217;t required.<\/li>\n<li>Clustering can be done on a standard table or a partitioned table<\/li>\n<li>Table Options\n<ul>\n<li>Standard, non-partitioned, non clustered table<\/li>\n<li>Partitioned Table<\/li>\n<li>Clustered Table<\/li>\n<li>Partitioned + Clustered Table<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Cluster Storage<\/h3>\n<ul>\n<li>Partitions are like Directories\n<ul>\n<li>Buckets are physical files stored in those directories<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>41. When to use Clustering, Partitioning, or Both<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22755771#questions<\/p>\n<ul>\n<li>Partitioning Alone\n<ul>\n<li>Need strict cost guarantees before running the query<\/li>\n<li>Need to manage Partition level Expirations<\/li>\n<li>Need to manage Partition level DML operations<\/li>\n<\/ul>\n<\/li>\n<li>Partitioning with Clusters\n<ul>\n<li>No need of strict cost guarantees before running the query<\/li>\n<li>Very large Partitions<\/li>\n<li>Frequent filter and aggregate queries on the table<\/li>\n<li>BQ automatically sorts the data in the table based on clustering columns<\/li>\n<\/ul>\n<\/li>\n<li>Clustering alone\n<ul>\n<li>Data is unique<\/li>\n<li>Average partition size is &gt; 1GB\n<ul>\n<li>(He actually said &lt; 1GB here, so which is it?)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>42. Create Clustered Table<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22756007#questions<\/p>\n<ul>\n<li>Schema\n<ul>\n<li>name string<\/li>\n<li>surname string<\/li>\n<li>id string<\/li>\n<li>count float<\/li>\n<li>date date<\/li>\n<\/ul>\n<\/li>\n<li>Partitioning\n<ul>\n<li>date<\/li>\n<\/ul>\n<\/li>\n<li>Clustering order\n<ul>\n<li>name, surname<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>43, Dos and Don&#8217;ts for Clustering<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22756029#questions<\/p>\n<ul>\n<li>Create clusters in the order that the fields are created for optimal performance\n<ul>\n<li>Example: name, surname NOT surname, name<\/li>\n<\/ul>\n<\/li>\n<li>Do not use clustered columns in complex queries.\u00a0 (Same as Partitioned Tables)\n<ul>\n<li>Example: Don&#8217;t do this: <code>select * from `proj.ds.table` where cast(count as string) = \"5\"<\/code><\/li>\n<\/ul>\n<\/li>\n<li>Do not compare cluster columns to other columns\n<ul>\n<li>Same as Partitioned tables<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Quotas and Limitations<\/h3>\n<ul>\n<li>Only Standard SQL is supported for querying and writing<\/li>\n<li>Can modify\/remove clustering columns, but it will be for new data only\n<ul>\n<li>I guess you would need to delete all the data in the cluster to completely remove it.<\/li>\n<\/ul>\n<\/li>\n<li>Clustering must be top-level columns only\n<ul>\n<li>DATE, Boolean, Geography, Int64, Numeric, String, Timestamp<\/li>\n<li>No Struct or Array fields<\/li>\n<li>Same as Partition Tables<\/li>\n<\/ul>\n<\/li>\n<li>Max 4 clustering columns.<\/li>\n<li>All quotas that apply to Partitioned Tables apply to clustering<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Main Menu Section 9: Clustered Tables in BigQuery 40. What is Clustering https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22755755#questions Clustered tables aka Bucketing in Hive Further break down Partitioned Tables Work best for very large unpartitioned databases, or very large partitions. When a table or partitions is bucketed on a column, all data in that row will be moved to that ..<\/p>\n<div class=\"clear-fix\"><\/div>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/?p=3768\" title=\"read more...\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,78],"tags":[],"class_list":["post-3768","post","type-post","status-publish","format-standard","hentry","category-bigquery","category-bigquery-for-big-data-engineers"],"_links":{"self":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3768","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3768"}],"version-history":[{"count":4,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3768\/revisions"}],"predecessor-version":[{"id":3773,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3768\/revisions\/3773"}],"wp:attachment":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3768"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3768"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3768"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}