{"id":3731,"date":"2022-02-08T00:24:23","date_gmt":"2022-02-08T00:24:23","guid":{"rendered":"https:\/\/wiki.thomasandsofia.com\/?p=3731"},"modified":"2022-02-08T00:24:56","modified_gmt":"2022-02-08T00:24:56","slug":"bqfbd-using-bigquery-dashboard-options","status":"publish","type":"post","link":"https:\/\/wiki.thomasandsofia.com\/?p=3731","title":{"rendered":"BQFBD &#8211; Using BigQuery Dashboard Options"},"content":{"rendered":"<p><a href=\"\/bigquery-for-big-data\/\">Main Menu<\/a><\/p>\n<h2>Section 4: Using BigQuery Dashboard options<\/h2>\n<h2>19. Running queries with various query settings<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22729977#questions<\/p>\n<h3>Select top 10 male names from the table<\/h3>\n<p><code>select name, count from `PROJECT.DATASET.TABLE` where gender = \"M\" order by count desc limit 10<br \/>\n<\/code><\/p>\n<p>Notes:<\/p>\n<ul>\n<li>&#8220;PROJECT.DATASET.TABLE&#8221; are enclosed in back tics (below the ~). Often these get converted to single quotes, so double check to make sure your query doesn&#8217;t fail.<\/li>\n<li>quantum-balm-340112.troberts_dataset_oo1.names<\/li>\n<\/ul>\n<h4>Query results<\/h4>\n<ul>\n<li>Query complete (0.3 sec elapsed, 36.5 KB processed)\n<ul>\n<li>This query would result in some billing because data was processed.<\/li>\n<\/ul>\n<\/li>\n<li>Running the query again will result in 0.0 sec with no bytes processed because the result was cached.\n<ul>\n<li>No billing on a cached reply!<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>More Options<\/h3>\n<h4>Format<\/h4>\n<ul>\n<li>Reformats your query string into a python-ish format, indenting the variables from the commands.<\/li>\n<\/ul>\n<pre>SELECT\r\n  name,\r\n  count\r\nFROM\r\n  `PROJECT.DATASET.TABLE`\r\nWHERE\r\n  ...<\/pre>\n<h4>Query Settings<\/h4>\n<p><strong>Query engine<\/strong><\/p>\n<ul>\n<li>BigQuery engine\n<ul>\n<li>BigQuery engine expects and processes the table data in form of batches.<\/li>\n<li>This is because the table already existed<\/li>\n<\/ul>\n<\/li>\n<li>Cloud Dataflow engine\n<ul>\n<li>Used on streaming data, such as data from Pub\/Sub.<\/li>\n<li>This will be covered in depth later.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Destination<\/strong><\/p>\n<ul>\n<li>By default, BQ saves all query results to a table, temp or permanent<\/li>\n<li>Temporary table\n<ul>\n<li>Randomly named saved in a special dataset<\/li>\n<li>Not available for sharing, querying and are not visible to users<\/li>\n<li>You are NOT charged for storing temporary tables<\/li>\n<li>Responsible for caching the query results<\/li>\n<li>Saved for ~24 hours<\/li>\n<\/ul>\n<\/li>\n<li>Permanent tables\n<ul>\n<li>Can be new or existing tables in any dataset that you have access to.\n<ul>\n<li>Provide the Project name, dataset name or table name<\/li>\n<\/ul>\n<\/li>\n<li>The table will automatically be created if it does not already exist<\/li>\n<li>Results will be stored in the table with the selected Write preference\n<ul>\n<li>Write if empty<\/li>\n<li>Append to table<\/li>\n<li>Overwrite table<\/li>\n<\/ul>\n<\/li>\n<li>You WILL be charged for this storage<\/li>\n<li>The table being queried must be in the same location as the dataset that contains the destination table.<\/li>\n<\/ul>\n<\/li>\n<li>Results size\n<ul>\n<li>Over-rides the maximum bytes the query can return\n<ul>\n<li>Default ~10GB<\/li>\n<\/ul>\n<\/li>\n<li>Checking this field will allow any number of bytes to be returned\/written to the destination table.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Resource Management<\/strong><\/p>\n<ul>\n<li>Job priority\n<ul>\n<li>Interactive (default)\n<div class=\"flex-none\"><\/div>\n<ul>\n<li>Start running immediately &#8211; basically on-demand<\/li>\n<\/ul>\n<\/li>\n<li>Batch\n<ul>\n<li>Start as soon as idle resources are available in the BQ shared resource pool which usually occurs within <strong>a few minutes<\/strong>. (WHAT???)<\/li>\n<li>Do not have concurrent query rate limits.\n<ul>\n<li>You can run any number of queries in a single shot which are then queued and ran as soon as the resources are available in the shared pool.<\/li>\n<\/ul>\n<\/li>\n<li>If it hasn&#8217;t started within 24 hrs, BQ changes the job priority to interactive.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Cache preference\n<ul>\n<li>Use cached results (default) allows you to reuse the cached results<\/li>\n<li>This means query attempts to reuse the results from a previous run of the same query unless the tables being queried have changed.<\/li>\n<li>If you don&#8217;t want to use cached results, uncheck this box.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>20. Caching features &amp; limitations<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22731425#questions<\/p>\n<p><em><strong>*This is really a continuation of Lesson 19. Running queries with various query settings.<\/strong><\/em><\/p>\n<ul>\n<li>To retrieve cached data, the query must be exactly the same as a previous query<\/li>\n<li>Data is NOT cached:\n<ul>\n<li>When the Destination table is specified to store the results<\/li>\n<li>If the tables\/views used in the query have changed since the last cache.<\/li>\n<li>If the tables have streaming ingestion.<\/li>\n<li>If the query uses non-deterministic functions\n<ul>\n<li>Examples:\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>CURRENT_TIMESTAMP()<\/li>\n<li>NOW()<\/li>\n<li>CURRENT_USER()<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>If the query is ran against an external data source like BigTable or CloudStorage<\/li>\n<li>If the result set is greater than the maximum response size\n<ul>\n<li>10GB compressed by default.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>It is possible to bind queries to only look for cached results.\n<ul>\n<li>If none are found, these queries will fail.<\/li>\n<li>Very useful if you have a large team firing the same queries over and over.<\/li>\n<li>Not available in UI yet.<\/li>\n<li>Available via command line or Python client.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Additional Settings<\/strong><\/p>\n<ul>\n<li>SQL Dialect\n<ul>\n<li>https:\/\/learnsql.com\/blog\/what-sql-dialect-to-learn\/<\/li>\n<li>Standard\n<ul>\n<li>Used mostly<\/li>\n<\/ul>\n<\/li>\n<li>Legacy<\/li>\n<\/ul>\n<\/li>\n<li>Processing location\n<ul>\n<li>Where do you want to process the query?<\/li>\n<li>Keep Auto-select\n<ul>\n<li>Data is processed in the same location the data is stored.<\/li>\n<li>Attempting to change this will cause your query to fail&#8230;\n<ul>\n<li>Yup, this is a useless option.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Advanced Options<\/h4>\n<p><strong>Encryption<\/strong><\/p>\n<ul>\n<li>Google-managed key<\/li>\n<li>Customer-managed key<\/li>\n<\/ul>\n<p><strong>Maximum Bytes billed<\/strong><\/p>\n<ul>\n<li>Specify the maximum number of bytes you can pay for the query<\/li>\n<li>If it will require more than the specified number, the query will fail.\n<ul>\n<li>You will not be charged for these.<\/li>\n<\/ul>\n<\/li>\n<li>Test by setting your limit really low. \ud83d\ude42<\/li>\n<\/ul>\n<h2>21. Querying Wildcard Tables<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22729991#questions<\/p>\n<p>Additional Link: https:\/\/cloud.google.com\/bigquery\/docs\/querying-wildcard-tables<\/p>\n<ul>\n<li>Wildcard table queries allow you to query multiple tables using a wildcard.<\/li>\n<li>These are the union of all tables that match a wildcard expression.<\/li>\n<li>This works as long as you have multiple tables with similar names\n<ul>\n<li>This also requires that the tables have the same, or compatible schemas<\/li>\n<li>se_table_1, se_table_2, se_table_3 &#8230;<\/li>\n<li><code>select col1, col2 from `PROJECT.DATASET.se_table_*` <\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Example from BigQuery Example datasets<\/h3>\n<p><strong>NOAA GSOD<\/strong><\/p>\n<ul>\n<li>This dataset contains tables for each year from 1929 through today.\n<ul>\n<li>gsod1929<\/li>\n<li>gsod1930<\/li>\n<li>&#8230;<\/li>\n<li>gsod2022<\/li>\n<\/ul>\n<\/li>\n<li>We can query these all from the editor\n<ul>\n<li>Select the table, then click Preview in the bottom pane.<\/li>\n<\/ul>\n<\/li>\n<li>Without the ability to use wildcard tables this would be difficult to query\n<ul>\n<li><code>select col1, col2 from table_1 union select col1, col2 from table2 union select ...<\/code><\/li>\n<\/ul>\n<\/li>\n<li>Instead, we use wildcards<\/li>\n<\/ul>\n<pre>SELECT \r\n  max, \r\n  ROUND((max-32)*5\/9,1) celsius, \r\n  mo, da, year \r\nFROM \r\n  `bigquery-public-data.noaa_gsod.gsod195*` \r\nWHERE \r\n  max != 9999.9 #9999.9 represents missing data in these tables\r\nORDER BY\r\n  max DESC<\/pre>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>Note this query limited the data to the 1950&#8217;s<\/li>\n<li>Query took 8.1 seconds on 138.8MB data<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Using _TABLE_SUFFIX pseudo columns<\/h3>\n<ul>\n<li>Table wildcard queries include an automatic pseudo column for the results of the wildcard.<\/li>\n<li>These can then be filtered in the <code>WHERE<\/code> clause<\/li>\n<\/ul>\n<pre>WHERE\r\n  max != 9999.9\r\n  AND (\r\n    _TABLE_SUFFIX = '1' OR_TABLE_SUFFIX = '2' OR_TABLE_SUFFIX = '3'\r\n  )\r\nORDER BY\r\n  ...<\/pre>\n<ul>\n<li>You can also use the BETWEEN operator here\n<ul>\n<li>`_TABLE_SUFFIX BETWEEN &#8216;1&#8217; AND &#8216;3&#8217;`<\/li>\n<\/ul>\n<\/li>\n<li>Using Table Suffixes will reduce the number of queries scanned, reducing costs.<\/li>\n<li>Filters on _TABLE_SUFFIX that include subqueries querying datasets cannot be used to limit the number of tables scanned for a wildcard table.<\/li>\n<\/ul>\n<h4>Basic filter<\/h4>\n<pre>SELECT max FROM `bigquery-public-data.noaa_gsod.gsod19*` WHERE _TABLE_SUFFIX = '49'<\/pre>\n<ul>\n<li>3.4MB Data<\/li>\n<\/ul>\n<h4>Using a subquery<\/h4>\n<pre>SELECT max \r\nFROM `bigquery-public-data.noaa_gsod.gsod19*` \r\nWHERE \r\n  _TABLE_SUFFIX = (SELECT '49' AS A FROM <code>bigquery-public-data.noaa_gsod.gsod1949<\/code>)<\/pre>\n<ul>\n<li>634.2MB Data<\/li>\n<li>This scan limits the search to tables 1900 &#8211; 1999 only!<\/li>\n<\/ul>\n<p><strong>This, however, works:<\/strong><\/p>\n<pre>SELECT max \r\nFROM `bigquery-public-data.noaa_gsod.gsod19*` \r\nWHERE \r\n_TABLE_SUFFIX = (SELECT 49 FROM (SELECT '49' AS a))<\/pre>\n<p><span style=\"color: #ff0000;\">??<\/span><\/p>\n<h2>22. Wildcard Table Limitations<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22734259#questions<\/p>\n<ul>\n<li>These only support native BigQuery storage\n<ul>\n<li>External tables and views are not allowed<\/li>\n<\/ul>\n<\/li>\n<li>Cached results are not supported.\n<ul>\n<li>You will be billed for each run.<\/li>\n<\/ul>\n<\/li>\n<li>Queries that use DML statements cannot use wildcard tables as the target\n<ul>\n<li>You cannot insert or update a wildcard table.<\/li>\n<li>Use the actual table name!<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>How BQ determines the schema<\/h3>\n<ul>\n<li>When executing an SQL query that uses a wildcard table, BQ automatically infers the schema of the most <strong>recently created<\/strong> wildcard table as the schema for ALL wildcard tables\n<ul>\n<li>Example:\n<ul>\n<li>Create tables <code>exam1<\/code>, <code>exam2<\/code>, <code>exam3<\/code><\/li>\n<li>Now create table <code>exam4<\/code><\/li>\n<\/ul>\n<\/li>\n<li>BQ will use the schema from <code>exam4<\/code> because it was the most recently created table.<\/li>\n<\/ul>\n<\/li>\n<li>If the schema of any of the tables does not include the columns in the most recent table, the query will throw an error\n<ul>\n<li>I think a better way of saying this is, if the query requests data from a column that does not exist in the schema of the most recent table, the query will fail\n<ul>\n<li>exam1 = &#8216;day&#8217;, &#8216;mo&#8217;<\/li>\n<li>exam4 = &#8216;day&#8217;, &#8216;year&#8217;<\/li>\n<li>Query for &#8216;mo&#8217; will fail, but query for &#8216;year&#8217; should be OK.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Long table prefixes perform better than shorter ones<\/li>\n<\/ul>\n<p>Long Prefix: Better performance<\/p>\n<pre>SELECT max FROM `PROJECT.DATASET.TABLE_NAME_195*` WHERE _TABLE_SUFFIX BETWEEN '0' AND '1'<\/pre>\n<p>Short Prefix: Worse performance<\/p>\n<pre>SELECT max FROM `PROJECT.DATASET.TABLE_NAME_*` WHERE _TABLE_SUFFIX BETWEEN '1950' AND '1951'<\/pre>\n<h2>23. Schedule, Save, Share a Query<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22730003?start=0#questions<\/p>\n<ol>\n<li>Type the &#8216;standard&#8217; query into the Query Editor\n<ul>\n<li>Can include DDLs and DMLs\n<ul>\n<li>DDL: Data Definition Language\n<ul>\n<li>CREATE, ALTER, DROP, TRUNCATE<\/li>\n<\/ul>\n<\/li>\n<li>DML: Data Manipulation Language\n<ul>\n<li>SELECT, INSERT, UPDATE, DELETE<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Click Schedule Query\n<ul>\n<li>Note: Requires Transfer Service, so must be enabled!\n<ul>\n<li>Search transfer api\n<ul>\n<li>Select &#8220;Marketplace: BigQuery Transfer API&#8221;<\/li>\n<li>Click [ Enable ]<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Enable scheduled queries<\/li>\n<li>Update scheduled query<\/li>\n<li>Create new scheduled query\n<ul>\n<li><strong>Details and schedule<\/strong>\n<ul>\n<li>Name for the schedule query<\/li>\n<\/ul>\n<\/li>\n<li><strong>Schedule options<\/strong>\n<ul>\n<li>Repeats: Hourly, Daily, Weekly, Monthly, Custom (cron style), On demand\n<ul>\n<li>Shortest allowed period is 15 min.<\/li>\n<\/ul>\n<\/li>\n<li>Start date and run time<\/li>\n<\/ul>\n<\/li>\n<li>Destination for query results\n<ul>\n<li>DDL and DML queries do not require this since you&#8217;re updating the current table<\/li>\n<li>Destination must be in same Region as source table<\/li>\n<\/ul>\n<\/li>\n<li>Notification options\n<ul>\n<li>Emails are sent only to the user that created the schedule.<\/li>\n<li>This cannot be changed.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h4>Back Fill<\/h4>\n<ul>\n<li>used to run specific or missed queries for a speficied date range.<\/li>\n<li>Times listed will be in UTC, so you&#8217;ll need to enter carefully<\/li>\n<\/ul>\n<h4>Save Results<\/h4>\n<ul>\n<li>After running the query, you can [Save results]\n<ul>\n<li>CSV:Google drive or local<\/li>\n<li>JSON: Google or local<\/li>\n<li>BQ Table<\/li>\n<li>Google sheets<\/li>\n<li>Clipboard<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4>Save Query<\/h4>\n<ul>\n<li>Visible only to creater<\/li>\n<li>Visible to project members with appropriate permissions.<\/li>\n<\/ul>\n<h4>Deleting Data<\/h4>\n<ul>\n<li>If you don&#8217;t delete, eventually you will be charged.<\/li>\n<li>Click the dataset, then delete.<\/li>\n<\/ul>\n<h2>24. Schema Auto Detection<\/h2>\n<p>https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22730011#questions<\/p>\n<ul>\n<li>Only available for CSV and JSON file format\n<ul>\n<li>Other formats have the schema included, so it&#8217;s not required.<\/li>\n<\/ul>\n<\/li>\n<li>When auto detection is enabled\n<ul>\n<li>Selects a random file in the source<\/li>\n<li>Scans up to 100 rows as a sample<\/li>\n<li>Attempts to assign data type based on sample values.<\/li>\n<\/ul>\n<\/li>\n<li>After detection, it may change a field name to make it compatible with BQ SQL syntax<\/li>\n<li>Auto detect recognizes\n<ul>\n<li>Gzip compatible file compression<\/li>\n<li>Commas, Pipes, Tab delimeters<\/li>\n<li>Headers from the file based on other rows in the dataset\n<ul>\n<li>if the first line contains only strings, but other have numberic&#8230;<\/li>\n<li>it will use the header values if it can<\/li>\n<\/ul>\n<\/li>\n<li>End of line, unless the &#8220;\\n&#8221; is enquoted.<\/li>\n<li>Dates must be YYYY-MM-DD or it will just interpret as a string<\/li>\n<li>Timestamps are pretty flexable\n<ul>\n<li>yyyy-mm-dd hh:mm:ss, yyyy-mm-dd hh:mm:ss.mm, yyyy-mm-dd<\/li>\n<\/ul>\n<\/li>\n<li>Default nullable fields<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Always Confirm the Auto Detect was accurate and if not, fix it!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Main Menu Section 4: Using BigQuery Dashboard options 19. Running queries with various query settings https:\/\/www.udemy.com\/course\/bigquery\/learn\/lecture\/22729977#questions Select top 10 male names from the table select name, count from `PROJECT.DATASET.TABLE` where gender = &#8220;M&#8221; order by count desc limit 10 Notes: &#8220;PROJECT.DATASET.TABLE&#8221; are enclosed in back tics (below the ~). Often these get converted to single ..<\/p>\n<div class=\"clear-fix\"><\/div>\n<p><a href=\"https:\/\/wiki.thomasandsofia.com\/?p=3731\" title=\"read more...\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[77,78],"tags":[],"class_list":["post-3731","post","type-post","status-publish","format-standard","hentry","category-bigquery","category-bigquery-for-big-data-engineers"],"_links":{"self":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3731","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3731"}],"version-history":[{"count":2,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3731\/revisions"}],"predecessor-version":[{"id":3734,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=\/wp\/v2\/posts\/3731\/revisions\/3734"}],"wp:attachment":[{"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiki.thomasandsofia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}