BQFBD – Operations on Datasets & Tables

  BigQuery, BigQuery for Big Data Engineers

Main Menu

Section 6: BQFBD – Operations on Datasets & Tables

27. Copying Datasets

https://www.udemy.com/course/bigquery/learn/lecture/22736223#questions

Access

  • You cannot grant permissions on:
    • Tables
    • views
    • row level
  • You can grant permissions on
    • dataset level
    • columns
      • BQ Column-level Security
  • Granting Access
    • Dataset > Share dataset
    • Enter the user or group and select the role

Copying Datasets

  • Once created, you cannot rename nor move a dataset
    • To do these, you need to copy it.
    • Requires BQ Data Transfer Service
      • API and Services > Enable API & Services
      • Search BigQuery Data Transfer API
        • This needs to be enabled on a Per-Project basis???
  • Create a new dataset with a new name and new location
  • Click the Source Dataset > Copy Dataset
    • Select the Project and dataset name
    • Check [X] Overwrite destination tables if there is existing data that needs to be overwritten.
    • [ Copy ]
  • This was scheduled for me in UTC-6 time.  Not sure when it will complete.
    • Note: was completed next time I logged in.

28. Transfer Service for scheduling Copy Jobs

https://www.udemy.com/course/bigquery/learn/lecture/22738771?start=0#questions

  • You can use Transfer Service as an automated backup service.
  • GCP compresses the data before sending, so you should be billed for less than the total size of your dataset.
  • Limitations
    • In Region
      • 1000 copy jobs per destination table per day
      • 100,000 per project per day (including failures)
      • Source dataset can have max 20,000 tables
        • All can be copied in parallel
    • Cross Region
      • 100 per dest. table
      • 2,000 per project / day (including failures)
      • Source dataset can have max 20,000 tables
        • 1000 can be copied in parallel

29. Native operations on Table for Schema change

https://www.udemy.com/course/bigquery/learn/lecture/22738893#questions

There are 2 types of Schema changes.  Native and Manual

Table metadata changes

  • Table > Details
    • Description
    • Labels (Tags?)
    • Expiration date/time
  • Copy / Delete

Table Schema changes

  • Native Schema
    • Only 2 Native schema modifications
      1. Adding new columns to a schema definition
      2. Relaxing a columns mode from Required to Nullable
        • Changing to Nullable cannot be undone.
    • Changing column names and column types are unsupported
    • Table > Schema > [Edit Schema]
      • You cannot add REQUIRED columns.
        • Only Nullable and Repeated
      • Add data using a DML statement or load new data into the table.
        • Create a new table with the same table name and upload your data.
      • If you appending to an existing table, you’ll need to manually recreate the schema

30. Manual operations on Tables

https://www.udemy.com/course/bigquery/learn/lecture/22738927#questions

Changing column names

  • Write a query to copy the data to an alias
select * EXCEPT(SOURCE_COLUMN), SOURCE_COLUMN as NEW_NAME from `proj.dataset.table`
  • Using query settings, when you run the query, set the destination as the name of the source table.
    • More > Query settings
  • This is expensive, but preserves all the existing data.
    • Conserve costs by recreating the whole table with the desired column names.

Changing data types

  • Use a query to cast the data type while overwriting the same table.
select * EXCEPT(SOURCE_COLUMN), CAST(SOURCE_COLUMN as NEW_TYPE) from `proj.ds.table`
  • You cannot convert strings to Boolean.
  • use the same method for overwriting the table

Deleting a column

select * EXCEPT(COLUMN_TO_DELETE) from `proj.ds.table`

Changing a columns mode (Nullable, Required…)

  1. Export the table to Cloud Storage
  2. Create a new table with the correct mode (schema)
  3. Load the data to the new table.
  • You will not be charged for the Export Job nor the Load job!
    • Export and Load jobs are free
    • You will be charged for storage, but this should be minimal.
      • Don’t forget to delete the export after the load completes.

Restoring Deleted Data

  • You can restore deleted data of a table up to 7 days, at any point in time
    • Table decorators
    • SYSTEM_TIME AS OF in the FROM clause

Example

Delete all rows from a table

delete from `proj.ds.table` where 1=1

Restore the data

  • Make sure you store the results in a table!
    • It’s OK to use the same table using overwrite
select * from `proj.ds.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

 

 

LEAVE A COMMENT