Section 6: BQFBD – Operations on Datasets & Tables
27. Copying Datasets
https://www.udemy.com/course/bigquery/learn/lecture/22736223#questions
Access
- You cannot grant permissions on:
- Tables
- views
- row level
- You can grant permissions on
- dataset level
- columns
- BQ Column-level Security
- Granting Access
- Dataset > Share dataset
- Enter the user or group and select the role
Copying Datasets
- Once created, you cannot rename nor move a dataset
- To do these, you need to copy it.
- Requires BQ Data Transfer Service
- API and Services > Enable API & Services
- Search BigQuery Data Transfer API
- This needs to be enabled on a Per-Project basis???
- Create a new dataset with a new name and new location
- Click the Source Dataset > Copy Dataset
- Select the Project and dataset name
- Check [X] Overwrite destination tables if there is existing data that needs to be overwritten.
- [ Copy ]
- This was scheduled for me in UTC-6 time. Not sure when it will complete.
- Note: was completed next time I logged in.
28. Transfer Service for scheduling Copy Jobs
https://www.udemy.com/course/bigquery/learn/lecture/22738771?start=0#questions
- You can use Transfer Service as an automated backup service.
- GCP compresses the data before sending, so you should be billed for less than the total size of your dataset.
- Limitations
- In Region
- 1000 copy jobs per destination table per day
- 100,000 per project per day (including failures)
- Source dataset can have max 20,000 tables
- All can be copied in parallel
- Cross Region
- 100 per dest. table
- 2,000 per project / day (including failures)
- Source dataset can have max 20,000 tables
- 1000 can be copied in parallel
- In Region
29. Native operations on Table for Schema change
https://www.udemy.com/course/bigquery/learn/lecture/22738893#questions
There are 2 types of Schema changes. Native and Manual
Table metadata changes
- Table > Details
- Description
- Labels (Tags?)
- Expiration date/time
- Copy / Delete
Table Schema changes
- Native Schema
- Only 2 Native schema modifications
- Adding new columns to a schema definition
- Relaxing a columns mode from Required to Nullable
- Changing to Nullable cannot be undone.
- Changing column names and column types are unsupported
- Table > Schema > [Edit Schema]
- You cannot add REQUIRED columns.
- Only Nullable and Repeated
- Add data using a DML statement or load new data into the table.
- Create a new table with the same table name and upload your data.
- If you appending to an existing table, you’ll need to manually recreate the schema
- You cannot add REQUIRED columns.
- Only 2 Native schema modifications
30. Manual operations on Tables
https://www.udemy.com/course/bigquery/learn/lecture/22738927#questions
Changing column names
- Write a query to copy the data to an alias
select * EXCEPT(SOURCE_COLUMN), SOURCE_COLUMN as NEW_NAME from `proj.dataset.table`
- Using query settings, when you run the query, set the destination as the name of the source table.
- More > Query settings
- This is expensive, but preserves all the existing data.
- Conserve costs by recreating the whole table with the desired column names.
Changing data types
- Use a query to cast the data type while overwriting the same table.
select * EXCEPT(SOURCE_COLUMN), CAST(SOURCE_COLUMN as NEW_TYPE) from `proj.ds.table`
- You cannot convert strings to Boolean.
- use the same method for overwriting the table
Deleting a column
select * EXCEPT(COLUMN_TO_DELETE) from `proj.ds.table`
Changing a columns mode (Nullable, Required…)
- Export the table to Cloud Storage
- Create a new table with the correct mode (schema)
- Load the data to the new table.
- You will not be charged for the Export Job nor the Load job!
- Export and Load jobs are free
- You will be charged for storage, but this should be minimal.
- Don’t forget to delete the export after the load completes.
Restoring Deleted Data
- You can restore deleted data of a table up to 7 days, at any point in time
- Table decorators
- SYSTEM_TIME AS OF in the FROM clause
Example
Delete all rows from a table
delete from `proj.ds.table` where 1=1
Restore the data
- Make sure you store the results in a table!
- It’s OK to use the same table using overwrite
select * from `proj.ds.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)