Section 3: Dataset and Table Creation
12. Setup a GCP account
https://www.udemy.com/course/bigquery/learn/lecture/22719631#overview
Steps to signup for a free Google cloud setup and get $300 for 12 months in your account
- Go to cloud.google.com
- Click on “Get started for free”
- Enter your Gmail id and password.
- Accept the Terms of service and click Continue.
- Select account type as ‘Individual’, Tax registration as ‘Unregistered Individual’ and other mandatory address details.
- Enter your credit/debit card details. A very small amount (for ex ₹1 or US $0.013) will be deducted from your bank card just to ensure its authenticity which will anyhow be refunded back to your card.
13: Note
https://www.udemy.com/course/bigquery/learn/lecture/28810020#overview
Once you open BigQuery, you may see a different UI for the initial screen. To get the relevant UI
- Click on “DISABLE EDITOR TABS”
- Click on “Opt Out”
14. Create a project
15. BigQuery UI Tour
https://www.udemy.com/course/bigquery/learn/lecture/22718867#overview
- Queries are a type of JOB
- Can be saved
- Jobs
- Load, export, copy and query jobs
- saved for 6 months
- Transfer
- Import data
- Scheduled Queries
- Reservations
- no info
- BI Engine
- no info
- Resources
- Pinned projects
- Add Data
- Public Data Sets for hands on…
- https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=project
- Pin the project
- Select your project from the top menu
- All set!
- Public Data Sets for hands on…
16. Region vs Multi-region
https://www.udemy.com/course/bigquery/learn/lecture/22718881#overview
- Data is not directly stored into tables.
- Dataset (Top level container)
- Meta Data
- Zone
- Encryption keys
- Meta Data
- After the dataset has been create, you can add tables.
- Table1
- Table2…
- Dataset Names
- 1024 characters (aA,09,_)
- No spaces, hyphens, special chars
- Case sensitive
- DATASET1 and dataset1 can coexist
- 1024 characters (aA,09,_)
How to Create a Dataset
- Setup Billing
- [ Upgrade ]
- Won’t be billed, but not setting this up will restrict access to some features.
- [ Upgrade ]
- Select the Project > Create Dataset
- Provide unique name
- Select Location (US by default)
- Regions can ONLY be selected at time of creation!
Regions vs Multi-Region
- Region is a specific Geographic place.
- Examples
- Las Vegas
- Tokyo
- Select a region close to you
- Examples
- Multi-Regons are large geographic locations with multiple regions
- Examples
- United States
- European Union
- Use multi-region when you know your data is going to be access from overall US or overall EU.
- This will make copies of your data to multiple regions.
- Stored in one region but backed up to multiple regions
- Adds Resiliency
- Adds Cost
- Examples
- Select a region carefully
- Can affect cost
- Can affect performance
- If using data NOT in BigQuery (storage bucket?)
- Dataset MUST be in the same region as the external data.
17. Create a Dataset
https://www.udemy.com/course/bigquery/learn/lecture/22718883#questions
- Can create any number of Datasets in a project
- > 1000 Dashboard performance can degrade
- Unlimited tables
- ~ 50,000 enumerating slows
- This is the max the UI will display
- Max 2500 authorized views to a dataset’s control list
18. Create a Table
https://www.udemy.com/course/bigquery/learn/lecture/22718891#questions
- Download the names.zip file and extract.
- Name, Gender, SS-last 4
- Select Dataset > Create Table
- Create table from: Upload
- You do not need to first create a table for uploaded data.
- Select file
- < 10MB
- < 16,000 rows
- Multiple files should be uploaded individually
- File Format: CSV
- Destination
- Search for a project > Select project name
- Table type
- Native: Stored in BigQuery
- External: Backed by storage external to BigQuery, like Cloud storage
- Table name: “names”
- Schema
- Manual (Do not select Auto)
- + Add Field (column)
- Note:
- Columns must contain only letters, numbers or underscore
- Max Len: 128 chars
- Cannot use: _TABLE_, _FILE_, or _PARTITION
- Not case sensitive, “COL1” and “col1” CANNOT coexist.
- Name: “name”
- Type: String
- Mode: Nullable
- Nullable: allows field to be null
- Required: Null not allowed
- Repeated: Contains array of values of the specified type.
- Note:
- + Add Field: “gender” (string, nullable)
- + Add Field: “count” (integer, nullable)
- Advanced
- Write preference
- Write if empty
- Append to table
- Overwrite table
- Number of errors: 0
- Unknown values: [ _ ] Ignore unknown values
- When enabled, this would ignore rows with extra values in them
- Field delimiter: Comma
- Header rows to skip: 0
- Quoted newlines:
- Jagged rows: Check
- if last value is null and not specified as such ( …,””)
- and this is not checked, the row will be treated as bad data.
- Encryption: Google managed key
- Write preference
- [Create table]
- Create table from: Upload
After the table is created you can:
- View/Edit the schema
- View/Edit Details (expiration date)
- Preview – Use this!!!
- Similar to
select *but free!- You will be charged for `select *`
- Similar to
Query Charges
- How much data is stored in the tables and its partitions
- Data extracted from the query (not clear)
- Table creation is free
Limitations
- 1500 operations per table per day
- Includes the combined total of
- Appends and Truncates
- Load jobs, Copy and Query Jobs that append to or overwrite a destination table
- Example
- 600 copy jobs + 900 query jobs
- Includes the combined total of
- Max 10,000 columns in a Table, Query result or View