BQFBD – Dataset and Table Creation

  BigQuery, BigQuery for Big Data Engineers

Main Menu

Section 3: Dataset and Table Creation

12. Setup a GCP account

https://www.udemy.com/course/bigquery/learn/lecture/22719631#overview

Steps to signup for a free Google cloud setup and get $300 for 12 months in your account

  1. Go to cloud.google.com
  2. Click on “Get started for free”
  3. Enter your Gmail id and password.
  4. Accept the Terms of service and click Continue.
  5. Select account type as ‘Individual’, Tax registration as ‘Unregistered Individual’ and other mandatory address details.
  6. Enter your credit/debit card details. A very small amount (for ex ₹1 or US $0.013) will be deducted from your bank card just to ensure its authenticity which will anyhow be refunded back to your card.

13: Note

https://www.udemy.com/course/bigquery/learn/lecture/28810020#overview

Once you open BigQuery, you may see a different UI for the initial screen. To get the relevant UI

  1. Click on “DISABLE EDITOR TABS”
  2. Click on “Opt Out”

 

14. Create a project

 

15. BigQuery UI Tour

https://www.udemy.com/course/bigquery/learn/lecture/22718867#overview

  • Queries are a type of JOB
    • Can be saved
  • Jobs
    • Load, export, copy and query jobs
    • saved for 6 months
  • Transfer
    • Import data
  • Scheduled Queries
  • Reservations
    • no info
  • BI Engine
    • no info
  • Resources
    • Pinned projects
    • Add Data
      • Public Data Sets for hands on…
        • https://console.cloud.google.com/bigquery?project=bigquery-public-data&page=project
        • Pin the project
        • Select your project from the top menu
        • All set!

16. Region vs Multi-region

https://www.udemy.com/course/bigquery/learn/lecture/22718881#overview

  • Data is not directly stored into tables.
  • Dataset (Top level container)
    • Meta Data
      • Zone
      • Encryption keys
  • After the dataset has been create, you can add tables.
    • Table1
    • Table2…
  • Dataset Names
    • 1024 characters (aA,09,_)
      • No spaces, hyphens, special chars
    • Case sensitive
      • DATASET1 and dataset1 can coexist

How to Create a Dataset

  • Setup Billing
    • [ Upgrade ]
      • Won’t be billed, but not setting this up will restrict access to some features.
  • Select the Project > Create Dataset
    • Provide unique name
    • Select Location (US by default)
      • Regions can ONLY be selected at time of creation!

Regions vs Multi-Region

  • Region is a specific Geographic place.
    • Examples
      • Las Vegas
      • Tokyo
    • Select a region close to you
  • Multi-Regons are large geographic locations with multiple regions
    • Examples
      • United States
      • European Union
    • Use multi-region when you know your data is going to be access from overall US or overall EU.
    • This will make copies of your data to multiple regions.
      • Stored in one region but backed up to multiple regions
      • Adds Resiliency
      • Adds Cost
  • Select a region carefully
    • Can affect cost
    • Can affect performance
  • If using data NOT in BigQuery (storage bucket?)
    • Dataset MUST be in the same region as the external data.

17. Create a Dataset

https://www.udemy.com/course/bigquery/learn/lecture/22718883#questions

  • Can create any number of Datasets in a project
    • > 1000 Dashboard performance can degrade
  • Unlimited tables
    • ~ 50,000 enumerating slows
    • This is the max the UI will display
  • Max 2500 authorized views to a dataset’s control list

18. Create a Table

https://www.udemy.com/course/bigquery/learn/lecture/22718891#questions

  • Download the names.zip file and extract.
    • Name, Gender, SS-last 4
  • Select Dataset > Create Table
    • Create table from: Upload
      • You do not need to first create a table for uploaded data.
    • Select file
      • < 10MB
      • < 16,000 rows
      • Multiple files should be uploaded individually
    • File Format: CSV
    • Destination
      • Search for a project > Select project name
      • Table type
        • Native: Stored in BigQuery
        • External: Backed by storage external to BigQuery, like Cloud storage
    • Table name: “names”
    • Schema
      • Manual (Do not select Auto)
      • + Add Field (column)
        • Note:
          • Columns must contain only letters, numbers or underscore
          • Max Len: 128 chars
          • Cannot use: _TABLE_, _FILE_, or _PARTITION
          • Not case sensitive, “COL1” and “col1” CANNOT coexist.
        • Name: “name”
          • Type: String
          • Mode: Nullable
            • Nullable: allows field to be null
            • Required: Null not allowed
            • Repeated: Contains array of values of the specified type.
      • + Add Field: “gender” (string, nullable)
      • + Add Field: “count” (integer, nullable)
    • Advanced
      • Write preference
        • Write if empty
        • Append to table
        • Overwrite table
      • Number of errors: 0
      • Unknown values: [ _ ] Ignore unknown values
        • When enabled, this would ignore rows with extra values in them
      • Field delimiter: Comma
      • Header rows to skip: 0
      • Quoted newlines:
      • Jagged rows: Check
        • if last value is null and not specified as such ( …,””)
        • and this is not checked, the row will be treated as bad data.
      • Encryption: Google managed key
    • [Create table]

After the table is created you can:

  • View/Edit the schema
  • View/Edit Details (expiration date)
  • Preview – Use this!!!
    • Similar to select * but free!
      • You will be charged for `select *`

Query Charges

  • How much data is stored in the tables and its partitions
  • Data extracted from the query (not clear)
  • Table creation is free

Limitations

  • 1500 operations per table per day
    • Includes the combined total of
      • Appends and Truncates
      • Load jobs, Copy and Query Jobs that append to or overwrite a destination table
      • Example
        • 600 copy jobs + 900 query jobs
  • Max 10,000 columns in a Table, Query result or View

LEAVE A COMMENT