Free Google Data Engineer Assessment Test 
Assessment Test (25 Questions)
Q1: You want to use managed services when you move your machine learning operations to the Google Cloud Platform. You have been managing a Spark cluster because of your frequent usage of the MLlib library. What managed GCP service would you choose?
A. Cloud SQL B. BigQuery C. Cloud Dataproc D. Cloud CDN
Explanation: A managed service for Hadoop and Spark is called Cloud Dataproc. For running Apache Spark, Apache Flink, Presto, and more than 30 more open source tools and frameworks, it is a fully managed and highly scalable service.
Q2: Your organization is creating a database to store data from a product catalog. They came to the conclusion that you needed to make use of a database that supported flexible transactions and schemas. Which service would you intend to utilize?
A. Cloud Fire-store B. Cloud Spanner C. Cloud Bigtable D. Cloud SDK
Explanation: A managed file database that allows customizable schemas and transactions is known as Cloud Fire-store.
Q3: Because competitors’ e-commerce platforms offer a more customized consumer experience for their clients, including product recommendations that could be of interest to them, your business has been losing market share. According to the CEO, your business will offer comparable services in 90 days. Which GCP service would you employ to assist you to achieve this goal?
A. Cloud Build B. DataFlow C. Cloud Sql D. AI Platform
Explanation: Machine learning, which is required to provide recommendations, is a managed service provided by the AI Platform.
Q4: Your company’s finance division has been storing data on-site. They are no longer interested in keeping up an expensive dedicated storage system. For ten years, they want to keep up to 300 TB of data. Most likely, no one will access the data at all. Additionally, they seek to reduce expenses. Which storage service would you suggest?
A. Cloud Storage for Firebase B. Cloud Storage Coldline storage C. File-store D. Persistent Disk
Explanation: The most affordable choice is Cloud Storage Coldline, which is made for data that is accessed less than once per year.
For more information about Storage Classes, Please refer to the following link: https://cloud.google.com/storage/docs/storage-classes
Q5: Using private data, you will create machine learning models. Your business has a number of rules in place to safeguard sensitive data, one of which requires enhanced security for virtual machines (VMs) used to process sensitive data. Which GCP service would you turn in to meet such demands?
A. Shielded VMs B. Cloud Data Loss Prevention C. Security center D. Secret Manager
Explanation: Shielded VMs are instances with enhanced security measures. Virtual machines (VMs) in the Google Cloud have been “shielded” from rootkits and bootkits by a set of security rules.
For more information about Virtual machine instances, Please refer to the following link: https://cloud.google.com/compute/docs/instances
Q6: A machine learning algorithm that you created can recognize things in photos. Users may upload images to your company’s mobile app and receive a list of the objects that have been detected in the images. You must put the system in place to recognize when a new image is uploaded to cloud storage and to call the model to do the analysis. For that, which GCP service would you choose?
A. Data Studio B. Data Catalog C. Cloud Functions D. Cloud Run
Explanation: A managed serverless solution called Cloud Functions can respond to cloud-based events like the creation of a file in Cloud Storage.
For more information about Cloud Storage, Please refer to the following link: https://cloud.google.com/storage
Q7: The data is processed in a Cloud Dataflow pipeline before being written to Cloud Bigtable by an IoT system, which streams data to a Cloud Pub/Sub topic for ingestion. Even if nodes are not being utilized to their highest extent, latency is growing as more data is supplied. What would you start looking for as a significant source of this issue?
A. During write operations, too many indexes are being changed. B. The cluster has too many nodes C. A surplus of column families D. A poorly designed row key
Answer: (D) A poorly designed row key
Explanation: Hot spotting might be caused by a row key that is poorly designed.
For more information about Hotspotting, Please refer to the following link: https://cloud.google.com/blog/products/databases/hotspots-and-performance-debugging-in-cloud-bigtable
Q8: A Canadian business focused on health and wellbeing has had better results than anticipated. The founders are being pushed by investors to open up new markets outside of North America. The CEO and CTO are discussing the possibility of growth in Europe. The startup’s software gathers user data and stores part of it locally on the user’s device and some of it in the cloud. What regulations must the company consider before entering the European market?
A. GDPR B. EU company law C. TEU D. EU consumer rights law
Explanation: The General Data Protection Regulation (GDPR) is a law enacted by the European Union to safeguard the personal data of its citizens.
For more information about GDPR, Please refer to the following link: https://gdpr.eu/what-is-gdpr/
Q9: Your organization has been gathering data on vehicle performance for the past year and currently has 500 TB of data. The company’s analysts want to examine the data to better understand performance variations between vehicle classes. The analysts are advanced SQL users, but not all have programming experience If at all feasible, they would like to use a managed service to reduce administrative costs. What service would you suggest using to perform a preliminary analysis of the data?
A. BigQuery B. Cloud Storage C. Dataflow D. Cloud CDN
Explanation: BigQuery is a SQL-compatible analytics database. It is a multicloud, serverless, cost-effective data warehouse that can assist you in turning large data into insightful business information.
Q10: Applications for tracking luggage are being moved by airline to Google Cloud. There are a lot of needs, such as strong consistency and compatibility for SQL. Users from Asia, Europe, and the United States will be able to access the database. The database will initially hold about 50 TB of data and then expand by about 10% a year after that. Which managed database service would you suggest?
A. Cloud Bigtable B. Cloud Spanner C. DataFlow D. Datastream
Explanation: SQL may be used to query Cloud Spanner, a strongly consistent, globally scalable relational database.
For more information about Google Cloud Databases, Please refer to the following link: https://cloud.google.com/products/databases
Q11: In order to save information about users’ game states, you are using Cloud Firestore. The player’s health score, a list of their possessions, and a list of their teammates are all included in the state information. You may have observed that although the database’s raw data takes up about 2 TB of space, Cloud Firestore consumes almost 5 TB. What could possibly require so much more space?
A. The database cluster's nodes have incorrect configurations. B. A denormalized data model has been used. C. There are multiple indexes. D. Column families in use are overly many.
Explanation: When several indexes are used, Cloud Fire-store stores data redundantly, therefore having more indexes will result in larger storage sizes.
For more information regarding index types in cloud fire-store, Please refer to the following link: https://firebase.google.com/docs/firestore/query-data/index-overview
Q12: You have a BigQuery table with information on consumer purchases, such as the date of the transaction, the kind of products purchased, the product name, and a number of other descriptive features. Three years’ worth of data is present. You tend to query data by month, followed by the customer. You want to scan as little data as possible. How should the table be set up?
A. Partition by customer and cluster by purchase date B. Partition by customer and cluster by product Name C. Partition by purchase date and cluster by product features D. Partition by purchase date and cluster by customer
Explanation: Data for a day will be kept in a single partition if partitioning is done according to the purchase date. The data in a partition will be ordered by customers due to clustering. The quantity of information that has to be scanned to answer a query by purchase date and the customer will be reduced as a result of this method.
For more information on BigQuery Tables, Please refer to the following link: https://cloud.google.com/bigquery/docs/creating-clustered-tables
Q13: To implement an ELT pipeline in Hadoop, you are currently utilizing Java. You want to switch out your Java applications with a managed service on GCP. Which one do you prefer?
A. Cloud Dataflow B. Cloud Dataprep C. Cloud SQL D. Cloud Spanner
Explanation: Java ELT programs can be replaced with Cloud Dataflow, a managed service for stream and batch processing.
Q14: You have been engaged by a group of lawyers to assist them in sorting through more than a million papers in an intellectual property dispute. The lawyers must isolate any papers related to a patent that the plaintiff’s claim has been violated. When the model is tested using training data, the attorneys’ 50,000 labeled examples of documents, it does fairly well. On test data, however, it performs quite poorly. How may you attempt to enhance the performance?
A. Validation B. Delete unnecessary data C. Regularization D. Imputation
Explanation: In this case, the model overfits the training set of data. A set of methods i.e. Regularization is used to lower the chance of overfitting.
For more information on how to prevent overfitting, Please refer to the following link: https://cloud.google.com/bigquery-ml/docs/preventing-overfitting
Q15: Your business is shifting from an on-premises pipeline that stores data in MongoDB and ingests data using Apache Kafka. What two managed services would you suggest in their place?
A. Cloud Pub/Sub and Cloud Firestore B. Cloud Pub/Sub and Cloud Bigtable C. Cloud Bigtable and Cloud Dataprep D. Cloud Dataprep and Cloud Dataproc
Q16: Hadoop is being used by a team of data scientists to store and process IoT data. They have chosen to adopt GCP since managing the Hadoop cluster is taking too much time. Their use of services that would enable them to move their machine learning workflows and models to other clouds is of special relevance. What service would you choose to replace their current platform?
A. Dataprep B. Dataflow C. Pub/Sub D. Cloud Dataproc
Explanation: Spark contains a machine learning library called MLlib, and Spark is an open-source platform that may be used in other clouds. Cloud Dataproc is a managed Hadoop and Spark service.
For more information on Machine Learning Library, Please refer to the following link: https://spark.apache.org/docs/latest/ml-guide.html
Q17: In order to create regression models, you are likely to use the datasets you are currently analyzing. You’ll get more datasets, therefore you’ll need a workflow to convert the raw data into something you can analyze. Additionally, you want to use Python to engage interactively with the data. Which GCP services would you use?
A. Cloud Dataflow and Cloud Datalab B. Cloud Dataprep and Cloud Dataproc C. Cloud Datalab and Cloud Dataflow D. Cloud Dataflow and Cloud Dataplex
Q18: You want to keep a lot of your files in storage for several years. Users from all across the world will often access the files. You choose to use multi-regional cloud storage to store the data. In a Cloud Storage bucket, you want users to be able to see the files and any related metadata. What roles would you give to those users? (Assume you are operating by the least privilege principle.)
A. roles/storage.objectViewer B. roles/storage.hmacKeyAdmin C. roles/storage.objectAdmin D. roles/storage.editor
Explanation: The roles/storage.objectViewer role allows users to view objects and list metadata.
For more information on IAM roles for Cloud Storage, Please refer to the following link: https://cloud.google.com/storage/docs/access-control/iam-roles
Q19: A deep learning neural network capable of performing multiclass classification was created by you. You discover that the model is overfitting. In order to reduce overfitting, which of the following would not be used?
A. Train with more data B. Cross-validation C. Logistic Regression D. Early stopping
For more information on creating and training models, Please refer to the following link: https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create
Q20: Machine learning is something that your firm would want to start experimenting with, but nobody in the organization has any ML expertise. The marketing division’s analysts have found some data in their relational database that they think might be helpful for training a model. How should they begin developing proof-of-concept models, in your opinion?
A. VertexAi B. Dialogflow C. AutoML Tables D. Scikit-learn
Explanation: A service for creating machine learning models from structured data is called AutoML Tables.
Q21: Using TensorFlow, you’ve created a number of large deep learning networks. The models just employ standard TensorFlow elements. The models have been operating on a n1-highcpu-64 VM, however, training is taking longer than you’d want. What would you do in order to accelerate model training?
A. TPUs B. CPUs C. Non-preemptible machines D. GPUs
Explanation: Because TPUs are created particularly to accelerate TensorFlow models, they are the ideal accelerator.
For more information on TensorFlow models, Please refer to the following link: https://cloud.google.com/ai-platform/training/docs/tensorflow-2
Q22: To keep data in its raw state for a long time, your business wants to set up a data lake. Access controls, almost unlimited storage, and the least expensive prices should all be offered by the data lake. What GCP service would you advise?
A. Cloud Storage B. Cloud SQL C. Cloud Fire-store D. Cloud SDK
Explanation: An object storage solution that satisfies all the requirements is Cloud Storage. It offers quick, affordable, and extremely durable storage for data accessed less often than once per month.
Q23: The processes used by your organization for storing, processing, and sending sensitive data have been found to be insufficient by auditors. According to them, further precautions must be taken to ensure that private information, such as government-issued identification numbers, are not exposed. They advise masking or removing private information before sending it outside the organization. Which GCP service would you suggest?
A. Firebase Crashlytics B. Data loss prevention API C. Confidential Computing D. Endpoint Management
Explanation: Many types of sensitive data, including government IDs, can be removed using a data loss prevention API.
For more information on Cloud Data Loss Prevention, Please refer to the following link: https://cloud.google.com/dlp
Q24: As images are uploaded into Cloud Storage, processing of those images begins utilizing Cloud Functions. There have been spikes in the quantity of uploaded images in the past, and those times saw a large number of Cloud Function instances being created. What steps can you take to avoid a large number of instances from starting?
A. Use the --max-instances parameter when deploying the function. B. When executing the program, use the --max-limit parameter. C. In the resource hierarchy, configure the --max-instance parameter. D. Nothing. There is no option to limit the number of instances.
Explanation: The maximum number of concurrently running function instances is controlled by the –max-instances parameter.
For more information on using maximum instances, Please refer to the following link: https://cloud.google.com/functions/docs/configuring/max-instances
Q25: The business data warehouse maintenance expenses are increasing, which concerns your company’s CTO. A PostgreSQL instance runs the data warehouse as it is right now. You want to switch to GCP and employ a managed service that lowers operational expenses and can scale to meet up to 3 PB in future demands. What kind of service would you suggest?
A. Cloud CDN B. Dataflow C. BigQuery D. Cloud SQL
Explanation: BigQuery is a managed service that scales to petabytes of storage and is ideal for data warehousing.
Google Cloud Platform (GCP) is quickly becoming one of the most widely used cloud computing platforms in the world, and the demand for professionals…
Welcome to the complete guide on Azure Cloud Certifications! In today's fast-paced digital landscape, staying ahead of the curve in terms of technology and…
Google Cloud is one of the most widely used cloud platforms in the world. As a result, obtaining Google Cloud certifications has become increasingly…
Join more than 100k learners worldwide
Effective learning starts with assessment. Learning a new skill is hard
work—Skillcurb makes it easier.