New Year Special Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Exam Practice Test

Page: 1 / 6
Total 60 questions

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Question 1

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

Options:

A.

Sample the web server logs web servers and copy them into HDFS using curl

B.

Ingest the server web logs into HDFS using Flume

C.

Channel these clickstreams into Hadoop using Hadoop Streaming

D.

Import all user clicks from your OLTP databases into Hadoop using Sqoop

E.

Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

Question 2

Which YARN daemon or service negotiations map and reduce Containers from the Scheduler, tracking their status and monitoring progress?

Options:

A.

NodeManager

B.

ApplicationMaster

C.

ApplicationManager

D.

ResourceManager

Question 3

You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum Storage. What is the purpose of ZooKeeper in such a configuration?

Options:

A.

It only keeps track of which NameNode is Active at any given time

B.

It monitors an NFS mount point and reports if the mount point disappears

C.

It both keeps track of which NameNode is Active at any given time, and manages the Edits file. Which is a log of changes to the HDFS filesystem

D.

If only manages the Edits file, which is log of changes to the HDFS filesystem

E.

Clients connect to ZooKeeper to determine which NameNode is Active

Question 4

You use the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation?

Options:

A.

The file will remain under-replicated until the administrator brings that node back online

B.

The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication factor doesn’t fall below)

C.

This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are resorted

D.

The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the NameNodes

Question 5

You are planning a Hadoop cluster and considering implementing 10 Gigabit Ethernet as the network fabric. Which workloads benefit the most from faster network fabric?

Options:

A.

When your workload generates a large amount of output data, significantly larger than the amount of intermediate data

B.

When your workload consumes a large amount of input data, relative to the entire capacity if HDFS

C.

When your workload consists of processor-intensive tasks

D.

When your workload generates a large amount of intermediate data, on the order of the input data itself

Question 6

Identify two features/issues that YARN is designated to address: (Choose two)

Options:

A.

Standardize on a single MapReduce API

B.

Single point of failure in the NameNode

C.

Reduce complexity of the MapReduce APIs

D.

Resource pressure on the JobTracker

E.

Ability to run framework other than MapReduce, such as MPI

F.

HDFS latency

Question 7

Which two are features of Hadoop’s rack topology? (Choose two)

Options:

A.

Configuration of rack awareness is accomplished using a configuration file. You cannot use a rack topology script.

B.

Hadoop gives preference to intra-rack data transfer in order to conserve bandwidth

C.

Rack location is considered in the HDFS block placement policy

D.

HDFS is rack aware but MapReduce daemon are not

E.

Even for small clusters on a single rack, configuring rack awareness will improve performance

Question 8

A slave node in your cluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?

Options:

A.

25GB on each hard drive may not be used to store HDFS blocks

B.

100GB on each hard drive may not be used to store HDFS blocks

C.

All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node

D.

A maximum if 100 GB on each hard drive may be used to store HDFS blocks

Question 9

Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin –failover nn01 nn02?

Options:

A.

nn02 is fenced, and nn01 becomes the active NameNode

B.

nn01 is fenced, and nn02 becomes the active NameNode

C.

nn01 becomes the standby NameNode and nn02 becomes the active NameNode

D.

nn02 becomes the standby NameNode and nn01 becomes the active NameNode

Page: 1 / 6
Total 60 questions