Can Containers Ease Cassandra Management Challenges?

Tags

, , ,

This is a repost of my original blog on robin systems website.

Container-based virtualization and microservice architecture have taken the world by storm. Applications with a microservice architecture consist of a set of narrowly focused, independently deployable services, which are expected to fail. The advantage – increased agility and resilience. Agility since individual services can be updated and redeployed in isolation. While given the distributed nature of microservices, they can be deployed across different platforms and infrastructures, and the developers are forced to think about resilience from the ground up instead of as an afterthought. These are the defining principles for large web-scale and distributed applications, and web companies like Netflix, Twitter, Amazon, Google, etc have benefitted significantly with this paradigm.

Add containers to the mix. Containers are fast to deploy, allow bundling of all dependencies required for the application (break out of dependency hell), and are portable, which means you can truly write your application once and deploy it anywhere. Microservice architecture and containers, together, make applications that are faster to build and easier to maintain while having overall higher quality.

Image borrowed from Martin Fowler’s excellent blog:  http://martinfowler.com/articles/microservices.html

A major change forced by microservice architecture is decentralization of data. This means, unlike monolithic applications which prefer a single logical database for persistent data, microservices prefer letting each service manage its own database, either different instances of the same database technology, or entirely different database systems.

Unfortunately, databases are complex beasts, have strong dependence on storage, have customized solutions for HA, DR, and scaling, and if not tuned correctly will directly impact application performance. Consequently, the container ecosystem has largely ignored the heart of most applications: storage, and thus limit the benefits of container-based microservices due to the inability to containerize stateful & data heavy services like databases.

Majority of the container ecosystem vendors have mostly focussed on stateless applications. Why? Stateless applications are easy to deploy and manage. For example, its ability to respond to events by adding or removing instances of a service without needing to significantly change or reconfigure the application. For stateful applications, most container ecosystem vendors have focussed on orchestration, which only solves the problem of deployment and scale, or existing storage vendors have tried to retrofit their current solutions for containers via volume plug-ins to orchestration solutions. Unfortunately this is not sufficient.

To dive deeper into this, let’s take the example of Cassandra, a modern NoSQL database, and look at the scope of management challenges that need to be addressed.

Cassandra Management Challenges

While poor schema design and query performance remains as the most prevalent problem, it is rather application and use case specific, and requires an experienced database administrator to resolve. In fact, i would say most Cassandra admins, or any DBA for the matter, enjoy this task and pride themselves at being good at it.

The management tasks which Database admins would rather avoid and have automated are:

  1. Low utilization and lack of consolidation
  2. Complex cluster lifecycle management
  3. Manual & cumbersome data management
  4. Costly scaling

 

Let’s look at these one by one.

1. Low utilization and lack of consolidation

Cassandra clusters are, typically, created per use-case or SLA (read intensive, write intensive). In fact, the common practice is to give each team its own cluster. This would be an acceptable practise if clusters weren’t deployed on dedicated physical servers. In order to avoid performance and noisy neighbor issues, most enterprises stay away from virtual machines. This unfortunately means that underlying hardware has to be sized for peak workloads, leaving large amounts of spare capacity and idle hardware due to varying load profiles.

All this leads to poor utilization of infrastructure and very low consolidation ratios. This is a big issue for enterprises on both – on-premise and in the cloud.

Underutilized servers == Wasted money.

2. Complex Cluster Lifecycle Management

Given the need for physical infrastructure (compute, network, and storage), provisioning Cassandra clusters on premise can be time consuming and tedious. The hardest thing about this activity is estimating the read and write performance that will be delivered by the designed configuration, and hence often involves extensive schema design and performance testing on individual nodes.

Besides initial deployment, enterprises also have to cater to node failures. Failures are the norm, and have to be planned for from the get go. Node failures can be temporary or permanent and can be caused due to various reasons – hardware faults, power failure, natural disaster, operator errors, etc. While Cassandra is designed to withstand node failures, it still has to be resolved by adding replacement nodes, and it poses additional load on the remaining nodes for data rebalance – post failure and again post addition of new nodes.

cassandra-nodefailure

1. Node A fails

2. Other nodes take on the load of node A

3. Node A is replaced with A1

4. Other nodes are loaded again as they stream data to node A1

 

3. Manual Data Management

Unlike traditional databases like Oracle, Cassandra does not come with utilities that automatically backup the database. Cassandra offers backup in terms of snapshots and incremental copies, but they are quite limited in features. Most notable limitations of snapshots are:

  • Use hard-links to store point-in-time-copies
  • Use the same disk as data files (compaction makes this worse)
  • Are per node
  • Do not include Schema backup
  • No support for off-site relocation
  • Have to be removed manually

Similarly, data recovery is fairly involved. Data recovery can be performed for two reasons:

  1. to recover database from failures. For example, in case of data corruption, loss of data due to an incorrect `truncate table`, etc
  2. to create a copy of it for other uses. For example, to create a clone for dev/test environments, to test schema changes, etc.

Typical steps to recover a node from data failures

cassandra-restore

In order to optimize for space used for backups, most enterprises will retain last 2 or 3 backups on the server but will move the rest to a remote location. This means based on the data sample needed, you maybe be able to restore locally on the server server or have to move files around from a remote source.

While Datastax Enterprise edition does provide notion of scheduled backups via OpsCenter, it still involves careful planning and execution.

4. Costly Scaling

With Cassandra’s ability to scale linearly, most administrators are quite accustomed to adding nodes (or scale out) to expand the size of clusters. With each node you gain additional processing power and data capacity. But while node addition is required to cater to steady increase in database usage, how does one handle transient spikes?

Let’s look at a scenario. Typically, once a year, most retail enterprises will go through the planning frenzy for Thanksgiving. Unfortunately, post that event, majority of the infrastructure would be idle or requires administrators to break the cluster and repurpose it for other uses. Wouldn’t it be interesting if there was a way to simply add/remove resources dynamically and scale the cluster up/down based on transient load demands?

Summary

Many enterprises have experimented with docker containers and open source orchestrators like mesos and kubernetes, but they soon discover that these tools along with their basic storage support in volume plugins only solve the problem of deployment and scale, but are unable to address challenges with container failover, data and performance management, and the ability to take care of transient workloads. In comes Robin Systems.

Robin is a container-based, application-centric, server and storage virtualization platform software which turns commodity hardware into a high-performance , elastic, and agile application/database consolidation platform. In particular, Robin is ideally suited for data-applications such as databases and Big data clusters as it provides most of the benefits of hypervisor-based virtualization but with bare-metal performance (up to 40% better than VMs) and application-level IO resource management capabilities such as minimum IOPS guarantee and max IOPS caps. Robin also dramatically simplifies data lifecycle management with features such as 1-click database snapshot, clones, and time-travel.

Join us for a webinar on Thursday, August 25 10am Pacific / 1pm Eastern to learn about how Robin’s converged, software-based infrastructure platform can make Cassandra management a breeze by addressing all aforementioned challenges.

 

Understanding Flash: Summary – NAND Flash Is A Royal Pain In The …

flashdba

chaos-order

So this is it – the last article in my mini-series on understanding flash. This is the bit where I draw it all together in a neat conclusion that makes you think, “Yes! That was worth reading”. No pressure eh?

So let me start with the conclusion first: as a storage medium, NAND flash is a royal pain in the ass.

Chaos

Why? Well, let’s look back at what we’ve learned in the previous 9 articles:

View original post 953 more words

Oracle Private Database Cloud: Understanding the Resource Model

Tags

, ,

In the public cloud parlance, consumers never provision to a specific server, instead they provision to a pool of infrastructure within a geographical region or data center. You would find this pattern in AWS, where you select Availability Zones or in Oracle Cloud, where you select the desired Data Center.

EM12c’s private database cloud follows a similar paradigm. It offers a two tier hierarchy in PaaS Infrastructure Zones, and Software Pools.

PaaS Infrastructure Zone (or Zone)

Zone is a logical grouping of cloud infrastructure resources (like servers, network, storage, etc) based on QOS, functional, departmental or geographic boundaries. For example, Finance Zone, East Coast Zone, etc. Cloud users or consumers provision into a Zone. A Zone is also used to enforce access control and chargeback/showback.

Database Software Pool (or Pool)

A  group of homogeneous clustered or non-clustered database resources exhibiting common characteristics. For example,
– Pool of 11.2 Database Oracle Homes (for dedicated databases)
– Pool of 12c Container Databases (for PDBs)

Okay now that we have covered some theory, lets take an example and walk through it. Our goal to offer a database service that is highly available and redundant across multiple data centers. This image below captures the situation well.

cloud_serv_cat

So lets get modeling.

A. Modeling Zones

1. We have two data centers – say one on the East Coast and the other on the West Coast. With this information, we could model two Zones based on the location dimension – East Coast Zone and West Coast Zone. As easy as this sounds, i don’t think putting all your hardware resources in a single grouping makes much sense.

2. Its likely that we have servers from different vendors, with different architectures, etc. Assume we own a few Exadata, some commodity servers, some SPARC servers, some VMs, etc. To accommodate this, we can update our model with the hardware dimension – East Coast-Exadata Zone, East Coast-Commodity Zone, East Coast-Sparc Zone, etc.

3. Now typically, hardware is rolled out to host applications, databases, etc. Applications have a lifecycle, they start in the development environment, then move to test, stage, performance, and finally production. Separate hardware is allocated for each of these environments, each with different characteristics – performance, cost, redundancy, etc Again, we can update our model with this new lifecycle dimension – East Coast-Exadata-Development Zone, East Coast-Exadata-Production Zone, etc

It is important to note, that all of the above dimensions are derived based on my experience with a bunch of customers. You are not required to use all of them, and it always helps to keep things simple. Lets look at pools next.

B. Modeling Pools

Pools are more software and platform centric and thus can be modeled based on various dimensions. Common dimensions are:

  • Service Type: EM supports 3 service types – database, schema, and pluggable databases
  • Version: This is the database software / Oracle home version
  • Platform: This corresponds to operating systems like linux x86, solaris, hp-ux, etc
  • High Availability: This represents if the infrastructure is RAC or SI (single instance)
  • Disaster Recovery: Indicates if the pool will be used to host standby databases or just primary
So the naming format for Pools could be like this: 
<Service_Type>-<Version>-<Platform>-<HA>-<DR> 

Some examples: 
DB-11204-linux-RAC, DB-11204-linux-SI-STANDBY, PDB-12102-RAC, etc

Again, as i said before these are mere suggestions, and it is really up to you to decide which works best for you.

Lets come back to our example from above. If i try to implement the resource model so as to deploy a highly available and redundant database service, it would look something like this:

cloud-res-model

The zone name should indicate its composition. In the pool name, i skip the platform part since i am using Exadata and hence i assume Linux automatically. If you are using VMs or commodity hardware you may want to specify the platform as well. So my pools are composed of GI clusters deployed on Exadata compute nodes and the DB Oracle home pre-provisioned. Note a pool can contain multiple clusters. At the time of provisioning, the placement algorithm ensures that the requested database is created on a cluster with least utilized nodes. This is the power of automation provided by EM 12c.

In summary, with the EM12c resource model, cloud providers have the ability to organize and manage their infrastructure the way they would like, while keeping the consumer experience simple and intuitive.

References:

Screenwatch: Build Service Catalogs with EM12c DBaaS

Oracle Private Database Cloud: Defining Database Sizes in the Service Catalog

Tags

, , ,

Latest release of cloud plug-in (part of EM12c R4 Plug-in Update 1) brings the ability to define sizes for database cloud services (Schema and PDB services already support definition of size). Prior to this ability, customers were required to define a new template for each size – small, medium, large, etc. This will help in significantly reducing the number of templates required.

So lets see how to use them.

1. The CRUD operations for database size are performed via emcli verbs. To create a new size, we run:

./emcli create_database_size -name=Tiny -description="tiny size" 
        -attributes="cpu:2;storage:20;memory:2"

The 4 attributes supported for size are – cpu, memory, processes, & storage. All attributes are optional, and minimum name and description need to be provided.

To list all sizes, we run:

./emcli list_database_sizes
 ____________________________________________
 Name:tiny
 Description:tiny size
 CPU(cores):2
 Memory(GB):2
 Processes(COUNT):Not Specified
 Storage(GB):20
 ____________________________________________

Had this size been assigned to any service templates, we could have provided the -details flag and gotten that list as well.

Editing this size is equally easy. We run:

./emcli update_database_size -name=tiny 
        -description="Tiny database size" 
        -attributes="storage:remove;processes:500"
 ____________________________________________
 Name:tiny
 Description:Tiny database size
 CPU(cores):2
 Memory(GB):2
 Processes(COUNT):500
 Storage(GB):Not Specified
 ____________________________________________

Few things to notice are:

  • name is the only fixed identifier, both attributes and description can be changed
  • to remove an attribute, set its value to remove
  • to set new attribute values, or change existing one, simply specify the new values

Finally, to delete a size, we run:

./emcli delete_database_size -name=tiny
 Are you sure you want to to delete?(yes/no)
 yes
 Database size tiny has successfully been deleted

Since delete is a destructive operation operation, you get the “Are you sure?” prompt.

2. Once sizes are defined, we can see them while creating or editing service templates. Note that by default, no sizes are attached with the service template. The administrator needs to explicitly associate a size with the template, as shown in the screenshot below.

st-size

3. Once the association is complete, and the template is saved. Any cloud consumer accessing the template via the cloud portal will see size as an additional input.

request-size

If you are using REST APIs, then database size will have to be provided as an additional input to the POST request.

A commonly asked question is that how are these values enforced? The answer depends on the the type of attribute:

  • cpu – this translates to cpu_count and is set as the DB init parameter
  • memory – this translates to memory_target and set as DB init parameter (This is a good time to mention that the memory attribute is only supported for DB version 11g and above. Other attributes work for 10g and above.)
  • storage – this is used only for monitoring purposes, and not enforced. So at the time of provisioning we ensure that the storage requirements for the service are within the quota limits of the consumer, but these limits are not set on the database.
  • processes – this is set as DB init parameter

In summary, the ability to define size as an external entity will help drastically reduce the number of templates required to be defined by the cloud administrators.

Additional Resources:

EMCLI Documentation

Discover and Promote Oracle Homes as EM Targets

Tags

, , , ,

Typically, Oracle Homes are discovered and promoted as targets automatically along with guided flows for addition of primary targets like databases, weblogic domains, etc, but there might be instances (not very often) where you need to discover the Homes standalone.

There are two ways to do this – from the GUI and using EMCLI verbs

A. From GUI

The steps are as follows:

  1. Goto Enterprise->Job->Activity menu item
  2. Here select the job type ‘Discover Promote Oracle Home Target’ and click Go
  3. Provide the obvious inputs like name, list of hosts, etc, but the most important tab is that of ‘Parameters’. Here you are required to provide 3 inputs:
    1. Path to Oracle Home/Inventory/Composite Home/Middleware Home you want to manage
    2. The type of entity you want to manage. Your options are – Oracle Home, Inventory, Composite Home, or Middleware Home
    3. The action – discover and promote, or just discover [I almost always select the former]

discover_oh

That’s it. All remaining tabs are optional. Once the job is submitted, it usually takes a few seconds to complete. The output of the job clearly lists all the Oracle Homes discovered and the target names created for these Homes in EM.

B. Using EMCLI

1. First we describe the job type to get the list of required inputs

./emcli describe_job_type 
        -job_type=discoverPromoteOHTargets  > inputs.prop

If the explanation of the required fields is not sufficient, you can additionally pass the -verbose flag to get more details.

Now update the inputs.prop file with the relevant values. My file looks like this (changed values in red):

# Description: (Optional) The user specified name of the job
name=promote_OH

# Description: (Optional) The job type for this job
type=discoverPromoteOHTargets

# Description: (Optional) The user specified description of the job
description=

# Description: The job owner. The job owner is the user who creates the job.
# Default: the logged in user
# The job owner information displayed here is for documentation only 
# and user is not expected to change it.
owner=

# Description: (Optional) The kind of job
# Legal Values: active, library
kind=active


# Fill in the target list before submitting.
# For Example:
#     target_list=MyTarget:cluster
target_list=abc.example.com:host

# Description: The type of targets to use for this job
targetType=host

# Description: (Required) Enter the action you want to perform.
# To run only discovery on the target, use : disc.To run discovery 
# and promotion on target to managed status, use : promote
variable.loc_action=promote

# Description: (Required) Enter the type of search to be performed. 
# All the homes in the Inventory/Middleware Home will be managed.
# For discovering Oracle Home, use : oh.For discovering Oracle Home's
# in inventory , use : inv.For discovering Oracle Home's in 
# middleware home, use : mwh
variable.loc_type=inv

# Description: (Optional) Enter Path to Oracle Home/Inventory/
# Composite Home/Middleware Home you want to manage.
variable.location=/u01/foo/oraInventory

# Description: (Optional) Notify the job owner when a selected state occurs
# Allowed Values:  SCHEDULED, RUNNING, ACTION_REQUIRED, SUSPENDED, SUCCEEDED, PROBLEMS
notification=

Note, i submit my job against a single host target, but you can provide a long comma separated list of <target_name>:<target_type>. Now we submit the job by passing the above inputs.prop file as input.

emcli create_job -input_file=property_file:inputs.prop
Creation of job "PROMOTE_OH" was successful.

Since the EM12c job system is asynchronous, the emcli verb will submit the job and return control almost instantly. We need to run different set of emcli verbs to get job progress.

emcli get_jobs -name=PROMOTE_OH
Name        Type                      Job ID                            Execution ID
      Scheduled            Completed            TZ Offset  Status     Status ID  Owner   Target Type
  Target Name
PROMOTE_OH  discoverPromoteOHTargets  10FBD71FF8C70205E050F00A07B46726  10FBD71FF8C90205E050F00A07B46726  
2015-03-10 22:34:33  2015-03-10 22:34:37  GMT-07:00  Succeeded  5          SYSMAN  host
  abc.example.com

The output is ill formatted, but the only 2 fields we care about are – Execution ID and the Status. Since this is a fairly quick job, the status is shown as Succeeded. If i wanted to view the output, i would run another emcli command, and for this we need the execution ID.

emcli get_job_execution_detail 
      -execution=10FBD71FF8C90205E050F00A07B46726 -xml -showOutput
<?xml version = '1.0' encoding = 'UTF-8'?>
<jobExecution jobOwner="SYSMAN" status="5" startTime="2015-03-11 05:34:34.0" id="10FBD71FF8C90205E05
0F00A07B46726" jobName="PROMOTE_OH" statusBucket="-5">
   <TargetList>
      <target name="abc.example.com" type="host" hostName="abc.example.com"/>
   </TargetList>
   <steps>
      <step command="DiscoverAndPromoteOH" status="5" name="RunCustomDiscovery" startTime="2015.03.1
0 22:34:34" endTime="2015.03.10 22:34:37" timezoneRegion="-07:00" stepId="825682" stepType="1" jobTy
pe="discoverPromoteOHTargets" stepNlsId="discoverPromoteOHTargets_RunCustomDiscovery" stepDefaultNam
e="RunCustomDiscovery" target="">
         <stepOutput>
            <output>Discovered Oracle Home Target 'OraDB12Home1_11_abc.example.com' with home
 location - /u01/db12/product/12.1.0/dbhome_1 in Inventory /u01/foo/oraInventory.
Successfully created 1 new Oracle Home Targets.
Succesfully added discovered homes matching given criteria.</output>
         </stepOutput>
      </step>
   </steps>
</jobExecution>

The output clearly states the outcome of the job, and if i check my all targets page, i will find the new Oracle Home target.

In summary, while there may not be many reasons to just discover and promote standalone Oracle Homes, if you ever need to do it, this blog tells you how to do it both via the GUI and EMCLI.

Understanding Agent Resynchronization

Tags

, ,

Agent Resynchronization (resync) is an important topic but often misunderstood or misused. In this Q&A styled blog, I discuss how and when it is appropriate to use agent resynchronization.

What is Agent Resynchronization?

Management Agent can be reconfigured using target information present in the Management Repository. Resynchronization pushes all targets and related information from the Management Repository to the Management Agent and then unblocks the Agent.

 Why do agents need to be re-synchronized?

There are two primary reasons why you may need to use agent resynchronization:

1. Agent is blocked

An agent is blocked whenever it is out-of-sync with the repository. This, typically, can happen due to a corrupt targets.xml, missing files and directories, and bugs in the code (they are rare but few do exist🙂 ) that can leave the plug-in inventories in a strange state. In this condition, the OMS rejects all heartbeat or upload requests from the blocked agent. This means, the blocked agent will not be able to upload any alerts or metric data to the OMS, but it does continue to collect monitoring data. This is useful as once the agent is resynchronized, no monitoring data is lost.

2. Agent is lost and has to be reinstalled

This could be considered to be a special case of agent blocked condition, but it is worth discussing separately. If an agent host or file system is ever lost, the recommended way to reinstall it is by cloning from a reference install. This not only recovers the agent, but avoids having to track and reapply customizations and patches.

Note: It is important to retain the same port when reinstalling the agent.

Agent resync when run on a reinstalled agent, reconfigures it using target information present in the repository. The OMS detects that the agent has been re-installed and blocks it temporarily to prevent the auto-discovered targets in the re-installed agent from overwriting previous customizations.

Note: NEVER, NEVER, combine agent recovery with upgrade! If you lose your agent, recover it first using the original version, and then upgrade it to the new release.

Which interfaces are available for this operation?

There are two interfaces that will allow you to perform agent resync.

1. The Enterprise Manager Console

a. Navigate to Setup->Manage Cloud Control->Agents to view list of all agents

b. Select the desired agent and visit its home page

c. Finally, select the ‘Resynchronization…’ option from the agent menu

Agent Resynchronization Menu Item

2. EMCLI

The agent can also be resynchronized via EMCLI. The command is as follows:

>> emcli resyncAgent -agent="Agent Host:Port"

How long does it take to resynchronize an agent?

This is a popular question, but unfortunately there is no straight answer for it. The time for resynchronization depends on the amount of data stored in the repository about the agent. When this action is invoked, the OMS does not consult the agent – it just asks agent to delete everything first, and then pushes the known state to it. Majority of the time is spent in pushing the plug-in content. So the more plug-ins deployed to the agent, the longer it takes. Metric Extensions and Configuration Extensions deployed to the agent would also contribute to the time.

Additional  Resources:

Upgrading  Oracle Management Agents

Back Up and Recover Enterprise Manager

EMCLI with Scripting Option: Embedded Jython Interpreter

Tags

, , ,

If you have been using the classic Oracle Enterprise Manager Command Line interface (EMCLI ), you are in for a treat. Oracle Enterprise Manager 12c R3 comes with a new EMCLI kit called ‘EMCLI with Scripting Option’. Not my favorite name, as I would have preferred to call this EMSHELL since it truly provides a shell similar to bash or cshell.  Unlike the classic EMCLI, this new kit provides a Jython-based scripting environment along with the large collection of verbs to use. This scripting environment enables users to use established programming language constructs like loops (for, or while), conditional statements (if-else), etc in both interactive and scripting mode.

Benefits of ‘EMCLI with Scripting Option’

Some of the key benefits of the new EMCLI are:

  • Jython based scripting environment
  • Interactive and scripting mode
  • Standardized output format using JSON
  • Can connect to any EM environment (no need to run EMCLI setup …)
  • Stateless communication with OMS (no user data is stored with the client)
  • Generic list function for EM resources
  • Ability to run user-defined SQL queries to access published repository views

Before we go any further, there are two topics that warrant some discussion – Jython and JSON.

Jython

Jython is the Java implementation of the Python programming language. I have been working with Python (or CPython) and Jython for the last 10 years, and to me it is the best scripting language ever. It is fun, easy to learn, the syntax is simple, is self formatted, and is dynamically typed. This comic from XKCD summarizes it the best:

Python

There are numerous tutorials for Python/Jython on the web, so feel free to pick anyone you like but just remember that the Jython version supported by the new kit is v2.5.1.

JSON

JSON stands for JavaScript Object Notation. It is a data interchange format, much like XML, which is easier to read and write for both humans and machines, but unlike XML it contains very little metadata (elements and attribute names). JSON format is quite simple; it basically represents data as a collection of name/value pairs. These pairs can be contained within arrays, lists, or maps. Here is a sample:

{"menu": {
    "id": "file",
    "value": "File",
    "popup": {
    "menuitem": [
        {"value": "New", "onclick": "CreateNewDoc()"},
        {"value": "Open", "onclick": "OpenDoc()"},
        {"value": "Close", "onclick": "CloseDoc()"}
        ]
    }
}}

JSON is quite popular. You will often find it used with REST based web services APIs or even with some modern databases like MongoDB. Most programming languages provide libraries to work with JSON.

The EMCLI kit uses JSON as its output format as well. Many of the verbs return output in JSON format for ease of programmatic use. I say many, since there are still some verbs that don’t, but this is only matter of time.

Now let’s get back to EMCLI.

Steps to setup the kit for ‘EMCLI with Scripting Option’

1. To download the new EMCLI kit, go to Setup->Command Line Interface. Here you will notice the new section for ‘EMCLI with Scripting Option’. Click on the link to download the kit to your desktop or desired server.

Download

You can also download the kit directly from the following url:

http(s)://<host>:<port>/em/public_lib_download/emcli/kit/emcliadvancedkit.jar

2. Copy the kit (emcliadvancedkit.jar) to a directory where you wish to install EMCLI

kit

3. To install, run the following command. Note that we need the Java version to be 1.6.0_43 or greater.

java -jar emcliadvancedkit.jar client -install_dir=<emcli_client_dir>

Verify Java version

4. The last step to complete the setup is to run ‘sync’. Before using EMCLI you have to connect to the OMS to install all verb-related command line help. In classic EMCLI, this happens automatically when you run the ‘setup’ command. But in the new EMCLI, since we do not run setup, we run the ‘sync’ command instead.

The ‘sync’ verb now accepts some additional arguments. Run the following command:

emcli sync -url=http(s)://<host>:<port>/em -username=<user> -trustall

It will prompt for the user password and then take a few minutes to download and install all the help content.

emcli sync

5. Now confirm the setup with a simple test. We do this using the interactive mode. Just run ‘emcli’, and once you see the prompt run ‘help()’. This will print list all verbs along with their description.

emcli interactive mode

With the setup complete, let’s now have some fun.

Using the ‘EMCLI with Scripting Option’

Connect to the interactive mode by running ‘emcli’ from the command prompt. Now try the following commands:

1. Basic Jython: Since EMCLI is built using the Jython interpreter, you can run
Jython commands at the EMCLI prompt. For example, you can try the following:

>>1+2

>>print “Hello Jython”

>>mylist = [1,2,3]

>>print mylist

Jython test

2. EMCLI Status: Next, print the status of the EMCLI session using the ‘status()’ command.

emcli status

You will notice that the EM url and user are not set. To do this we have to set the client_properties. Run ‘help(‘client_properties’)’ for more details.

client properties

The help text instructs us to set the client properties to connect to a specific EM environment. The 4 properties of interest to us are the following:

Name

Details

EMCLI_OMS_URL

The EM url

EMCLI_USERNAME

The EM user to connect as. We will use the login() function to set
this.

EMCLI_TRUSTALL

I like to set this to true, but the default is false.

EMCLI_OUTPUT_TYPE

I like to set this to JSON even for interactive mode

To set these properties run the following:

>>set_client_property(‘EMCLI_OMS_URL’,’http(s)://<host>:<port>/em’)

>>set_client_property(‘EMCLI_TRUSTALL’,’TRUE’)

>>set_client_property(‘EMCLI_OUTPUT_TYPE’,’JSON’)

>>login(username=”<em_user>”,password=”<password>”)

You should see the message on successful login. Now we are connected to EM.

login

3. Understanding help and verb invocations: Most of the help text presented in EMCLI is tailored towards the classic interface. Since Jython is a programming language, verb invocations are done in the function form. There is a simple mechanism for converting the classic invocation format for use in both interactive and scripting mode. Let’s use the login() verb as an example.

The EMCLI help for login is as follows:

>>help(‘login’)

emcli login

   -username=<EM Console Username>

   [-password=<EM Console Password>]

   [-force]

This means, when using classic EMCLI, you would invoke it as follows:

emcli login –username=”foo” –password=”bar” -force

Instead, in the interactive or script mode, the invocation would look like:

login(username=”<em_user>”,password=”<password>”,force=True)

Essentially, all verbs are now functions, and all arguments to the verb are now parameters passed to the function. Since the –force argument does not take any value, it is treated as a Boolean in Jython and takes the values of True or False.

Note: The -force parameter in the login() function is not applicable to the interactive or script mode, but is being used in this example to explain the concept of passing Boolean values. Again, you should never use the -force parameter in the interactive or script mode.

Another such conversion that you may come across is for list of values. For example,

In classic EMCLI, some verbs will ask for the same attribute to be repeated with varying values to represent a list.

emcli grant_privs -name='jan.doe'
         -privilege="USE_ANY_BEACON"
         -privilege="FULL_TARGET;TARGET_NAME=host1.acme.com:TARGET_TYPE=host"

In interactive or script mode, you can use native Jython listes instead and pass it as parameters. In Jython, lists are represented within square brackets ([]).

>>priv_list = ['USE_ANY_BEACON','FULL_TARGET;TARGET_NAME=host1.acme.com:TARGET_TYPE=host']
>>grant_privs(name='jan.doe',privilege=priv_list)

4. Sample Use Case: Let’s take a very simple use case to demonstrate the interaction with EMCLI in the interactive mode. So our sample use case is to ‘List all targets of type oracle_database and those whose name starts with the characters ‘db’”.

For this use case, we will make use of the new generic ‘list’ verb. Traditionally, each feature in EM provided its own verbs for list, get, show, and describe. Rather than working with multiple such variants, the new generic ‘list’ verb takes a page from the REST web service specification and provides a generic action that can work against different EM resources.

To learn more about this verb, we run:

>>help('list')

help

The help text shows us the format of this verb. Essentially, there are 3 parameters that we care about:

  • resource = the EM resource which is to be queried
  • columns = specify the different resource attributes to display
  • search = filters to narrow down the result

First, we need to know the list of resources that are supported by this verb. For this we run

>>List(‘help’)

list help

From the output, it is obvious that for our sample use case we want to query the Targets resource.

Second, we need to know which columns are supported by the Targets resource. For this, we run

>>list(‘help’,resource=”Targets”)

help resourcesl

From the output, we can determine that we need the column related to target name and type. With this we have all the information we need to construct the final function call for our sample use case.

For ease of explanation, I will break down the process of determining the final function call into small incremental steps. Once you gain proficiency, you will be able to define this
function in a single pass.

       1. List all targets in the EM environment. For this we run,

>>list(resource=”Targets”)

This command will spew a lot of text on your screen as there are likely to be numerous targets in your EM environment. So instead of listing all of them on the screen, let’s just get a count. For this, we need to understand the output format of this verb.

Any function that you run in the interactive or script mode returns an object of class Response (<class ’emcli.response.Response’>). The Response class has 4 key methods:

Function

Description

out()

Provides the verb execution output. The output can be text, or the JSON.

 isJson() method on the Response object can be used to determine whether the output is JSON.

error()

Provides the error text (if any) of the verb execution if there are any errors or exceptions during verb execution.

exit_code()

Provides the exit code of the verb execution. The exit code is zero for a successful execution and non-zero otherwise.

isJson()

Provides details about the type of output. It returns True if response.out() can be parsed into a JSON object.

So let’s look at a code snippet.

snippet

For the first function call to list all targets in EM, we store the results into a variable called ‘all_tgts’. This variable contains the response object. ‘all_tgts.out()’ will give us the actual output. The output returned is in JSON format which automatically gets converted into a Jython dictionary (collection of name-value pairs represented by curly brackets). The output dictionary has a key name called ‘data’ which contains all search results in the form of a Jython list as its value. Finally, len() is a native Jython function which returns the number of elements in a Jython list. As seen in the output, we found 878 targets in the EM environment which is clearly not what we desire.

 2. Now we add search parameters to filter our results. We add two search filters, first the target type should be equal to oracle_database, and second the target name be like db%. You can add multiple search filters to the function call, but all these filters should be encapsulated in a Jython list. The search filter supports various operators: =, !=, >, <, >=, <=, like, null, and not null. Similar to a SQL query, you can also
control which columns are to be displayed in the output.

So let’s run our final function.

>>search_filters=[“TARGET_TYPE=’oracle_database'”,”TARGET_NAME like ‘db%'”]

>>list(resource=”Targets”,columns=”TARGET_NAME,TARGET_TYPE”, search=search_filters)

The formatted output looks like this. As mentioned before it is in the form of a Jython dictionary which can be easily accessed programmatically. The value of the ‘data’ key is a Jython list that contains all search results, while the other keys provide other metadata related to the result.

{

‘exceedsMaxRows’:False,

‘columnHeaders’:[‘TARGET_NAME’, ‘TARGET_TYPE’],

‘columnLength’:[256, 64],

‘columnNames’:[‘TARGET_NAME’, ‘TARGET_TYPE’],

‘data’:[ {‘TARGET_NAME’:’db9328.acme.com’,’TARGET_TYPE’:’oracle_database’}, {‘TARGET_NAME’:’db3092.acme.com’,’TARGET_TYPE’:’oracle_database’},],

‘filler’:
‘\n\n\n’}

You must have noticed that I hardly talk about the scripting mode. This is on purpose, as I believe that interactive mode is the best interface to learn the new EMCLI. Once you master the interactive mode, converting your code snippets into a script is fairly easy. In future blog posts, I will cover scripting mode and numerous other use cases that seem like a perfect fit for the new EMCLI.

In summary, ‘EMCLI with Scripting Option’ is a new kit that is built on top of a Jython interpreter. It is much superior to the classic EMCLI, as it provides a complete programming environment with the ability to use native Jython functions and primitives. The output is presented in the JSON format which is both human and machine readable, and avoids the need for parsing text output. The client is completely stateless, which means no user data is stored with the client. This means numerous sessions can be launched from a single client, each connecting to a different EM environment, and as a different user.

I encourage you to play around with this new EMCLI kit, and post the different use cases that you found interesting and would benefit the community.

Additional Reading:

The EMCLI Documentation Guide

What is EM 12c DBaaS Snap Clone?

Tags

, , , , ,

[This is a repost of my original blog on Oracle blog from about a year ago.]

Lets look at a relatively new feature in EM that has generated significant interest in a very short time – EM 12c DBaaS Snap Clone.

The ‘Oracle Cloud Management Pack for Oracle Database’ a.k.a
the Database as a Service (DBaaS) feature in EM 12c has grown tremendously
since its release two years ago.  It started with basic single instance and RAC database provisioning, a technical service catalog, an out of box self service portal, metering and chargeback, etc. But since then we have added provisioning of schemas and pluggable databases, add/remove of Dataguard standby databases, full clones using RMAN backups, and Snap Clone. This video showcases the various EM12c DBaaS features.

This blog will cover one of the most exciting and popular features – Snap Clone. In one line, Snap Clone is a self service approach to creating rapid and space
efficient
clones of large (~TB) databases.

Self Service – empowers the end users (developers, testers, data analysts, etc) to get access to database clones whenever they need it.
Rapid – implies the time it takes to clone the database. This is in minutes and not hours, days, or weeks.
Space Efficient – represents the significant reduction in storage (>90%) required for cloning databases

Customer Scenario

To best explain the benefits of Snap Clone, let’s look at a Banking customer scenario:

  • 5 production databases total 30 TB of storage
  • All 5 production databases have a standby
  • Clones of the production database are required for data analysis and reporting
  • 6 total clones across different teams every quarter
  • For security reasons, sensitive data has to be masked prior to cloning

Based on the above scenario, the storage required, if using traditional cloning techniques, can be calculated as follows:

5 Prod DB                  = 30 TB
5 Standby DB            = 30 TB
5 Masked DB             = 30 TB (These will be used for creating clones)
6 Clones (6 * 30 TB) = 180 TB
——————
Total                           = 270 TB
Time = days to weeks

As the numbers indicate, this is quite horrible. Not only 30 TB turn into 270 TB, creating 6 clones of all production databases would take forever. In addition to this, there are other issues with data cloning like:

  • Lack of automation. Scripts are good but often not a long term solution.
  • Traditional cloning techniques are slow while, existing storage vendor solutions are DBA unfriendly
  • Data explosion often outpaces storage capacity and hurts ITs ability to provide clones for dev and testing
  • Archaic processes that require multiple users to share a single clone, or only supports fixed refresh cycles
  • Different priorities between DBAs and Storage admins

Snap Clone to the Rescue

All of the above issues lead to slow turnaround times, and users have to wait for days and weeks to get access to their databases. Basically, we end up with competing priorities and requirements, where the user demands self service access, rapid cloning, and the ability to revert data changes, while IT demands standardization, better control, reduction in storage and administrative overhead, better visibility into the database stack, etc.

EM 12c DBaaS Snap Clone tries to address all these issues. It provides:

  • Rapid and space efficient cloning of databases by leveraging storage copy-on-write (or similar) technology
  • Supports all database versions from 10g to 12c
  • Supports various storage vendors and configurations NAS and SAN
  • Lineage and association tracking between clone master and its various clones and snapshots
  • ‘Time Travel’ capability to restore and access past data
  • Deep visibility into storage, OS, and database layer for easy triage of performance and configuration issues
  • Simplified access for end user via out-of-the-box self service portal
  • RESTful APIs to integrate with custom portals and third party products
  • Ability to meter and charge back on the clone databases

So how does Snap Clone work?

The secret sauce lies in the Storage Management Framework (SMF) plug-in. This plug-in sits between the storage system and the DBA, and provides the much needed layer of abstraction required to shield DBAs and users from the nuances of the different storage systems. At the storage level, Snap Clone makes use of storage copy-on-write (or similar) technology. There are two options in terms of using and interacting with storage:

1. Direct connection to storage: Here storage admins can register NetApp and ZFS storage appliance with EM, and then EM directly connects to the storage appliance and performs all required snapshot and clone operations. This approach requires you to license the relevant options on the storage appliance, but is the easiest and the most efficient and fault tolerant approach.

2. Connection to storage via Solaris file system (ZFS): This is a storage vendor agnostic solution and can be used by any customer. Here instead of connecting to storage, the storage admin mounts the volumes to a Solaris server and format it with ZFS file system. Now all snapshot and clone operations required on the storage are conducted via ZFS file system,. The good thing about this approach is that it does not require thin cloning options to be licensed on the storage since ZFS file system provides these capabilities.

For more details on how to setup and use Snap Clone, refer to a previous blog post.

Now, lets go back to our Banking customer scenario and see how Snap Clone helped then reduce their storage cost and time to clone.

5 Prod DB                      = 30 TB
5 Standby DB                 = 30 TB
5 Masked DB                 = 30 TB
6 Clones (6 * 30 TB)      = 180 TB
6 Clones (6 * 5 * 2 GB) = 60 GB
——————
Total                               = 270 TB 90 TB
Time = days to weeks minutes

Assuming the clone databases will have minimal writes, we allocate about 2GB of write space per clone. For 5 production databases and 6 clones, this totals to just 60GB in required storage space. This is a whopping 99.97% savings in storage. Plus, these clones are created in matter of minutes and not the usual days or weeks. The product has out-of-the-box charts that show the storage savings across all storage devices and cloned databases. See the screenshot below.

Snap Clone Savings

Where can you use Snap Clone databases?

As i said earlier, Snap Clone is most effective when cloning large databases  (~TBs). Common scenarios we see our customers best use Snap Clone are:

  • Application upgrade testing. For example, EBusiness suite upgrade to R12.
  • Functional testing. For example, testing using production datasets.
  • Agile development. For example, run parallel development sprints by giving each sprint its own cloned database.
  • Data Analysis and Reporting. For example, stock market analysis at the close of market everyday.

Its obvious that Snap Clone has a strong affinity to applications, since its application data that you want to clone and use. Hence it is important to add that the Snap Clone feature when combined with EM12c middleware-as-a-service (MWaaS) can provide a complete end-to-end self service application deployment experience. If you have existing portals or need to integrate Snap Clone with existing processes, then use our RESTful APIs for easy integration with third party systems.

In summary, Snap Clone is a new and exciting way of dealing with data cloning challenges. It shields DBAs from the nuances of different storage systems, while allowing end users to request and use clones in a rapid and self service fashion. All of this while saving storage costs. So try this feature out today, and your development and test teams will thank you forever.

Additional References

Cloud Management Page on OTN

Cloud Administration Guide (Documentation)

Enterprise Manager 12c Database-as-a-Service Snap Clone Overview (Presentation)

Limit User Access to Oracle Private Database Cloud Portal

Tags

, , ,

When implementing Database as a Service (DBaaS) and/or Snap Clone, a common request was for a way to hide the other service types like IaaS, MWaaS, etc from the self service portal for the end users. Before EM12c R4, there was no way to restrict the portal view. Essentially, any user with the EM_SSA_USER role would be directed to the self service portal and would then be able to see all service types supported by EM12c.

Of course, you could always set Database as your default self service portal from the ‘My Preferences’ pop up, but this only helps with their post-login experience. The end user still gets to see all the options as shown in screen above.

In EM12c R4, a new out of the box role called EM_SSA_USER_BASE has been introduced. This role, by default, does not give access to any portal, that is an explicit selection. Here is how you use this role:

1. Create a custom role and add the EM_SSA_USER_BASE role to it.

2. Now in the Resource Privileges step, select the Resource Type ‘Cloud Self Service Portal for Database’, and edit it

3. Check the ‘Access the Cloud Self Service Portal for Database.’ privilege. Finish the rest of the wizard.

Now, when a user with this custom role accesses the self service portal, they can only do so for databases and nothing else.

While the EM_SSA_USER role will continue to work, we recommend you start using the new EM_SSA_USER_BASE role. For more details on DBaaS or Snap Clone roles, refer to the cloud admin guide chapter on roles and users.

Convert Crontab to Enterprise Manager Jobs

Tags

, , ,

Surprisingly, a popular question posted on our internal forum is about the possibility of using the Enterprise Manager (EM) Job System to replace customer’s numerous cron jobs. The answer is obviously YES! I say surprisingly because the EM Job system has been in existence for around 10 years (I believe since EM 10.2.0.1), and my hope was that, by now, customers would have moved to using more enterprise class job schedulers instead of cron. So, here is a quick post on how to get started with this conversion from cron to EM Jobs for some of our new users.

Benefits of EM Job System

Before we learn about the how, let’s look at the why. The EM job system is:

  • Free – (Yes, I said free) It is included with the base EM at no cost.
  • Flexible – It supports multiple options for scheduling, notification, authentication, etc
  • Infinitely scalable – the job system seamlessly scales to every new Oracle Management Server (OMS). In fact, in case of OMS failures, the job steps are automatically picked up by the next available OMS without affecting the job execution.
  • General purpose – General purpose since it provides numerous out-of-the-box job types like run OS command, start/stop, backup, SQL Script, patch, etc that span multiple target types. As of today, there are over 50 job types available in the product.
  • Enterprise grade – It allows users to automate multiple administrative tasks like backup, patching, cloning, etc across multiple targets. Customers have not only converted their cron jobs to EM, but have also replaced other enterprise tools like Autosys and migrated 1000s of jobs to EM Job System.
  • APIs – Jobs can be scheduled and managed from the UI and using EMCLI (the command line interface).

Now back to our topic.

The Conversion Process

Let’s start with a sample crontab that we want to convert.

Sample Crontab

A cron expression consists of 6 fields, where the first 5 fields represent the schedule, while the last field represents the command or script to run.

 Field Name Mandatory?  Allowed Values  Allowed special characters
 Minutes Yes 0-59  * / , –
 Hours Yes  0-23  * / , –
 Day of month Yes  1-31 * / , – ? L W
 Month Yes  1-12 or JAN-DEC * / , –
 Day of week Yes  0-6 or SUN-SAT * / , – ? L #

Cron jobs run on the operating system, often using the native shell or other tools installed on the operating system. The equivalent of this capability in Enterprise Manager is the ‘OS Command’ job type. Here are the steps required to convert the first entry in the crontab to an EM job:

1. Navigate to the Job Activity page
Job activity menu

2. Select the ‘OS Command’ job and click Go
OS Command

A 5-tab wizard will appear. Let’s step through this one by one.

3. Select the first tab called ‘General’. Here provide a meaningful name and description for the job. Since this job will be run on the Host target, keep the target type selection as ‘Host’. Next, select all host targets in EM that you wish to run this script against.

While cron jobs are defined on a per host bases, in EM a job definition can be run and managed across multiple hosts or groups of hosts. This avoids having to maintain the same crontab across multiple hosts.

General

4. Select the ‘Parameters’ tab. Here enter the command or script as specified in the last field of the crontab entry. When constructing the command, you can make use of the various target properties.
Parameters tab

5. Next select ‘Credentials’. Here we provide the credentials required to connect to the host and execute the required commands or scripts. Three options are presented to the user:

  • Preferred – default credential set for the host
  • Named – Credentials are stored within Enterprise Manager as “named” entities. Named credentials can be a username/password, or a public key-private key pair. Here we choose pre-created named credentials
  • New – This allows us to create and use new named credential

Note: If your OS user does not have the required privileges to execute the set command, Named Credentials also support use of sudo, powerbroker, sesu, etc.

Credentials tab

6. Next, we set the schedule and this is where it gets interesting. As discussed before, crontab uses a textual representation for the schedule, while EM Job system has a graphical representation for the schedule.

Our sample schedule in the crontab is ‘00 0 * * Sun’. This translates to a weekly job at 12 midnight on every Sunday. To set this in EM, choose the ‘Repeating’ schedule type. The screenshot below shows all the other selections.
Schedule tab

The key here is to select the correct ‘Frequency Type’, the rest of the selections are quite obvious. This also lets you choose the desired timezone for the schedule. Your options are to either start the job w.r.t a fixed timezone, or start it in individual target’s timezone. The latter is very popular, for example, I want to start a job at 2 AM local time in every region around the world.

Another selection of note is that for ‘Grace Period’. This is an extremely powerful feature, but often not used by many customers. Typically, we expect jobs to be started within a few seconds or minutes (based on the load on the system and number of jobs scheduled) of the start time, but a job might not start on time for many reasons. The most common reasons are the Agent being down or due to a blackout. The grace period controls the latest start time for the job in case the job is delayed, else its is marked as skipped. By default, jobs are scheduled with indefinite grace periods, but I highly recommend setting a value for it. In the sample above, I set a 3 hr limit which may seem large but given the weekly nature of the job seems reasonable. So the job system will wait until 3 am (the job start time is 12 am) to start the job, after which the iteration will be skipped. For repeating schedules, the grace period should always be less than the repeat interval. If the job starts on time, the grace period is ignored.

7. Finally, we navigate to the ‘Access’ tab. This tab has two parts:

  • Privilege assignment to roles and users: this allows you to control job level access for other users
  • Email notifications for the Job owner: this allows you to control the events for which you wish to receive notifications. Note, this only sets notification for the job owner, the other users can subscribe to emails by setting up notification and/or incident rules.

To prevent EM from sending deluge of emails, I recommend the following settings in the notifications region:

  • Match status and severity: Both
  • Select severity of status: Critical
  • Select status: Problems & Action Required

You can always come back and modify these settings to suit your needs.

Access tab

Not all cron jobs need to be converted to OS command. For example, if you are taking Oracle database backups using cron, then you probably want to use the out-of-the-box  job type for RMAN scripts. Just provide the RMAN script, list of databases to run this against, and the credentials required to connect to the database. Similarly, if you run sqls on numerous databases, you can leverage the SQL Script job type for this purpose. There are over 50 job types available in EM12c, all available for use from the UI and EMCLI.

Finally, the best way to learn more about the EM Job System is to actually play with it. I also recommend blogs from Maaz, Kellyn, and other users on this topic. Good luck!!

References

Maaz Anjum: http://maazanjum.com/2013/12/30/create-a-simple-job-for-a-host-target-in-em12c/
Kellyn Pot’vin: http://dbakevlar.com/