How many Hyperion Administrators proactively monitor their application or system log files? Based on our experience, very few. Considering the time constraints put on a typical Hyperion Administrator, this is not surprising. We certainly understand the time and workload crunch these administrators are dealing with, but we’ve also found that the old Ben Franklin adage applies here: An ounce of prevention is worth a pound of cure. Learning to understand the logs is worth the investment of time and will pay dividends down the road.
Here is an outline for a good way for Hyperion Administrators to come up to speed on the logs, without getting bogged down or overwhelmed.
Step One: Declare Logging Amnesty
Define a period of time to just look at the logs, and not act. Odds are the logs have been filling up with items for weeks and months. Starting at the beginning of the current logs and trying to tackle every item is a massive task with very little potential benefit. Instead, take four weeks, one full financial month or some other logical time period and just start to get an idea of what a normal day/week/month looks like in the logs. Use this time to start to see patterns and, if necessary, correlate errors with reported user trouble or errors in the application.
During this period, l start to identify items reported as errors or warnings that aren’t important and can be ignored. This would also be a good time to look into some options to better view and filter the logs. Using Hyperion’s text-based logging is fine, but limits your ability to filter out items that don’t need to be reviewed. There are many tools available to help with this task, and your IT department may already have a solution in place. Tools like Nagios can be used to report errors in logs and software like Solarwinds’ Kiwi Syslog server can be setup to automatically pull in logs and help filter out the noise.
Step Two: Reduce Noise
Over your first month with the logs you will have started to see through a lot of the noise, and put some tools in place to help you automatically filter out the most obvious information that you don’t need to see. Examples of this would be items like reports of successful completions of ordinary tasks, successful user logins, etc. This is all good information, but it doesn’t concern you. Your top priority now is to get rid of as much of this information as possible. Every time you get an alert or see an error that you don’t need to act upon, you can create a subconscious inertia against paying attention when something that is important pops up. This is like the story of the boy who cried ‘Wolf!’ Eventually, no one believed the boy, and likewise you might stop keeping an eye on your logging tools if there is a lot of noise.
Step Three: Build a database of errors and what they mean
By now, you’re probably seeing just a handful of items in your logs each week. Many of our client logs reflect no errors of import for days or weeks at a time, which is what we expect for a system that is, generally speaking, running well. When we do see an error, we look into it. Most Hyperion errors include an actual error code in brackets, like <BEA-XXXXXX>. We use those codes, plus the information included in the log items, to try and identify what the cause of the error is. Oracle’s website can be a great resource, as can more general search engines. When you identify a new error, make a note of it. If it’s something you can ignore, then filter it out but also keep on file what it means, for future reference.
An important correlation you can start making now is to clearly link user errors, like application crashes or other issues within applications, to error codes in the logs. These will be invaluable and save you a lot of time when working with support. When someone reports trouble, the first thing support asks for is the logs around the time the error occurred. Speed up the support process and include the error you already know occurred with the problem. This will save hours with any support request you create and you’ll be making yourself that much more effective as a Hyperion administrator.
Step Four: Relax
It took time, research and a good bit of hard work, but now you can be confident that you’re aware of what’s happening within your Hyperion system. Instead of regularly putting out fires, you’re on top of the system and can anticipate issues before they occur. Instead of waiting for the phone to ring with the next crisis, you can relax and trust that your system is working well. Be careful, though, not to be too relaxed. Keep any eye on the logs and keep reducing noise whenever possible. Take care of your system, and it will take care of you.