Thursday, April 11, 2013

Cisco Prime Infrastructure: Bug ID CSCud39395 Logins not Processed

A couple weeks ago I ran into an issue with Prime Infrastructure 1.2 where it was not responding to a login request.  It just sat there working on the login request but would never respond with a success or failure.

After stopping the NCS software, it would not start again.  I was presented with a failure and told to check the launchout.log

So I performed the Backup-logs and untar'd the file.  Here is what I was receiving in the file.

Starting Health Monitor as a primary
Checking for Port 8082 availability... OK
truststore used is /opt/CSCOlumos/conf/truststore
truststore used is /opt/CSCOlumos/conf/truststore
Starting Health Montior Web Server...
Health Monitor Web Server Started.
Starting Health Monitor Server...
Health Monitor Server Started.
Starting Remoting Service: Reporting Server
Checking for running servers.
00:00 Check complete. No servers running.
Starting Server ... 
 Start failed. Initiating shutdown. Please check logs/Startup.log.

I never did find the Startup.log file.  I did some searching in NotePad++ in the directory but didn't find a lot of details.

After emailing tac, they  pointed me to the issues:

04/05/13 16:00:00.706 ERROR [database] [ORACLE_BACKGROUND_TASK-1] Error while deleting db archivelogs: Recovery Manager: Release - Production on Fri Apr 5 16:00:00 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database (not started) RMAN> 2> using target database control file instead of recovery catalog RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of delete command at 04/05/2013 16:00:00 RMAN-06403: could not obtain a fully authorized session ORA-01034: ORACLE not available ORA-27102: out of memory Linux-x86_64 Error: 12: Cannot allocate memory Additional information: 1 Additional information: 32768 Additional information: 8 RMAN> Recovery Manager complete. 

Aack, Oracle.  Isn't it enough they torture me with Java?  I was also give the following action plan.  This completely fixed my issue.  Note:  I'm not responsible if you do this and it destroys your NCS or PI Appliance.

Please clear the archivelogs in the oracle db/cache using the below procedure:

Clearing procedure:

1-  When running the procedure, make sure ncs server is not up, only database can be up.
#After log-in as admin, run the command:
        ncs stop

#Login to Root and run:
        /opt/CSCOlumos/bin/ start

2- Run the following to start the clean of the archivelogs

        cd /opt/CSCOlumos/bin
        ./ --> This to get the DB password for sqlplus login if required
        su - oracle   
        .. /opt/oracle/oracleenv       
        sqlplus / as sysdba
        startup mount
        connect target /
        crosscheck archivelog all;
        delete noprompt archivelog all;
        sqlplus / as sysdba
        shutdown abort;

Exit to the NCS/PI shell and do a "ncs stop". Run "ncs status" to verify everything is stopped then "ncs start" to start the services

After asking TAC for root cause, I was presented with the following Bug-ID and told that a fix is not yet in place.  Please check out the details for yourself.  I hope this saves someone some time when they run into this.