Recover from All endpoints are blacklisted exception

Apr 19, 2012 at 7:26 PM

I have a .Net application using Aquiles 1.0 that I use to test/manipulate data in multiple different Cassandra clusters.  Thus, my app.config has at least 6 different "friendlyName" clusters listed - so I can dynamically choose a cluster at runtime, and switch between them.  This works nicely - except when one of the test systems is down for some reason.  With multiple test systems, I can work on another while one is down.  However, when a cluster is down, the application throws an exception like:

All endpoints ['10.34.81.194:9160-6000'] are blacklisted, is cluster down?

Even if I catch the exception, it appears that initialization of all clusters fails if just one of them is down.  Thus, I have to manually remove (comment out) the down cluster from the config file to get my program to run.

Is there anyway to get it to initialize all the valid clusters and just fail the ones that are bad?

Less important, but "would be nice" - Is there a nice way to get a list of all the friendlyNames defined in the Aquiles system (currently I have the list to choose from hard coded)?

Coordinator
Apr 20, 2012 at 8:52 PM

First time, I was thinking it was an odd situation to log an error entry saying that I couldn't create a Cluster from the configuration, that's why I prefer to "crash" than to continue and let you know there was a problem when your application is running.

Now, from your scenario, I think I can add some boolean value saying if the cluster configured is some sort of stopper or critical to the application or can be skipped if no endpoint are available. Moreover, I need to see what might happen if such cluster becomes available again some minutes later after startup.

 

I will see what I can do about it. I will post any news about this matter.

Sep 12, 2012 at 11:08 PM

I have this problem too. Cassandra will give me memory problems after a little bit of stress, and it will become irresponsive after a while, giving me the same "no end point" error or time out, or memory related exceptions.

Is there an API that I can force cassandra to do an equivalent of "nodetool flush." 

Is there an API that I can kill cassandra process, restart it, and be connected to it again?

Thanks!

Coordinator
Sep 21, 2012 at 12:43 AM

From what I know, nodetool is only avaiable through the command line and, not sure, through some administrative application distributed by DataStax. Although, Nodetool is sending its command through a JMX interface, so if you see it's code you may achieve the same some way.

About killing your cassandra nodes, the only way through Linux is by doing a "kill" command and then executing the process again. On windows, you may need to do a service restart. Nevertheless, I wouldn't sugguest going on that direction, it is much more likely you are having a problem, and trying to fix that problem will assure system stability.