Read Performance Problem

Jan 18 at 7:46 PM

Hi all,

i am new to aquiles 1.0 and cassandra. I created a cluster of 15 nodes and did insert about 30gigs of data into it. The insert performance was good.

But then i tried to read the data with multiget_slice. My setup was about 300-500 keys to perform a multiget_slice request. First problem i got there were exceptions on heap size and out of memory and also very bad response times... so i split up my request per 50 keys on each request and looped through the results - in a single thread. It takes about 17 seconds (no matter what read consistency) to load the data from 15 machines, for about 300 keys on avarage. and the data is not large, it's just text in small kB. I think i am doing something wrong here. Maybe on the configuration part or on the coding part... if anyone has a clue or ideas to help, please do so :)

here is the problematic code:

var cParent = new ColumnParent()
            {
                Column_family = ColumnFamilyNames.Term.ToString()
            };

            var sPredict = new SlicePredicate()
            {
                Slice_range = new SliceRange() { Count = int.MaxValue, Reversed = false, Finish = new byte[] { }, Start = new byte[] { } }
            };

            var cluster = AquilesHelper.RetrieveCluster(ClusterName);

            //test
            var tempResult = new Dictionary<byte[], List<ColumnOrSuperColumn>>();
            
            //Take per 50 cf from cluster.
            for (int i = 0;  ;i+=50)
            {
                int rcount = 49;

                if (i >= keys.Count)
                    break;

                if (i + rcount > keys.Count)
                    rcount = rcount - ((i + rcount) - keys.Count);

                List<byte[]> temp = keys.GetRange(i, rcount);

                var result =
                    (Dictionary<byte[], List<ColumnOrSuperColumn>>)
                    cluster.Execute(
                        new ExecutionBlock(
                            client => client.multiget_slice(temp, cParent, sPredict, ReadConsistencyLevel)),
                        KeyspaceName);

                tempResult = tempResult.Concat(result).ToDictionary(t => t.Key, t => t.Value);
            }

and my cassandra.aml:

cluster_name: 'name'
initial_token:
hinted_handoff_enabled: true
max_hint_window_in_ms: 3600000 # one hour
hinted_handoff_throttle_delay_in_ms: 50
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
	- /data/data
commitlog_directory: /data/commitlog
saved_caches_directory: /data/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
seed_provider:
	- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  	parameters:
      	- seeds: "164.8.7.53, 164.8.7.61, 164.8.7.63"
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: 164.8.7.63
rpc_address: 164.8.7.63
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 10000
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
	internode_encryption: none
	keystore: conf/.keystore
	keystore_password: cassandra
	truststore: conf/.truststore
	truststore_password: cassandra

 

thanks and br

Coordinator
Jan 20 at 12:49 AM

First, you are saying you have 15 nodes, but on config you have only 3 nodes as seeds, so something is wrong there.

 

On the other hand, can you post your app.config or the way you are configuring Aquiles? How much memory you are putting into Cassandra instances?

Jan 20 at 8:10 PM

Hi,

thanks for your reply - much appreciated! I never thought about to configure all nodes as seeds, i configured 3, and the others to use those 3 as seeds. So the first thing i will do is set all of them as seeds. 

i think i used the default allocation of memory for the cassandra instances - this is the next thing i will try, give more memory to the jvm on startup .?

i still have some questions: 

a) if i configure all nodes in the config from the cluster as seeds and put those ip's in the app.config of aquiles, will aquiles distribute the request automatically to the corresponding token range nodes?

b) why did the write performance was good, if the configuration of the seeds is wrong?

 

My config:

  <aquilesConfiguration>
    <clusters>
      <add friendlyName="name">
        <connection poolType="SIZECONTROLLEDPOOL" factoryType="FRAMED">
        </connection>
        <endpointManager type="ROUNDROBIN" defaultTimeout="30000">
          <cassandraEndpoints>
            <add address="164.8.7.61" port="9160"/>
            <add address="164.8.7.53" port="9160"/>
            <add address="164.8.7.63" port="9160"/>
          </cassandraEndpoints>
        </endpointManager>
      </add>
    </clusters>
  </aquilesConfiguration>

Jan 23 at 9:14 PM

@javiercanillas any new thoughts? Is this even the right spot for this kind of question or should i post it reather to some cassandra forums? 

i was also thinking about to change my HW configuration. from the 15 nodes (each has 4 gigs of ram) to less nodes, but more memory...

Coordinator
Jan 24 at 10:28 AM
Well, if your problem is the cassandra configuración or it's performance, you should check cassandra mailing list for better answers.

Using 4gigs of RAM is pretty low, you should consider 8 or more, eventhough it means less nodes in the cluster

Javier

On Monday, January 23, 2012, MarioD <notifications@codeplex.com> wrote:
> From: MarioD
>
> @javiercanillas any new thoughts? Is this even the right spot for this kind of question or should i post it reather to some cassandra forums?
>
> i was also thinking about to change my HW configuration. from the 15 nodes (each has 4 gigs of ram) to less nodes, but more memory...
>
> Read the full discussion online.
>
> To add a post to this discussion, reply to this email (Aquiles@discussions.codeplex.com)
>
> To start a new discussion for this project, email Aquiles@discussions.codeplex.com
>
> You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com