Thoughts on a Cassandra issue?

I'm running into consistent problems when storing values larger than 15MB into Cassandra, and I was hoping for some help on tracking down what's going wrong. I've emailed the Cassandra list, but I thought my long-suffering readers might like a crack too. I promise I'll get another five short links post up there if I can crack this bug!

From the FAQ it seems like what I'm trying to do is possible, so I assume I'm messing something up with my configuration. I have a minimal set of code to reproduce the issue below, which I've run on the DataStax 0.8.1 AMI I'm using in production (ami-9996c4dc)

 

# To set up the test data structure on Cassandra:
cassandra-cli
connect localhost/9160;
create keyspace TestKeyspace with 
  placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and 
  strategy_options = [{replication_factor:3}];
use TestKeyspace;
create column family TestFamily with
  comparator = UTF8Type and
  column_metadata =
  [
    {column_name: test_column, validation_class: UTF8Type},
  ];
# From bash on the same machine, with Ruby and the Cassandra gem installed:
irb
require 'rubygems'
require 'cassandra/0.8'
client = Cassandra.new('TestKeyspace', 'localhost:9160', :retries => 5, :connect_timeout => 5, :timeout => 10)

# With data this size, the call works
column_value = 'a' * (14*1024*1024)
row_value = { 'column_name' => column_value }
client.insert(:TestFamily, 'SomeKey', row_value)
# With data this size, the call fails with the exception below
column_value = 'a' * (15*1024*1024)
row_value = { 'column_name' => column_value }
client.insert(:TestFamily, 'SomeKey', row_value)
# Results:
This first call with a 14MB chunk of data succeeds, but the second one fails with this exception:
CassandraThrift::Cassandra::Client::TransportException: CassandraThrift::Cassandra::Client::TransportException
from /usr/lib/ruby/gems/1.8/gems/thrift-0.7.0/lib/thrift/transport/socket.rb:53:in `open'
from /usr/lib/ruby/gems/1.8/gems/thrift-0.7.0/lib/thrift/transport/framed_transport.rb:37:in `open'
from /usr/lib/ruby/gems/1.8/gems/thrift_client-0.7.1/lib/thrift_client/connection/socket.rb:11:in `connect!'
from /usr/lib/ruby/gems/1.8/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:105:in `connect!'
from /usr/lib/ruby/gems/1.8/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:144:in `handled_proxy'
from /usr/lib/ruby/gems/1.8/gems/thrift_client-0.7.1/lib/thrift_client/abstract_thrift_client.rb:60:in `batch_mutate'
from /usr/lib/ruby/gems/1.8/gems/cassandra-0.12.1/lib/cassandra/protocol.rb:7:in `_mutate'
from /usr/lib/ruby/gems/1.8/gems/cassandra-0.12.1/lib/cassandra/cassandra.rb:459:in `insert'
from (irb):6
from :0
Any suggestions on how to dig deeper? I'll be reaching out to the Cassandra gem folks, etc too of course.
[Update – Thanks to Matthew Russell and Daniel Lundin for pointing me towards the solution. Cassandra.yaml defines a maximum frame size for the Thrift API communication, and defaults to 15MB. Upping that and the max message length solved it for me.]

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: