Home » Database Administration

Category Archives: Database Administration

Advertisements

Hunting Down Memory Issues In Ruby: A Definitive Guide


I’m sure there are some lucky Ruby developers out there who will never run into issues with memory, but for the rest of us, it’s incredibly challenging to hunt down where memory usage is getting out of hand and fix it. Fortunately, if you’re using a modern Ruby (2.1+), there are some great tools and techniques available for dealing with common issues. It could also be said that memory optimization can be fun and rewarding although I may be alone in that sentiment.

Hunting Down Memory Issues In Ruby

If you thought bugs were pesky, wait until you hunt for memory issues.

As with all forms of optimization, odds are that it will add code complexity, so it’s not worth doing unless there are measurable and significant gains.

Everything described here is done using the canonical MRI Ruby, version 2.2.4, although other 2.1+ versions should behave similarly.

It’s Not a Memory Leak!

When a memory issue is discovered, it’s easy to jump to the conclusion that there’s a memory leak. For example, in a web application, you may see that after you spin up your server, repeated calls to the same endpoint keep driving memory usage up higher with each request. There are certainly cases where legitimate memory leaks happen, but I’d wager they are vastly outnumbered by memory issues with this same appearance that aren’t actually leaks.

As a (contrived) example, let’s look at a bit of Ruby code that repeatedly builds a big array of hashes and discards it. First, here’s some code that’ll be shared throughout the examples in this post:

# common.rb
require "active_record"
require "active_support/all"
require "get_process_mem"
require "sqlite3"

ActiveRecord::Base.establish_connection(
  adapter: "sqlite3",
  database: "people.sqlite3"
)

class Person < ActiveRecord::Base; end

def print_usage(description)
  mb = GetProcessMem.new.mb
  puts "#{ description } - MEMORY USAGE(MB): #{ mb.round }"
end

def print_usage_before_and_after
  print_usage("Before")
  yield
  print_usage("After")
end

def random_name
  (0...20).map { (97 + rand(26)).chr }.join
end

And the array builder:

# build_arrays.rb
require_relative "./common"

ARRAY_SIZE = 1_000_000

times = ARGV.first.to_i

print_usage(0)
(1..times).each do |n|
  foo = []
  ARRAY_SIZE.times { foo << {some: "stuff"} }

  print_usage(n)
end

The get_process_mem gem is just a convenient way to get the memory being used by the current Ruby process. What we see is the same behavior that was described above, a continual increase in memory usage.

$ ruby build_arrays.rb 10
0 - MEMORY USAGE(MB): 17
1 - MEMORY USAGE(MB): 330
2 - MEMORY USAGE(MB): 481
3 - MEMORY USAGE(MB): 492
4 - MEMORY USAGE(MB): 559
5 - MEMORY USAGE(MB): 584
6 - MEMORY USAGE(MB): 588
7 - MEMORY USAGE(MB): 591
8 - MEMORY USAGE(MB): 603
9 - MEMORY USAGE(MB): 613
10 - MEMORY USAGE(MB): 621

However, if we run more iterations, we’ll eventually plateau.

$ ruby build_arrays.rb 40
0 - MEMORY USAGE(MB): 9
1 - MEMORY USAGE(MB): 323
...
32 - MEMORY USAGE(MB): 700
33 - MEMORY USAGE(MB): 699
34 - MEMORY USAGE(MB): 698
35 - MEMORY USAGE(MB): 698
36 - MEMORY USAGE(MB): 696
37 - MEMORY USAGE(MB): 696
38 - MEMORY USAGE(MB): 696
39 - MEMORY USAGE(MB): 701
40 - MEMORY USAGE(MB): 697

Hitting this plateau is the hallmark of not being an actual memory leak, or that the memory leak is so small that it’s not visible compared to the rest of the memory usage. What may not be intuitive is why memory usage continues to grow after the first iteration. After all, it built a big array, but then promptly discarded it and started building a new one of the same size. Can’t it just use the space freed up by the previous array? The answer, which explains our problem, is no. Aside from tuning the garbage collector, you don’t have control over when it runs, and what we’re seeing in the build_arrays.rb example is new memory allocations being made prior to garbage collection of our old, discarded objects.

Do not panic if you see a sudden rise in the memory usage of your app. Apps can run out of memory for all sorts of reasons – not just memory leaks.

I should point out that this isn’t some sort of horrible memory management issue specific to Ruby, but is generally applicable to garbage-collected languages. Just to reassure myself of this, I reproduced essentially the same example with Go and saw similar results. However, there are Ruby libraries that make it easy to create this sort of memory issue.

Divide and Conquer

So if we need to work with large chunks of data, are we doomed to just throw lots of RAM at our problem? Thankfully, that’s not the case. If we take the build_arrays.rb example and decrease the array size, we’ll see a decrease in the point where memory usage plateaus that’s roughly proportional to the array size.

This means that if we can break our work into smaller pieces to process and avoid having too many objects existing at one time, we can dramatically reduce the memory footprint. Unfortunately, that often means taking nice, clean code and turning it into more code that does the same thing, just in a more memory-efficient way.

Isolating Memory Usage Hotspots

In a real codebase, the source of a memory issue will likely not be as obvious as in the build_arrays.rbexample. Isolating a memory issue before trying to actually dig in and fix it is essential because it’s easy to make incorrect assumptions about what’s causing the problem.

I generally use two approaches, often in combination, to track down memory issues: leaving the code intact and wrapping a profiler around it, and monitoring memory usage of the process while disabling/enabling different parts of the code I suspect could be problematic. I’ll be using memory_profiler here for profiling, butruby-prof is another popular option, and derailed_benchmarks has some great Rails-specific capabilities.

Here’s some code that’ll use a bunch of memory, where it may not be immediately clear which step is pushing up memory usage the most:

# people.rb
require_relative "./common"

def run(number)
  Person.delete_all

  names = number.times.map { random_name }

  names.each do |name|
    Person.create(name: name)
  end

  records = Person.all.to_a

  File.open("people.txt", "w") { |out| out << records.to_json }
end

Using get_process_mem, we can quickly verify that it does use a lot of memory when there are a lot of Person records being created.

# before_and_after.rb
require_relative "./people"

print_usage_before_and_after do
  run(ARGV.shift.to_i)
end

Result:

$ ruby before_and_after.rb 10000
Before - MEMORY USAGE(MB): 37
After - MEMORY USAGE(MB): 96

Looking through the code, there are multiple steps that seem like good candidates for using a lot of memory: building a big array of strings, calling #to_a on an Active Record relation to make a big array of Active Record objects (not a great idea, but done for demonstration purposes), and serializing the array of Active Record objects.

We can then profile this code to see where memory allocations are happening:

# profile.rb
require "memory_profiler"
require_relative "./people"

report = MemoryProfiler.report do
  run(1000)
end
report.pretty_print(to_file: "profile.txt")

Note that the number being fed to run here is 1/10 of the previous example, since the profiler itself uses a lot of memory, and can actually lead to memory exhaustion when profiling code that already causes high memory usage.

The results file is rather lengthy and includes memory and object allocation and retention at the gem, file, and location levels. There’s a wealth of information to explore, but here are a couple of interesting snippets:

allocated memory by gem
-----------------------------------
  17520444  activerecord-4.2.6
   7305511  activesupport-4.2.6
   2551797  activemodel-4.2.6
   2171660  arel-6.0.3
   2002249  sqlite3-1.3.11

...

allocated memory by file
-----------------------------------
   2840000  /Users/bruz/.rvm/gems/ruby-2.2.4/gems/activesupport-4.2.6/lib/activ
e_support/hash_with_indifferent_access.rb
   2006169  /Users/bruz/.rvm/gems/ruby-2.2.4/gems/activerecord-4.2.6/lib/active
_record/type/time_value.rb
   2001914  /Users/bruz/code/mem_test/people.rb
   1655493  /Users/bruz/.rvm/gems/ruby-2.2.4/gems/activerecord-4.2.6/lib/active
_record/connection_adapters/sqlite3_adapter.rb
   1628392  /Users/bruz/.rvm/gems/ruby-2.2.4/gems/activesupport-4.2.6/lib/activ
e_support/json/encoding.rb

We see the most allocations happening inside Active Record, which would seem to point at either instantiating all the objects in the records array, or serialization with #to_json. Next, we can test our memory usage without the profiler while disabling these suspects. We can’t disable retrieving records and still be able to do the serialization step, so let’s try disabling serialization first.

  # File.open("people.txt", "w") { |out| out << records.to_json }

Result:

$ ruby before_and_after.rb 10000
Before: 36 MB
After: 47 MB

That does indeed seem to be where most of the memory is going, with before/after memory delta dropping 81% by skipping it. We can also see what happens if we stop forcing the big array of records to be created.

  # records = Person.all.to_a
  records = Person.all

  # File.open("people.txt", "w") { |out| out << records.to_json }

Result:

$ ruby before_and_after.rb 10000
Before: 36 MB
After: 40 MB

This reduces memory usage as well, although it’s an order of magnitude less reduction than disabling serialization. So at this point, we know our biggest culprits, and can make a decision about what to optimize based on this data.

Although the example here was contrived, the approaches are generally applicable. Profiler results may not point you at the exact spot in your code where the problem lies, and can also be misinterpreted, so it’s a good idea to follow up by looking at actual memory usage while turning sections of code on and off. Next, we’ll look at some common cases where memory usage becomes an issue and how to optimize them.

Deserialization

A common source of memory issues is deserializing large amounts of data from XML, JSON or some other data serialization format. Using methods like JSON.parse or Active Support’s Hash.from_xml is incredibly convenient, but when the data you’re loading is large, the resulting data structure that’s loaded in memory can be enormous.

If you have control over the source of the data, you can do things to limit the amount of data you’re receiving, like adding filtering or pagination support. But if it’s an external source or one you can’t control, another option is to use a streaming deserializer. For XML, Ox is one option, and for JSON yajl-ruby appears to operate similarly, although I don’t have much experience with it.

Just because you have limited memory doesn’t mean you cannot parse large XML or JSON documents safely. Streaming deserializers allow you to incrementally extract whatever you need from these documents and still keep the memory footprint low.

Here’s an example of parsing a 1.7MB XML file, using Hash#from_xml.

# parse_with_from_xml.rb
require_relative "./common"

print_usage_before_and_after do
  # From http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml
  file = File.open(File.expand_path("../mondial-3.0.xml", __FILE__))
  hash = Hash.from_xml(file)["mondial"]["continent"]
  puts hash.map { |c| c["name"] }.join(", ")
end
$ ruby parse_with_from_xml.rb
Before - MEMORY USAGE(MB): 37
Europe, Asia, America, Australia/Oceania, Africa
After - MEMORY USAGE(MB): 164

111MB for a 1.7MB file! This clearly is not going to scale up well. Here’s the streaming parser version.

# parse_with_ox.rb
require_relative "./common"
require "ox"

class Handler < ::Ox::Sax
  def initialize(&block)
    @yield_to = block
  end

  def start_element(name)
    case name
    when :continent
      @in_continent = true
    end
  end

  def end_element(name)
    case name
    when :continent
      @yield_to.call(@name) if @name
      @in_continent = false
      @name = nil
    end
  end

  def attr(name, value)
    case name
    when :name
      @name = value if @in_continent
    end
  end
end

print_usage_before_and_after do
  # From http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml
  file = File.open(File.expand_path("../mondial-3.0.xml", __FILE__))
  continents = []
  handler = Handler.new do |continent|
    continents << continent
  end
  Ox.sax_parse(handler, file)

  puts continents.join(", ")
end
$ ruby parse_with_ox.rb
Before - MEMORY USAGE(MB): 37
Europe, Asia, America, Australia/Oceania, Africa
After - MEMORY USAGE(MB): 37

This brings us down to a negligible memory increase and should be able to handle vastly larger files. However, the tradeoff is that we now have 28 lines of handler code we didn’t need before, which seems like it’d be error prone, and for production use it should have some tests around it.

Serialization

As we saw in the section about isolating memory usage hotspots, serialization can have high memory costs. Here’s the key part of people.rb from earlier.

# to_json.rb
require_relative "./common"

print_usage_before_and_after do
  File.open("people.txt", "w") { |out| out << Person.all.to_json }
end

Running this with 100,000 records in the database, we get:

$ ruby to_json.rb
Before: 36 MB
After: 505 MB

The issue with calling #to_json here is that it instantiates an object for every record, and then encodes to JSON. Generating the JSON record-by-record so that only one record object would need to exist at a time reduces the memory usage significantly. None of the popular Ruby JSON libraries appear to handle this, but a commonly recommended approach is to build the JSON string manually. There is a json-write-stream gem that provides a nice API for doing this, and converting our example to this looks like:

# json_stream.rb
require_relative "./common"
require "json-write-stream"

print_usage_before_and_after do
  file = File.open("people.txt", "w")
  JsonWriteStream.from_stream(file) do |writer|
    writer.write_object do |obj_writer|
      obj_writer.write_array("people") do |arr_writer|
        Person.find_each do |people|
          arr_writer.write_element people.as_json
        end
      end
    end
  end
end

Once again, we see optimization has given us more code, but the result seems worth it:

$ ruby json_stream.rb
Before: 36 MB
After: 56 MB

Being Lazy

A great feature added to Ruby starting with 2.0 is the ability to make enumerators lazy. This is great for improving memory usage when chaining methods on an enumerator. Let’s start with some code that isn’t lazy:

# not_lazy.rb
require_relative "./common"

number = ARGV.shift.to_i

print_usage_before_and_after do
  names = number.times
                .map { random_name }
                .map { |name| name.capitalize }
                .map { |name| "#{ name } Jr." }
                .select { |name| name[0] == "X" }
                .to_a
end

Result:

$ ruby not_lazy.rb 1_000_000
Before: 36 MB
After: 546 MB

What happens here is that at each step in the chain, it iterates over every element in the enumerator, producing an array that has the subsequent method in the chain invoked on it, and so forth. Let’s see what happens when we make this lazy, which just requires adding a call to lazy on the enumerator we get from times:

# lazy.rb
require_relative "./common"

number = ARGV.shift.to_i

print_usage_before_and_after do
  names = number.times.lazy
                .map { random_name }
                .map { |name| name.capitalize }
                .map { |name| "#{ name } Jr." }
                .select { |name| name[0] == "X" }
                .to_a
end

Result:

$ ruby lazy.rb 1_000_000
Before: 36 MB
After: 52 MB

Finally, an example that gives us a huge memory usage win, without adding a lot of extra code! Note that if we didn’t need to accumulate any results at the end, for instance, if each item was saved to the database and could then be forgotten, there would be even less memory usage. To make a lazy enumerable evaluate at the end of the chain, just add a final call to force.

Another thing to note about the example is that the chain starts with a call to times prior to lazy, which uses very little memory since it just returns an enumerator that will generate an integer each time it’s invoked. So if an enumerable can be used instead of a big array at the beginning of the chain, that will help.

Keeping everything in huge arrays and maps is convenient, but in real world scenarios, you rarely need to do that.

One real-world application of building an enumerable to lazily feed into some sort of processing pipeline is processing paginated data. So rather than requesting all the pages and putting them into one big array, they could be exposed through an enumerator that nicely hides all the pagination details. This could look something like:

def records
  Enumerator.new do |yielder|
    has_more = true
    page = 1

    while has_more
      response = fetch(page)
      response.records.each { |record| yielder record }

      page += 1
      has_more = response.has_more
    end
  end
end

Conclusion

We’ve done some characterization of memory usage in Ruby, and looked at some general tools for tracking down memory issues, as well as some common cases and ways to improve them. The common cases we explored are by no means comprehensive and are highly biased by the sort of issues I personally have encountered. However, the biggest gain may just be getting in the mindset of thinking about how the code will impact memory usage.

This article was written by Bruz Marzolf, a Toptal Ruby developer.

Advertisements

Oracle 11g Extended Statistics for SAP Tables


Extended Statistics is an attempt to fix one of the flaws in CBO–values of different columns are not correlated.

Let us take an example of two columns in a table. One of the column contain department code and the other contains employee name. Let us assume that there are 10 departments and 3000 employees in our example. In a real life scenario, all the employees do not belong to all the departments, but CBO assumes that is the case and hence it assumes that there are 3000*10 = 30000 combinations of employee name and department code that exist. In reality, it can be between 3000 and 30000 (assuming employee belongs to at least one department and can clock for multiple departments).
(more…)

Brief Description of SAP Executable


Here is a list of SAP executables that you may find on your SAP systems. If you know of SAP exe files missing in from list, please leave a comment:

R3check                     This is a tool that can check Cluster-Tables for errors.
R3ldctl                        The tool for exporting all table structures to the file system during an OS/DB-Migration.
R3load                        The table import & export tool of SAP during Installation, Upgrade and Migration.
R3szchk                     The tool for determine the sizes of the different tables in the target database during the import in an OS/DB-Migration.
R3ta                             Split large tables for export and import
R3trans                      This is the tool, that does the real work for tp. tp controls the import and export of changes and r3trans does them using scripts, that were generated from tp.
R3trans_164-20000978.SAR     R3trans_164-20000978.SAR’ is a compressed archive with the latest version of R3trans from the SAP Service Marketplace, used when we patched the kernel.
SAPCAR                         SAP Compression and Archiving program (more…)

DB2 Optimized for SAP – Set it and forget it


You may have read that the last release of DB2 (8.2.2) was specifically Optimized for SAP. What exactly does that mean? Well in this and the next few postings I’ll describe exactly what new features were in 8.2.2 and how, if you are using SAP (and even if you are not), you can take advantage of these new features.

The first is what I like to call the One SAP Knob.
If you administer an SAP database then you likely know that there are a number of settings SAP strongly recommends that you set. This is not a big issue when you initially get started but as new versions of DB2 and/or SAP come out it can be a burden to read through all the release notes to find out what new registry variables should be set and/or what changes to existing registry variables should be made. Well if you are on 8.2.2 your life has just gotten easier. You can turn the SAP Knob to the ON position and forget about it. What do I mean? Well there is a new registry variable in DB2 8.2.2 called DB2_WORKLOAD. If you run

db2set DB2_WORKLOAD=SAP

this will automatically set a number of registry variables based on what IBM and SAP have determined to be the optimal settings. Think of this as a parent setting which takes care of all the children registry variables underneath it. If a future fixpack of DB2 or release of SAP has new or more optimal (if that’s even correct grammar) registry settings, DB2 will automatically adjust them as long as the DB2_WORKLOAD setting is set to SAP. If you want to see what registry variables are controlled by the DB2_WORKLOAD variable, you can run

db2set -gd DB2_WORKLOAD=SAP

and you will see a list of 17 registry variables and their settings, specifically designed in cooperation with SAP to tune DB2 optimally for an SAP workload.

IBM DB2 10.1 has been supported by SAP


Finally, IBM DB2 10.1 has been supported by SAP. There are a lot of improvement whic had been done by IBM on DB2 10.1 such as Adaptive Compression (prior than Deep Compression which now mentioned as Static Compression), Log Compression, etc.

SAP has just certified DB2 10.1 this July 2012.

There are several SAP Notes need to be read and implement if you want to use DB2 10.1.

sapnote_0001700631_DB6: Using DB2 10.1 with SAP Applications

sapnote_0001701181_DB6: ABAP DDIC: Enhancements for DB2 10.1

sapnote_0001692571_DB6: DB2 10.1 Standard Parameter Settings

sapnote_0001365982_DB6: Current db6_update_db_db6_update_client script (V 27)

Right now, I also doing some research on this DB2 10.1 using SAP ERP 6 EHP 5.

Happy hacking, guys !!

Nice ebooks from SAP : Data Management Guide for SAP Business Suite


Hi folks,

I got this nice ebooks from SAP website.

You can download from my ziddu account here (http://www.ziddu.com/download/18524440/DataManagementGuideforSAPBusinessSuite.pdf.html)

It’s great books and every SAP Functional, Basis and Consultant should have it.

Oracle 9i : Some of limitation on Oracle 9i


Physical Database Limits

Item Type of Limit Limit Value
Database Block Size Minimum 2048 bytes; must be a multiple of operating system physical block size
Maximum Operating system dependent; never more than 32 KB
Database Blocks Minimum in initial extent of a segment. 2 blocks
Maximum per datafile Platform dependent; typically 222-1 blocks
Controlfiles Number of control files 1 minimum; 2 or more (on separate devices) strongly recommended
Size of a control file Dependent on operating system and database creation options; maximum of 20,000 x (database block size)
Database files Maximum per tablespace Operating system dependent; usually 1022
Maximum per database 65533

May be less on some operating systems

Limited also by size of database blocks and by the DB_FILES initialization parameter for a particular instance

Database extents Maximum 4 GB, regardless of the maximum file size allowed by the operating system
Database file size Maximum Operating system dependent. Limited by maximum operating system file size; typically 222 or 4M blocks
MAXEXTENTS Default value Derived from tablespace default storage or DB_BLOCK_SIZE initialization parameter
Maximum Unlimited
Redo Log Files Maximum number of logfiles Limited by value of MAXLOGFILES parameter in the CREATE DATABASE statement

Control file can be resized to allow more entries; ultimately an operating system limit

Maximum number of logfiles per group Unlimited
Redo Log File Size Minimum size 50 KB
Maximum size Operating system limit; typically 2 GB
Tablespaces Maximum number per database 64 K

Number of tablespaces cannot exceed the number of database files, as each tablespace must include at least one file

 

source (http://download.oracle.com/docs/cd/B10501_01/server.920/a96536/ch43.htm#287916)