Monthly Archives: January 2009

Advanced JVM Tuning for Low Pause

The standard Java Virtual Machine (JVM) is configured to optimize for throughput. But some systems are more interested in low pause/reduced latency and GC (garbage collection) might be one source of pausing. (you can read an interesting article about what latency means to your business)

I have found a post on GigaSpaces forum providing some possible JVM configurations to optimize on latency:

-Xms2g -Xmx2g -Xmn150m 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=10 
-XX:CMSIncrementalDutyCycle=50 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:GCTimeRatio=10 -XX:+DisableExplicitGC

Please note that -XX:+UseConcMarkSweepGC has the heaviest impact on performance – decrease of 40%.

The following set of parameters shows 20% better performance than with -XX:+UseConcMarkSweepGC while the pause size still is below 100msec in embedded test with payload 10KB and 100 threads:

-Xms2g -Xmx2g -Xmn150m 
-XX:GCTimeRatio=2 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:+DisableExplicitGC

While I’m pretty sure that most of the applications do no need such an advanced VM configuration, it is interesting to see what strategies are employed when low latency is needed.

Option Details
-XX:+UseConcMarkSweepGC Sets the garbage collector policy to the concurrent (low pause time) garbage collector (also known as CMS)
-XX:+CMSIncrementalMode Enables the incremental mode. (works only with -XX:+UseConcMarkSweepGC)
-XX:+CMSIncrementalPacing Enables automatic adjustment of the incremental mode duty cycle based on statistics collected while the JVM is running
-XX:CMSIncrementalDutyCycleMin The percentage (0-100) which is the lower bound on the duty cycle when CMSIncrementalPacing is enabled
-XX:CMSIncrementalDutyCycle The percentage (0-100) of time between minor collections that the concurrent collector is allowed to run. If CMSIncrementalPacing is enabled, then this is just the initial value.
-XX:ParallelGCThreads Sets the number of garbage collector threads
-XX:+UseParNewGC Enables multi threaded young generation collection.
-XX:MaxGCPauseMillis A hint to the throughput collector that it’s desirable that the maximum pause time is lowed than the given value. (n.b. it looks like this value can also be used with the CMS garbage collector)
-XX:GCTimeRatio A hint to the virtual machine that it’s desirable that not more than 1 / (1 + GCTimeRation) of the application execution time be spent in the collector
-XX:+DisableExplicitGC Disables explicit garbage collection calls (System.gc())

There is no need to learn all these flags by heart as you can find them covered in various documents:

If you still need help you can try asking for help on the General Performance Forum.

Leave a comment

Filed under technolog

Cloud Computing Coverage End of 2008

Here is a short list of links that I have found interesting about the cloud computing during December (the list is by no means complete and most probably more links will be added later). The end of year is usually the time for checking how your last year predictions have worked and to throw out new predictions for the year to come.

So, let’s start with a couple of predictions:

Another interesting reads for me were Microsoft miss the ship(ping container) and definitely this post talking about Capegemini trying to convince companies to move to the Amazon cloud.

For now, I’ll finish with the AWS migration blueprint article.


If you are interested in cloud computing you can start visiting DailyCloud: the daily coverage of the cloud computing market. The DailyCloud, which is still in early beta, synthezises the content, links and social stream on cloud computing and its adjacent topics: SaaS, PaaS, IaaS, HaaS, grid computing, virtualization, data centers.

Leave a comment

Filed under links

Commenting Services Face to Face: Disqus vs IntenseDebate vs JS-Kit vs SezWho

There is a major update to this post: Update on Commenting Services Face-off. Make sure you read it before jumping to any conclusions!
I have received a lot of feedback from JS-Kit and SezWho which lead to changes in the evaluations. I am working on an update to the post. Meanwhile, you can read Jitendra’s comment which is providing more insight about SezWho offering.

This is a long post, so I’ll jump directly to the main topic. I’ve run a face to face comparison for 4 existing commenting systems: Disqus, IntenseDebate, JS-Kit and SezWho. The rest of the post presents the criteria I’ve used, the winners and references to specific features.

Please feel free to comment and correct me if I got anything wrong!

Criteria

  • User related Features
    • Comment threading
    • Anonymous posts
    • Bookmarkability
    • Comment ranking and additional features (sort, most, filter, etc.)
    • Rich format comments
  • Owner related features:
    • Spam filtering
    • Comment moderation
    • Search Engine Friendliness
    • HTML/CSS customization
  • Data access
    • Widget (Javascript API)
    • Programming API
    • Data export

Evaluated products

Winners

After completing the evaluation of the mentioned criteria, I have decided that there should be 2 categories: Blogs and Custom sites. Even if there are no immediate visible differences, the existence of a programming API offers a lot more extensibility and freedom to site owners, while this feature might not be as important as features like additional comment ranking features (sorting, filtering, recommendations, etc.) for blog owners.

Drum roll… The winners are:

There is a major update to this post: Update on Commenting Services Face-off and you should make sure that you read it before further considering this top.

Blog

  1. Winner: IntenseDebate
  2. Runner-up: Disqus

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate

User Features

  Disqus  IntenseDebate  JS-Kit  SezWho
Comment Threading: Y Y Y Y
Anonymous Comments: Y Y Y Y [7]
Bookmarkability N Y N Y
Comment ranking Y Y Y Y
Comment ranking functionality – [12] Y Y Y
Rich format comments Y Y Y Y

Owner Features

  Disqus  IntenseDebate  JS-Kit  SezWho
Spam filtering: Custom + Akismet [1] Custom [2] Akismet Akismet [3]
Comment Moderation: Y
(web + email)
Y
(web + email)
Y
(web + email)
Y [4]
Search Engine Friendliness Y [8] N [9] Y [10] N [11]
HTML/CSS Customization Y Y Y Y

Data access

  Disqus  IntenseDebate  JS-Kit  SezWho
Widget (JS API): Y Y Y Y
Programming API: Y N [5] N N [6]
Data export API None None None

Other Criteria

  Disqus  IntenseDebate  JS-Kit  SezWho
Costs Free Free Free for 25k pageviews
+ Commercial
Free [13]
Documentation
Support
API docs
Low Traffic Forum
GetSatisfaction Q&A Forum [14] FAQ Page [15]

Comment ranking functionality

Basically, once comments can be rated themselves, there is a lot of new functionality that can be offered by these services. I am thinking of: filtering the comment thread, sorting, most commented content, etc.

[12] I couldn’t find anything about Disqus support for additional functionality.

Spam Filtering

  1. [1] According to the following links, Disqus is employing a combination of custom filtering and Akismet integration
  2. [2] I couldn’t find any place in the IntenseDebate documentation detailing what solution is used, so I’ve concluded that some custom filtering is employed. Considering that IntenseDebate is now part of the WordPress universe, it might be possible to also integrate with Akismet.
  3. JS-Kit documentation is clear about this point: LINK
  4. [3] While some sources are mentioning the integration with Akismet for spam filtering, I couldn’t find this info in the SezWho documentation

Comment Moderation

All 3 Disqus, IntenseDebate and JS-Kit support advanced moderation features. But my advice would be to effectively test them if comment moderation is important for your site or blog.

[4] Unfortunately, I couldn’t find any info about SezWho’s support for comment moderation.

Search Engine Friendliness

This is probably the most debatable criterion used for evaluating the 4 services and unfortunately to cover it I will need a whole new post (which will come later on).

  • [8] Disqus is offering a hosted page for each comment thread, so search engines can index the comments.
  • [9] I’ve read about improvements implemented by IntenseDebate, but unfortunately these are not useful for the sites that do not use the IntenseDebate custom plugin
  • [10] JS-Kit employs the same solution as Disqus.
  • [11] No information available.

Disqus and JS-Kit approach to this problem is quite good (even if a bit complex) as with the help of some subdomain mapping you can instruct the search engines to see the comment thread content as site’s content.

Programming API

  • [5] I am still investigating the possibility to access IntenseDebate data through a programming API (see thread)
  • [6] Even if there are a couple of sources mentioning a SezWho API (see Mashable and CenterNetworks I couldn’t find any reference to it in SezWho’s documentation

Other notes

  • [13] SezWho:

    The basic SezWho rating, reputation and profile services are provided for free on the currently supported platforms. SezWho will be offering upgrades to premium products and services in the future, but basic SezWho integration for standard social media platforms will always be available as a free service.

  • [14] JS-Kit Q&A Forum is unfortunately pretty unusable as there is no way to perform any searches. It has over 450 pages of comments, so even if I am pretty sure there is a lot of information in there, it is a pity that you cannot get to it. Search Engine Friendliness is a major and extremely important feature that you should consider while integrating a 3rd party commenting system.
  • [15] SezWho FAQ Page contains minimal information and unfortunately I couldn’t find other sources.

More information

Here is a set of other features compared on RWW:

2531532507_9464b2f583.jpg

Disclaimer

SezWho documentation is pretty scarce, so I haven’t been able to find detailed information on the evaluated set of features.


12 Comments

Filed under technolog, Tools

Internet History in 8 minutes

You’ve probably seen it already, but I thought it is worth re-sharing it:

As you can imagine, reducing the 50 years of internet to 8 minutes only is not easy, but the video does a great job explaining the major steps that lead to what we take today for granted.


Leave a comment

Filed under personalog

BeautifulSoup or SGMLParser Bug

If you are reading this, you already know what BeautifulSoup is and how useful it is while working with XML/HTML in Python (in case you are not familiar with it, I’d encourage you to read its documentation). So I’ll just skip to the main reason of this post: a bug in parsing the <script> tags in HTML documents.

10.1.jpg
According to the documentation, BeautifulSoup knows how to handle the body of a <script> tag, meaning that it knows to treat its content as a pure string and not perform any additional parsing on it. Unfortunately, I’ve discovered a corner case where it behaves incorrectly.

Here is the sample HTML that will reveal the bug:

<html>
<head></head>
<body>
  <script type='text/javascript'>
    document.write('</script>');
    document.write('<div></div>');
  </script>
</body>
</html>

The problem is that the string ‘</script>’ tricks the parser to believe that the end of the <script> tag is reached and so instead of getting a single Tag from the <script> HTML tag it basically results in 2 elements: a Tag and a NavigableString that contains the rest of the <script> tag (i.e. what comes after the ‘</script>’ string: '); document.write('<div></div>');).

This basically means that for any HTML that contains a similar fragment rewriting it will lead to broken <script>s. Unfortunately, I haven’t been able to figure out a solution. My impression is that this parsing happens at a very low level and this makes me think that the bug might not be one of BeatifulSoup but rather a bug in SGMLParser.

The affected version is 3.0.7a. Meanwhile it looks like a new release has seen the light, but I haven’t tested it yet. The new BeautifulSoup 3.1.0 has replaced the SGMLParser with HTMLParser (in the attempt to make BeautifulSoup compatible with Python 3.0) so this bug might be already fixed.

If we are at bugs, I’d also like to mention one in Python 2.5.2 MacOS:

MemoryError
Python(72261) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Exception exceptions.MemoryError: MemoryError() in  ignored

Things are much simpler with this one, even if the displayed information doesn’t offer enough details. The above bug is basically the result of adding strings to a list in an infinite loop (so a programming problem, but with no indication of the error).


3 Comments

Filed under technolog