The QCon London 2009 Week

It is once again the time for QCon London and you might know that this is already the 3rd year in a row we are organizing the event (and indeed, there is a QCon San Francisco for our friends on that side of the world).

I know I may be accused of being subjective (as I am one of the InfoQ co-founders which is the co-organizer of the event), but people that know me and those that have participated at least once at QCon will know that what I’m going to write stands true.

QCon is the event that will provide at least 2 things: the opportunity to learn something new (there are a lot of different tracks — you can find the ones for this year here) and the best opportunity to connect with renowned experts in various fields.

logo_qcon.gif
You might wonder why I do think that QCon is better than other events to connect with people. The reason is that most of the people participating at QCon (speakers included) are spending their time at the conference (as opposed to just flying in, delivering the presentation and leaving), correlated with the fact that the conference is not , plus the venue, plus the parties will offer you enough time to get to these guys. All you need to do is just come and have the guts to walk to whoever you want to meet and say Hi!

I really hope that next year you will be here and you’ll just say Hi (well, leave aside the fact that I don’t consider myself some sort of guru, I’d still appreciate it).

Leave a comment

Filed under personalog

Apple: Can I run Safari4 and Safari3 side by side?

I know that Safari 4 was just released, but I couldn’t stop wondering if there is any way I can try it out without having to give up my stable Safari 3. So, is there a way to run Safari4 and Safari3 side by side or is there a portable Safari?

features-bookmarks-historyview-20090217.jpg

While I don’t know the final answer, I can tell you one thing: if you are searching for running different IE versions side by side you’ll find tons of answers. If you’re looking for running different Firefox versions side by side you’ll find another pile of answers, plus various portable bundles, etc. But for Safari: almost nothing.

With the help of the guys from StackOverflow, I have found the unique resource pointing out that it might be possible to run Safari 4 and Safari 4 side by side: Multi-Safari. Unfortunately, so far Safari 4 is not available, but I really hope they will make it available soon.

Update: Here is an extract from another tutorial for setting this up:

  1. Download and install the Safari 4 beta. You’ll need to reboot after the install because of the system framework changes.
  2. After rebooting, rename the new Safari.app in your Applications folder to Safari4.app.
  3. Download the Webkit build from 11/22/2008.
    Safari 3.2.1 was released on 11/24/2008 so I’m guessing this build is very close to that version.
  4. Mount the Webkit disk image and copy the Webkit.app application to your desktop.
  5. Rename Webkit.app to Safari3.app and move it to your Applications folder. In your Applications folder you should now have Safari3.app and Safari4.app.
    The Safari 4 installer backs up the previous version as an invisible file located at /Library/Application Support/Apple/.Safari4PreviewArchive.tar.gz. We need the original Safari.app bundle as the old version of Webkit we downloaded will not work with the new Safari 4 bundle.
  6. Launch the terminal and change directories:
    cd “/Library/Application Support/Apple/”
  7. Expand the backed up archive: tar -zxvf .Safari4PreviewArchive.tar.gz. This creates a few new folders in the current directory: Applications, System, and usr.
  8. You can now copy the old Safari.app which is now available the the newly created Applications folder to the top-level /Applications folder. You can either use the terminal (running the command cp -R “/Library/Application Support/Apple/Applications/Safari.app” /Applications/) or by navigating to Library » Application Support » Apple » Applications in the Finder and copying the Safari application bundle that way.

Unfortunately, it doesn’t sound so simple and I have not tried it myself (moreover on the original post there are commenters saying it doesn’t work).

So, if you try it out do let me know if you can run Safari4 and Safari3 side by side! Also, if you find a portable Safari I’d really appreciate any leads!

1 Comment

Filed under technolog, Tools

Python and Text Processing

During the Christmas vacation, I’ve played some more with Python as I really like its simplicity and consistency (as a side note I really wish other languages would have the same level of consistency).

I’ve put together a short list of Python resources for text processing. While, I haven’t used all of them, in most cases they seemed to be exactly what I’ve been looking for.

r30741m.jpg

Natural Language Processing

Tokenization

Tokenization is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input.

While I have found the found the following simple tokenizer, I’ve also written mine which doesn’t use regexps

def tokenize(sentence):
  '''Tokenize the given `sentence`.'''
  words = []
  j = 0
  end = len(sentence) - 1
  for i in xrange(len(sentence)):
    if not sentence[i].isalnum():
      if (sentence[i] == '.' or sentence[i] == ',') and (i > 0 and i < end):
        # if inside a number
        if sentence[i - 1].isdigit() and sentence[i + 1].isdigit(): 
          continue
      words.append(sentence[j:i])
      j = i + 1
  if j <= end:
    words.append(sentence[j:])
  return [w for w in words if w]

The only thing worth mentioning about the above tokenizer is that it is not breaking the formatted numbers (but it will break dates separated by / or -).

Stemming

The original Porter Stemmer is available also in Python (it looks like it is a simple translation of the C version without using any Python idioms).

2 Comments

Filed under technolog

Google Desktop Breaking Privacy and More ‘Good’ Things about Google

Google and Privacy

This last week, I’ve decided to install Google Desktop for Mac, as I wasn’t very satisfied with how my PDFs are indexed by Spotlight and/or Yep.

While, so far I’m pretty happy with the way Google Desktop has indexed my PDFs, I have noticed one thing that makes me feel completely unsecure about Google products!

GoogleDesktop.png

Even if I have configured Google Desktop NOT to send any statistics (see above screenshot), the firewall caught Google Desktop repeatedly attempting to connect and submit private information without my consent!

I’ll not rant about what this means, but this is a major security and privacy breach in Google Desktop.

Gmail: Multi-Inboxes

This is a brilliant idea that offers a way to have multiple ‘portlets’ on your GMail account, each configured to display mails according to specific rules.

While playing with it, I’ve noticed a couple of things:

  • the Lab feature is useful for displaying emails that are either Archived or configured to skip the Inbox. If you configure the portlets to match emails in the Inbox then things may get a bit confusing (duplication, actions, etc).
  • The sidebar display option seemed to be the most appealing configuration. Remember we are having wider, not longer screens
  • In case you categorize your emails using multiple labels or you are watching group emails (so you have real email threads), the sidebar display configuration is pretty unusable as the displayed information is unreadable (basically, the email subject is not visible)
    Picture 2.png

    Now, I am trying out the option to display the portlets underneath the main area, but so far I don’t really like it.

Offline Gmail

I suppose you’ve already read about this as it was covered by all major and not so major blogs, everybody praising it. But, I guess somebody must be reticent about it, so why not that being me.

In my opinion, the offline GMail in the current form is useless.

But let me tell you my reasons. The offline support is auto-configured, meaning that you don’t have any control on what and how it is brought for offline access. I frankly prefer to access specific emails while being offline than to read what some statistical algorithm is telling me to read.

I think there is an easy solution for it though: GMail should introduce a special Offline label that you can use to specify what emails you want for offline access. Then it can use this humanly input metadata to take offline those emails and the last X days in the Inbox. That would make Offline GMail really useful!

Google Analytics Loosing Data

While analyzing the monthly data for one of my Google Analytics accounts, I’ve noticed a 10 days gap in the collected data.

It looks like Google Analytics completely missed collecting data for that period and when trying to get some support help for this major problem in Google Analytics, the answer I’ve got back was along the lines: “Don’t complain! It is a free product!”. I’ll let you judge by yourself how I feel about it.

Leave a comment

Filed under personalog, technolog, Tools

Update on Commenting Services Face-off

After my post on the 4 commenting services Commenting Services Face to Face: Disqus vs IntenseDebate vs JS-Kit vs SezWho, I have started to receive back a lot of feedback. I’d like to thank all the people that took their time to go through the article and send me their feedback.

Based on this feedback, I feel that an update is needed as the information might change the final evaluation. As a quick reminder my final ranking was something like:

Picture 5.jpg

Blog

  1. Winner: IntenseDebate
  2. Runner-up: Disqus

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate

Now, if you check the different matrices in my initial post, you’ll notice that there is a question that isn’t really answered: why JS-Kit doesn’t show up in my final top? I have to confess that compared with the rest of the article which tried to be as objective as possible, the answer to this question was a bit more subjective and it was heavily influenced by the fact that JS-Kit is was offering the FREE widget for only 25k pageviews. But, for some the JS-Kit’s freemium model may be more comfortable as it may be seen as a guarantee that the initial investment will not go away any time soon.

Picture 1.png

I should also mention that there have been a major upgrade to the freemium model and now the free version is available for up to 5 mil pageviews (see more details about JS-Kit pricing).

IntenseDebate updates

Spam filtering

IntenseDebate offers integration with Akismet for spam filtering. The option is available on the account dashboard.
IntenseDebate Akismet support

Data Access

There are 2 updates related to the data access in IntenseDebate:

  1. IntenseDebate offers through the account dashboard an export to XML function
  2. IntenseDebate is currently working on an API. It wasn’t yet launched and I haven’t had the chance to check it yet. I am in contact with their support team and hope to have a more detailed update on this topic.

IntenseDebate Matrix

Comment Threading: Y
Anonymous Comments: Y
Bookmarkability Y
Comment ranking Y
Comment ranking functionality Y
Rich format comments Y
Spam filtering: Custom + Akismet
Comment Moderation: Y
(web + email)
Search Engine Friendliness Y (for platforms support by the plugin)/N for custom web sites
HTML/CSS Customization Y
Widget (JS API): Y
Programming API: private (work in progress)
Data export Export as XML
Costs Free
Documentation
Support
GetSatisfaction

With these updates, it looks like the only missing piece from the IntenseDebate offering is the lack of an off-the-shelf SEO friendliness feature. Moreover, this missing feature applies only for custom web sites that are not able to use the IntenseDebate integration plugins.

JS-Kit updates

Model/Costs

While, at the time of my initial comparison, the JS-Kit FREE version was available for 25k pageviews, JS-Kit has pushed a major update and now JS-Kit FREE applies for 5mil pageviews. This is imo a major change in their offering, one that makes me feel that JS-Kit wins its place in my top.

Picture 2.png

Data access

In my initial matrices, JS-Kit is missing both a Programming API and Data Export functionality. Well, I have some good news about these: JS-Kit folks are working on a public API and I hope to get access to it very soon and add more details. Also, JS-Kit offers access to all the comments through RSS. While, this is not optimal, your comments are not completely locked-in and so JS-Kit cannot score 0 anymore.

Search Engine Friendliness

In the previous post, I’ve been complaining about the fact that I wasn’t able to search the JS-Kit forum for more details. It looks like I was a bit wrong and the forum search functionality is in fact available, but a bit hidden under the Control link:

Picture 4.png

The guys from JS-Kit have promised to improve the widget UI so that the search functionality to become more visible and easily to access.

Also, having in mind the SEO solution created by JS-Kit (and then re-used by other commenting services), search engines should be able to correctly index the comment threads.

JS-Kit Matrix

Comment Threading: Y
Anonymous Comments: Y
Bookmarkability N
Comment ranking Y
Comment ranking functionality Y
Rich format comments Y
Spam filtering: Custom + Akismet
Comment Moderation: Y
(web + email)
Search Engine Friendliness Y (for platforms support by the plugin)/N for custom web sites
HTML/CSS Customization Y
Widget (JS API): Y
Programming API: private (work in progress)
Data export RSS
Costs Freemium model (see JS-Kit pricing for more details)
Documentation
Support
Q&A Forum, on site documentation, PDF

Conclusion

Based on the new information, I’d say that the top has changed a bit and without further ado, here is the new ranking:

Blog

  1. Winner: JS-Kit
  2. Runner-up: IntenseDebate

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate and JS-Kit

More JS-Kit Features

I have received a ton of information on JS-Kit features, so I’m sharing here with you. If guys from Disqus and IntenseDebate are willing to share their complete feature list, I’d be glad to publish it.

User Related Features: JS-Kit Additions

  • Ability to get replies to comments via email. Ability to respond with email. Response automatically inserted into the comment thread.
  • Private messaging between commentors.
  • Ability to embed and play YouTube videos within comments. (configurable through the JS-Kit admin interface)
  • Ability to upload photos up to 10MB in size with automatic thumbnail generation. (configurable through the JS-Kit admin interface)
  • Facebook Connect and OpenID support

Owner Related Features: JS-Kit Additions

  • Obscenity filters
  • Support for multiple administrators and sub-section moderation (eg. you can only moderate this \subdomain)
  • Community moderation. “Mark as offensive” is set by blogger to N, where N = remove comment and place in pre-moderation
  • Selective moderation (eg. Once the blogger approves a commentor N times, that commentor is no longer moderated)
  • JS-Kit also provides Ratings and Polls for bloggers using the same cusomization, support, and administration system
  • (integration with blogging platforms) Option to highlight Blogger comments with a different background color.

Data Access: JS-Kit Additions

  • (integration with blogging platforms) JS-Kit innovated “Sync” which automatically updates the base platform with all new comments.

If you are interested in cloud computing you can start visiting DailyCloud: the daily coverage of the cloud computing market. The DailyCloud, which is still in early beta, synthezises the content, links and social stream on cloud computing and its adjacent topics: SaaS, PaaS, IaaS, HaaS, grid computing, virtualization, data centers.

7 Comments

Filed under technolog, Tools

Advanced JVM Tuning for Low Pause

The standard Java Virtual Machine (JVM) is configured to optimize for throughput. But some systems are more interested in low pause/reduced latency and GC (garbage collection) might be one source of pausing. (you can read an interesting article about what latency means to your business)

I have found a post on GigaSpaces forum providing some possible JVM configurations to optimize on latency:

-Xms2g -Xmx2g -Xmn150m 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=10 
-XX:CMSIncrementalDutyCycle=50 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:GCTimeRatio=10 -XX:+DisableExplicitGC

Please note that -XX:+UseConcMarkSweepGC has the heaviest impact on performance – decrease of 40%.

The following set of parameters shows 20% better performance than with -XX:+UseConcMarkSweepGC while the pause size still is below 100msec in embedded test with payload 10KB and 100 threads:

-Xms2g -Xmx2g -Xmn150m 
-XX:GCTimeRatio=2 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:+DisableExplicitGC

While I’m pretty sure that most of the applications do no need such an advanced VM configuration, it is interesting to see what strategies are employed when low latency is needed.

Option Details
-XX:+UseConcMarkSweepGC Sets the garbage collector policy to the concurrent (low pause time) garbage collector (also known as CMS)
-XX:+CMSIncrementalMode Enables the incremental mode. (works only with -XX:+UseConcMarkSweepGC)
-XX:+CMSIncrementalPacing Enables automatic adjustment of the incremental mode duty cycle based on statistics collected while the JVM is running
-XX:CMSIncrementalDutyCycleMin The percentage (0-100) which is the lower bound on the duty cycle when CMSIncrementalPacing is enabled
-XX:CMSIncrementalDutyCycle The percentage (0-100) of time between minor collections that the concurrent collector is allowed to run. If CMSIncrementalPacing is enabled, then this is just the initial value.
-XX:ParallelGCThreads Sets the number of garbage collector threads
-XX:+UseParNewGC Enables multi threaded young generation collection.
-XX:MaxGCPauseMillis A hint to the throughput collector that it’s desirable that the maximum pause time is lowed than the given value. (n.b. it looks like this value can also be used with the CMS garbage collector)
-XX:GCTimeRatio A hint to the virtual machine that it’s desirable that not more than 1 / (1 + GCTimeRation) of the application execution time be spent in the collector
-XX:+DisableExplicitGC Disables explicit garbage collection calls (System.gc())

There is no need to learn all these flags by heart as you can find them covered in various documents:

If you still need help you can try asking for help on the General Performance Forum.

Leave a comment

Filed under technolog

Cloud Computing Coverage End of 2008

Here is a short list of links that I have found interesting about the cloud computing during December (the list is by no means complete and most probably more links will be added later). The end of year is usually the time for checking how your last year predictions have worked and to throw out new predictions for the year to come.

So, let’s start with a couple of predictions:

Another interesting reads for me were Microsoft miss the ship(ping container) and definitely this post talking about Capegemini trying to convince companies to move to the Amazon cloud.

For now, I’ll finish with the AWS migration blueprint article.


If you are interested in cloud computing you can start visiting DailyCloud: the daily coverage of the cloud computing market. The DailyCloud, which is still in early beta, synthezises the content, links and social stream on cloud computing and its adjacent topics: SaaS, PaaS, IaaS, HaaS, grid computing, virtualization, data centers.

Leave a comment

Filed under links