Category Archives: technolog

Apple: Can I run Safari4 and Safari3 side by side?

I know that Safari 4 was just released, but I couldn’t stop wondering if there is any way I can try it out without having to give up my stable Safari 3. So, is there a way to run Safari4 and Safari3 side by side or is there a portable Safari?

features-bookmarks-historyview-20090217.jpg

While I don’t know the final answer, I can tell you one thing: if you are searching for running different IE versions side by side you’ll find tons of answers. If you’re looking for running different Firefox versions side by side you’ll find another pile of answers, plus various portable bundles, etc. But for Safari: almost nothing.

With the help of the guys from StackOverflow, I have found the unique resource pointing out that it might be possible to run Safari 4 and Safari 4 side by side: Multi-Safari. Unfortunately, so far Safari 4 is not available, but I really hope they will make it available soon.

Update: Here is an extract from another tutorial for setting this up:

  1. Download and install the Safari 4 beta. You’ll need to reboot after the install because of the system framework changes.
  2. After rebooting, rename the new Safari.app in your Applications folder to Safari4.app.
  3. Download the Webkit build from 11/22/2008.
    Safari 3.2.1 was released on 11/24/2008 so I’m guessing this build is very close to that version.
  4. Mount the Webkit disk image and copy the Webkit.app application to your desktop.
  5. Rename Webkit.app to Safari3.app and move it to your Applications folder. In your Applications folder you should now have Safari3.app and Safari4.app.
    The Safari 4 installer backs up the previous version as an invisible file located at /Library/Application Support/Apple/.Safari4PreviewArchive.tar.gz. We need the original Safari.app bundle as the old version of Webkit we downloaded will not work with the new Safari 4 bundle.
  6. Launch the terminal and change directories:
    cd “/Library/Application Support/Apple/”
  7. Expand the backed up archive: tar -zxvf .Safari4PreviewArchive.tar.gz. This creates a few new folders in the current directory: Applications, System, and usr.
  8. You can now copy the old Safari.app which is now available the the newly created Applications folder to the top-level /Applications folder. You can either use the terminal (running the command cp -R “/Library/Application Support/Apple/Applications/Safari.app” /Applications/) or by navigating to Library » Application Support » Apple » Applications in the Finder and copying the Safari application bundle that way.

Unfortunately, it doesn’t sound so simple and I have not tried it myself (moreover on the original post there are commenters saying it doesn’t work).

So, if you try it out do let me know if you can run Safari4 and Safari3 side by side! Also, if you find a portable Safari I’d really appreciate any leads!

Advertisements

2 Comments

Filed under technolog, Tools

Python and Text Processing

During the Christmas vacation, I’ve played some more with Python as I really like its simplicity and consistency (as a side note I really wish other languages would have the same level of consistency).

I’ve put together a short list of Python resources for text processing. While, I haven’t used all of them, in most cases they seemed to be exactly what I’ve been looking for.

r30741m.jpg

Natural Language Processing

Tokenization

Tokenization is the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input.

While I have found the found the following simple tokenizer, I’ve also written mine which doesn’t use regexps

def tokenize(sentence):
  '''Tokenize the given `sentence`.'''
  words = []
  j = 0
  end = len(sentence) - 1
  for i in xrange(len(sentence)):
    if not sentence[i].isalnum():
      if (sentence[i] == '.' or sentence[i] == ',') and (i > 0 and i < end):
        # if inside a number
        if sentence[i - 1].isdigit() and sentence[i + 1].isdigit(): 
          continue
      words.append(sentence[j:i])
      j = i + 1
  if j <= end:
    words.append(sentence[j:])
  return [w for w in words if w]

The only thing worth mentioning about the above tokenizer is that it is not breaking the formatted numbers (but it will break dates separated by / or -).

Stemming

The original Porter Stemmer is available also in Python (it looks like it is a simple translation of the C version without using any Python idioms).

2 Comments

Filed under technolog

Google Desktop Breaking Privacy and More ‘Good’ Things about Google

Google and Privacy

This last week, I’ve decided to install Google Desktop for Mac, as I wasn’t very satisfied with how my PDFs are indexed by Spotlight and/or Yep.

While, so far I’m pretty happy with the way Google Desktop has indexed my PDFs, I have noticed one thing that makes me feel completely unsecure about Google products!

GoogleDesktop.png

Even if I have configured Google Desktop NOT to send any statistics (see above screenshot), the firewall caught Google Desktop repeatedly attempting to connect and submit private information without my consent!

I’ll not rant about what this means, but this is a major security and privacy breach in Google Desktop.

Gmail: Multi-Inboxes

This is a brilliant idea that offers a way to have multiple ‘portlets’ on your GMail account, each configured to display mails according to specific rules.

While playing with it, I’ve noticed a couple of things:

  • the Lab feature is useful for displaying emails that are either Archived or configured to skip the Inbox. If you configure the portlets to match emails in the Inbox then things may get a bit confusing (duplication, actions, etc).
  • The sidebar display option seemed to be the most appealing configuration. Remember we are having wider, not longer screens
  • In case you categorize your emails using multiple labels or you are watching group emails (so you have real email threads), the sidebar display configuration is pretty unusable as the displayed information is unreadable (basically, the email subject is not visible)
    Picture 2.png

    Now, I am trying out the option to display the portlets underneath the main area, but so far I don’t really like it.

Offline Gmail

I suppose you’ve already read about this as it was covered by all major and not so major blogs, everybody praising it. But, I guess somebody must be reticent about it, so why not that being me.

In my opinion, the offline GMail in the current form is useless.

But let me tell you my reasons. The offline support is auto-configured, meaning that you don’t have any control on what and how it is brought for offline access. I frankly prefer to access specific emails while being offline than to read what some statistical algorithm is telling me to read.

I think there is an easy solution for it though: GMail should introduce a special Offline label that you can use to specify what emails you want for offline access. Then it can use this humanly input metadata to take offline those emails and the last X days in the Inbox. That would make Offline GMail really useful!

Google Analytics Loosing Data

While analyzing the monthly data for one of my Google Analytics accounts, I’ve noticed a 10 days gap in the collected data.

It looks like Google Analytics completely missed collecting data for that period and when trying to get some support help for this major problem in Google Analytics, the answer I’ve got back was along the lines: “Don’t complain! It is a free product!”. I’ll let you judge by yourself how I feel about it.

Leave a comment

Filed under personalog, technolog, Tools

Update on Commenting Services Face-off

After my post on the 4 commenting services Commenting Services Face to Face: Disqus vs IntenseDebate vs JS-Kit vs SezWho, I have started to receive back a lot of feedback. I’d like to thank all the people that took their time to go through the article and send me their feedback.

Based on this feedback, I feel that an update is needed as the information might change the final evaluation. As a quick reminder my final ranking was something like:

Picture 5.jpg

Blog

  1. Winner: IntenseDebate
  2. Runner-up: Disqus

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate

Now, if you check the different matrices in my initial post, you’ll notice that there is a question that isn’t really answered: why JS-Kit doesn’t show up in my final top? I have to confess that compared with the rest of the article which tried to be as objective as possible, the answer to this question was a bit more subjective and it was heavily influenced by the fact that JS-Kit is was offering the FREE widget for only 25k pageviews. But, for some the JS-Kit’s freemium model may be more comfortable as it may be seen as a guarantee that the initial investment will not go away any time soon.

Picture 1.png

I should also mention that there have been a major upgrade to the freemium model and now the free version is available for up to 5 mil pageviews (see more details about JS-Kit pricing).

IntenseDebate updates

Spam filtering

IntenseDebate offers integration with Akismet for spam filtering. The option is available on the account dashboard.
IntenseDebate Akismet support

Data Access

There are 2 updates related to the data access in IntenseDebate:

  1. IntenseDebate offers through the account dashboard an export to XML function
  2. IntenseDebate is currently working on an API. It wasn’t yet launched and I haven’t had the chance to check it yet. I am in contact with their support team and hope to have a more detailed update on this topic.

IntenseDebate Matrix

Comment Threading: Y
Anonymous Comments: Y
Bookmarkability Y
Comment ranking Y
Comment ranking functionality Y
Rich format comments Y
Spam filtering: Custom + Akismet
Comment Moderation: Y
(web + email)
Search Engine Friendliness Y (for platforms support by the plugin)/N for custom web sites
HTML/CSS Customization Y
Widget (JS API): Y
Programming API: private (work in progress)
Data export Export as XML
Costs Free
Documentation
Support
GetSatisfaction

With these updates, it looks like the only missing piece from the IntenseDebate offering is the lack of an off-the-shelf SEO friendliness feature. Moreover, this missing feature applies only for custom web sites that are not able to use the IntenseDebate integration plugins.

JS-Kit updates

Model/Costs

While, at the time of my initial comparison, the JS-Kit FREE version was available for 25k pageviews, JS-Kit has pushed a major update and now JS-Kit FREE applies for 5mil pageviews. This is imo a major change in their offering, one that makes me feel that JS-Kit wins its place in my top.

Picture 2.png

Data access

In my initial matrices, JS-Kit is missing both a Programming API and Data Export functionality. Well, I have some good news about these: JS-Kit folks are working on a public API and I hope to get access to it very soon and add more details. Also, JS-Kit offers access to all the comments through RSS. While, this is not optimal, your comments are not completely locked-in and so JS-Kit cannot score 0 anymore.

Search Engine Friendliness

In the previous post, I’ve been complaining about the fact that I wasn’t able to search the JS-Kit forum for more details. It looks like I was a bit wrong and the forum search functionality is in fact available, but a bit hidden under the Control link:

Picture 4.png

The guys from JS-Kit have promised to improve the widget UI so that the search functionality to become more visible and easily to access.

Also, having in mind the SEO solution created by JS-Kit (and then re-used by other commenting services), search engines should be able to correctly index the comment threads.

JS-Kit Matrix

Comment Threading: Y
Anonymous Comments: Y
Bookmarkability N
Comment ranking Y
Comment ranking functionality Y
Rich format comments Y
Spam filtering: Custom + Akismet
Comment Moderation: Y
(web + email)
Search Engine Friendliness Y (for platforms support by the plugin)/N for custom web sites
HTML/CSS Customization Y
Widget (JS API): Y
Programming API: private (work in progress)
Data export RSS
Costs Freemium model (see JS-Kit pricing for more details)
Documentation
Support
Q&A Forum, on site documentation, PDF

Conclusion

Based on the new information, I’d say that the top has changed a bit and without further ado, here is the new ranking:

Blog

  1. Winner: JS-Kit
  2. Runner-up: IntenseDebate

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate and JS-Kit

More JS-Kit Features

I have received a ton of information on JS-Kit features, so I’m sharing here with you. If guys from Disqus and IntenseDebate are willing to share their complete feature list, I’d be glad to publish it.

User Related Features: JS-Kit Additions

  • Ability to get replies to comments via email. Ability to respond with email. Response automatically inserted into the comment thread.
  • Private messaging between commentors.
  • Ability to embed and play YouTube videos within comments. (configurable through the JS-Kit admin interface)
  • Ability to upload photos up to 10MB in size with automatic thumbnail generation. (configurable through the JS-Kit admin interface)
  • Facebook Connect and OpenID support

Owner Related Features: JS-Kit Additions

  • Obscenity filters
  • Support for multiple administrators and sub-section moderation (eg. you can only moderate this \subdomain)
  • Community moderation. “Mark as offensive” is set by blogger to N, where N = remove comment and place in pre-moderation
  • Selective moderation (eg. Once the blogger approves a commentor N times, that commentor is no longer moderated)
  • JS-Kit also provides Ratings and Polls for bloggers using the same cusomization, support, and administration system
  • (integration with blogging platforms) Option to highlight Blogger comments with a different background color.

Data Access: JS-Kit Additions

  • (integration with blogging platforms) JS-Kit innovated “Sync” which automatically updates the base platform with all new comments.

If you are interested in cloud computing you can start visiting DailyCloud: the daily coverage of the cloud computing market. The DailyCloud, which is still in early beta, synthezises the content, links and social stream on cloud computing and its adjacent topics: SaaS, PaaS, IaaS, HaaS, grid computing, virtualization, data centers.

8 Comments

Filed under technolog, Tools

Advanced JVM Tuning for Low Pause

The standard Java Virtual Machine (JVM) is configured to optimize for throughput. But some systems are more interested in low pause/reduced latency and GC (garbage collection) might be one source of pausing. (you can read an interesting article about what latency means to your business)

I have found a post on GigaSpaces forum providing some possible JVM configurations to optimize on latency:

-Xms2g -Xmx2g -Xmn150m 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
-XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=10 
-XX:CMSIncrementalDutyCycle=50 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:GCTimeRatio=10 -XX:+DisableExplicitGC

Please note that -XX:+UseConcMarkSweepGC has the heaviest impact on performance – decrease of 40%.

The following set of parameters shows 20% better performance than with -XX:+UseConcMarkSweepGC while the pause size still is below 100msec in embedded test with payload 10KB and 100 threads:

-Xms2g -Xmx2g -Xmn150m 
-XX:GCTimeRatio=2 -XX:ParallelGCThreads=8 
-XX:+UseParNewGC -XX:MaxGCPauseMillis=2000 
-XX:+DisableExplicitGC

While I’m pretty sure that most of the applications do no need such an advanced VM configuration, it is interesting to see what strategies are employed when low latency is needed.

Option Details
-XX:+UseConcMarkSweepGC Sets the garbage collector policy to the concurrent (low pause time) garbage collector (also known as CMS)
-XX:+CMSIncrementalMode Enables the incremental mode. (works only with -XX:+UseConcMarkSweepGC)
-XX:+CMSIncrementalPacing Enables automatic adjustment of the incremental mode duty cycle based on statistics collected while the JVM is running
-XX:CMSIncrementalDutyCycleMin The percentage (0-100) which is the lower bound on the duty cycle when CMSIncrementalPacing is enabled
-XX:CMSIncrementalDutyCycle The percentage (0-100) of time between minor collections that the concurrent collector is allowed to run. If CMSIncrementalPacing is enabled, then this is just the initial value.
-XX:ParallelGCThreads Sets the number of garbage collector threads
-XX:+UseParNewGC Enables multi threaded young generation collection.
-XX:MaxGCPauseMillis A hint to the throughput collector that it’s desirable that the maximum pause time is lowed than the given value. (n.b. it looks like this value can also be used with the CMS garbage collector)
-XX:GCTimeRatio A hint to the virtual machine that it’s desirable that not more than 1 / (1 + GCTimeRation) of the application execution time be spent in the collector
-XX:+DisableExplicitGC Disables explicit garbage collection calls (System.gc())

There is no need to learn all these flags by heart as you can find them covered in various documents:

If you still need help you can try asking for help on the General Performance Forum.

Leave a comment

Filed under technolog

Commenting Services Face to Face: Disqus vs IntenseDebate vs JS-Kit vs SezWho

There is a major update to this post: Update on Commenting Services Face-off. Make sure you read it before jumping to any conclusions!
I have received a lot of feedback from JS-Kit and SezWho which lead to changes in the evaluations. I am working on an update to the post. Meanwhile, you can read Jitendra’s comment which is providing more insight about SezWho offering.

This is a long post, so I’ll jump directly to the main topic. I’ve run a face to face comparison for 4 existing commenting systems: Disqus, IntenseDebate, JS-Kit and SezWho. The rest of the post presents the criteria I’ve used, the winners and references to specific features.

Please feel free to comment and correct me if I got anything wrong!

Criteria

  • User related Features
    • Comment threading
    • Anonymous posts
    • Bookmarkability
    • Comment ranking and additional features (sort, most, filter, etc.)
    • Rich format comments
  • Owner related features:
    • Spam filtering
    • Comment moderation
    • Search Engine Friendliness
    • HTML/CSS customization
  • Data access
    • Widget (Javascript API)
    • Programming API
    • Data export

Evaluated products

Winners

After completing the evaluation of the mentioned criteria, I have decided that there should be 2 categories: Blogs and Custom sites. Even if there are no immediate visible differences, the existence of a programming API offers a lot more extensibility and freedom to site owners, while this feature might not be as important as features like additional comment ranking features (sorting, filtering, recommendations, etc.) for blog owners.

Drum roll… The winners are:

There is a major update to this post: Update on Commenting Services Face-off and you should make sure that you read it before further considering this top.

Blog

  1. Winner: IntenseDebate
  2. Runner-up: Disqus

Site

  1. Winner: Disqus
  2. Runner-up: IntenseDebate

User Features

  Disqus  IntenseDebate  JS-Kit  SezWho
Comment Threading: Y Y Y Y
Anonymous Comments: Y Y Y Y [7]
Bookmarkability N Y N Y
Comment ranking Y Y Y Y
Comment ranking functionality – [12] Y Y Y
Rich format comments Y Y Y Y

Owner Features

  Disqus  IntenseDebate  JS-Kit  SezWho
Spam filtering: Custom + Akismet [1] Custom [2] Akismet Akismet [3]
Comment Moderation: Y
(web + email)
Y
(web + email)
Y
(web + email)
Y [4]
Search Engine Friendliness Y [8] N [9] Y [10] N [11]
HTML/CSS Customization Y Y Y Y

Data access

  Disqus  IntenseDebate  JS-Kit  SezWho
Widget (JS API): Y Y Y Y
Programming API: Y N [5] N N [6]
Data export API None None None

Other Criteria

  Disqus  IntenseDebate  JS-Kit  SezWho
Costs Free Free Free for 25k pageviews
+ Commercial
Free [13]
Documentation
Support
API docs
Low Traffic Forum
GetSatisfaction Q&A Forum [14] FAQ Page [15]

Comment ranking functionality

Basically, once comments can be rated themselves, there is a lot of new functionality that can be offered by these services. I am thinking of: filtering the comment thread, sorting, most commented content, etc.

[12] I couldn’t find anything about Disqus support for additional functionality.

Spam Filtering

  1. [1] According to the following links, Disqus is employing a combination of custom filtering and Akismet integration
  2. [2] I couldn’t find any place in the IntenseDebate documentation detailing what solution is used, so I’ve concluded that some custom filtering is employed. Considering that IntenseDebate is now part of the WordPress universe, it might be possible to also integrate with Akismet.
  3. JS-Kit documentation is clear about this point: LINK
  4. [3] While some sources are mentioning the integration with Akismet for spam filtering, I couldn’t find this info in the SezWho documentation

Comment Moderation

All 3 Disqus, IntenseDebate and JS-Kit support advanced moderation features. But my advice would be to effectively test them if comment moderation is important for your site or blog.

[4] Unfortunately, I couldn’t find any info about SezWho’s support for comment moderation.

Search Engine Friendliness

This is probably the most debatable criterion used for evaluating the 4 services and unfortunately to cover it I will need a whole new post (which will come later on).

  • [8] Disqus is offering a hosted page for each comment thread, so search engines can index the comments.
  • [9] I’ve read about improvements implemented by IntenseDebate, but unfortunately these are not useful for the sites that do not use the IntenseDebate custom plugin
  • [10] JS-Kit employs the same solution as Disqus.
  • [11] No information available.

Disqus and JS-Kit approach to this problem is quite good (even if a bit complex) as with the help of some subdomain mapping you can instruct the search engines to see the comment thread content as site’s content.

Programming API

  • [5] I am still investigating the possibility to access IntenseDebate data through a programming API (see thread)
  • [6] Even if there are a couple of sources mentioning a SezWho API (see Mashable and CenterNetworks I couldn’t find any reference to it in SezWho’s documentation

Other notes

  • [13] SezWho:

    The basic SezWho rating, reputation and profile services are provided for free on the currently supported platforms. SezWho will be offering upgrades to premium products and services in the future, but basic SezWho integration for standard social media platforms will always be available as a free service.

  • [14] JS-Kit Q&A Forum is unfortunately pretty unusable as there is no way to perform any searches. It has over 450 pages of comments, so even if I am pretty sure there is a lot of information in there, it is a pity that you cannot get to it. Search Engine Friendliness is a major and extremely important feature that you should consider while integrating a 3rd party commenting system.
  • [15] SezWho FAQ Page contains minimal information and unfortunately I couldn’t find other sources.

More information

Here is a set of other features compared on RWW:

2531532507_9464b2f583.jpg

Disclaimer

SezWho documentation is pretty scarce, so I haven’t been able to find detailed information on the evaluated set of features.


12 Comments

Filed under technolog, Tools

BeautifulSoup or SGMLParser Bug

If you are reading this, you already know what BeautifulSoup is and how useful it is while working with XML/HTML in Python (in case you are not familiar with it, I’d encourage you to read its documentation). So I’ll just skip to the main reason of this post: a bug in parsing the <script> tags in HTML documents.

10.1.jpg
According to the documentation, BeautifulSoup knows how to handle the body of a <script> tag, meaning that it knows to treat its content as a pure string and not perform any additional parsing on it. Unfortunately, I’ve discovered a corner case where it behaves incorrectly.

Here is the sample HTML that will reveal the bug:

<html>
<head></head>
<body>
  <script type='text/javascript'>
    document.write('</script>');
    document.write('<div></div>');
  </script>
</body>
</html>

The problem is that the string ‘</script>’ tricks the parser to believe that the end of the <script> tag is reached and so instead of getting a single Tag from the <script> HTML tag it basically results in 2 elements: a Tag and a NavigableString that contains the rest of the <script> tag (i.e. what comes after the ‘</script>’ string: '); document.write('<div></div>');).

This basically means that for any HTML that contains a similar fragment rewriting it will lead to broken <script>s. Unfortunately, I haven’t been able to figure out a solution. My impression is that this parsing happens at a very low level and this makes me think that the bug might not be one of BeatifulSoup but rather a bug in SGMLParser.

The affected version is 3.0.7a. Meanwhile it looks like a new release has seen the light, but I haven’t tested it yet. The new BeautifulSoup 3.1.0 has replaced the SGMLParser with HTMLParser (in the attempt to make BeautifulSoup compatible with Python 3.0) so this bug might be already fixed.

If we are at bugs, I’d also like to mention one in Python 2.5.2 MacOS:

MemoryError
Python(72261) malloc: *** mmap(size=2097152) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Exception exceptions.MemoryError: MemoryError() in  ignored

Things are much simpler with this one, even if the displayed information doesn’t offer enough details. The above bug is basically the result of adding strings to a list in an infinite loop (so a programming problem, but with no indication of the error).


3 Comments

Filed under technolog