Tuesday, May 31, 2011

Anatomy of Spam Business

This paper details the way Spam Business model works.

Paper presents the work in great detail and it is astonishing to know that Spammers have a ecosystem in which they thrive. They setup their stores through affiliate program and host them in bullet-proof hosting servers which are managed by companies which do not abide by request to take down the site.

Sunday, May 29, 2011

Elastic Search - an interesting search solution

ElasticSearch is a open source RESTful Search Server, built on top of Lucene Library.
It boasts of following features



  • JSON over HTTP

  • Free Search Schema

  • Near Realtime search

  • Easy Distributed Index and Search

  • Multi-Tenancy

  • Ready for Cloud - very easy for setting it up in Amazon Cloud

  • JAVA API Support

  • Support for Facets

  • It uses Write Behind Queue to store Index updates. It makes use of TransactionLogs to keep track of Index updates.

  • Reads can be done on Shard Replicas.

However, it does not support XML.


On the surface, it appears that this product is ready for Web 2.0 world and is ideal for cloud deployment.


Its feature set is not as rich as Apache Solr but it does have decent support for Facets which is hot nowadays. It has very good Data Visualization support which makes it ideal for Monitoring Tools.


How does it compare with Solr?



  • Solr is richer in feature set, w.r.t analyzers and facets.

  • Solr's distributed setup is not ideal and looks awkward. ElasticSearch's design seems to be robust.

  • Solr has been there for much longer and has matured community behind it.

  • ElasticSearch is so far only one committer's work.

  • ElasticSearch scores over Solr in terms of Cloud Readyness.

  • XML support is misisng in ElasticSearch which is not a big deal as JSON is standard for Web 2.0 world.

You can get more info from these slides.

When do you use ES?


  • Big Index or Realtime Search is needed

  • or, there are many indexes

  • or, have a multi-tenancy requirement ( Solr core is okay)

When you should not use ES?



  • If team is comfortable with Solr then stick to it

  • justifying ES in a large corp would be difficult

More info can be obtained from here.






Logstash: A Free/Open Source alternative to Splunk

Today I came across a wonderful presentation on logstash, a open source log archiver and analyzer which makes use of ElasticSearch to index and search log data.

What makes it interesting is, it has very good support for collecting events from different sources such as log files, sys logs, sockets as well as MQ. It will let you apply different filters and stores its index in ElasticSearch.

Use of elasticsearch is interesting as it uses JSON to index/read data and provides an easy way to search and visualize log data. ElasticSeach can scale better than Solr and is ready for Cloud

This is a compelling package and offers a credible alternative to Splunk.

logstash project url is this.

Sunday, May 22, 2011

How to log httpclient using log4j and Java Util logging

there are times when httpclient's trace needs to be logged

we can use a log4j file like this

log4j.rootLogger=INFO, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%c] %m%n

log4j.logger.org.apache.http=DEBUG
log4j.logger.org.apache.http.wire=ERROR


You can pass the path of this config file at command line like this
-Dlog4j.configuration=C:\myworkspaces\Client\src\log4j.properties

if you are using Java util logging then you can use this
-Djava.util.logging.config.file=C:\myworkspaces\Client\src\logging.properties

java util log file needs to be like this

.level = FINEST

handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level = ALL

org.apache.http.level = FINEST
org.apache.http.wire.level = SEVERE

Monday, May 16, 2011

We are in the price rise age

Days of cheap stuff seem to be gone. It looks like price of commodity and everything else is about to go up or is already inching up.

As per, Jeremy Grantham times are changing and we are in for rude shock.

Here is the summary of Grantham’s thoughtful newsletter: (lifted from here)
  • Until about 1800, our species had no safety margin and lived, like other animals, up to the limit of the food supply, ebbing and fl owing in population.
  • From about 1800 on the use of hydrocarbons allowed for an explosion in energy use, in food supply, and, through the creation of surpluses, a dramatic increase in wealth and scientific progress.
  • Since 1800, the population has surged from 800 million to 7 billion, on its way to an estimated 8 billion, at minimum.
  • The rise in population, the ten-fold increase in wealth in developed countries, and the current explosive growth in developing countries have eaten rapidly into our finite resources of hydrocarbons and metals, fertilizer, available land, and water.
  • Now, despite a massive increase in fertilizer use, the growth in crop yields per acre has declined from 3.5% in the 1960s to 1.2% today. There is little productive new land to bring on and, as people get richer, they eat more grain-intensive meat. Because the population continues to grow at over 1%, there is little safety margin.
  • The problems of compounding growth in the face of finite resources are not easily understood by optimistic, short-term-oriented, and relatively innumerate humans (especially the political variety).
  • The fact is that no compound growth is sustainable. If we maintain our desperate focus on growth, we will run out of everything and crash. We must substitute qualitative growth for quantitative growth.
  • But Mrs. Market is helping, and right now she is sending us the Mother of all price signals. The prices of all important commodities except oil declined for 100 years until 2002, by an average of 70%. From 2002 until now, this entire decline was erased by a bigger price surge than occurred during World War II.
  • Statistically, most commodities are now so far away from their former downward trend that it makes it very probable that the old trend has changed – that there is in fact a Paradigm Shift – perhaps the most important economic event since the Industrial Revolution.
  • Climate change is associated with weather instability, but the last year was exceptionally bad. Near term it will surely get less bad.
  • Excellent long-term investment opportunities in resources and resource efficiency are compromised by the high chance of an improvement in weather next year and by the possibility that China may stumble.
  • From now on, price pressure and shortages of resources will be a permanent feature of our lives. This will increasingly slow down the growth rate of the developed and developing world and put a severe burden on poor countries.
  • We all need to develop serious resource plans, particularly energy policies. There is little time to waste.

Saturday, May 07, 2011

Urbanization of India - the road ahead

Recent controversy about Lavasa Project shows that India has to evolve a lot for much needed urban renewal of country.

As India prospers and agricultural sector languishes, farmers and workers are moving to cities to earn their livelihood. This is creating pressure on cities and we need to build new cities. Lavasa can be a good model but for our rotten system.

However, we have to keep working on our system so that new cities can be built.

you can read about lavasa here