Monday, November 8, 2010

Presentation on Tools for Web Crawling

We had initially tried to select 2 other topics but finally settled for this topic as those 2 topics had already been taken by the other groups(We don't remember which topics they were).
Initially we didn't like it as we thought this topic wouldn't have much information that we could learn and share with our classmates but over the course of our research, we found out that we were VERY much wrong! There was so much to learn and we had to decide what could be shared with the class without getting them confused as web crawling involved understanding the code that web crawlers use for crawling the web in their search for information.

In our presentation, we covered the definition of a web crawler, types of web crawlers, examples of known web crawlers and also gave a demonstration of two of them(one written by us)

There are 4 types of web crawlers - search engine crawlers, email harvesting crawlers, corporate crawlers and specialised crawlers.
Three examples of known web crawlers that we covered were Websphinx, Universe(Which we decided not to cover in the presentation) and python crawler(which as mentioned before, we made). We decided to make the web crawler using python as we had covered python in class and we found it easier to understand the logic behind the crawler than compared to if we had tried to do it using Java.

In the end, we enjoyed making this presentation and practising amongst ourselves to try and deliver that perfect presentation despite running into hurdles like spending 2 weeks away from each other due to the CWG break, understanding how the web crawler works using python coding, etc.


Samiran Roy (2010073)
Shayan Lahiri (2010078)
Apoorv Saini (2010020)

No comments:

Post a Comment