Alex Chudnovsky is the founder and managing director of Majestic-12, a UK based firm that specializes in cross-platform .NET/C# development of scalable high-performance data analysis applications with primary focus on creation of the World Wide Web search engine. Majestic also uses the trade name Majestic SEO which publishes a backlinking tool that is a competitor to SEOmoz’s Linkscape.
Alex previously worked for a number of well-known retail businesses with primary focus on maximizing sales from their respective retail web sites. Utilizing extensive business and technical skills for Jungle.com (part of the Argos Retail Group), formerly top 10 UK e-tail website that handled over a billion hits annually, Alex led many significant projects with a proven overall economic effect of over 15 m in additional online sales.
Eric Enge: Tell me a little bit about Majestic-SEO at a company level.
Alex Chudnovsky: We have a registered company in the UK, which is called Majestic-12 Limited, and we started a distributed search engine project four years ago. The goal is to build a viable alternative to Google. And because we were small and they were big, we had to find some ways of catching up. The way that we chose was to develop distributed computing on the internet.
Projects such as [email protected], distributed.net were the basis of the approach we took. We created software and we started crawling the web using volunteers all around the world. This is our main project, and it has been going on for four years now. About two years ago, when we used the data to create a full-text search index, we had one billion pages indexed. As we were building it bigger and bigger, we realized that relevance was becoming a problem.
You can’t beat Google unless you are as relevant as Google. The solution for this was to look more closely at the web graph, look at the backlinks and analyze link text in order to be just as smart about it as Google is. You really have to do that, because when you rank in competitive categories, you have so many matches that you have to discriminate against many of them to decide which ones are the best and most relevant.
This is where backlinks come into play big time, because that’s really one of the key objective ways to differentiate between more popular and less popular sites. When we realized this two years ago, it became clear that we needed a separate index that would help us understand backlinks and link text better. So, we started working on the so-called “anchor index” and we’ve been doing it for two years with many index builds.
It was very, very difficult to build a large index that was close to that of Yahoo and Google. But, we built it, and early this year we launched a commercial offshoot to help us fund further R&D activities. This is what Majestic-SEO was designed for. It is the same company, but it’s our trading name that we use to position ourselves in the SEO industry.
So, what we have in Majestic-SEO is the biggest publicly available backlinks index. It allows webmasters to verify their sites and obtain extensive backlinking data for free. If you want information for your competitor websites, then you can pay to obtain reports and compare the websites. It’s essentially like Google Webmaster Tools, but you can get information on competitive sites and we show complete data.
Unlike Google, we show all data that we have, and we actually have quite a lot of sites with many millions of backlinks. We will show you the whole lot if you want it. And, we include a number of analytical options that allow you to focus on the areas you are most interested in. So, in a nutshell, this is what Majestic-SEO is about.
Eric Enge: How many web pages have you crawled?
Alex Chudnovsky: So far we have crawled about 114 billion in total (this figure includes urls that failed to get crawled due to various reasons, 404 Not Found, server was down etc). The total crawled data size is over 2.5 peta bytes. If you look at the number of unique pages that we include within our index in Majestic-SEO, we have over 52 billion unique crawled pages in our current index that will grow again in January 2009. We show all these stats on our website. We consider a url being a page if the URL was successfully crawled. We analyze those urls and pick up links from those pages as well as other metrics.
If you look at our database in terms of unique URLs, then we have lots more of those than crawled pages. Google recently claimed to have one trillion unique URLs that they knew of, but they have not crawled them all yet. It’s the same with us. For us, the number of unique URLs is 346 billion, 52 billion of which are pages, meaning that these are the URLs that we crawled successfully at least once. Our aim is to catch up with Google by the end of next year.
Eric Enge: You’ve organized this in a product that people can explore and pull down link profiles for different domains? I presume you do things like pull the anchor text and that sort of stuff?
Alex Chudnovsky: Yes, we supply the link text, if it was present, date when backlink was found ,and a number of flags, such as whether it was an image link, or it was a redirect, or whether it was in a frame. The latter can be very useful because you can actually check backlinks for your own site. You can actually find the people who have embedded your site in a frameset, and you may not necessarily see this information from your log files, because if it’s in a frameset, the referrer may not be set in log files and it may not be obvious to you that your site was quite literally framed.
We also have a measure of how important the page is, called ACRank. ACRank stands for “A Citation Rank.” What it basically is, is a number from 0 to 15, with higher being better. A higher number shows that there were more referring external domains linking into that page. For example, if both Google and our site’s homepages linked to your site, we will rank the Google link higher than ours because Google itself would have a lot more referring domains that point into them.
This allows our customers to focus on the most important links first, because they would know that those links are coming from pages that are themselves very heavily linked to.
Eric Enge: Right. You are doing that based on a proprietary calculation method?
Alex Chudnovsky: Yes, it is very simple at the moment. It’s basically an indication of how many unique referring domains will link into the page which links to you.
Eric Enge: When did you release this product?
Alex Chudnovsky: We launched Majestic-SEO in February of this year. We were not selling data at the time we launched it because it was effectively soft-launched as a test to allow webmasters to come to our site and verify their domains to get information for free. So, we were getting all this feedback. In July we launched new option, which allowed our customers to actually buy reports on domains that they do not own. From the commercial point of view we launched in July 2008.
Eric Enge: How many people have signed up so far?
Alex Chudnovsky: We have a lot. It’s exceeded our expectations definitely. We are gaining acceptance right now, and we are converting traffic really well. We get a lot of people who come to our site just to verify their own domain and to check out whether the service is good or not.
Then we convert them to actual paying customers because they see that they can look at their own domains and the information we have on their own sites. This is where they become believers in our information, because it’s the best way to check.
Eric Enge: What is the commercial model?
Alex Chudnovsky: We have different pricing for different domains. The fundamental issue for us is that some domains are a lot bigger than others. For example, if we take Google as a domain then our database tells me that we have 3.7 billion external backinks to google.com.
When we name this number, it means that we actually have that many backlinks that we can retrieve. This is quite a critical difference from some of our competition. They will often show you a limited number of backlinks, such as what you can get in Yahoo Site Explorer. But in our case, when you buy access to the domain, you get the whole lot, all the information you can retrieve at no extra charge.
So, we have very large domains like google.com and we have small domains like our own site www.MajesticSEO.com. We have one thousand external backlinks in our database at the moment, and that is a number that is growing quite quickly. So, we have different domain pricing which depends on how heavily linked the domain is.
We also offer some time based options. You can subscribe to domains data for seven days, 1 month, 3 months, 6 months or 12 months. So for domains that you might just be curious about, it makes sense to buy them for seven days, just to check out the information. Those that you want to keep an eye on for longer, it makes sense to buy for 12 months, as the monthly price gets reduced as you subscribe for longer periods of time.
Eric Enge: What’s the cost for a domain that has 10,000 links to it?
Alex Chudnovsky: Let’s take your site for example. On your site, we have 78,000 external backlinks coming from 2,500 referring domains as of now. If you look at the price, you can get it for 10 credits for 7 days. Now, we sell credits and we have different packages for credits. If you buy a bigger package, you get a bigger discount. For example, if you are our client and you want to use our service a lot, it makes sense to buy a thousand credits, because you would get a 30% discount on that.
So, if you are a big buyer, the actual price of domains that you buy will be lower for you. In your case, it will be 10 credits for 7 days. In monetary terms, if you buy one thousands credits, it should cost about a dollar a credit. So that means that data on your site could be had for $10. That would include almost 79,000 external backlinks coming from 2,500 referring domains. So, you’ve got quite a popular website. We are also considering introducing a fixed fee subscription model in Q1 2009.
Eric Enge: That’s interesting. Yahoo reports 94,800 by the way. Of course, is has its own accuracy issues as we all know. When did you go live?
Alex Chudnovsky: Basically we do a lot of research at Majestic. We first launched our index in February of this year, but we only started selling payable information in July. The reason for that is that as we were building different indexes. We were providing quantitative assessment to understand how close we were to Yahoo and Google.
To do this, we picked 20 URLs, some of which were from well-known websites such as Google, Wikipedia, CNN.com, etc. And, we took backlinks from last year that were reported by Google and Yahoo for these URLs.
Every time we made an index, we actually found the backlinks reported by Yahoo and Google in our index. So as we were growing our index, we could see whether we were improving our quality or not. And we found out that we were matching more and more . What it was showing was that our index is actually getting closer to that of Yahoo’s and less so to Google’s. And I think this is interesting because I don’t think our competition is doing something like this, at least not publicly.
Eric Enge: You are continuing to run your own crawlers?
Alex Chudnovsky: Absolutely, yes.
Eric Enge: Does your client base currently skew towards Europe or other geographies?
Alex Chudnovsky: I would say we get clients from the United States, Canada and a lot from Europe. I would say maybe it’s 60% from Europe and 40% from America.
If you look at market size in real terms, it probably should be the other way around really. We are not as strong in the United States as we are in the Europe, but we are gaining more and more customers and definitely growing in North America,
Note that in your interview of Rand Fishkin about Linkscape, you asked Rand a question about the bots that they are making use of, whether they are leveraged and if they do custom crawling for themselves. Rand said, in some cases but not all. At Majestic-12, we have our own crawler and we publish information about our own crawler and we are very open about these things.
We are not asking others to crawl for us. We actually crawl the data ourselves, we have the URLs and we decide what we crawl. It’s a hundred percent our effort.
Eric Enge: So you must have a fairly substantial data center in order to be able to do that level of crawling?
Alex Chudnovsky: Because we have a distributed computer network it allows us to offload this complicated task to a lot of computers. So, we do not actually need the data centers you would imagine required to sustain this sort of crawling. That’s our commercial advantage that gives us hope that we can reach Google scale in respect of webgraph (backlinks) analysis.
Eric Enge: How do you acquire the access to the computers that are within your network?
Alex Chudnovsky: This is done by people who join our project, the Majestic-12 Distributed Search Engine project. They join it and they will use our software on the computers that they own. We are not actually installing it ourselves. It’s one hundred percent volunteer and we have built quite a name in the distributed computing area. There are a number of projects out there, but we are fairly unique in that distributed computing projects would usually are CPU intensive.
Eric Enge: How do you recruit your participants?
Alex Chudnovsky: Well, we have a website, www.majestic12.co.uk, which is our main project site and they sign up there. We have more than 100 regular users who return results to us. In a full day they usually crawl more than 5 terabytes of data and around 200 million URLs. The first people who found us were the people who saw our bot in their log files.
After they saw our bot, they searched and found our web page, read about our project and liked the idea, then they joined it. This is how we started, and after some time we become known among the distributed computing community. We have active people who are also doing other distributed computing projects.
They talk about us and this helps increase the interest in our project, so we have grown to a point where we sustained high number of volunteers who can come to us.
Eric Enge: What’s in it for them?
Alex Chudnovsky: Remember, our main objective as a company is to build a search engine which can rival Google in terms of relevance, speed and scale. As a part of this, we also need to understand the web better, this is where backlinks come into play. It’s strictly volunteer, we have not paid them anything at the moment. What we do is that we will have a separate company for our partners, which will own 20% of shares in the main commercial company, which also owns Majestic-SEO trading name. I have to stress here that money was not the main motivation for the people who took part in our project.
We don’t really want people to come to us specifically for a short-term financial incentive in mind, as this can cause problems. In our case, many people who came naturally were interested in distributed computing in general and our project in particular. They like the project, they like the idea of trying to to create a competitor to Google, and they don’t like monopolies.
They found that the administration of the project, the way we work, the direction in which we are trying to move, and the feedback that we give to them is good; so it’s worth sticking around. This is really how we retain the people who are taking part in this project.
Eric Enge: How many participants do you have?
Alex Chudnovsky: Today we have more than 100 active participants. However, if you look in terms of computers, we have about 150 machines crawling the Internet and analyzing data from different locations in the world.
Eric Enge: How do you get the service to perform acceptably well?
Alex Chudnovsky: That was very difficult. Let me just tell you what you can do in our index. First, you can search for the exact URL and they give you a quick answer. Or you can search for a domain by typing the domain name. Say you typed google.com, in this case we would have search results showing top URLs from that site with some basic statistics, such as how many referring backlinks are internal or external.
How many referring domains it has is also something we show, but something Yahoo does not. I think our competition wants money to show this information, but we show it for free. A lot of effort was put into design of the index to make sure that it can scale to the number of URLs that Google and Yahoo have.
Eric Enge: You must need some powerful hardware.
Alex Chudnovsky: It does use fairly powerful hardware.
Eric Enge: How many servers do you have that are involved in this process?
Alex Chudnovsky: One part is the crawling and analysis stuff, which is done by distributed crawler. That is around 150 machines. Now not all of these computers run 24/7, but many do and they do big chunk of work. We have a lot of hardware involved; but because of the way we did it, we don’t need to have this hardware on the premises.
These computers will do the analyses, the crawl and they will send the data back to the central servers. The servers also do quite a lot of work, but we don’t need that many. We have less than 10 servers that do the final processing and searching at the moment.
Eric Enge: Thanks a lot, Alex!
Alex Chudnovsky: Thank you very much, Eric!