How Big Is Khoras?

Started by avisarr, September 14, 2008, 12:04:03 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

avisarr

I've often wondered how big the Khoras website is... specifically, how many words of actual content is it? Not counting HTML tags and such, how many words of ACTUAL content is it?

In the Khoras FAQ, I quoted "over 2 million words", but that was just an estimate made based on some preliminary measurements. Just a good guestimate based on some limited data. Probably not accurate though. So, I have wondered... what's the real number?

It's surprisingly hard to do a word count for an entire website while excluding HTML tags and such. I've looked all over. I eventually found some software online by a company called InSpyder which has a utility that can do a word count for an entire website. However, I don't want to spend $60 just to get a word count total. That's too expensive. For months, I avoided buying that software and often wondered if there was any free software out there that could do it.

Well, tonight I finally found that "other" option. I found a translator's utility that can count the non-HTML tag "content" words in a series of HTML documents. By combining that with a Windows search on a local copy of the site to pull out all the HTML files, (and compensating for HTML files that had the same name), I was finally able to get a fairly accurate count. At least, I think this is the most accurate count I've done to date.

So, what was the final number? According to this utility, Khoras weighs in at 1,585 HTML files and 1,050,721 words.

Smaller than I thought... but at least we're over one million words. :)  By the way, this does not include the Khoras Forum or the Guest Book. I think I'm going to have to update the FAQ now that I have a fairly accurate number.

I would LOVE to do another word count with a different method or piece of software ad see how they compare. I think using multiple methods would be the best. So, if anyone out there can think of another way to count the total number of actual content words in a website, let me know. I'd like to try it. InSpyder InSite looks good, but it's $60. I'd like to find something free. Anyone have any ideas?

Golanthius

#1
I don't know about word count, but I do know that the world content (just the left side bar, including updates) is 178 mb on my flash drive. It fills four 4 inch "D" ring binders when printed out.
That's pretty impressive.

tanis

He who fights with monsters might take care lest he thereby become a monster. And if you gaze long into an abyss, the abyss gazes also into you.

Drul Morbok

Just wondering if you got a more recent number...would be interesting to see how it developed.

Also I'm not sure if I understand the challenge of counting words...with all html files in a directory (or its subdirectories), I think it shouldn't be too difficult via bash/cygwin.
Getting those files in a directory in the first place... don't know about that...

But maybe now, 13 years later, a Notepad++ plugin could do it...

David Roomes

I decided to search the web again and see if I could find a tool that could count it up. And this time I was successfully. It wasn't free, but it was cheap... only 12 euro. So, not bad. And worth it, in my opinion. This software tool was able to count every word in every HTML page in the site. Here are the totals:

Total Number of Pages:  1,622

Total Number of Words:  1,271,820

Those totals only include actual content of the site. It does not count anything in this Forum. Due to interesting questions, in-depth conversations and verbose session summaries, this Forum would likely add another million words...
David M. Roomes
Creator of the World of Khoras