home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!dreaderd!not-for-mail
- Message-ID: <usenet/creating-newsgroups/justification_1082376800@rtfm.mit.edu>
- Supersedes: <usenet/creating-newsgroups/justification_1079688903@rtfm.mit.edu>
- Expires: 3 Jun 2004 12:13:20 GMT
- X-Last-Updated: 2001/10/31
- Organization: none
- From: rob@alt-config.org (Rob Maxwell)
- Newsgroups: alt.config,news.groups,alt.answers,news.answers
- Subject: [FAQ] Gathering Traffic Data for Proposed Newsgroups
- Followup-To: alt.config,news.groups
- Approved: news-answers-request@MIT.Edu
- Originator: faqserv@penguin-lust.MIT.EDU
- Date: 19 Apr 2004 12:15:19 GMT
- Lines: 116
- NNTP-Posting-Host: penguin-lust.mit.edu
- X-Trace: 1082376919 senator-bedfellow.mit.edu 565 18.181.0.29
- Xref: senator-bedfellow.mit.edu alt.config:425632 news.groups:479719 alt.answers:72577 news.answers:270269
-
- Archive-name: usenet/creating-newsgroups/justification
- Last-modified: 10 June 2001
- Posting-Frequency: Monthly (on the 1st)
- URL: http://www.alt-config.org/justification.htm
- Maintainer: Rob Maxwell <rob@alt-config.org>
- Disclaimer: Approval for *.answers is based on form, not content.
-
- Gathering Traffic Data for Proposed Newsgroups
- Or
- How to use Google Groups
-
- The traditional expectation that a newsgroup justify its existence by virtue
- of existing Usenet traffic goes back to the earliest days. It precedes the
- birth of alt.*, the Great Renaming that bought forth the Big 7 (later the
- more familiar Big 8 with the creation of the humanities.* hierarchy in 1995),
- and even the rise and eventual fall of the backbone Cabal.
-
- In the early 1980s, if discussion of a topic became significant enough, a new
- newsgroup was created to centralize the discussion. With only a relatively
- few corporate and university mainframes providing the Unix Users' Network
- (Usenet) to a similarly few readers it was fairly easy to see when a topic
- was worthy of receiving its own newsgroup. Today with over three Gigabytes of
- text-only discussion occurring on a daily basis coupled with the abuse of the
- alt.* newsgroup creation process leading to a significant number of alt.*
- newsgroups not being carried on any given news server it has become
- effectively impossible to see when a topic becomes popular enough to warrant
- a newsgroup of its own.
-
- This is where Google Groups comes into the picture. It would start in 1995
- when Deja decided to begin archiving Usenet text postings until 2000 when the
- task became too overwhelming and expensive leading them to try different
- things but ultimately their efforts would be futile leading to their sale of
- their archive and name to the Internet search engine company Google. After a
- rough start, Google was finally able to bring together Deja's massive archive
- with their recent efforts at archiving Usenet under the name of Google Groups
- <http://groups.google.com/>.
-
- Getting started
-
- The journey to Justification begins at Google Groups' Advanced Group Search
- <http://groups.google.com/advanced_group_search>. What you will be looking
- for is how often the topic is discussed in English on Usenet. The customary
- method uses a search for the keyword or phrase being used over the last
- ninety-days. The recommended quantity of on-topic posts is ten (10) per day
- on average. For the sake of this demonstration we will be trying to justify
- the ABC television show "20/20".
-
- Start by typing 20/20 into "Find Messages with all of the words", change the
- dropdown box from "10 messages" to "100 messages", Language Return messages
- written in "any language" to "English", and Message Dates () Return messages
- posted between 29 Mar 1995 to the date three months before today's date. A
- visual example is available at: <http://www.alt-config.org/20-20a.gif>
-
- The results for this search for "20/20" on 27 May 2001 produced these
- results:
-
- Relevant English Messages for 20/20 from 28 Feb 2001 to 27 May 2001 Results
- 1- 100 of about 12,400. <http://www.alt-config.org/20-20b.gif>
-
- That averages out to 137.78 posts per day which clearly meets the 10 per day
- recommendation, or does it?
-
- Refining the search results
-
- Taking a closer look at the 20/20 example shows that the first on-topic
- mention of the show is the 14th search result. <http://www.alt-config.org/20-
- 20c.gif>
-
- Although this is an extreme example which is badly contaminated by "%20"
- which is a way of representing a space in a URL when of course spaces are not
- allowed and is often in a search result URL which is seen in the third search
- result for 20/20.
-
- Repeating the search for 20/20 and adding "abc" it is on produces radically
- different results:
- Relevant English Messages for 20/20 abc from 28 Feb 2001 to 27 May 2001
- Results 1-100 of about 374
-
- Three hundred seventy-four averages out to a mere 04.16 posts per day coming
- to less than half of the desirable results. <http://www.alt-config.org/20-
- 20d.gif>
-
- This is why your initial search results must be checked carefully before
- attempting to use them. First off, there is a known glitch in the software
- Google acquired from Deja which usually does a poor (sometimes comically
- poor) estimate of "about" how many results were found. A blatant example of
- this was a search for "infertility insurance":
-
- Relevant English Messages for "infertility insurance" from 18 Feb 2001 to 18
- May 2001 Results 1 - 4 of about 6. <http://www.alt-config.org/20-20e.gif>
-
- The quick way to see the actual totals or least enough to see if there is
- justification which of course would be 900 on-topic messages over 90 days is
- to scroll down to the bottom of the page (or press the [End] key) and double-
- click the 9 under Goooooooooogle which will take you to the 901st message if
- there is one. [Note: This is why "100 messages" is selected instead of the
- default "10 messages".] The glitch is meaningless if the top line is:
-
- Relevant English Messages for "_______" from 28 Feb 2001 to 27 May 2001
- Results 901-1000 of about #,###.
-
- Things to avoid
-
- Most of the things that can falsely inflate results show up on the last
- pages. A weekly Frequently Asked Questions (FAQ) on the topic or containing a
- reference to same will produce 12-14 identical results with only one being
- valid. Far worse then this is when the subject ends up in someone's signature
- if they post a few messages per day they can create a few hundred false hits
- in the 90 day period. A sig hit requires a search in the same time frame for
- the author to determine the total number of hits the sig has caused and then
- finding out the number of actual posts made on the subject being searched.
-
- ... END ...
-
-
-
-