The Tao of Searching: accuracy

Showing posts with label accuracy. Show all posts

Monday, April 23, 2007

Skewed Survey's: or Lies, Damn Lies, and Statistics

MSNBC has put up a web page entitled "About our Live Votes and surveys: How 1,000 people can be more representative than 200,000"
http://www.msnbc.msn.com/id/3704453/

This is an important addition to the discussion about information literacy. It is a concise and informative article about how polls, surveys, and online votes can differ greatly in results even on the same topic and with the same questions. It is also gratifying to see a major media outlet not only to be so circumspect about how they present information, but to also be open about it with the public. Some of the more interesting statements include:

One week in the middle of the Clinton-Lewinsky scandal, more than 200,000 people took part in an MSNBC Live Vote that asked whether President Clinton should leave office. Seventy-three percent said yes. That same week, an NBC News-Wall Street Journal poll found that only 34 percent of about 2,000 people who were surveyed thought so.

To explain the vast gap in the numbers in this and other similar cases, it is necessary to look at the difference in the two kinds of surveys.

While a poll of 100 people will be more accurate than a poll of 10, studies have shown that accuracy begins to improve less at about 500 people and increases only a minor amount beyond 1,000 people.

Random selection of those polled is necessary to ensure a broad representation of the population at large.

To begin with, the people who respond choose to do so — they are not randomly selected and asked to participate, but instead make the choice to read a story about a certain topic and then vote on a related question. There is thus no guarantee that the votes would reflect anything close to a statistical sample...

This is a good and brief explanation about statistical sampling and reliability that I think would be useful for everybody to review. It is a good reminder of things we tend to forget. To many of us these may seem obvious, however it is easy when reading an article, book or web site for us to just accept the statistics offered without considering the way they were collected and the context in which they are delivered. With plethora of information providers both ethical and less ethical it is now more important than ever to check the sources and verify information with separate and independent resources.

Tuesday, January 9, 2007

A Little More Wikipedia before we discuss research uses

From Shout Blog

Mediocre Top Ten List

I was reading an interesting bit in Chris Anderson’s The Long Tail about how top 10 lists of things that are spread thinly across multiple categories tend to be banal. His primary example is that of top 10 lists of artists from all genres. ... such a list–it’s simply a jumble of popular artists....

... I was reminded of a top 10 contest two my friends and I played ... in college. We had decided to create a list of the best songs of all time. We would each come up with 10 songs that we felt would make the list, and we would go down the line and eliminate songs from each others’ lists until we only had 10 left–presumably, the 10 best.

We were really excited, as if the final list would be a miracle list straight from the heavens.

We quickly went to the task of whittling down the 3 lists. One person would exclaim, “No way that song would ever be on my list. That’s out!” And thus, this went on ....

But, as the number of songs dwindled, we began to notice something. .... It started to look like a Billboard music chart.

We were horrified. .... Our tastes are eclectic and niche. The last thing we wanted was a final list that mirrored the pop charts.

The real problem underlying it all was that our tastes were different enough to cause a “graying” of the final list. Combining our tastes into one list resulted in a bland popular songs list, without any artists that delved deeply into a genre...."

I think this can be related to what Alexis De Tocqueville referred to as "The tyranny of the majority." In chpts. 15 and 16 "Democracy in America." This rule would also come into effect in Wikipedia where the majority opinion as opposed to objective research must ultimately rule. However, eventually it became the tyranny of the unemployed and teenagers with a lot of time on their hands, who could follow entries and change them immediately after someone made a correction. One of the founders Sanders describes things this way. "It's a relatively few, difficult to deal with people that cause the problems, and once a quorum of such people were at work on the Wikipedia system," . In chpt. 16 De Tocqueville states "When the central government which represents that majority has issued a decree, it must entrust the execution of its will to agents over whom it frequently has no control and whom it cannot perpetually direct. The townships, municipal bodies, and counties form so many concealed breakwaters, which check or part the tide of popular determination." Unfortunately in Wikipedia these checks come in the form of uninformed individuals, special interest groups, or incompetents. Sanders comments on the poor writing exhibited in some of the articles. "It's really that the skills to marshal an argument, and represent the facts correctly are all skills encouraged by a solid liberal arts education. It's a problem associated more with a lack of training in the liberal arts."

In an article in the Chronicle of Higher Education Jimmy Wales "founder" of Wikipedia warned students not to refer to Wikipedia, he goes on to say "For God sake, you're in college; don't cite the encyclopedia,"

Ultimately web sites reflect the organization or entity that builds them. Wikipedia represents a wide swath of western democracy or the current western civilization's "hive mind" clearly and effectively. As such it is more a symptom of our culture especially the United States for the ages 14 - 44, than a simple problem of a few people writing some inaccurate information. That is why the debate is so heated. An indictment on Wikipedia is ultimately an indictment of "We the People" and an indictment on the entire Web 2.0 concept and the "hive mind". Unfortunately the "People" that choose to participate in Wikipedia appear to often be ill informed, biased and able to shout louder (post more often) than the more informed and moderate voices in our society, or in the Wikipedia project.

R Philip Reynolds

Tuesday, December 5, 2006

Wikipedia Britannica and missing the point

In the Academic Journal "Nature" the following article appeared:
http://www.nature.com/nature/journal/v438/n7070/full/438900a.html

Nature 438, 900-901 (15 December 2005) doi:10.1038/438900a

Special Report Internet encyclopaedias go head to head

In the article Nature makes some startling claims based on "research " it had done. The most polemic of these claims included statements such as:

"However, an expert-led investigation carried out by Nature — the first to use peer review to compare Wikipedia and Britannica's coverage of science....The exercise revealed numerous errors in both encyclopaedias, but among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three."

"Yet Nature's investigation suggests that Britannica's advantage may not be great, at least when it comes to science entries. In the study, entries were chosen from the websites of Wikipedia and Encyclopaedia Britannica on a broad range of scientific disciplines and sent to a relevant expert for peer review....Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopaedia. But reviewers also found many factual errors, omissions or misleading statements: 162 and 123 in Wikipedia and Britannica, respectively."

Michael Twindale was quoted as saying:

"People will find it shocking to see how many errors there are in Britannica," Twidale adds. "Print encyclopaedias are often set up as the gold standards of information quality against which the failings of faster or cheaper resources can be compared. These findings remind us that we have an 18-carat standard, not a 24-carat one."

Britannica of course responded to these assertions in vehement Response Called "Fatally Flawed"
http://corporate.britannica.com/britannica_nature_response.pdf

In its response Britannica cited several points that it took issue with in the article.

"Almost everything about the journal’s investigation, from the criteria for
identifying inaccuracies to the discrepancy between the article text and its headline,
was wrong and misleading."

"Nature’s research was invalid. As we demonstrate below, almost everything about the journal’s investigation, from the criteria for identifying inaccuracies to the discrepancy between the article text and its headline, was wrong and misleading. Dozens of inaccuracies attributed to the Britannica were not inaccuracies at all, and a number of the articles Nature examined were not even in the Encyclopædia Britannica. The study was so poorly carried out and its findings so error-laden that it was completely without merit. We have produced this document to set the record straight, to reassure Britannica’s readers about the quality of our content, and to urge that Nature issue a full and public retraction of the article."

"Anyone who read the article with even a modicum of care would have noticed a discrepancy
between the headline and the data themselves. While the heading proclaimed that “Wikipedia comes close to Britannica in terms of the accuracy of its science entries,” the numbers buried deep in the body of the article said precisely the opposite: Wikipedia in fact had a third more inaccuracies than Britannica. (As we demonstrate below, Nature’s research grossly exaggerated Britannica’s inaccuracies, so we cite this figure only to point out the slanted way in which the numbers were presented.)"

"Nature reviewed text that was not even from the Encyclopædia Britannica. Several of the articles
Nature sent its reviewers were not from our core encyclopedia, and in one case it was not from
any Britannica publication at all."

"One Nature reviewer was sent only the 350-word introduction to Encyclopædia Britannica’s
6,000-word article on lipids. For Nature to have represented Britannica’s extensive coverage of
the subject with this short squib was absurd, and it invalidated the findings of omissions
alleged by the reviewer, since those matters were covered in sections of the article he or she
never saw.
Other reviewers were sent only sections taken from longer articles. For example, what the
Nature editors referred to as Britannica’s “articles” on “kin selection” and “punctuated equilibrium”
are actually separate sections of our article on the theory of evolution, written by one of
the foremost experts on evolution in the world. What they claimed to be an “article” on fieldeffect transistors was actually only one section of our article on integrated circuits. For Nature to have excerpted our articles in this way was irresponsible."

Britannica goes on to further attack the study's methodology and follows it with a 13 page appendix describing the errors made by the unnamed experts with unverified facts.

In Nature's reportedly point by point three page rebuttal to Britannica's twenty page statement.
http://www.nature.com/nature/britannica/eb_advert_response_final.pdf

Nature starts out with the tone of a rebuttal but the content essentially agrees with Britannica's accusations. Nature stated that it sent "numerical" information and "some samples of errors." Britannica then asked for the study's methodology, all of the errors and the reviewers reports. Nature sent the information and put it on its web sight but refused to send the reviewers comments because"We asked reviewers whether they wished to have their name attached to their reviews, but almost all declined." Apparently Nature's experts were not prepared to stake their reputation on, or even claim the assertions they made.

Nature asserts that all of the information given to reviewers was from Britannica. This assertion however looses its efficacy because the sentence that immediately follows the claim states "This was deliberate: the aim of our story, as we made clear, was to compare the online material available from Britannica and Wikipedia."

Nature then states that the article title was not misleading. That the "The standfirst to the story read “Jimmy Wales’ Wikipedia comes close to Britannica in terms of the accuracy of its science entries, a Nature investigation finds.”

They then cite the 4 Wikipedia errors to 3 Britannica errors per article as proof that they come close. They completely fail to mention the 162 Wikipedia errors to the 123 Britannica errors described in the original article, they do mention that Britannica address almost half of the 123 errors in its response, which would lower the number of their errors and increase the ratio of errors in Wikipedia to Britannica. How are we supposed to compare these numbers? It sounds like both sides are playing games or that Nature's methodology was poor in that they had no consistent or meaningful way to record and report their metrics in the context of the comparisons.

The one statement that comes across as a bald faced lie is when Nature says,

"Another part of Britannica’s criticism concerns the fact that we provided material
from other Britannica publications, such as the Britannica Book of the Year. This
was deliberate: the aim of our story, as we made clear, was to compare the online
material available from Britannica and Wikipedia."

This is laughable The article repeatedly fails to address the web issue and never mentions the Britannica web site.

"Internet encyclopaedias go head to head"

They don not say Wikipedia's site goes against Britannica's site or even Britannica.com.

Later they ask "...if Wikipedia is as accurate as established sources such as Encyclopaedia Britannica?" By established sources they seem to be referring to"...the oldest continuously published reference work in the English language."(encyclopedia Britannica) Not to a web site that is a few years old. In fact they almost seem to go out of their way to say just "Britannica" and never say Britannica's web site , or Britannica's publications. The Nature rebuttal and article also lack any kind of notation. Britannica on the other hand has a set of 10 extensive footnotes on the first eight pages of its response. Its bibliographies in its encyclopedia provide further authoritative resources not to mention things written by the article contributers. I wonder if Nature's 'study' would have been different if they would have had to produce a bibliography of supporting sources or even had to identify their contributers? Nature also failed to mention whether they used the fee based content available through the Britannica site and many libraries or just the free services.

Britannica and Nature and the reviewers and editors all agreed on one thing. "Reviewers told the journal that many of the Wikipedia articles were “poorly structured and confusing.” "Former Britannica editor Robert McHenry declared one Wikipedia entry — on US founding father Alexander Hamilton — as "what might be expected of a high-school student".

So what does this all mean? Can I use Wikipedia or Britannica? Which one should I choose? As my title indicates both seem to miss the point. Each fail to adequately identify their user groups habits or needs for their products, or their products' role in the research process. They miss the point and argue about mostly petty mistakes. The only books I have ever heard called perfect are the Quran, Bible and Book of Mormon.

Interestingly enough the next year one of Wikipedias founders announced the creation of a wikipedia rival.
Wikipedia founder plans rival
By Richard Waters in San Francisco
Published: October 16 2006 21:08 Last updated: October 16 2006 21:08

"The latest venture from Larry Sanger, who helped create Wikipedia in 2001,
is intended to bring more order to this creative chaos by drawing on traditional
measures of authority. Though still open to submissions from anyone, the power
to authorise articles will be given to editors who can prove their expertise, as
well as a group of volunteer “constables”, charged with keeping the peace
between warring interests.
Accusing Wikipedia of failing to control its writers and editors, he said: “The latest articles don't represent a consensus view – they tend to become what the most persistent ‘posters’ say.”

Next time we will look at encyclopedias and wikipedia and see what their role is in the research process and when and how they should be used.

Twisted Sage
Forever learning but never coming to a knowledge of the truth.

The Register
Nature mag cooked Wikipedia study

The Register
Wikipedia science 31% more cronky than Britannica's

The Tao of Searching

Monday, April 23, 2007

Skewed Survey's: or Lies, Damn Lies, and Statistics

Tuesday, January 9, 2007

A Little More Wikipedia before we discuss research uses

Tuesday, December 5, 2006

Wikipedia Britannica and missing the point

Special Report Internet encyclopaedias go head to head

"Internet encyclopaedias go head to head"

Add This Feed

What I"m Working on Now

OH, How I Love to Buy Books!

Blog Archive

Blogs Somehow Related

Tags