Saturday, December 9, 2006

The greatest crisis facing us...

Hi, everyone :),

I am going to postpone the second half of the encyclopedia thing for a groundbreaking speech I read.

"The greatest crisis facing us is not Russia, not the Atom Bomb, not corruption in government, not encroaching hunger, nor the morals of the young. It is a crisis in the organization and accessibility of human knowledge. We own an enormous "encyclopedia" - which isn't even arranged alphabetically. Our "file cards" are spilled on the floor, nor were they ever in order. The answers we want mat be buried somewhere in the heap, but it might take a lifetime to locate two already known facts, place them side by side and derive a third fact, the one we urgently need.
Call it the crisis of the Librarian.
We need a new "specialist" who is not a specialist, but a synthesist. (n) We need a new science to be a perfect secretary to all other sciences."

Who wrote this? David Lynch? Google's founders Larry Page and Sergey Brin?
Nope

Robert A Heinlein in 1950 in a piece originally called "Where To?". In this piece Heinlein makes several predictions or extrapolations about the future. It was repeated as "Pandora's Box" in the book The Worlds of Robert A Heinlein published by Ace Books, New York in 1962. In the 1962 version after twelve years of thought and history had gone by he amended some of his predictions but this one remained the same. Then in 1980 it was again reprinted by Ace Books in Expanded Universe Robert A. Heinlein as "Where To?" (pp 317-371). He again had the opportunity after thirty years of reflection and history to revise his statements. Many of them were changed but this one was not. In his list of the forerunners of these "synthesists" that he makes in 1965 he mentions several job titles but does not include librarian as one of them.

In Snow Crash by Neal Stevenson, Bantam Books Paperback 1993 (p107) we read:

"The Librarian daemon looks like a pleasant, fiftyish,
silverhaired, bearded man with bright blue eyes, wearing
a V-neck sweater over a work shirt, with a coarsely
woven, tweedy-looking wool tie. The tie is loosened,
the sleeves pushed up. Even though he’s just a piece
of software, he has reason to be cheerful; he can move
through the nearly infinite stacks of information in the
Library with the agility of a spider dancing across a
vast web of cross references"
The "librarian is a piece of software that programs itself.
In her article I Librarian Hilda Kruger described "Having an agent methodically crawling the Web, gathering the information you’ve specified, is a bit like having a full-time reference librarian residing in your PC."
(INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2005 p124).
She talks about these Intelligent Software Agents (ISA) burrowing into our lives but we are way beyond that I have two different types of agents downloaded as add ons to my FireFox browser.

One is "Clearforest Gnosis" http://sws.clearforest.com/Blog/?page_id=32 its function is to:
ClearForest Gnosis uses advanced natural language processing techniques and ClearForests’s Semantic Web Service (SWS) to
extract meaning from the content of any web page.

With a single click, Gnosis will identify the people, companies, organizations, geographies and products on the page you are viewing. Using the built-in navigation sidebar you can gain immediate understanding of the page’s contents.
The other is "read4me" XUL http://read4me.sourceforge.net/wiki/index.php/Main_Page
"Client front end to read4me server. read4me is a Python feed-reading web service. It reads RSS or atom feeds and, using Bayesian statistics, reports how much you will like the articles. This project includes a server and a Firefox extension client."
Then I organize it all with Zotero
Zotero [zoh-TAIR-oh] is a free, easy-to-use Firefox extension to help you collect, manage, and cite your research sources:

- Automatic capture of citation information from web pages

- Storage of PDFs, files, images, links, and whole web pages

- Flexible notetaking with autosave

- Fast, as-you-type search through your materials

- Playlist-like library organization, including saved searches (smart collections) and tags

- Platform for new forms of digital research that can be extended with other web tools and services

- Formatted citation export (style list to grow rapidly)"
In her abstract Kruger says "One of the main concerns of this paper is the continued relevance of information professionals as infomediaries in our future society." Maybe it has already gone from continued relevance to no relevance? Out of all of these books and articles being written there is a way forward offered in "Snow Crash" The Librarian states:
"I was not coded by a professional hacker, per se, but by a researcher at the Library of Congress who taught himself how to code. He devoted himself to sifting through vast amounts of irrelevant detail in order to find significant gems of information."

The hero of the story "Hiro Protaganist" replies "So he was kind of a meta-librarian." Is this our path? most of these projects including the two browser add ons I described are either open source or accept help from others in their development. The whole point behind Web 2.0 and Opensource software and publishing, is group participation. Do we follow their lead? It appears that the direction our profession has been to try and impose upon the web and its users, meta-data and proprietary databases and it was the wrong direction. Maybe it is time for a radical new direction. Maybe it is time for meta-librarians as programmers? If we can't take the lead in these new projects maybe we can at least make a valuable contribution.

Lets talk it over and I will make a list of new job duties for the meta-librarian and get back to that encyclopedia mess.

R Philip Reynolds

Tuesday, December 5, 2006

Wikipedia Britannica and missing the point

In the Academic Journal "Nature" the following article appeared:
http://www.nature.com/nature/journal/v438/n7070/full/438900a.html

Nature 438, 900-901 (15 December 2005) doi:10.1038/438900a

Special Report Internet encyclopaedias go head to head

In the article Nature makes some startling claims based on "research " it had done. The most polemic of these claims included statements such as:

"However, an expert-led investigation carried out by Nature — the first to use peer review to compare Wikipedia and Britannica's coverage of science....The exercise revealed numerous errors in both encyclopaedias, but among 42 entries tested, the difference in accuracy was not particularly great: the average science entry in Wikipedia contained around four inaccuracies; Britannica, about three."

"Yet Nature's investigation suggests that Britannica's advantage may not be great, at least when it comes to science entries. In the study, entries were chosen from the websites of Wikipedia and Encyclopaedia Britannica on a broad range of scientific disciplines and sent to a relevant expert for peer review....Only eight serious errors, such as misinterpretations of important concepts, were detected in the pairs of articles reviewed, four from each encyclopaedia. But reviewers also found many factual errors, omissions or misleading statements: 162 and 123 in Wikipedia and Britannica, respectively."

Michael Twindale was quoted as saying:

"People will find it shocking to see how many errors there are in Britannica," Twidale adds. "Print encyclopaedias are often set up as the gold standards of information quality against which the failings of faster or cheaper resources can be compared. These findings remind us that we have an 18-carat standard, not a 24-carat one."

Britannica of course responded to these assertions in vehement Response Called "Fatally Flawed"
http://corporate.britannica.com/britannica_nature_response.pdf

In its response Britannica cited several points that it took issue with in the article.

"Almost everything about the journal’s investigation, from the criteria for
identifying inaccuracies to the discrepancy between the article text and its headline,
was wrong and misleading."
"Nature’s research was invalid. As we demonstrate below, almost everything about the journal’s investigation, from the criteria for identifying inaccuracies to the discrepancy between the article text and its headline, was wrong and misleading. Dozens of inaccuracies attributed to the Britannica were not inaccuracies at all, and a number of the articles Nature examined were not even in the Encyclopædia Britannica. The study was so poorly carried out and its findings so error-laden that it was completely without merit. We have produced this document to set the record straight, to reassure Britannica’s readers about the quality of our content, and to urge that Nature issue a full and public retraction of the article."

"Anyone who read the article with even a modicum of care would have noticed a discrepancy
between the headline and the data themselves. While the heading proclaimed that “Wikipedia comes close to Britannica in terms of the accuracy of its science entries,” the numbers buried deep in the body of the article said precisely the opposite: Wikipedia in fact had a third more inaccuracies than Britannica. (As we demonstrate below, Nature’s research grossly exaggerated Britannica’s inaccuracies, so we cite this figure only to point out the slanted way in which the numbers were presented.)"
"Nature reviewed text that was not even from the Encyclopædia Britannica. Several of the articles
Nature sent its reviewers were not from our core encyclopedia, and in one case it was not from
any Britannica publication at all."


"One Nature reviewer was sent only the 350-word introduction to Encyclopædia Britannica’s
6,000-word article on lipids. For Nature to have represented Britannica’s extensive coverage of
the subject with this short squib was absurd, and it invalidated the findings of omissions
alleged by the reviewer, since those matters were covered in sections of the article he or she
never saw.
Other reviewers were sent only sections taken from longer articles. For example, what the
Nature editors referred to as Britannica’s “articles” on “kin selection” and “punctuated equilibrium”
are actually separate sections of our article on the theory of evolution, written by one of
the foremost experts on evolution in the world. What they claimed to be an “article” on fieldeffect transistors was actually only one section of our article on integrated circuits. For Nature to have excerpted our articles in this way was irresponsible."

Britannica goes on to further attack the study's methodology and follows it with a 13 page appendix describing the errors made by the unnamed experts with unverified facts.

In Nature's reportedly point by point three page rebuttal to Britannica's twenty page statement.
http://www.nature.com/nature/britannica/eb_advert_response_final.pdf


Nature starts out with the tone of a rebuttal but the content essentially agrees with Britannica's accusations. Nature stated that it sent "numerical" information and "some samples of errors." Britannica then asked for the study's methodology, all of the errors and the reviewers reports. Nature sent the information and put it on its web sight but refused to send the reviewers comments because"We asked reviewers whether they wished to have their name attached to their reviews, but almost all declined." Apparently Nature's experts were not prepared to stake their reputation on, or even claim the assertions they made.

Nature asserts that all of the information given to reviewers was from Britannica. This assertion however looses its efficacy because the sentence that immediately follows the claim states "This was deliberate: the aim of our story, as we made clear, was to compare the online material available from Britannica and Wikipedia."

Nature then states that the article title was not misleading. That the "The standfirst to the story read “Jimmy Wales’ Wikipedia comes close to Britannica in terms of the accuracy of its science entries, a Nature investigation finds.”

They then cite the 4 Wikipedia errors to 3 Britannica errors per article as proof that they come close. They completely fail to mention the 162 Wikipedia errors to the 123 Britannica errors described in the original article, they do mention that Britannica address almost half of the 123 errors in its response, which would lower the number of their errors and increase the ratio of errors in Wikipedia to Britannica. How are we supposed to compare these numbers? It sounds like both sides are playing games or that Nature's methodology was poor in that they had no consistent or meaningful way to record and report their metrics in the context of the comparisons.

The one statement that comes across as a bald faced lie is when Nature says,

"Another part of Britannica’s criticism concerns the fact that we provided material
from other Britannica publications, such as the Britannica Book of the Year. This
was deliberate: the aim of our story, as we made clear, was to compare the online
material
available from Britannica and Wikipedia."
This is laughable The article repeatedly fails to address the web issue and never mentions the Britannica web site.

"Internet encyclopaedias go head to head"

They don not say Wikipedia's site goes against Britannica's site or even Britannica.com.

Later they ask "...if Wikipedia is as accurate as established sources such as Encyclopaedia Britannica?" By established sources they seem to be referring to"...the oldest continuously published reference work in the English language."(encyclopedia Britannica) Not to a web site that is a few years old. In fact they almost seem to go out of their way to say just "Britannica" and never say Britannica's web site , or Britannica's publications. The Nature rebuttal and article also lack any kind of notation. Britannica on the other hand has a set of 10 extensive footnotes on the first eight pages of its response. Its bibliographies in its encyclopedia provide further authoritative resources not to mention things written by the article contributers. I wonder if Nature's 'study' would have been different if they would have had to produce a bibliography of supporting sources or even had to identify their contributers? Nature also failed to mention whether they used the fee based content available through the Britannica site and many libraries or just the free services.

Britannica and Nature and the reviewers and editors all agreed on one thing. "Reviewers told the journal that many of the Wikipedia articles were “poorly structured and confusing.” "Former Britannica editor Robert McHenry declared one Wikipedia entry — on US founding father Alexander Hamilton — as "what might be expected of a high-school student".

So what does this all mean? Can I use Wikipedia or Britannica? Which one should I choose? As my title indicates both seem to miss the point. Each fail to adequately identify their user groups habits or needs for their products, or their products' role in the research process. They miss the point and argue about mostly petty mistakes. The only books I have ever heard called perfect are the Quran, Bible and Book of Mormon.

Interestingly enough the next year one of Wikipedias founders announced the creation of a wikipedia rival.
Wikipedia founder plans rival
By Richard Waters in San Francisco
Published: October 16 2006 21:08 Last updated: October 16 2006 21:08

"The latest venture from Larry Sanger, who helped create Wikipedia in 2001,
is intended to bring more order to this creative chaos by drawing on traditional
measures of authority
. Though still open to submissions from anyone, the power
to authorise articles will be given to editors who can prove their expertise, as
well as a group of volunteer “constables”, charged with keeping the peace
between warring interests.
Accusing Wikipedia of failing to control its writers and editors, he said: “The latest articles don't represent a consensus view – they tend to become what the most persistent ‘posters’ say.”

Next time we will look at encyclopedias and wikipedia and see what their role is in the research process and when and how they should be used.

Twisted Sage
Forever learning but never coming to a knowledge of the truth.

The Register
Nature mag cooked Wikipedia study

The Register
Wikipedia science 31% more cronky than Britannica's