Planet Cataloging

March 27, 2015

OCLC Cataloging and Metadata News

January and February data update now available for the WorldCat knowledge base

The WorldCat knowledge base continues to grow with new providers and collections added monthly.  The details for January and February updates are now available in the full release notes.

March 27, 2015 08:00 PM

First Thus

ACAT linked data question

On 24/02/2015 21.31, David Bigwood wrote:
> I still think this distributed system might fail at searching. Sure it can > pull in data and display it when a record is selected. But when I search > will it follow the dozen links in each record? And then will it follow the > links from each of those endpoints?
> A patron comes in and does a KW search for Project Apollo. Will it search > all the TOC links for all records in the collection? And follow all the > subject links to VAIF and then to Wikipedia and redo the search based on > all the terms retrieved from those sites? Will it follow all the names in > all the records to VAIF to see if any the VAIF or dbpedia match any of > those to Project Apollo? Then will it follow all the links to the full-text > of all the summary notes in all the records to see if it gets hits on > Project Apollo in any of those remote resources? As it’s KW search it could > be anything, so maybe searching MESH, AAT, AGRICOLA, the NASA Thesaurus, > GeoRef Thesaurus and so on would also need to happen. Any cross references > from those sources would have to be searched again against the whole > system. What if any of these resources are down? One KW search at a > university could easily generate billions of links.

Systems can be built that will search all of that and display it how we want. And yes, saying it is easier said than doing it, especially if the purpose is to build something that is genuinely useful in practical terms for the public. But to demonstrate it at work, there is the Google Books API that automatically searches their database and returns different options. For instance, I can search Princeton University’s catalog for “Electronic funds transfers” as a subject and find this record We can see the Google Books link with it. How did that work? In the background, the catalog searched Google’s database and automatically brought back the book cover, and other information. It probably searched for the ISBN, but you can search it in all kinds of ways. It could do much more if the programmers wanted it to.

It is very possible that if I searched Google Books for “Electronic funds transfers” I would not find this book. It is also important to note that if the Google Books site goes down, you don’t see big empty boxes filled with question marks in Princeton’s catalog: you just don’t see anything at all. Google knows this would be bad for them and prefer for things to “fail gracefully”. The user never realizes that something has gone wrong. All in all, this system seems to work pretty well. Of course, this could be broadened widely if you could get back the full-text of the book but the Google Books-Publishers’ agreement went down the tubes. For now.

One of the problems we are facing is that we are at the very beginning of what will undoubtedly be a long process, and it is almost impossible to imagine what the final product could be. Probably it will be something a lot of us would disagree with. For instance, fundamental changes are occurring right now in the process of “search” which will have profound impacts on what the public expects. The rise of “predictive search” with the use of algorithms that rely on computers that monitor our every waking (and sleeping) moment; tools that tend to take on lives of their own in order to predict what we want before we even realize it ourselves, is one of the latest, and most disturbing, fads.

But in defense of programmers, they would look at linked data in a completely different way and see it as a logical outcome of what they have been doing for decades. Let me try to explain quickly:

Almost any computer program is not a single entity but is actually a conglomeration of lots and lots of smaller programs (called scripts or APIs or other names) that the programmer brings together (“includes”) for his or her purposes. So, what seems to the user as a single screen on a computer–such as the program you are reading this posting on–is actually composed of dozens (or more) of these smaller scripts, some determining the header, the footer, navigation, whether you can delete or print or save or so on.

There are similar capabilities with “server-side includes” (SSI) where a web programmer can include specific pages or bits of other files depending on various criteria. As an example, a site may look different to you depending on whether you are logged-in or not. If you are logged-in to the system, you may have options to email things to yourself, to save, to see what other users are online, or whatever, but if you are not logged-in you see none of that. This is done with SSI, where the programmer has written something like “if this person is logged-in, then add this file (or include it) to the display or run this program. If not, do not add the file”. There can be many, many variations on this.

Linked data does something similar. If certain conditions are met, take a file from another site and use it to build the page for the user. The user doesn’t need to be aware of any of it. At base, the function is the same, it is the scale that is different.

In any case, this is what the library community, and the cataloging community have been aiming at for quite awhile now.

James Weinheimer
First Thus
First Thus Facebook Page
Cooperative Cataloging Rules
Cataloging Matters Podcasts


by James Weinheimer at March 27, 2015 03:30 PM

RDA-L RE: RE: Re: RE: RE: Re: publisher info

On 26/03/2015 22.16, Amanda Cossham wrote:

If a university has compulsory user education or information literacy classes, that’s a good opportunity to teach, but it’s usually a one-off. If users come into the library, there’s a chance to engage with them, but more and more they sit at home and go online; we’re just one of a range of competing options. And what about public, school, and special libraries? They are different situations again

Of course there are exceptions to all this. It’s a varied situation across many different library types in many countries, with many different users who have widely differing needs. But, we need to be where users are, not expect that they will learn our way of doing things, regardless of the quality of library catalogues. Even sophisticated users find catalogues somewhat frustrating, if not a lot frustrating.

This reminds me of a controversial article that quoted a “Mr. Line” (I have not been able to find out any more about this person) who said that the term user education is, “meaningless, inaccurate, pretentious and patronising and that if only librarians would spend the time and effort to ensure that their libraries are more user friendly then they wouldn’t have to spend so much time doing user education.” I did not like this statement at first, but slowly realized that Mr. Line only stated clearly what a lot of people were thinking. (I discussed this in a podcast

The amazing thing today is that catalogs are virtual, and that means they do not have to look and operate the same for everyone. Just as Google gives different choices to Amanda in New Zealand from what I see in Italy, depending on what the system knows of each of us, so too can a single catalog work differently for different users and different purposes: there can be a child’s interface, an undergraduate’s, a scholar’s. Even the expert in Enlightenment philosophy who wants the most detailed records possible for John Locke and David Hume, probably doesn’t want the same level of detail for all his or her searches on other subjects, e.g. people can also be interested in teaching methods, or jazz or chess and they do not need or want the same detail for all of it. While librarians need that detail for everything for their work, rarely do members of the public.

As Amanda mentioned, this is not catering to the lowest-common denominator: it is tailoring the catalog to provide the most relevant information to each user. That is one of the powers that the new systems provide and we experience it all the time with all of the “customization” on website and apps. People no longer have to browse the same cards (records) that are in the same order (main entry). In the earlier environment, there was no choice and people had to be trained. Of course, people won’t do it today, and I have demonstrated that even when you know how to do it (as I do) it still doesn’t work!

The problem is: our catalogs are still fundamentally structured to provide information in the traditional ways following Cutter’s Rules for a Dictionary catalog. Note that word “dictionary.” It meant that people were expected to use it like a 19th century printed dictionary (browsing alphabetical lists of text). That fundamental structure has never changed. When you do it, alphabetical browsing allows people to see cross-references and notes that the 19th-century catalog designers knew were indispensable but those days are long gone and people no longer see those cross-references and notes. I think people would *love* to see a cross-reference when they search, e.g. “wwi battles” that told them:
“Search also under:
World War, 1914-1918–Aerial operations.
World War, 1914-1918–Campaigns.
World War, 1914-1918–Naval operations.”

or a reference when looking at something with a subject heading “Labor movement” they would see a note such as this:
“This subject split in [date]. Items cataloged before that date used Labor and laboring classes which then split into Labor movement and Working class. Therefore, to make a complete search on “Labor movement” you must also search “Labor and laboring classes” otherwise you will be missing older items.”

It’s a sad truth but a fact: although people were able to see these references in card catalogs, they do not today. And they haven’t for a long time.

So it should not be a surprise that people find the catalog terribly frustrating and turn to other options whenever possible. These are some of the reasons why I think that RDA, Bibframe and linked data will not make any difference to the public until these more basic problems are dealt with.


by James Weinheimer at March 27, 2015 01:45 PM

March 26, 2015

First Thus

ACAT linked data question

On 2/24/2015 4:26 PM, Williams, Ann wrote:
> I was just wondering how linked data will affect OPAC searching and discovery vs. a record with text approach. For example, we have various 856 links to publisher, summary and biographical information in our OPAC as well as ISBNs linking to ContentCafe. But none of that content is discoverable in the OPAC and it requires a further click on the part of patrons (many of whom won’t click).

People would immediately be able to see the difference in the linked data universe with those 856 links. In this case, the information at the end of the links will be able to display in your catalog without the need to click on them, *and* without the need of copying it all into your records. So, the publishers could change the information, update and so on, and you wouldn’t have to do anything for your users to benefit from their work.

As an example, see this record in the LC catalog: Right now, there are 856 links that you have to click on to see some biographical info and the publisher’s blurb and as you point out, nobody does that. In a linked data universe, it will display in any way you want: just in the page, or it can appear only when you run the mouse over a special area of the window; it might appear in one of those horrible pop-up windows that have become the bane of the web, or it could all be in blood-red 24 point font–whatever you want.

And the information will *not* be in the catalog, the catalog will have only the link and the information will exist elsewhere. Right now, the information is on LC’s servers (in the enhancements folder but this information can be anywhere.

When you begin to imagine new possibilities, all kinds of links can be used. For instance, what would people prefer? The link to the biography supplied by the publisher, which is pretty sparse: “Ted Jones is a writer and journalist who specializes in Travel and the Arts. He is the author of The French Riviera: A Literary Guide for Travellers. He currently resides in_the South of France.” (, or would they prefer a link to the author’s personal website: And if this website were coded correctly for linked data (it is easier to do this now than ever before), so many things can be done that it becomes difficult to even imagine the possibilities.

How will this affect OPAC searching, discovery and display? Nobody knows. Lots of people have their own ideas–me too–but nobody knows what will actually happen. I bet that most of it will be decided on by non-catalogers, and very possibly by non-librarians.

There definitely are great possibilities with linked data, but there are some very definite downsides as well, plus complexities will arise that catalogers have never really had to deal with before.

That is another discussion though!

James Weinheimer
First Thus
First Thus Facebook Page
Personal Facebook Page Google+
Cooperative Cataloging Rules
Cataloging Matters Podcasts The Library Herald


by James Weinheimer at March 26, 2015 06:49 PM


Moving to Wikidata

VIAF has long interchanged data with Wikipedia, and the resulting links between library authorities and Wikipedia are widely used.  Unfortunately we only harvested data from the English Wikipedia (, so we missed names, identifiers and other information in non-English Wikipedia pages.

Fortunately the problem VIAF had with Wikipedia was similar to the problems that Wikipedia itself had in sharing data across language versions.  Wikidata is Wikimedia's solution to the problem, and over the last year or two has grown from promising to useful.  In fact, from VIAF's point of view Wikidata now looks substantially better than just working with the English pages.  In addition to picking up many more titles for names, we are finding a million names that do not occur in the English pages, and names that match those in other VIAF sources has nearly doubled to 800 thousand from 440 thousand.

Since we (i.e. Jenny Toves) was reexamining the process, we took the opportunity to harvest corporate/organization names as well, something we have wanted for some time, so some 300K of the increase comes from those.

We expect to have the new data in VIAF in mid to late April 2015 and it is visible now in our test system at

The advantages we see:

  • Much less bias towards English
  • More entities (people and organizations)
  • More coded information about the entities
  • More non-Latin forms of names
  • More links into Wikipedia

This will cause some changes in the data that are visible in the VIAF interface.  One of these is that VIAF will link to the Wikidata pages rather than the English Wikipedia pages, and we are changing the WKP icon to reflect that (Wikipedia (en) to Wikidata).  This means that Jane Austen's WKP identifier (VIAF's abbreviation for Wikipedia) will change from WKP|Jane_Austen to WKP|Q36322.  Links to the WKP source page will change from


Although it is possible to jump from the Wikidata pages to Wikipedia pages in specific languages, we feel these links are important enough that we will be importing all the language specific Wikipedia page links we find in the Wikidata.  These will show up as 'external links' in the interface in the 'About' section of the display.

A commonly used bulk file from VIAF is the 'links' file that shows all the links made between VIAF identifiers and source file identifiers (pointers to the bulk files can be found here).  The links file includes external links, so the individual Wikipedia pages will show up in the file along with the Wikidata WKP IDs.  Here are some of the current links in the file for Lorcan Dempsey:   BAV|ADV11117013   BNF|12276780

. . .   SUDOC|031580661   WKP|Lorcan_Dempsey   XA|2219


The new file will change to:   BAV|ADV11117013   BNF|12276780

. . .   WKP|Q6678817   WKP|   XA|2219


Lorcan only has one Wikipedia page, the English language one.  Jane Austen has more than a hundred, and all those links will be there.

Of course, this also means some changes to the RDF view of the data.  We're still working on that and will post more information when we get it closer to its final form.


by Thom at March 26, 2015 01:43 PM

Mod Librarian

5 Things Thursday: Coca-Cola Archives, DAM, Metadata

Here are five interesting things:

  1. Why librarians make perfect digital asset managers
  2. Cool DAM infographic
  3. Ted Ryan shows us around the Coca-Cola archives
  4. Erin Leach lobbies on behalf of library metadata
  5. Taxonomy versus controlled vocabulary

View On WordPress

March 26, 2015 12:06 PM

First Thus

ACAT RDA authorty files

On 23/02/2015 20.58, Gene Fieg wrote:
> While doing some updating of authorized personal names here at home, I have > noticed that some authority records are coded as rda, but very few, > sometimes none, of the fields are used even when the data is in record. > Such data as the birth and death dates, affiliation, gender, etc. >
> If those fields are not included but the info is readily available, then > all the talk about linked data is empty. Just empty. But if we are > serious about it, let’s get on the bus and do the work when it comes of > “rda’ing” authority records.
It is important not to confuse RDA with linked data. They are separate and do not need one another. There are many sites that are fully compatible with linked data right now that have nothing to do with RDA–and never will. RDA can also exist without linked data, quite easily.

The idea of linked data is *not* to copy all information into a single “mother of all databases,” then put it on the web in some linked-data-friendly format such as BIBFRAME, and then expect everyone to use what you have made. Rather, the idea of linked data is to put the information you have on the web in specific ways so that others can leverage the information you have, taking what they want and ignoring other parts, and in this way make it easier for everybody. In return, you can then use information on other sites for your own purposes. As I have pointed out earlier: the purpose of linked data is *not* to make your data obsolete! It is to enhance everyone’s data.

Sites that utilize linked data can make use of information in Wikipedia or Google Maps and do not have to copy anything from Wikipedia or Google Maps. Linked data sites just add the link(s) to the relevant information they want, and then the machines work their magic. Therefore, in a correctly built system, if information is in Wikipedia/dbpedia/Wikidata conglomeration, it is available to you just as easily as if it was in your own database. This saves everyone from copying everything over and over and over …, which has always been a huge waste of resources and effort. (Or at least it has been portrayed that way)

As an example, I just found this site:, which contains a database of literary magazines. Even though this site has been made for writers, it looks to me as if there is some nice genre information in there, as well as other information that could appeal to the public. *If* this database would be made available in specific formats in specific ways (and if may be already), this information could be made available to the public using library tools, and to do it, catalogers would not have to copy any information at all, only the links. And if it turned out that the links could be automatically generated, e.g. by using ISSNs, there would be no need for catalogers to do anything at all.

One of the conundrums for catalogers is this: linked data is not their job. Building it is a job for programmers. Certainly, catalogers can be involved to help the programmers figure out which information should be equivalent to names, to titles, to subjects, and so on but this is only if programmers want the help. Often they prefer to do it entirely on their own.

So, when discussing linked data, instead of manually adding birth and death dates, affiliation, gender, etc. because RDA says to do so, we should be asking whether this information exists anywhere else on the web and if so, what it would take to make it usable through linked data.

James Weinheimer First Thus First Thus Facebook Page Cooperative Cataloging Rules Cataloging Matters Podcasts


by James Weinheimer at March 26, 2015 12:03 PM

March 23, 2015

TSLL TechScans

Libhub initiative launches sponsorship program

The Libhub Initiative, the effort founded by Zepheira to encourage the use of Linked Data to expand Library Visibility on the Web, announces a sponsorship program which now includes Atlas Systems, Innovative, and SirsiDynix. Through their sponsorship, each of these industry leaders is acknowledging the critical importance of Visibility for libraries, and shows interest in the use of technologies that support libraries wherever their customers may be.

See details at (from Library Technology Guides, March 23, 2015).

by (Marlene Bubrick) at March 23, 2015 04:55 PM

Resource Description & Access (RDA)

RDA Cataloging Rules for Pseudonyms with MARC 21 Examples

PseudonymA name used by a person (either alone or in collaboration with others) that is not the person’s real name. (RDA Toolkit Glossary)
PseudonymA name assumed by an author to conceal or obscure his or her identity. (AACR2 Glossary) 

RDA Cataloging Rules for Pseudonyms

Frequently Asked Questions (FAQ) on Library of Congress / Program for Cooperative Cataloging practice for creating Name Authority Records (NARs) for persons who use Pseudonyms:

Q1. How many NARs are needed for contemporary persons who use pseudonyms

Q2. How do I decide which name to choose as the basic heading when creating NARs for a person with multiple pseudonyms? (DCM Z1 663— Complex see-also reference—Names)

Q3. Is it required to make a NAR for every pseudonym associated with a contemporary person? Some persons purport to have ten or more pseudonyms but in my catalog we only have works under one or two of those names – are there limits set on the number of NARs required? (DCM Z1 667 Cataloger Note section) 

Q4. Should the 663 note technique also be used in a corporate name NAR when providing 500 see-also references for the members of a group? (RDA Chapter 30) 

Q5. Can the 663 note be used without coding the 500 field with subfield $w nnnc? (MARC 21 Format for Authority Records, 663 field)

Q6. What about creating NARs for non-contemporary persons? Where is the guidance? 

Q7. What about different real names used concurrently by authors? 

Q8. How do I handle a situation when a pseudonym conflicts with another name and there is no information to add to either name to differentiate them? Do I create an undifferentiated NAR (or add the name to an undifferentiated NAR if it already exists)? Do I add the prescribed 663 note as well as the 500 coded “nnnc” to the undifferentiated NAR?

Q9. How do I handle LC classification numbers: Do I add the same LCC (053) on each NAR of a pseudonym? What if the pseudonym is used on non-literary works?

See also following Resource Description & Access (RDA) Blog posts:

by Salman Haider ( at March 23, 2015 04:43 AM

March 21, 2015

First Thus

Koha Cataloguing in Koha

On 19/02/2015 10.48, Diana Haderup wrote:
> I’m new to the list and have a question concerning cataloguing in Koha (version 3.16.04).
> It pertains to the MARC fields 110/111 or 710/711 respectively. All of them have the First Indicator in common, which is defined as follows: >
> First Indicator
> Type of corporate name entry element
> 0 – Inverted name
> 1 – Jurisdiction name
> 2 – Name in direct order
> (see:
> The LOC examples did not help much to interpret the meaning of the first indicator. Can somebody shed light on how to understand the numbers’ descriptions and when to choose which number? > I’d be grateful for examples as well.
I don’t know if this is the list for regular cataloging questions. Most of those go to lists such as Autocat. To answer your question:
The “2” is used in most cases. The “1” is used with some kind of government agency where the government agency controls a jurisdiction (national, down to local such as a city). Therefore, you will see in the examples:

110 1#$aUnited States.$bNational Technical Information Service 110 1#$aMinnesota.$bConstitutional Convention

because the US and Minnesota are jurisdictions, while any non-jurisdictional agency will have a 2, e.g.

110 2#$aInternational Labour Organisation.$bEuropean Regional Conference

because while the ILO is a inter-governmental agency, it is not a jurisdiction.

But note. There is also

110 2#$aCatholic Church.$bConcilium Plenarium Americae Latinae

where the Catholic Church gets a 2 because it is not a jurisdiction, but

110 1_ |a Papal States. |b Congregazione del buon governo and
110 1_ |a Vatican City. |b Comitato centrale del grande giubileo dell’anno duemila

these get a 1 because they are jurisdictions.

This also goes for references, e.g.
110 2_ |a New York Public Library
has a reference
410 1_ |a New York (N.Y.). |b Public Library
as a subbody of the jurisdiction of New York City.

Determining whether bodies should be entered subordinately to larger bodies or not can be difficult. It can be especially tough for government (jurisdictional) bodies. The rules for AACR2 are 24.17-24.19, plus the RIs at

Whenever I explain these things, I have to stop and discuss why cataloging rules can be so complicated. Often it makes sense but in this case, I have never in my life seen how a 1 or 2 makes any difference in searching or display (I have never run across a 0). In theory, I guess you could have the “1” file differently from the “2” so that you could–in theory–get government bodies filed separately (if anybody would want it), but it wouldn’t work anyway because the rules for determining the form of name of a subordinate body are so obscure it would be almost totally arbitrary. For instance as we saw, “New York Public Library” is *not* considered a subbody of the jurisdiction of New York! Although it is! It’s only a reference. Explain that to a member of the public!

And how these forms will be at all relevant in the world of linked data, I really cannot imagine. These are some of the issues I have hoped the world of cataloging would address, but they haven’t. Oh well, back to work!

James Weinheimer
First Thus
First Thus Facebook Page
Personal Facebook Page Google+
Cooperative Cataloging Rules
Cataloging Matters Podcasts The Library Herald


by James Weinheimer at March 21, 2015 06:29 PM

March 19, 2015

Mod Librarian

5 Things Thursday: Upcoming Events, DAM, Keywords

Here are five things including some great spring/summer events:

  1. Seattle taxonomists rejoice! Heather Hedden presents on Taxonomy Displays on April 28th.
  2. In NY, many fine speakers have been announced for DAM NY. Early bird ends tomorrow.
  3. In June at the SLA conference, a full day on taxonomy on June 13th.
  4. From David Riecks, Caption and Keywording Guidelines.
  5. Integrate your DAM, DAM is not an island.

View On WordPress

March 19, 2015 12:13 PM

March 18, 2015

TSLL TechScans

Feedback on Library of Congress' Recommended Format Specifications

Last year the Library of Congress released Recommended Format Specifications (see post here) to serve as a guide for long term preservation and access to both analog and digital materials. As they move forward, and to maintain the currency of the guide, an annual review of the formats is being implemented. To this end, they are requesting feedback before March 31, 2015 that will  be taken into consideration during this year's review. Feedback can be addressed to one of the email contacts here.

Additional information about the review process is available on The Signal.

by (Lauren Seney) at March 18, 2015 02:09 PM

Terry's Worklog

MarcEdit 6.0 Update

List of changes below:

** Bug Fix: Delimited Text Translator: Constant data, when used on a field that doesn’t exist, is not applied.  This has been corrected.
** Bug Fix: Swap Field Function: Swapping control field data (fields below 010) using position + length syntax (example 35:3 to take 3 bytes, starting at position 35) not functioning.  This has been corrected.
** Enhancement: RDA Helper: Checked options are now remembered.
** Bug Fix: RDA Helper: Abbreviation Expansion timing was moved in the last update.  Moved back to ensure expansion happens prior to data being converted.
** Enhancement: Validator error message refinement.
** Enhancement: RDA Helper Abbreviation Mapping table was updated.
** Enhancement: MarcEditor Print Records Per Page — program will print one bib record (or bib record + holdings records + authority records) per page.
** Bug Fix: Preferences Window: If MarcEdit attempts to process a font that isn’t compatible, then an error may be thrown.  A new error trap has been added to prevent this error.

You can get the new download either through MarcEdit’s automatic update tool, or by downloading the program directly from:


by reeset at March 18, 2015 02:20 AM

March 17, 2015

OCLC Cataloging and Metadata News

Moving from WorldCat Collection Sets to WorldShare Collection Manager

As you probably have heard, the process to manage MARC record delivery for your WorldCat Collection Sets for selected vendor sets is changing and will be managed in WorldShare Collection Manager.

March 17, 2015 03:00 PM

March 15, 2015

First Thus

ACAT LC Authorities down?

Posting to Autocat

On 2/12/2015 9:38 PM, Galen Charlton wrote:

On Thu, Feb 12, 2015 at 3:07 PM, J. McRee Elrod wrote:

Looking forward to linked data are we?

Actually, yes. Besides the consideration that any competent designer of a library discovery system that ingests RDF triples would of course implement local caching to guard against big SPARQL endpoints having service glitches, a world where all metadata of interest to libraries is web-accessible and has clear (and open) licensing terms, would make it /easier/ to maintain multiple archives of it.

For example, even today I can download a copy of the entire VIAF dataset [1] and do what I want with it, including keeping a copy of it for long-term preservation or building a new service using it. Various other library datasets are readily available [2].

This is a significant improvement over the days where it was necessary to drop a few thousand dollars to get one’s own copy of the NAF and SAF.

This is absolutely true, in theory. We must wait to see what happens in practice. Of course, while each library can keep a copy of VIAF, it is expensive to do anything with it. To implement linked data will demand a long-term and significant investment from different organizations to create a series of mirror sites, thereby easing the load on a single server, and creating redundancies for those times when something actually goes down.

All of the expertise and technology exist today and is being done by all kinds of organizations, but behind it all needs to be a long-term investment of money. This is something that is difficult to come by these days, especially the “long-term” aspect. For instance, in 2010 Google bought Freebase, which was considered to be one of the major parts of the linked data universe and Google has decided to shut it down, apparently after taking what they needed, and move it all to Wikidata. ( Wikidata is a relative newcomer (began in 2012) and is paid for by “community contributions”. In this way, Freebase joins the long list of tools discontinued by Google ( It shouldn’t be any surprise that there is no button in Wikidata that says “Ingest Freebase data” and it is turning out to be a lot of work to move Freebase into Wikidata. ( Having major parts of the linked-data universe being “community funded” should also give us pause. While Wikipedia may be fine, I hope that Wikidata will be too. Finally, what all of this means for dbpedia is still unclear (at least to me).

Putting all of these issues aside however, it is still very positive that –at last–the NAF and SAF are available to the public–but it is clear that we are only at the beginning of a long, and expensive, process to create something useful in a practical way for the public.


by James Weinheimer at March 15, 2015 02:30 PM