Planet Cataloging

October 21, 2014

First Thus

ACAT Best way to catalog geographic information?

Posting to Autocat

On 20/10/2014 22.45, Julie Huddle wrote:

I will be starting an internship which will involve cataloging. I have been asked to help develop the best way to record the geographic coordinates of research items so that patrons can find resources about a geographic area of interest. After reading Bidney’s 2010 article, I now have the following questions:
1. How difficult and effective would the official form of geographic terms be for this?
2. If I record the geographic coordinates of a resource, should I use the center or corner of the area covered?
3. Would using a geographic search interface such as MapHappy or Yahoo!Map be worth the trouble This is the sort of problem where linked data should ride to the rescue.

Instead of adding coordinates to each and every bib record (a terrifying notion!), those records should contain links to–something else–where the coordinates exist. This would normally mean links from the bib records to authority records, but unfortunately, this information does not exist in many, many, many of our geographic records, e.g. there is nothing in the record for Herculaneum (Extinct city) http://lccn.loc.gov/sh85060358–one of the greatest archaeological sites in Italy–nor in the record for the little town in New Mexico where I grew up http://lccn.loc.gov/n80085226.

But all of this is in dbpedia, e.g. for the little town in New Mexico: http://dbpedia.org/page/Socorro,_New_Mexico. The ultimate way it can work can be seen in Wikipedia (where the dbpedia information comes from) http://en.wikipedia.org/wiki/Socorro,_New_Mexico.

Close to the top are the coordinates that you can click on http://tools.wmflabs.org/geohack/geohack.php?pagename=Socorro%2C_New_Mexico&params=34_3_42_N_106_53_58_W_region:US_type:city

and from here, there are maps of all kinds: weather, traffic, historic, terrain, etc. etc. I personally like Night Lights.

So, I think the solution to your problem is to add links from authority files to something(?!) and then see what can be built, using any of the tools Wikipedia uses, or something new. As we see, none of this needs MARC format and it may be more efficient to add links to dbpedia instead of any library tools. Otherwise, it is a huge amount of work.

There is a lot of information available on the web that we can use to help us.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at October 21, 2014 12:20 PM

October 20, 2014

Resource Description & Access (RDA)

What is FRBR?

What is FRBR? -- RDA Quiz on Google+ Community RDA Cataloging.

Join RDA Cataloging online community / group / forum and share ideas on RDA and discuss issues related to Resource Description and Access Cataloging.



Following are the comments received on this RDA Blog post

<<<<<---------->>>>>


Roger Hawcroft
Roger
Library Consultant
Salman, FRBR is an acronym for Functional Requirements for Bibliographic Records. It stems from recommendations made by IFLA in 1988. The FRBR represents the departure of bibliographic description from the long-standing linear model as used in AACR... to a muti-tiered concept contemporaneous with current technology and the increasing development of digital formats and storage. These principles underpin RDA - Resource Description & Access..

You may find the following outline useful:
http://www.loc.gov/cds/downloads/FRBR.PDF

I have also placed a list of readings ( not intended to be comprehensive or entirely up-to-dtate) in DropBox for you:
https://www.dropbox.com/s/quf7nhmcm43r530/Selected%20Readings%20on%20FRBR%20%C2%A0%28April%202014%29.pdf?dl=0

An online search should relatively easily find you the latest papers / articles / opinion on this concept of cataloguing and I am sure that you will find many librarians on LI that have plenty to say for and against the approach!



















<<<<<---------->>>>>


Sris Ponniahpillai
Sris
Library Officer at University of Technology, Sydney
Salman, Hope the article in the following link would help you to understand what FRBR stands for in library terms. Thanks & Best Regards, Sris

http://www.loc.gov/cds/downloads/FRBR.PDF



<<<<<---------->>>>>


Alan Danskin
Alan
Metadata Standards Manager at The British Library
FRBR (Functional Requirements for Bibliographic Records) is a model published by IFLA. RDA is an implementation of the the FRBR and FRAD (Functional Requirements for Authority Data) models. The FRBR Review Group is currently working on consolidation of these models and the Functional Requirements for Subject Authority Data (FRSAD) model. See http://www.ifla.org/frbr-rg and http://www.ifla.org/node/2016


<<<<<---------->>>>>






Harshadkumar Patel
Harshadkumar
Deputy Librarian, C.U. Shah Medical College
Functional Requirements for Bibliographic Records is a conceptual entity-relationship model developed by the International Federation of Library Associations and Institutions that relates user tasks of retrieval and access in online library catalogues and bibliographic databases from a user's perspective.








by Salman Haider (noreply@blogger.com) at October 20, 2014 09:54 PM

Bibliographic Wilderness

ActiveRecord Concurrency in Rails4: Avoid leaked connections!

My past long posts about multi-threaded concurrency in Rails ActiveRecord are some of the most visited posts on this blog, so I guess I’ll add another one here; if you’re a “tl;dr” type, you should probably bail now, but past long posts have proven useful to people over the long-term, so here it is.

I’m in the middle of updating my app that uses multi-threaded concurrency in unusual ways to Rails4.   The good news is that the significant bugs I ran into in Rails 3.1 etc, reported in the earlier post have been fixed.

However, the ActiveRecord concurrency model has always made it too easy to accidentally leak orphaned connections, and in Rails4 there’s no good way to recover these leaked connections. Later in this post, I’ll give you a monkey patch to ActiveRecord that will make it much harder to accidentally leak connections.

Background: The ActiveRecord Concurrency Model

Is pretty much described in the header docs for ConnectionPool, and the fundamental architecture and contract hasn’t changed since Rails 2.2.

Rails keeps a ConnectionPool of individual connections (usually network connections) to the database. Each connection can only be used by one thread at a time, and needs to be checked out and then checked back in when done.

You can check out a connection explicitly using `checkout` and `checkin` methods. Or, better yet use the `with_connection` method to wrap database use.  So far so good.

But ActiveRecord also supports an automatic/implicit checkout. If a thread performs an ActiveRecord operation, and that thread doesn’t already have a connection checked out to it (ActiveRecord keeps track of whether a thread has a checked out connection in Thread.current), then a connection will be silently, automatically, implicitly checked out to it. It still needs to be checked back in.

And you can call `ActiveRecord::Base.clear_active_connections!`, and all connections checked out to the calling thread will be checked back in. (Why might there be more than one connection checked out to the calling thread? Mostly only if you have more than one database in use, with some models in one database and others in others.)

And that’s what ordinary Rails use does, which is why you haven’t had to worry about connection checkouts before.  A Rails action method begins with no connections checked out to it; if and only if the action actually tries to do some ActiveRecord stuff, does a connection get lazily checked out to the thread.

And after the request had been processed and the response delivered, Rails itself will call `ActiveRecord::Base.clear_active_connections!` inside the thread that handled the request, checking back connections, if any, that were checked out.

The danger of leaked connections

So, if you are doing “normal” Rails things, you don’t need to worry about connection checkout/checkin. (modulo any bugs in AR).

But if you create your own threads to use ActiveRecord (inside or outside a Rails app, doesn’t matter), you absolutely do.  If you proceed blithly to use AR like you are used to in Rails, but have created Threads yourself — then connections will be automatically checked out to you when needed…. and never checked back in.

The best thing to do in your own threads is to wrap all AR use in a `with_connection`. But if some code somewhere accidentally does an AR operation outside of a `with_connection`, a connection will get checked out and never checked back in.

And if the thread then dies, the connection will become orphaned or leaked, and in fact there is no way in Rails4 to recover it.  If you leak one connection like this, that’s one less connection available in the ConnectionPool.  If you leak all the connections in the ConnectionPool, then there’s no more connections available, and next time anyone tries to use ActiveRecord, it’ll wait as long as the checkout_timeout (default 5 seconds; you can set it in your database.yml to something else) trying to get a connection, and then it’ll give up and throw a ConnectionTimeout. No more database access for you.

In Rails 3.x, there was a method `clear_stale_cached_connections!`, that would  go through the list of all checked out connections, cross-reference it against the list of all active threads, and if there were any checked out connections that were associated with a Thread that didn’t exist anymore, they’d be reclaimed.   You could call this method from time to time yourself to try and clean up after yourself.

And in fact, if you tried to check out a connection, and no connections were available — Rails 3.2 would call clear_stale_cached_connections! itself to see if there were any leaked connections that could be reclaimed, before raising a ConnectionTimeout. So if you were leaking connections all over the place, you still might not notice, the ConnectionPool would clean em up for you.

But this was a pretty expensive operation, and in Rails4, not only does the ConnectionPool not do this for you, but the method isn’t even available to you to call manually.  As far as I can tell, there is no way using public ActiveRecord API to clean up a leaked connection; once it’s leaked it’s gone.

So this makes it pretty important to avoid leaking connections.

(Note: There is still a method `clear_stale_cached_connections` in Rails4, but it’s been redefined in a way that doesn’t do the same thing at all, and does not do anything useful for leaked connection cleanup.  That it uses the same method name, I think, is based on misunderstanding by Rails devs of what it’s doing. See Fear the Reaper below. )

Monkey-patch AR to avoid leaked connections

I understand where Rails is coming from with the ‘implicit checkout’ thing.  For standard Rails use, they want to avoid checking out a connection for a request action if the action isn’t going to use AR at all. But they don’t want the developer to have to explicitly check out a connection, they want it to happen automatically. (In no previous version of Rails, back from when AR didn’t do concurrency right at all in Rails 1.0 and Rails 2.0-2.1, has the developer had to manually check out a connection in a standard Rails action method).

So, okay, it lazily checks out a connection only when code tries to do an ActiveRecord operation, and then Rails checks it back in for you when the request processing is done.

The problem is, for any more general-purpose usage where you are managing your own threads, this is just a mess waiting to happen. It’s way too easy for code to ‘accidentally’ check out a connection, that never gets checked back in, gets leaked, with no API available anymore to even recover the leaked connections. It’s way too error prone.

That API contract of “implicitly checkout a connection when needed without you realizing it, but you’re still responsible for checking it back in” is actually kind of insane. If we’re doing our own `Thread.new` and using ActiveRecord in it, we really want to disable that entirely, and so code is forced to do an explicit `with_connection` (or `checkout`, but `with_connection` is a really good idea).

So, here, in a gist, is a couple dozen line monkey patch to ActiveRecord that let’s you, on a thread-by-thread basis, disable the “implicit checkout”.  Apply this monkey patch (just throw it in a config/initializer, that works), and if you’re ever manually creating a thread that might (even accidentally) use ActiveRecord, the first thing you should do is:

Thread.new do 
   ActiveRecord::Base.forbid_implicit_checkout_for_thread!

   # stuff
end

Once you’ve called `forbid_implicit_checkout_for_thread!` in a thread, that thread will be forbidden from doing an ‘implicit’ checkout.

If any code in that thread tries to do an ActiveRecord operation outside a `with_connection` without a checked out connection, instead of implicitly checking out a connection, you’ll get an ActiveRecord::ImplicitConnectionForbiddenError raised — immediately, fail fast, at the point the code wrongly ended up trying an implicit checkout.

This way you can enforce your code to only use `with_connection` like it should.

Note: This code is not battle-tested yet, but it seems to be working for me with `with_connection`. I have not tried it with explicitly checking out a connection with ‘checkout’, because I don’t entirely understand how that works.

DO fear the Reaper

In Rails4, the ConnectionPool has an under-documented thing called the “Reaper”, which might appear to be related to reclaiming leaked connections.  In fact, what public documentation there is says: “the Reaper, which attempts to find and close dead connections, which can occur if a programmer forgets to close a connection at the end of a thread or a thread dies unexpectedly. (Default nil, which means don’t run the Reaper).”

The problem is, as far as I can tell by reading the code, it simply does not do this.

What does the reaper do?  As far as I can tell trying to follow the code, it mostly looks for connections which have actually dropped their network connection to the database.

A leaked connection hasn’t necessarily dropped it’s network connection. That really depends on the database and it’s settings — most databases will drop unused connections after a certain idle timeout, by default often hours long.  A leaked connection probably hasn’t yet had it’s network connection closed, and a properly checked out not-leaked connection can have it’s network connection closed (say, there’s been a network interruption or error; or a very short idle timeout on the database).

The Reaper actually, if I’m reading the code right, has nothing to do with leaked connections at all. It’s targeting a completely different problem (dropped network, not checked out but never checked in leaked connections). Dropped network is a legit problem you want to be handled gracefullly; I have no idea how well the Reaper handles it (the Reaper is off by default, I don’t know how much use it’s gotten, I have not put it through it’s paces myself). But it’s got nothing to do with leaked connections.

Someone thought it did, they wrote documentation suggesting that, and they redefined `clear_stale_cached_connections!` to use it. But I think they were mistaken. (Did not succeed at convincing @tenderlove of this when I tried a couple years ago when the code was just in unreleased master; but I also didn’t have a PR to offer, and I’m not sure what the PR should be; if anyone else wants to try, feel free!)

So, yeah, Rails4 has redefined the existing `clear_stale_active_connections!` method to do something entirely different than it did in Rails3, it’s triggered in entirely different circumstance. Yeah, kind of confusing.

Oh, maybe fear ruby 1.9.3 too

When I was working on upgrading the app, I’m working on, I was occasionally getting a mysterious deadlock exception:

ThreadError: deadlock; recursive locking:

In retrospect, I think I had some bugs in my code and wouldn’t have run into that if my code had been behaving well. However, that my errors resulted in that exception rather than a more meaningful one, maybe possibly have been a bug in ruby 1.9.3 that’s fixed in ruby 2.0. 

If you’re doing concurrency stuff, it seems wise to use ruby 2.0 or 2.1.

Can you use an already loaded AR model without a connection?

Let’s say you’ve already fetched an AR model in. Can a thread then use it, read-only, without ever trying to `save`, without needing a connection checkout?

Well, sort of. You might think, oh yeah, what if I follow a not yet loaded association, that’ll require a trip to the db, and thus a checked out connection, right? Yep, right.

Okay, what if you pre-load all the associations, then are you good? In Rails 3.2, I did this, and it seemed to be good.

But in Rails4, it seems that even though an association has been pre-loaded, the first time you access it, some under-the-hood things need an ActiveRecord Connection object. I don’t think it’ll end up taking a trip to the db (it has been pre-loaded after all), but it needs the connection object. Only the first time you access it. Which means it’ll check one out implicitly if you’re not careful. (Debugging this is actually what led me to the forbid_implicit_checkout stuff again).

Didn’t bother trying to report that as a bug, because AR doesn’t really make any guarantees that you can do anything at all with an AR model without a checked out connection, it doesn’t really consider that one way or another.

Safest thing to do is simply don’t touch an ActiveRecord model without a checked out connection. You never know what AR is going to do under the hood, and it may change from version to version.

Concurrency Patterns to Avoid in ActiveRecord?

Rails has officially supported multi-threaded request handling for years, but in Rails4 that support is turned on by default — although there still won’t actually be multi-threaded request handling going on unless you have an app server that does that (Puma, Passenger Enterprise, maybe something else).

So I’m not sure how many people are using multi-threaded request dispatch to find edge case bugs; still, it’s fairly high profile these days, and I think it’s probably fairly reliable.

If you are actually creating your own ActiveRecord-using threads manually though (whether in a Rails app or not; say in a background task system), from prior conversations @tenderlove’s preferred use case seemed to be creating a fixed number of threads in a thread pool, making sure the ConnectionPool has enough connections for all the threads, and letting each thread permanently check out and keep a connection.

I think you’re probably fairly safe doing that too, and is the way background task pools are often set up.

That’s not what my app does.  I wouldn’t necessarily design my app the same way today if I was starting from scratch (the app was originally written for Rails 1.0, gives you a sense of how old some of it’s design choices are; although the concurrency related stuff really only dates from relatively recent rails 2.1 (!)).

My app creates a variable number of threads, each of which is doing something different (using a plugin system). The things it’s doing generally involve HTTP interactions with remote API’s, is why I wanted to do them in concurrent threads (huge wall time speedup even with the GIL, yep). The threads do need to occasionally do ActiveRecord operations to look at input or store their output (I tried to avoid concurrency headaches by making all inter-thread communications through the database; this is not a low-latency-requirement situation; I’m not sure how much headache I’ve avoided though!)

So I’ve got an indeterminate number of threads coming into and going out of existence, each of which needs only occasional ActiveRecord access. Theoretically, AR’s concurrency contract can handle this fine, just wrap all the AR access in a `with_connection`.  But this is definitely not the sort of concurrency use case AR is designed for and happy about. I’ve definitely spent a lot of time dealing with AR bugs (hopefully no longer!), and just parts of AR’s concurrency design that are less than optimal for my (theoretically supported) use case.

I’ve made it work. And it probably works better in Rails4 than any time previously (although I haven’t load tested my app yet under real conditions, upgrade still in progress). But, at this point,  I’d recommend avoiding using ActiveRecord concurrency this way.

What to do?

What would I do if I had it to do over again? Well, I don’t think I’d change my basic concurrency setup — lots of short-lived threads still makes a lot of sense to me for a workload like I’ve got, of highly diverse jobs that all do a lot of HTTP I/O.

At first, I was thinking “I wouldn’t use ActiveRecord, I’d use something else with a better concurrency story for me.”  DataMapper and Sequel have entirely different concurrency architectures; while they use similar connection pools, they try to spare you from having to know about it (at the cost of lots of expensive under-the-hood synchronization).

Except if I had actually acted on that when I thought about it a couple years ago, when DataMapper was the new hotness, I probably would have switched to or used DataMapper, and now I’d be stuck with a large unmaintained dependency. And be really regretting it. (And yeah, at one point I was this close to switching to Mongo instead of an rdbms, also happy I never got around to doing it).

I don’t think there is or is likely to be a ruby ORM as powerful, maintained, and likely to continue to be maintained throughout the life of your project, as ActiveRecord. (although I do hear good things about Sequel).  I think ActiveRecord is the safe bet — at least if your app is actually a Rails app.

So what would I do different? I’d try to have my worker threads not actually use AR at all. Instead of passing in an AR model as input, I’d fetch the AR model in some other safer main thread, convert it to a pure business object without any AR, and pass that in my worker threads.  Instead of having my worker threads write their output out directly using AR, I’d have a dedicated thread pool of ‘writers’ (each of which held onto an AR connection for it’s entire lifetime), and have the indeterminate number of worker threads pass their output through a threadsafe queue to the dedicated threadpool of writers.

That would have seemed like huge over-engineering to me at some point in the past, but at the moment it’s sounding like just the right amount of engineering if it lets me avoid using ActiveRecord in the concurrency patterns I am, that while it officially supports, it isn’t very happy about.


Filed under: General

by jrochkind at October 20, 2014 03:35 AM

October 19, 2014

Coyle's InFormation

This is what sexism looks like

[Note to readers: sick and tired of it all, I am going to report these "incidents" publicly because I just can't hack it anymore.]

I was in a meeting yesterday about RDF and application profiles, in which I made some comments, and was told by the co-chair: "we don't have time for that now", and the meeting went on.

Today, a man who was not in the meeting but who listened to the audio sent an email that said:
"I agree with Karen, if I correctly understood her point, that this is "dangerous territory".  On the call, that discussion was postponed for a later date, but I look forward to having that discussion as soon as possible because I think it is fundamental."
And he went on to talk about the issue, how important it is, and at one point referred to it as "The requirement is that a constraint language not replace (or "hijack") the original semantics of properties used in the data."

The co-chair (I am the other co-chair, although reconsidering, as you may imagine) replied:
"The requirement of not hijacking existing formal specification languages for expressing constraints that rely on different semantics has not been raised yet."
"Has not been raised?!" The email quoting me stated that I had raised it the very day before. But an important issue is "not raised" until a man brings it up. This in spite of the fact that the email quoting me made it clear that my statement during the meeting had indeed raised this issue.

Later, this co-chair posted a link to a W3C document in an email to me (on list) and stated:
"I'm going on holidays so won't have time to explain you, but I could, in theory (I've been trained to understand that formal stuff, a while ago)"
That is so f*cking condescending. This happened after I quoted from W3C documents to support my argument, and I believe I had a good point.

So, in case you haven't experienced it, or haven't recognized it happening around you, this is what sexism looks like. It looks like dismissing what women say, but taking the same argument seriously if a man says it, and it looks like purposely demeaning a woman by suggesting that she can't understand things without the help of a man.

I can't tell you how many times I have been subjected to this kind of behavior, and I'm sure that some of you know how weary I am of not being treated as an equal no matter how equal I really am.

Quiet no more, friends. Quiet no more.

(I want to thank everyone who has given me support and acknowledgment, either publicly or privately. It makes a huge difference.) 

Some links about "'Splaining"
http://scienceblogs.com/thusspakezuska/2010/01/25/you-may-be-a-mansplainer-if/
http://geekfeminism.wikia.com/wiki/Splaining

by Karen Coyle (noreply@blogger.com) at October 19, 2014 01:39 PM

schema.org - where it works

In the many talks about schema.org, it seems that one topic that isn't covered, or isn't covered sufficiently, is "where do you do it?" That is, where does it fit into your data flow? I'm going to give a simple, typical example. Your actual situation may vary, but I think this will help you figure out your own case.

The typical situation is that you have a database with your data. Searches go against that database, the results are extracted, a program formats these results into a web page, and the page is sent to the screen. Let's say that your database has data about authors, titles and dates. These are stored in your database in a way that you know which is which. A search is done, and let's say that the results of the search are:
author:  Williams, R
title: History of the industrial sewing machine
date: 1996
This is where you are in your data flow:

The next thing that happens (and remember, I'm speaking very generally) is that the results then are fed into a program that formats them into HTML, probably within a template that has all your headers, footers, sidebars and branding and sends the data to the browser. The flow now looks like

Let's say that you will display this as a citation, that looks like:
Williams, R. History of the industrial sewing machine. 1996.
Without any fancy formatting, the HTML for this is:
<p>Williams, R. History of the industrial sewing machine. 1996.</p>
Now we can see the problem that schema.org is designed to fix. You started with an author, a title and date, but what you are showing to the world is a string of characters are that undifferentiated. You have lost all the information about what these represent. To a machine, this is just another of many bazillions of paragraphs on the web. Even if you format your data like this:
<p>Author: Williams, R.</p>
<p>Title:  Williams, R. History of the industrial sewing machine</p>
<p>Date: 1996</p>
What a machine sees is:
<p>blah: blah</p>
<p>blah: blah</p>
<p>blah: blah</p>  
What we want is for the program that is is formatting the HTML to also include some metadata from schema.org that retains the meaning of the data you are putting on the screen. So rather than just putting HTML formatting, it will add formatting from schema.org. Schema.org has metadata elements for many different types of data. Using our example, let's say that this is a book, and here's how you could mark that up in schema.org:
<div vocab="http://schema.org/">
<div   typeof="Book">
<p>
    <span property="author">Williams, R.</span> <span property="name">History of the industrial sewing machine</span>. <span property="datePublished">1996</span>.
    </p>
    </div>
</div>
Again, this is a very simple example, but when we test this code in the Google Rich Snippet tool, we can see that even this very simple example has added rich information that a search engine can make use of:
To see a more complex example, this is what Dan Scott and I have done to enrich the files of the Bryn Mawr Classical Reviews.

The review as seen in a browser (includes schema.org markup)

The review as seen by a tool that reads the structured schema.org data.

From these you can see a couple of things. The first is that the schema.org markup does not change how your pages look to a user viewing your data in a browser. The second is that hidden behind that simple page is a wealth of rich information that was not visible before.

Now you are probably wondering: well, what's that going to do for me? Who will use it? At the moment, the users of this data are the search engines, and they use the data to display all of that additional information that you see under a link:


In this snippet, the information about stars, ratings, type of film and audience comes from schema. org mark-up on the page.

Because the data is there, many of us think that other users and uses will evolve. The reverse of that is that, of course, if the information isn't there then those as yet undeveloped possibilities cannot happen.



by Karen Coyle (noreply@blogger.com) at October 19, 2014 10:10 AM

October 18, 2014

Bibliographic Wilderness

Google Scholar is 10 years old

An article by Steven Levy about the guy who founded the service, and it’s history:

Making the world’s problem solvers 10% more efficient: Ten years after a Google engineer empowered researchers with Scholar, he can’t bear to leave it

“Information had very strong geographical boundaries,” he says. “I come from a place where those boundaries are very, very apparent. They are in your face. To be able to make a dent in that is a very attractive proposition.”

Acharya’s continued leadership of a single, small team (now consisting of nine) is unusual at Google, and not necessarily seen as a smart thing by his peers. By concentrating on Scholar, Acharya in effect removed himself from the fast track at Google….  But he can’t bear to leave his creation, even as he realizes that at Google’s current scale, Scholar is a niche.

…But like it or not, the niche reality was reinforced after Larry Page took over as CEO in 2011, and adopted an approach of “more wood behind fewer arrows.” Scholar was not discarded — it still commands huge respect at Google which, after all, is largely populated by former academics—but clearly shunted to the back end of the quiver.

…Asked who informed him of what many referred to as Scholar’s “demotion,” Acharya says, “I don’t think they told me.” But he says that the lower profile isn’t a problem, because those who do use Scholar have no problem finding it. “If I had seen a drop in usage, I would worry tremendously,” he says. “There was no drop in usage. I also would have felt bad if I had been asked to give up resources, but we have always grown in both machine and people resources. I don’t feel demoted at all.”


Filed under: General

by jrochkind at October 18, 2014 03:47 PM

October 17, 2014

TSLL TechScans

New report offers recommendations to improve usage, discovery and access of e-content in libraries


A group of professionals from libraries, content providers and OCLC have published Success Strategies for Electronic Content Discovery and Access, a white paper that identifies data quality issues in the content supply chain and offers practical recommendations for improved usage, discovery and access of e-content in libraries.


Success Strategies for Electronic Content Discovery and Access offers solutions for the efficient exchange of high-quality data among libraries, data suppliers and service providers, such as:
  • Improve bibliographic metadata and holdings data
  • Synchronize bibliographic metadata and holdings data
  • Use consistent data formats.

See the article at http://www.librarytechnology.org/ltg-displaytext.pl?RC=19772

by noreply@blogger.com (Marlene Bubrick) at October 17, 2014 05:39 PM

Terry's Worklog

MarcEdit LibHub Plug-in

As libraries begin to join and participate in systems to test Bibframe principles, my hope is that when possible, I can provide support through MarcEdit to provide these communities a conduit to simplify the publishing of information into those systems.  The first of these test systems is the Libhub Initiative, and working with Eric Miller and the really smart folks at Zepheira (http://zepheira.com/), have created a plug-in specifically for libraries and partners working with the LibHub initiative.  The plug-in provides a mechanism to publish a variety of metadata formats into the system – from MARC, MARCXML, EAD, and MODS data – the process will hopefully help users contribute content and help spur discussion around the data model Zepheira is employing with this initiative.

For the time being, the plug-in is private, and available to any library currently participating in the LibHub project.  However, my understanding is that as they continue to ramp up the system, the plugin will be made available to the general community at large.

For now, I’ve published a video talking about the plug-in and demonstrating how it works.  If you are interested, you can view the video on YouTube.

 

–tr

by reeset at October 17, 2014 03:19 AM

Automated Language Translation using Microsoft’s Translation Services

We hear the refrain over and over – we live in a global community.  Socially, politically, economically – the ubiquity of the internet and free/cheap communications has definitely changed the world that we live in.  For software developers, this shift has definitely been felt as well.  My primary domain tends to focus around software built for the library community, but I’ve participated in a number of open source efforts in other domains as well, and while it is easier than ever to make one’s project/source available to the masses, efforts to localize said projects is still largely overlooked.  And why?  Well, doing internationalization work is hard and often times requires large numbers of volunteers proficient in multiple languages to provide quality translations of content in a wide range of languages.  It also tends to slow down the development process and requires developers to create interfaces and inputs that support language sets that they themselves may not be able to test or validate.   

Options

If your project team doesn’t have the language expertise to provide quality internalization support, you have a variety of options available to you (with the best ones reserved for those with significant funding).  These range of tools available to open source projects like: TranslateWiki (https://translatewiki.net/wiki/Translating:New_project) which provides a platform for volunteers to participate in crowd-sourced translation services.  There are also some very good subscription services like Transifex (https://www.transifex.com/), a subscription service that again, works as both a platform and match-making service between projects and translators.  Additionally, Amazon’s Mechanical Turk can be utilized to provide one off translation services at a fairly low cost.  The main point though, is that services do exist that cover a wide spectrum in terms of cost and quality.   The challenge of course, is that many of the services above require a significant amount of match-making, either on the part of the service or the individuals involved with the project and oftentimes money.  All of this ultimately takes time, sometimes a significant amount of time, making it a difficult cost/benefit analysis of determining which languages one should invest the time and resources to support.

Automated Translation

This is a problem that I’ve been running into a lot lately.  I work on a number of projects where the primary user community hails largely from North America; or, well, the community that I interact with most often are fairly English language centric.  But that’s changing — I’ve seen a rapidly growing international community and increasing calls for localized versions of software or utilities that have traditionally had very niche audiences. 

I’ll use MarcEdit (http://marcedit.reeset.net) as an example.  Over the past 5 years, I’ve seen the number of users working with the program steadily increase, with much of that increase coming from a growing international user community.  Today, 1/3-1/2 of each month’s total application usage comes from outside of North America, a number that I would have never expected when I first started working on the program in 1999.  But things have changed, and finding ways to support these changing demographics are challenging.. 

In thinking about ways to provide better support for localization, one area that I found particularly interesting was the idea of marrying automated language transcription with human intervention.  The idea being that a localized interface could be automatically generated using an automated translation tool to provide a “good enough” translation, that could also serve as the template for human volunteers to correct and improve the work.  This would enable support for a wide range of languages where English really is a barrier but no human volunteer has been secured to provide localized translation; but would enable established communities to have a “good enough” template to use as a jump-off point to improve and speed up the process of human enhanced translation.  Additionally, as interfaces change and are updated, or new services are added, automated processes could generate the initial localization, until a local expert was available to provide the high quality transcription of the new content, to avoid slowing down the development and release process.

This is an idea that I’ve been pursing for a number of months now, and over the past week, have been putting into practice.  Utilizing Microsoft’s Translation Services, I’ve been working on a process to extract all text strings from a C# application and generate localized language files for the content.  Once the files have been generated, I’ve been having the files evaluated by native speakers to comment on quality and usability…and for the most part, the results have been surprising.  While I had no expectation that the translations generated through any automated service would be comparable to human mediated translation, I was pleasantly surprised to hear that the automated data is very often, good enough.  That isn’t to say that it’s without its problems, there are definitely problems.  The bigger question has been, do these problems impede the use of the application or utility.  In most cases, the most glaring issue with the automated translation services has been context.  For example, take the word Score.  Within the context of MarcEdit and library bibliographic description, we know score applies to musical scores, not points scored in a game…context.  The problem is that many languages do make these distinctions with distinct words, and if the translation service cannot determine the context, it tends to default to the most common usage of a term – and in the case of library bibliographic description, that would be often times incorrect.  It’s made for some interesting conversations with volunteers evaluating the automated translations – which can range from very good, to down right comical.  But by a large margin, evaluators have said that while the translations were at times very awkward, they would be “good enough” until someone could provide better a better translation of the content.  And what is more, the service gets enough of the content right, that it could be used as a template to speed the translation process.  And for me, this is kind of what I wanted to hear.

Microsoft’s Translation Services

There really aren’t a lot of options available for good free automated translation services, and I guess that’s for good reason.  It’s hard, and requires both resources and adequate content to learn how to read and output natural language.  I looked hard at the two services that folks would be most familiar with: Google’s Translation API (https://cloud.google.com/translate/) and Microsoft’s translation services (https://datamarket.azure.com/dataset/bing/microsofttranslator).  When I started this project, my intention was to work with Google’s Translation API – I’d used it in the past with some success, but at some point in the past few years, Google seems to have shut down its free API translation services and replace them with a more traditional subscription service model.  Now, the costs for that subscription (which tend to be based on number of characters processed) is certainly quite reasonable, my usage will always be fairly low and a little scattershot making the monthly subscription costs hard to justify.  Microsoft’s translation service is also a subscription based service, but it provides a free tier that supports 2 million characters of through-put a month.  Since that more than meets my needs, I decided to start here. 

The service provides access to a wide range of languages, including Klingon (Qo’noS marcedit qaStaHvIS tlhIngan! nuq laH ‘oH Dunmo’?), which made working with the service kind of fun.  Likewise, the APIs are well-documented, though can be slightly confusing due to shifts in authentication practice to an OAuth Token-based process sometime in the past year or two.  While documentation on the new process can be found, most code samples found online still reference the now defunct key/secret key process.

So how does it work?  Performance-wise, not bad.  In generating 15 language files, it took around 5-8 minutes per file, with each file requiring close to 1600 calls against the server, per file.  As noted above, accuracy varies, especially when doing translations of one word commands that could have multiple meanings depending on context.  It was actually suggested that some of these context problems may actually be able to be overcome by using a language other than English as the source, which is a really interesting idea and one that might be worth investigating in the future. 

Seeing how it works

If you are interested in seeing how this works, you can download a sample program which pulls together code copied or cribbed from the Microsoft documentation (and then cleaned for brevity) as well as code on how to use the service from: https://github.com/reeset/C–Language-Translator.  I’m kicking around the idea of converting the C# code into a ruby gem (which is actually pretty straight forward), so if there is any interest, let me know.

–tr

by reeset at October 17, 2014 01:13 AM

October 16, 2014

OCLC Cataloging and Metadata News

September 2014 data update now available for the WorldCat knowledge base

The WorldCat knowledge base continues to grow with new providers and collections added monthly.  The details for September updates are now available in the full release notes.

October 16, 2014 08:00 PM

Collection Manager September 2014 release notes now available

The details for the September updates to Collection Manager are now available in the full release notes.

October 16, 2014 06:30 PM

Thingology (LibraryThing's ideas blog)

NEW: Annotations for Book Display Widgets

Our Book Display Widgets is getting adopted by more and more libraries, and we’re busy making it better and better. Last week we introduced Easy Share. This week we’re rolling out another improvement—Annotations!

Book Display Widgets is the ultimate tool for libraries to create automatic or hand-picked virtual book displays for their home page, blog, Facebook or elsewhere. Annotations allows libraries to add explanations for their picks.

Station Eleven

Some Ways to Use Annotations

1. Explain Staff Picks right on your homepage.
Column McCann
2. Let students know if a book is reserved for a particular class.
Semiotics
3. Add context for special collections displays.
Blueberries for Sal

How it Works

Check out the LibraryThing for Libraries Wiki for instructions on how to add Annotations to your Book Display Widgets. It’s pretty easy.

Interested?

Watch a quick screencast explaining Book Display Widgets and how you can use them.

Find out more about LibraryThing for Libraries and Book Display Widgets. And sign up for a free trial of either by contacting ltflsupport@librarything.com.

by KJ at October 16, 2014 02:21 PM

Mod Librarian

5 Things Thursday: Dead People, Metadata Games, DAM

Here are five things for you!

  1. Are you in Seattle on October 28th? Go to the Central Library for a special program Voices from Beyond the Grave: Stories from Seattle Cemeteries.
  2. How much do I love metadata games?
  3. DAM user expectations
  4. Librarians reactions to Pew study on social media.
  5. Library research in the 1980’s.

View On WordPress

October 16, 2014 01:06 PM

October 14, 2014

Thingology (LibraryThing's ideas blog)

Send us a programmer, win $1,000 in books.

We just posted a new job post Job: Library Developer at LibraryThing (Telecommute).

To sweeten the deal, we are offering $1,000 worth of books to the person who finds them. That’s a lot of books.

Rules! You get a $1,000 gift certificate to the local, chain or online bookseller of your choice.

To qualify, you need to connect us to someone. Either you introduce them to us—and they follow up by applying themselves—or they mention your name in their email (“So-and-so told me about this”). You can recommend yourself, but if you found out about it from someone else, we hope you’ll do the right thing and make them the beneficiary.

Small print: Our decision is final, incontestable, irreversible and completely dictatorial. It only applies when an employee is hired full-time, not part-time, contract or for a trial period. If we don’t hire someone for the job, we don’t pay. The contact must happen in the next month. If we’ve already been in touch with the candidate, it doesn’t count. Void where prohibited. You pay taxes, and the insidious hidden tax of shelving. Employees and their families are eligible to win, provided they aren’t work contacts. Tim is not.

» Job: Library Developer at LibraryThing (Telecommute)

by Tim at October 14, 2014 05:24 PM

Job: Library Developer at LibraryThing (Telecommute)

Code! Code! Code!

LibraryThing, the company behind LibraryThing.com and LibraryThing for Libraries, is looking to hire a top-notch developer/programmer.

We like to think we make “products that don’t suck,” as opposed to much of what’s developed for libraries. We’ve got new ideas and not enough developers to make them. That’s where you come in.

The Best Person

  • Work for us in Maine, or telecommute in your pajamas. We want the best person available.
  • If you’re junior, this is a “junior” position. If you’re senior, a “senior” one. Salary is based on your skills and experience.

Technical Skills

  • LibraryThing is mostly non-OO PHP. You need to be a solid PHP programmer or show us you can become one quickly.
  • You should be experienced in HTML, JavaScript, CSS and SQL.
  • We welcome experience with design and UX, Python, Solr, and mobile development.
shutterstock_59784454
The highly-photogenic LibraryThing staff only use stock photos ironically.

What We Value

  • Execution is paramount. You must be a sure-footed and rapid coder, capable of taking on jobs and finishing them with attention and expedition.
  • Creativity, diligence, optimism, and outspokenness are important.
  • Experience with library data and systems is favored.
  • LibraryThing is an informal, high-pressure and high-energy environment. This puts a premium on speed and reliability, communication and responsibility.
  • Working remotely gives you freedom, but also requires discipline and internal motivation.

Compensation

  • Gold-plated health insurance.
  • Cheese.

How To Apply

  • We have a simple quiz, developed back in 2011. If you can do it in under five minutes, you should apply for the job! If not, well, wasn’t that fun anyway?
  • To apply, send a resume. Skip the cover letter, and go through the blog post in your email, responding to the tangibles and intangibles bullet-by-bullet.
  • Also include your solution to the quiz, and how long it took you. Anything under five minutes is fine. If it takes you longer than five minutes, we won’t know. But the interview will involve lots of live coding.
  • Feel free to send questions to tim@librarything.com, or Skype chat Tim at LibraryThingTim.
  • Please put “Library developer” somewhere in your email subject line.

by Tim at October 14, 2014 05:04 PM

October 13, 2014

First Thus

RDA-L Re: Re: Changes in RDA

Posting to RDA-L

On 13/10/2014 10.24, vries***.nl wrote:

LC practice/PCC practice (and Anglo-American practice in general) is to apply the alternative

I am very surprised by this and there goes the international aspirations of RDA!

Doesn’t the basic instruction regarding the recording of title imply to record what’s in the source, neither change, supply. or omit?

Exactly, so why this alternative?

As I wrote in an earlier post, this entire “problem” can and should be avoided once the emphasis of cataloging stops being placed on the creation of left-anchored text, but on adding the correct link for our linked data. http://blog.jweinheimer.net/2014/10/acat-rda-l-changes-in-rda.html

The example I gave was the book (in English) “The Swiss Family Robinson” which drops the initial article in English, but retains it in some other rules, as we see in the VIAF record.
xR Extended Titles-test
100 1 _ ‎‡a Wyss, J. D.‏ ‎‡0 (viaf)102332157‏ ‎‡t Swiss family robinson‏
National Library of Spain
100 1 _ ‎‡a Wyss, Johann David‏ ‎‡d 1743-1818‏ ‎‡t Der schweizerische Robinson‏
National Library of Australia
100 1 _ ‎‡a Wyss, Johann David,‏ ‎‡d 1743-1818.‏ ‎‡t Schweizerische Robinson‏
National Library of France
240 _ _ ‎‡a Wyss‏ ‎‡b Johann David‏ ‎‡f 1743-1818‏ ‎‡t Der Schweizerische Robinson‏
Library of Congress/NACO
100 1 0 ‎‡a Wyss, Johann David,‏ ‎‡d 1743-1818.‏ ‎‡t Schweizerische Robinson‏

which have different forms of personal names and different titles. In the wonderful world of linked data that the cataloging world is aiming for, there will be this link: http://viaf.org/viaf/176999342 and the display can be any, or even all of these headings, displayed however someone could want.

We can compare this to even more forms found in dbpedia that use initial articles http://dbpedia.org/page/The_Swiss_Family_Robinson (scroll down to the section owl.sameAs) (I personally think we should always be comparing our practices to the more public tools–especially dbpedia–but that is another matter)

Once all of that is done, the next question is: how will the public find this title? Just as they do now in tools such as Wikipedia. If they search “swiss family robinson” or “the swiss family robinson” it is all keyword and it makes no real difference.

Of course, all of this assumes that the links are inserted–correctly, consistently and by everyone–(wow! What an assumption!)–and that systems exist that allow the public to use all of this. I admit that creating and managing all of that will take quite some time and will cost quite a bit of money.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at October 13, 2014 10:15 AM

Catalogue & Index Blog

cilipcig

Following on from Lynne Dyer’s post on our recent conference we have heard about other blogs that have started to appear. Links to those we know about are below but please let us know about any more, either through the comments at the bottom of this page or via email.

Lynne Dyer – Metadata – Making an Impact, CILIP CIG Conference 

Karen Pierce – The Impact of Metadata at the CILIP CIG 2014 Conference

Richard Lamin – The Cataloguer’s Tale part 1


by cilipcig at October 13, 2014 07:27 AM

October 12, 2014

Resource Description & Access (RDA)

RDA Cataloging Example of Selections & Translations

CASE: Selected plays of a Panjabi language author translated into Hindi language.

Bibliographic Record


Authority Record

LC control no.:n 2012217312
LCCN permalink:http://lccn.loc.gov/n2012217312
HEADING:Gurasharana Siṅgha, 1929-2011. Plays. Selections. Hindi
00000528cz a2200133n 450
0019272751
00520130618010111.0
008130524n| azannaabn |a aaa
010__ |a n 2012217312
040__ |a DLC |b eng |c DLC |e rda
1000_ |a Gurasharana Siṅgha, |d 1929-2011. |t Plays. |k Selections. |l Hindi
4000_ |a Gurasharana Siṅgha, |d 1929-2011. |t Pratinidhi nāṭaka
4000_ |a Gurasharana Siṅgha, |d 1929-2011. |t Pratinidhi natak
670__ |a Pratinidhi nāṭaka, 2012: |b title page (Pratinidhi nāṭaka) title page verso (Pratinidhi natak)
[Source: Library of Congress]

by Salman Haider (noreply@blogger.com) at October 12, 2014 03:38 PM

October 10, 2014

First Thus

ACAT RDA-L Changes in RDA

Posting to Autocat, RDA-L

On 10/9/2014 9:46 PM, Adam L. Schiff wrote:

The examples are simply being changed to match the main instructions in RDA 6.2.1.7.
6.2.1.7 Initial Articles
When recording the title, include an initial article, if present.
EXAMPLE
The invisible man
Der seidene Faden
Eine kleine Nachtmusik
La vida plena
The most of P.G. Wodehouse

Following this main instruction, there is an alternative to omit an initial article “unless the title for a work is to be accessed under that article (e.g., a title that begins with the name of a person or place).”

LC practice/PCC practice (and Anglo-American practice in general) is to apply the alternative. However, examples throughout RDA illustrate the basic instructions, not the alternative or exceptions. When 6.2.1.7 was revised to make the basic instruction include initial articles, the examples throughout RDA were not changed at the same time. But they are being changed now. PCC and U.S. cataloging practice will continue to follow the alternative instruction in RDA though.

This is one of those perennial issues that I hoped would more or less disappear. Some cultures do not file under initial articles but others do. In any case, the purpose of omitting initial articles is for browsing left-anchored text (so that you don’t look for “The Swiss Family Robinson” under “T”) but I don’t know how many users do that any more. I haven’t seen users do it. I don’t do it. [Added to online version: with the exception of when I am cataloging something. That is when I have the book (or other item) in my hands. Of course, this is specific to the act of cataloging. Users do not do this because once they have the item in hand, they no longer need to look at the record] I confess that even I, when I look up names or titles, I just throw words together, in any order that comes to me, e.g. “finn mark twain”. I’ve seen lots of people do that and I recently discovered this type of searching even has been considered a type of language with names such as “Searchese” or even “Caveman” (which I like!).

The current trends in searching are more toward “conversational search” which is based on natural language. See: http://allthingsd.com/20130314/how-search-is-evolving-finally-beyond-caveman-queries/ There have been some impressive advances. Jeopardy’s Watson was simply incredible but now these methods are being introduced into everybody’s smartphones and browsers–for free. http://searchengineland.com/google-upgrades-conversational-search-mobile-apps-205535 and Google Chrome does it now.

From my own experience of the Google conversational search, it is simply bad and nowhere near as good as Watson, but who knows where it will be in just 5 years from now?

In any case, search technology is evolving and the public will become more and more accustomed to those methods, while our traditional methods will look increasingly strange. As a consequence, these debates whether to add an initial article or not are becoming obsolete and irrelevant to what people really do, and is similar to arguing how IBM punched cards should be used today. We should be adapting our methods to the public instead of expecting them to do left-anchored text browses, eliminating (or not) initial articles. That is a remnant of days long past.

Linked data means that everything will be based on URIs and that using text for authority purposes is going away, e.g. this one for Swiss Family Robinson which has one link but all kinds of forms http://www.viaf.org/viaf/176999342/#Wyss,_Johann_David,_1743-1818._|_Schweizerische_Robinson. There is also this from dbpedia which the public would possibly find more useful http://dbpedia.org/page/The_Swiss_Family_Robinson

Once the links are input (if that ever starts to happen!) the task will then be to match a searcher interested in this title to these–and other–links, plus making everything coherent and relatively easy to use.

I think that is when things will become interesting!

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at October 10, 2014 11:45 AM

A Gentleman's Guide to Cataloguing...

October 09, 2014

Mod Librarian

5 Things Thursday: Taxonomy, Linked Data and DAM

Here are five more things…

  1. How to organize WordPress posts using taxonomy.
  2. Awesome visual of the reality of taxonomy versus the expectation.
  3. Want to make your head spin? Check out this Linked Open Data Cloud Diagram.
  4. Friendlier DAM for small businesses.
  5. Tips for stock photography.

View On WordPress

October 09, 2014 01:11 PM

Catalogue & Index Blog

Lynne Dyer Blog post

Below is a link to a blog post by Lynne Dyer on our recent conference.

Lynne Dyer Blog post

Many thanks to Lynne for allowing us to share her post. If anyone else has written anything on our conference they would like us to share then please get in touch!


by cilipcig at October 09, 2014 11:58 AM

First Thus

RDA-L Re: Date from Preface

Posting to RDA-L

On 10/9/2014 8:31 AM, Heidrun Wiesenmüller wrote:

According to the logics of RDA, I think you have to put this date in square brackets. … But I’m not at all happy with all the bracketed years which we get for resources which only show copyright dates. I think it must be highly confusing for our users. Has anybody already experienced how user react to all these bracketed dates?

While I am 100% in favor of discovering how the public reacts to our records, I am also 99% positive that no one in the public would care one bit about brackets in the dates. It must be stated that the brackets are not input for the public, but for the sake of the poor, lonely, overworked and underpaid cataloger, to alert him or her to the fact that the date is not to be found in the usual places and therefore, they will have to work harder.

I would add that if you have any pity for your colleagues, when you find it in a place that is really hidden, such as buried in a preface, please make a note of it–adding the page numbers please!–so that your coworkers do not have to waste time trying to dig it out on their own. Again, the public doesn’t care, but other catalogers do. A lot.

I compare it to when I was talking with a mechanic once. I had noticed a strange little trapdoor in the side of my car. I asked the mechanic what it was, and he told me, “You don’t want to know. That’s there for me to do my work better and cheaper.” It’s the same thing with the cataloging methods such as brackets. The public doesn’t need to know and if they are interested, they can ask. They will probably find out that don’t want to know. But seeing it doesn’t disturb anything they do.

Yet, if we really are concerned about how the public views our records, how about one of these? http://lccn.loc.gov/2013033237 Here are some of the high points:

Uniform title: 41 (Nelson and Perry)
Related names:
Nelson, Michael, 1949- editor of compilation.
Perry, Barbara A. (Barbara Ann), 1956- editor of compilation.
Nelson, Michael, 1949- George Bush. Contains (work):
White Burkett Miller Center, sponsoring body.

Content type: text
Media type: unmediated
Carrier type: volume

Who understands that? Each of these points is much harder for anyone (including librarians!) to understand than a tiny pair of brackets. But everything went through nevertheless. Still, I haven’t heard of people rising up in arms about not understanding these kinds of records. Why? Because people have a tendency to ignore what they don’t understand.

I am all for finding out how the public relates to catalog records. It is long past time.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at October 09, 2014 06:55 AM

October 08, 2014

Terry's Worklog

MarcEdit Sept. 2014 server log snapshot

Here’s a snapshot of the server log data as reported through Awstats for the marcedit.reeset.net subdomain. 

Server log stats for Sept. 2014:

  • Logged MarcEdit uses: ~190,000
  • Unique Users: ~17,000
  • Bandwidth Used: ~14 GB

Top 10 Countries by Bandwidth:

  1. United States
  2. Canada
  3. China
  4. India
  5. Australia
  6. Great Britain
  7. Mexico
  8. Italy
  9. Spain
  10. Germany

Countries by Use (with at least 100+ reported uses)

clip_image002[4]

United States

clip_image004[4]

Canada

clip_image006[4]

Australia

clip_image008[4]

Italy

clip_image010[4]

India

clip_image012[4]

Great Britain

clip_image014[4]

China

clip_image016[4]

Finland

clip_image018[4]

Poland

clip_image020[4]

France

clip_image022[4]

Germany

clip_image024[4]

Ukraine

clip_image026[4]

Philippines

clip_image028[4]

Mexico

clip_image030[4]

New Zealand

clip_image032[4]

Brazil

clip_image034[4]

Spain

clip_image036[4]

Russian Federation

clip_image038[4]

Hong Kong

clip_image040[4]

Colombia

clip_image042[4]

Taiwan

clip_image044[4]

Egypt

clip_image046[4]

Sweden

clip_image048[4]

Denmark

clip_image050[4]

Saudi Arabia

clip_image052[4]

Turkey

clip_image054[4]

Argentina

clip_image056[4]

Greece

clip_image058[4]

Belgium

clip_image060[4]

Pakistan

clip_image062[4]

Georgia

clip_image064[4]

Malaysia

clip_image066[4]

Czech Republic

clip_image068[4]

Thailand

clip_image070[4]

Netherlands

clip_image072[4]

Japan

clip_image074[4]

Bangladesh

clip_image076[4]

Chile

clip_image078[4]

Ireland

clip_image080[4]

Switzerland

clip_image082[4]

Vietnam

clip_image084[4]

El Salvador

clip_image086[4]

Venezuela

clip_image088[4]

Kazakhstan

clip_image090[4]

Romania

clip_image092[4]

European country

clip_image094[4]

Norway

clip_image096[4]

Belarus

clip_image098[4]

United Arab Emirates

clip_image100[4]

South Africa

clip_image102[4]

Estonia

clip_image104[4]

Portugal

clip_image106[4]

Singapore

clip_image108[4]

Austria

clip_image110[4]

Indonesia

clip_image112[4]

South Korea

clip_image114[4]

Kenya

clip_image116[4]

Bolivia

clip_image118[4]

Israel

clip_image120[4]

Sudan

clip_image122[4]

Ecuador

clip_image124[4]

Qatar

clip_image126[4]

Nepal

clip_image128[4]

Slovak Republic

clip_image130[4]

Algeria

clip_image132[4]

Lithuania

clip_image134[4]

Costa Rica

clip_image136[4]

Rwanda

clip_image138[4]

Guatemala

clip_image140[4]

Peru

clip_image142[4]

Slovenia

clip_image144[4]

Iran

clip_image146[4]

Morocco

clip_image148[4]

Moldova

clip_image150[4]

Mauritius

clip_image152[4]

Croatia

clip_image154[4]

Kuwait

clip_image156[4]

Republic of Serbia

clip_image158[4]

Armenia

clip_image160[4]

Jordan

clip_image162[4]

Cameroon

clip_image164[4]

Sri Lanka

clip_image166[4]

Puerto Rico

clip_image168[4]

Dominican Republic

clip_image170[4]

Jamaica

clip_image172[4]

Cuba

clip_image174[4]

Iraq

clip_image176[4]

Oman

clip_image178[4]

Zimbabwe

clip_image180[4]

Tunisia

clip_image182[4]

Benin

clip_image184[4]

Uruguay

clip_image186[4]

Honduras

clip_image188[4]

Ivory Coast (Cote D’Ivoire)

clip_image190[4]

Syria

clip_image192[4]

Hungary

clip_image194[4]

Latvia

clip_image196[4]

Cyprus

clip_image198[4]

Macau

clip_image200[4]

Papua New Guinea

clip_image202[4]

Malawi

clip_image204[4]

Nigeria

clip_image206[4]

Netherlands Antilles

clip_image208[4]

Zambia

clip_image210[4]

Tanzania

clip_image212[4]

Panama

clip_image214[4]

Uganda

clip_image216[4]

Palestinian Territories

 

Aland islands

clip_image218[4]

Bosnia-Herzegovina

clip_image220[4]

Ethiopia

 

Tadjikistan

clip_image222[4]

Senegal

clip_image224[4]

Ghana

clip_image226[4]

Mongolia

clip_image228[4]

Luxembourg

by reeset at October 08, 2014 02:23 AM

October 07, 2014

Bibliographic Wilderness

Catching HTTP OPTIONS /* request in a Rails app

Apache sometimes seems to send an HTTP “OPTIONS /*” request to Rails apps deployed under Apache Passenger.  (Or is it “OPTIONS *”? Not entirely sure). With User-Agent of “Apache/2.2.3 (CentOS) (internal dummy connection)”.

Apache does doc that this happens sometimes, although I don’t understand it.

I’ve been trying to take my Rails error logs more seriously to make sure I handle any bugs revealed. 404’s can indicate a problem, especially when the referrer is my app itself. So I wanted to get all of those 404’s for Apache’s internal dummy connection out of my log.  (How I managed to fight with Rails logs enough to actually get useful contextual information on FATAL errors is an entirely different complicated story for another time).

How can I make a Rails app handle them?

Well, first, let’s do a standards check and see that RFC 2616 HTTP 1.1 Section 9 (I hope I have a current RFC that hasn’t been superseded) says:

If the Request-URI is an asterisk (“*”), the OPTIONS request is intended to apply to the server in general rather than to a specific resource. Since a server’s communication options typically depend on the resource, the “*” request is only useful as a “ping” or “no-op” type of method; it does nothing beyond allowing the client to test the capabilities of the server. For example, this can be used to test a proxy for HTTP/1.1 compliance (or lack thereof).

Okay, sounds like we can basically reply with whatever we want to this request, it’s a “ping or no-op”.  How about a 200 text/plain with “OK\n”?

Here’s a line I added to my Rails routes.rb file that seems to catch the “*” requests and just respond with such a 200 OK.

  match ':asterisk', via: [:options], 
     constraints: { asterisk: /\*/ }, 
     to:  lambda {|env| [200, {'Content-Type' => 'text/plain'}, ["OK\n"]]}


Since “*” is a special glob character to Rails routing, looks like you have to do that weird constraints trick to actually match it. (Thanks to mbklein, this does not seem to be documented and I never would have figured it out on my own).

And then we can use a little “Rack app implemented in a lambda” trick to just return a 200 OK right from the routing file, without actually having to write a controller action somewhere else just to do this.

I have not yet tested this extensively, but I think it works? (Still worried if Apache is really requesting “OPTIONS *” instead of “OPTIONS /*” it might not be. Stay tuned.)


Filed under: General

by jrochkind at October 07, 2014 09:36 PM