Search This Blog

Showing posts with label Data. Show all posts
Showing posts with label Data. Show all posts

Tuesday, April 8, 2014

Making it happen - linkeddata, bigdata, opendata and the semantic web

So, a question was asked in one of my linkeddata discussion groups. How does linkeddata, bigdata, opendata and the semantic web fit together?

My answer:
The Semantic web is the interface to make sense of all of the data (bigdata) which has to be accessible (open data); linkeddata is the mechanism making the data connections which are interpreted by the semantic web.

A very simplified view, for sure but I think it sums it up in a sentence.  

Friday, March 14, 2014

OpenLibrary - what it is (besides trending on reddit)


In addition to being a digital library, it is a linked data experiment with strong accessibility support. It is often cited as an example of linking bib data.






It was created by Aaron Swartz (yes, the Aaron Swartz) and is now a project of the Internet Archive (woot! love the IA).





https://openlibrary.org/

Linked data info:
http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Open_Library_Data

Watch it blow up reddit:
http://www.reddit.com/domain/openlibrary.org/

Tuesday, January 7, 2014

Thoughts on CATALOGING, RDA, and metadata in netflix

 I have so many thoughts on this nexflix article but they can all be summed as humans and machines working together to organize, describe and provide relevance (the best of both worlds!) : semantic cataloging. Of course, libraries have been organizing, categorizing, and describing materials from the beginning, but RDA is a big step forward. With the end of print card catalogs and record limits (for the most part), the amount of data within a library catalog record can be much more expansive. Other library databases like repositories and digital libraries, generally have not faced record limits nor have they been tied to MARC (which has its own pros/cons). Of course, quantity doesn't always equal quality, either, but under RDA, we can provide as much description as we would like.

Another aspect of RDA is breaking up more data into smaller bits. Information that might have only appeared in a free text note field or was omitted from a library catalog record, may now be included in -- in some cases, as part of a controlled vocabulary, such as relator codes. These CODES provide information about the relationship of a particular person to a variety of things and can be used to build different kinds of linking, relevance, and all sorts of things! Libraries could create mechanisms so that users and others can more easily use the data to dynamically build lists or collections that are relevant to them (there's the semantic aspect!)  Of course, in order to use the data to make new things, it has to be open

Netflix has had a similar evolution in metadata. Thinking to what our nexgen library catalog systems could be like, let's look at what Netflix has done (and what a few folks have done with their data, which could only happen with at least, some of the data being open). 

Tagging/Data
It starts with people creating data and machine data collection:

"They [workers] capture dozens of different movie attributes. They even rate the moral status of characters. When these tags are combined with millions of users viewing habits, they become Netflix's competitive advantage. "


Much like traditional cataloging work, tagging is only as good as the tagger. The advantage that libraries have had is that the staff who do this sort of work (cataloging) most likely have some sort of training or relevant education.

In most popular social media (facebook, twitter, etc.) and image gallery sites (flickr, youtube, etc.) sub-tags if any, are limited: geographic (GIS , frequently from phone or camera gps coordinates in the exif metadata), subjects (topics as input by the uploader or tagger), names (user who uploaded or who tags other users in item), dates (item uploaded), access (public/private/select user group), system file information (file format, name, etc.)  and rights (copyright, permissions, etc.) are among the most common. For some image sites, exif data will automatically be loaded in, most frequently date, type of camera, file information and general image specs (size, resolution, etc.) ; other information such as rights (copyright)  is less likely to be picked up.  Facebook's support of metadata is marginal* (EXIF metadata is stripped out) and while Flickr does support the most metadata for images*, it relies primarily on the user to fill out the forms correctly to describe and assign the metadata. (See photometadata.org for more information about EXIF and social media).

In terms of search, crowdsourced metadata can be a challenge. It is only as good (and complete!) as the user who creates it. If you have ever searched for hashtags in twitter, or tags in Flickr, you will see they are used every way imaginable. Hashtags are used as a statement #fail #thisisstupid #greatread,  duplicated #ala (multiple things with the same keyword),  or misspelled #teh (the), with little in the way of quality control placed on them.

Structure
However, there is some structure in place, which facilitates searching by hashtag/tag vs. date.

While libraries have had better systems in that the metadata was created by experts and experienced staff, much of the data in a traditional MARC record is unstructured. Funny, no? We think of MARC as being so structured and while it is in terms of field order and use and the fixed field (character placement is essential there), it is not so structured within some fields, like the 5XX fields or even within the 245 (title/statement of responsibility) field. As long as the indicators are correct and the subfields are input correctly, the content within that field is really a type of free text. albeit with some rules for inputting. For example, while the 245 was and remains under RDA as a transcription field (key it as you see it), there are still "shortcuts" (i.e., ways to minimize data recorded) under RDA (See: a nice overview of changes between AAC2 and RDA). So, while it's transcription, it's not exactly ALWAYS word for word (albeit more so with RDA).

The third major component is that the data is open, or at least partially open.With siloed data, this experiment would have not been possible. Having siloed data decreases its ability to be used by others, as well.



So, how was Netflix able to make this successful from a metadata standpoint?

  • a defined (controlled) vocabulary (subject headings, authorities): " The same adjectives appeared over and over. Countries of origin also showed up, as did a larger-than-expected number of noun descriptions like Westerns and Slasher..."  
  • a structure (for catalogers, a similarity to how subject headings are formatted in a traditional library catalog), in netflix:  
    • Region, Awards named first (at least for Oscars)
    • Adjectives (Keywords, subject headings)
    • Dates and places named last (akin to a geographic subdivision)
"If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s....
In fact, there was a hierarchy for each category of descriptor. Generally speaking, a genre would be formed out of a subset of these components:
Region + Adjectives + Noun Genre + Based On... + Set In... + From the... + About... + For Age X to Y"
 Akin to traditional subject headings:
6510 Sardinia (Italy) $v maps $v Early works to 1800  
650 0  $a Beach erosion $z Florida $z Pensacola Beach $x History $y 20th century $v Bibliography.

  •  data bits that can be repackaged: "little "packets of energy" that compose each movie.... "microtag."" (the smaller the data bits, the more they can be repackaged in different ways) 
 "Netflix's engineers took the microtags and created a syntax for the genres..... "


Thinking back to nexgen systems: RDA is providing a fairly good foundation to go beyond the traditional catalog. When done right (more vs. less, quality AND quantity), cataloging will net structured data bits that can be repackaged and relationship information that can build provide links between previously unrelated items (at least within the catalog); provided the data is open to be used and mechanisms are built so that users can create their own catalog experience. In that world, cataloging truly becomes semantic.
 


References:
Open Bibliographic Data, http://opendefinition.org/bibliographic/
Photometadata.org photometadata.org
AACR2 compared to RDA, field by field: http://www.rda-jsc.org/docs/5sec7rev.pdf  
How netflix reverse engineered hollywood: http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
 

*Disclaimer: I have no idea what the backend systems of sites do with metadata; my thoughts are based upon the user experience. 

Thursday, October 17, 2013

GLAMLOD: Linked data, semantic web group meetup

Some of you may remember this group was formed last year after GLA. Please excuse crossposting: Interested in linked data? Interested in the semantic web? Not even sure what the heck that is or how it applies to libraries, archives, or museums? GLAMLOD: Georgia Libraries, Archives & Museums Linked Open Data (http://www.facebook.com/glamlod/) is hosting a meetup in atlanta in November. Please join our discussion group at google groups or like us on facebook for news and updates. If you're interested in the meetup, please contact a member of the group. Feel free to share this with colleagues who might be interested. Here is the proposed plan. What: This is a GLAMLOD meet-up with presentations and information sharing on tools, training, demos, potential uses, or emerging practices regarding linked data. When: (TBD) Sometime the week of November 11th 2013. 6:30pm - 9:00pm Where: Atlanta GA (Manuel's Tavern) Who: GLAMLOD members and guests How: In person (and we can explore using Skype for remote attendance) Please contact Laura Akerman (liblna@emory.edu), Robin Fay (georgiawebgurl@gmail.com ), or Doug Goans (doug.goans@library.gatech.edu): * If you are interested in attending and especially if you would like to attend virtually. * If you would like to give short presentations or information sharing about linked data. (We are looking for 2-minute lightning talks to about 15 minutes max for each presentation.) * If you have other suggestions for programming.

Thursday, August 8, 2013

Linked data presentations

Reading list: linked data & ex-libris

  1. Linked data and Ex Libris products – introduction - Lukas Koster, University of Amsterdam, Netherlands
  2. Publishing Aleph data as linked open data - Silke Schomburg, HBZ, Germany
  3. Linked open dedup vectors – An experiment with RDFa in Primo - Corey HarperNew York University, USA
  4. Exploiting DBPedia for use in Primo - Ulrike Krabo, OBVSG, Austria
  5. Linking library and theatre data - Lukas Koster,University of Amsterdam, Netherlands
  6. Linked data and Ex Libris products – summary - Lukas Koster, University of Amsterdam, Netherlands
  7. Ex Libris – linked data outlook - Axel Kaschte, Ex Libris

Wednesday, August 7, 2013

RDA/FRBR reading list

Lots of RDA/FRB in this list:

Friday, May 31, 2013

Mendeley news - Acquired by Elsevier, Open Data & more

Mendeley has been acquired by elsevier & other tidbits via the Mendeley May Librarian Newsletter (the talk about Mendeley is very interesting):

--------------------
This month we're giving a look inside our relationship with open data, looking for your feedback on a new user resource, and inviting you to participate in upcoming programs.

1. Mendeley Vision
2. What do you think: New User Guide
3. Upcoming Mendeley Open Day
4. Supporting Researchers: It's what we do
http://us5.campaign-archive1.com/?u=5560fe5e9f52735e40444340c&id=00de5749db&e=59dee60172

Monday, May 20, 2013

LInked data, big data presentations archives

OCLC/Lyrasis discussion/presentation that Peter Murray and I facilitated:

Shared Data:

    

Linked Data:



Archive:      http://tinyurl.com/cob9uur

Wednesday, May 8, 2013

Global Change Queue (Batch edit) @ELUNA 2103 notes


Global Data Change Queue Notes

http://works.bepress.com/julene/ (many batch edit presentations)

What can GDC do?
  • Can edit marc tags, fields
  • can delete, edit, add
  • can set preferences
  • can limit by user names including create rules but not implement - so some one person could create rules but someone else has authority to run; can define by user role what can be edited (R note could be useful for a review/test  process)
Examples:
  • all records must have ____ (specific criteria; R note in the case of POs 910 = PA + lacking 245 indicators )
  • like a global find and replace (R note: YES! yes! So, could fix typos in 5xx fields! or invalid MARC tagging in PO ; looks useful)

How to do it:
  • create record set (R note: we could use old provisional records with incorrect marc indicators as a test)
  • RULE: create a rule use if/then statements
    • further define rules through sets  - (R note: daisy chain together) to edit multiple fields - one rule for each field but then change them
  • Preview /Review before change
    • Will highlight changes
    • Jump through set of records (e.g., 10 records at a time - your choice)
    • If you find something that doesn't belong, you can remove it manually during preview
    • If rule doesn't work, you will get a notice
    • Update or review changes before you actually run
Run job or schedule

More powerful/easier to use than marcedit

More examples - updated authorized headings (RDA)
fixed fields
add OCLC #s
cleanup recon
add/remove standard notes
changed locations - pick and scan for item tho (of course you have to have the barcode.... but you don't have to have piece - R note) doesn't interfere with cataloging work - because whoever has record open has it (“locked” sort of) ; can schedule

Friday, April 12, 2013

Cataloging: Cuttering resources

I put this together for someone else and thought I would share it with you too! ________________________________________
cutter tables at
http://www.itsmarc.com/crs/mergedProjects/cutter/cutter/basic_table_cutter.htm

and the cataloging calculator is a pretty nifty tool:
http://calculate.alptown.com/

This is a good overall resource:
http://www.itsmarc.com/crs/mergedProjects/cutter/cutter/contents.htm

One of the main things to be aware of in cuttering, is the local shelflist. ;-)

As for creating call numbers, for us that would be LC classification, so there is the subject analysis part to get the class and then the cutter. LCSH can be browsed via this list
http://www.biblio.tu-bs.de/db/lcsh/index.htm
I'm not sure how detailed it is, but it seems like a good overall tool.

Friday, December 21, 2012

Survey on Research practices of historians


Ithaka S+R’s Research Support Services for Scholars program has released the report of their NEH-funded study, Supporting the Changing Research Practices of Historians(http://www.sr.ithaka.org/news/understanding-historians-today-%E2%80%94-new-ithaka-sr-report). Here’s a brief description of the project from the report’s Executive Summary:
In 2011-2012, Ithaka S+R examined the changing research methods and practices of academic historians in the United States, with the objective of identifying services to better support them. Based on interviews with dozens of historians, librarians, archivists, and other support services providers, this project has found that the underlying research methods of many historians remain fairly recognizable even with the introduction of new tools and technologies, but the day to day research practices of all historians have changed fundamentally. Ithaka S+R researchers identified numerous opportunities for improved support and training, which are presented as recommendations to information services organizations including libraries and archives, history departments, scholarly societies, and funding agencies.

Monday, November 26, 2012

Bibliographic Framework Initiative (MARC replacement) update

New document released from LoC: Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services http://www.loc.gov/marc/transition/pdf/marcld-report-11-21-2012.pdf
The new, proposed model is simply called BIBFRAME, short for Bibliographic Framework. The new model is more than a mere replacement for the library community's current model/format, MARC. It is the foundation for the future of bibliographic description that happens on, in, and as part of the web and the networked world we live in. It is designed to integrate with and engage in the wider information community while also serving the very specific needs of its maintenance community - libraries and similar memory organizations. It will realize these objectives in several ways:
1. Differentiate clearly between conceptual content and its physical manifestation(s) (e.g., works and instances)
2. Focus on unambiguously identifying information entities (e.g., authorities)
3. Leverage and expose relationships between and among entities
In a web-scale world, it is imperative to be able to cite library data in a way that not only differentiates the conceptual work (a title and author) from the physical details about that work's manifestation (page numbers, whether it has illustrations) but also clearly identifies entities involved in the creation of a resource (authors, publishers) and the concepts (subjects) associated with a resource. Standard library description practices, at least until now, have focused on creating catalog records that are independently understandable, by aggregating information about the conceptual work and its physical carrier and by relying heavily on the use of lexical strings for identifiers, such as the name of an author. The proposed BIBFRAME model encourages the creation of clearly identified entities and the use of machine-friendly identifiers which lend themselves to machine interpretation for those entities.
and thus we start our march to semanticizing our bibliographic data by looking to linking data, which will allow us to have more flexibility in terms of constructing records (& relationships), better authority and bibliographic control (fix in one place, change is propagated across records which consist of aggregated data presented in a framework (most likely in near future, fields), and the ability for our data bits to be harvested (if our data is open) and used outside of traditional library catalogs ...

Wednesday, September 19, 2012

You can get there from here: AACR2 / MARC>RDA / FRBR / Semantic web

Although my graphics didn't turn out too nicely at slideshare, overall I think this covers what I'd like my staff to understand about RDA and FRBR in terms of foundation knowledge. We build from here...

Friday, April 10, 2009

Survey on Cataloging/Metadata Training (please help!)

All~
Would you please share with appropriate lists and colleagues?
Many thanks.

--------------------
This message has been cross-posted to several lists. Please excuse duplication.

Metadata and Cataloging Training Survey link
http://www.surveymonkey.com/s.aspx?sm=S6LEGThhanvSpjVMeVzwWw_3d_3d

---------
We are conducting a survey on training in the cataloging field which for the purposes of this survey includes all forms of metadata generation or correction for bibliographic use.

Catalogers include professional and paraprofessional staff, as well as Library/Information Science students who are doing work as a practicum or field of study.

Background:
The idea for the survey came about in the course of discussion between two managers/trainers at the University of Georgia: Beth Thornton and Robin Fay. Robin has created a multi-media introduction to the UGA Cataloging Department and Beth has been experimenting with using .mp3 files for feedback on record review.

While post-it notes and red pencils still figure prominently in our training repertoire, we are looking for new ideas and things that people at other institutions have hit upon, whether tried-and-true or still in the experimental stages.

Please take a few minutes to respond to our survey
before May 6, 2009. It should take 10-15 minutes to complete. Many thanks.
http://www.surveymonkey.com/s.aspx?sm=S6LEGThhanvSpjVMeVzwWw_3d_3d

This survey is posted by Beth Thornton, Head, Serials Cataloging and Robin Fay, Head, Database Maintenance, University of Georgia Libraries. No information will be linked to any individual or institution.

Monday, October 27, 2008

The vanishing cataloger

In the past few years, I've been noticing a trend amongst many of the smart, talented cataloging colleagues that I know -- they are are leaving the Cataloging profession.
I think there are many factors at work, but I see a few trends:
  • The shifted job: Their jobs have shifted even if their job descriptions or titles have not.
  • The evolved job: Their jobs and official job descriptions have shifted from "Cataloger" to something involving bibliographic services, metadata, electronic resources, digital initiatives, etc. Their home department has also changed names and scope from traditional cataloging services, to encompass more digital projects, non-MARC metadata, and more.
  • The transition job: Their new jobs are no longer centered on traditional cataloging (and in some cases are not in a library) . These librarians are leaving the cataloging profession to be reference librarians, systems librarians, school media librarians, web designers, information architects, technical writers, library science instructors, and more.

I do realize there are still lib sci students studying traditional cataloging, but I am not sure that they expect to find a job as a "Cataloger." I think many of them will be looking to digital libraries, IRs, a consortia, the web, and elsewhere.

Perhaps, this is the reason:

Our catalogers began to disappear with the takeover of that function by OCLC, the nonprofit that aspires to be a corporation in this brave new retail library world. The standardized result of the effort is bypassed by patron and librarian alike, as they turn to the more friendly Amazons, Googles, et al., for the less precise, more watered-down “metadata” that has replaced what used to be cataloging. Apparently, users don’t miss the old catalog, except as a familiar artifact, which is testimony to how low this dumbing down has taken us.
In the new model, that most sacred of our professional duties, the selection of materials to build services and collections, is turned over to either small centralized teams of two or three librarians and clerks, or in extreme cases to an external vendor, usually a library book distributor.
from the LJ's Vanishing Librarian

There is alot of great stuff to be said for the Amazon/Google model, but if librarianship continues to move down that path, will we completely give up the expertise (and respect for that expertise) that is inherent in the profession? Will our services become superficial, mediocre, and generic? Is this part of the larger trend coming out of web2.0 technologies (every one is an expert, so no one is an expert?)

I don't have a good answer to any of those questions, but it does seem like librarianship (especially cataloging) is changing.

Monday, September 8, 2008

Reference in transition (interesting Stephen Abram article)

Although a good portion of what I read in terms of change in libraries is cataloging/metadata/systems related (metadata for electronic & digital objects, nex-gen catalogs, rda, opensource ILS, products to enhance user experience that work with the existing catalog (e.g., vufind, etc.), changes in LC policy, etc.), occasionally an article which is more reference oriented wanders across my desk. I've certainly read alot of articles and case studies about specific libraries using specific new technologies (IM, twitter, myspace, secondlife, etc.) to reach out to users.

Stephen Abrahm's Reference in Transition examines some common (and less common) reference scenarios: "traditional" aka status quo, information commons, learning commons, embedded librarians, partners in action, the remote librarian, team players, the retail librarian, shoulder to shoulder (teaching & training), avatar based/virtual worlds, virtual librarian (no onsite f2f presence), emergency/on-demand librarian, aggregation of user experience, aka crowdsourcing, and finally, the mandatory -- all of these.

I think for the VL/avatar based librarian to become a viable choice, those platforms are going to have get easier to use. SecondLife is very computer resource intensive, and it's not easy to learn to navigate the SL world. Smallworlds seems like it may be a viable option (flash based with no software to download), but really can anything take down SL at this point?

I definitely have to disagree with this statement, though:

For instance, the OPAC and ILS systems don’t suck for library workers. They were built to meet our specific needs — library management, transaction processing, inventory systems, etc. When we moved an internally oriented tool out of the backroom to make it accessible to the “public,” we did a good thing. The unintended consequence of public OPACs, however, has been to teach us that end users have different needs and processes for discovery and navigation than library workers — especially in the virtual digital world.


I haven't met an ILS (and its OPAC) that didn't "suck" in some way. None of them are perfect -- as anyone who has actually done work in one knows -- as anyone who has done a search knows. Just because you might know more about how the ILS works, doesn't always ensure success nor a good experience. Ugly? Poorly designed? Lack of functionality? You betcha. I usually cringe when I go poking around in a library's OPAC. I also don't think our needs are so different from library users -- because after all, we are library users, too.

Thursday, January 17, 2008

Flickr & the Library of Congress

Library of Congress announced today that it has partnered with flickr, putting up 3,000 photos from two of their most popular collections.Only images for which no copyright restrictions are known to exist are included.

The LOC blog post about it is here:
http://www.loc.gov/blog/?p=233

The flickr page is here:
http://www.flickr.com/photos/library_of_congress/

Thursday, November 15, 2007

tagging the art world

A couple of interesting tagging (folksonomies, not graffiti!) projects from the arts/museum world:

Steve Museum
Steve Museum is a largescale tagging project which is supported by an ILMS grant. Partners include the Met and the Guggenheim.

Article about the Steve Museum project and tagging:
http://www.archimuse.com/mw2006/papers/wyman/wyman.html

Steve Musuem (tagging also links to research, working papers, etc.)
http://www.steve.museum/

development site:
http://trac.steve.museum/

The Cleveland Museum of Art (a Steve Museum partner) has a "Help Others Find Me" button, which allows users to enter tags.

Here is a link to a de Kooning work from the museum:
http://tinyurl.com/yrtlml

Thursday, September 13, 2007

opensource alternatives to common commercial products

Although the introduction of this article is overly simplified in terms of the changes in library technology (I couldn't help but laugh a little in a couple of places) , once you're past that, the rest is a good little overview of some of the more popular opensource products. I've hotlinked and listed the products below, the article gives a more indepth overview.

The products are:
ubuntu (ms windows alternative based on linux)
firefox (web browser; ms internet explorer alternative)
openoffice (productivity suite with wordprocessing, presentation, and spreadsheets; ms office alternative)
thunderbird (e-mail + rss reader; ms outlook express alternative)
songbird ( media player; windows media player alternative?)
gimpshop (image editing; adobe photoshop alternative)
pdfcreator (pdf creator; adobe acrobat alternative)
Audacity (audio burning software)
avidemux (video creation)

Other stuff (web publishing, etc.):
wordpress
drupal
mediawiki and also twiki.

As far as libraries go, there is
koha
evergreen
vufind
liblime

I've talked a little about evergreen and vufind here. At home, I still run MS for the operating system and commercial stuff for my server; but then everything else is opensource or web based services (Firefox, gimp, ghostwriter+pdf, openoffice, etc.) Setting up these products on a small personal computer is fairly easy (really!). I'm not sure how that would translate to a large network, which could possibly be a hidden cost factor: installing these, configuring them as needed, and upgrading. Of course, admins already have to do that for any programs that they support. Training issues (oh the fun of trying to teach a group of web editors to use Drupal...) as well as potential security risks given the opensource nature would be other potential costs.


http://www.degreetutor.com/library/managing-expenses/open-source-library