Search This Blog

Saturday, August 25, 2012

ERIC removes PDFs due to privacy issues

Researchers have started hitting deadends in ERIC (the Educational Resources Information Center) database when trying to download/view fulltext PDFs. PDFs downloads were ceased due to privacy concerns. I haven't seen much written about this other than a brief post at Inside Higher Ed.

ERIC's statement included this message:
A limited number of ERIC full-text documents are available at this time due to privacy concerns about information contained in some of the collection. Although the documents in ERIC had been publicly available in microfiche for many years, the advent of the Internet has amplified the possibility that someone could make improper use of information in these ERIC documents.

 On 8/23/2012, they posted an update about the restoration of access to some PDFs including a list of what has been restored.
"While searching for full text on the ERIC website at, note that there are some temporary search and display limitations caused by the recent, unexpected suspension of PDF downloading. As soon as the majority of full text is restored to the site, all features will return to normal operation. Limitations include:
  • PDF links continue to display in ERIC records, even when full text is currently not available.

  • The ability to limit search results to only records with associated full text may not return reliable results."
  • HMMM..... As if patrons aren't confused enough with the results they receive.

    In the article from Inside Higher Ed (published prior to the update statement at ERIC):
    There’s nothing on the site explaining exactly what that concern is, but some speculate that documents that once were distributed to libraries on microfiche (if the library could afford it) are much more exposed on the web, and it may be that someone discovered that one or more of the documents contained sensitive material  that identified subjects in a study or possibly revealed a social security number.
    This does seem to be the case as a separate ERIC post states:
    We apologize for the inconvenience, and we are working to isolate the affected documents to ensure that personally identifying information that authors may have inadvertently included is not displayed. Our goal is to return full text access to users as quickly as possible.
    All of this does raise interesting questions about preweb materials being digitized or uploaded to web based databases. Tools and better scanners have made scanning and dumping content into a database much easier; yet, the ability to review materials and assign appropriate metadata for better search results - at some point those require human oversight. I'm not sure how the materials get into ERIC, but I've worked on enough digital projects where content is dumped in, with either machine generated or user generated metadata to know that there has to be experienced staff involved - otherwise, things which should not be in the database end up there, materials are hard to find due to a lack of appropriate keywords/subject headings (metadata), and mistakes abound. Our machines are just not able to effectively apply context to data (yet).

    No comments: