Friday, April 11, 2008

PreCydent Impresses: An Advance In Legal Case Research (with explanation)

I had briefly published this same post, and then had some second thoughts about it (disclosure: PreCydent is a client of the firm I work at). Since the bulk of it has more recently appeared on Tom Smith's blog, The Right Coast, attributed to a "litigation knowlege management attorney at an AMlaw 100 firm" (that's me), I reproduce the original article in full (and for good) here. OK, enough with the parentheticals.


The Post

Legal technology guru Robert Ambrogi posted on March 18 on’s legal technology blog about what he terms an “online legal revolution.” He had earlier posted on the subject on his own blog. He discussed four new web sites that are providing either search or direct access to databases of federal appellate caselaw. What’s happened is that in January 2008 obtained this caselaw through companies such as and has made it available in a fairly easy-to-work-with database format. Three other outfits have taken this data and have turned their search engines loose on it. I initially thought that none of these would be worth a legal researcher’s time, but as I explain in my detailed reviews below, one of them definitely is, for some types of work.

Ambrogi is plainly correct that the information in these databases does and should belong to the public. After all, the body of public caselaw is the highest and most enduring product of the combined wisdom, intellect, and conflict resolution skills of thousands of our best lawyers and judges over the course of more than two hundred years. As the cost of information goes down, and as consumers become more sophisticated about it is inevitable that caselaw will become more and more easy to search through and access, for lawyers and the general public. I didn't get a clear sense from the article of which of the searches he liked better, so I decided to figure out which one I liked best.

My Test

I decided to challenge all of the engines with the same search term—“in personam jurisdiction”, a legal concept that arises on occasion in the case law. The term refers to a particular court’s power to judge a controversy involving a particular person, usually a corporation or person from out of state. IPJ is restricted by statute and by federal constitutional guarantees of procedural due process, the theory being that it’s not fair to haul someone before a court in a state with which they have no connection.

I chose IPJ because, while not a jurisdiction expert, I had studied the subject hard under U. of Mich. Law Prof. Mathias Reimann, and had applied that learning in several instances in my practice as a litigator.

The WinnerPreCYdent.

I was stunned by the results of my search for IPJ on PreCYdent. The top six cases were the leading U.S. Supreme Court cases I studied in Prof. Reimann’s jurisdiction class. Each of them is fundamental to an understanding of the application of personal jurisdiction in federal courts. I have never seen a such a highly relevant set of search results on any electronic case search engine. Not in Westlaw. Not in Lexis. Not anywhere.

Drilling into the cases, each clearly indicated where the page breaks were in the text. Federal appellate cases in the text of the case were hyperlinks to those cases. There also appears to be collaborative social features such as tagging, ranking of how relevant the case was to your search, and display of other’s search terms that also generated this result. Based on these results, those features have not been fully implemented, or used, but as a dedicated tagger I like the idea very much.

This stellar result was no accident or the result of preprogammed "best bets." I tested a few other terms (e.g., "abortion" "sodomy") and had comparably stellar results. The statute search also worked quite well. And this is the “Beta” release!

Although Ambrogi talked to the leader of PreCYdent, San Diego Law Prof. Thomas Smith, and apparently discussed “proprietary algorhythms,” in my opinion he didn’t properly convey either the effectiveness of this search or how the tool might really work. (In an earlier post on Ambrogi's own blog he did point to a Youtube video from PreCYdent CTO Antonio Tomarchio.)

A look on the PreCYdent team list and firm description site has a clue.“PreCYdent search technology is able to mine the information latent in the "Web of Law", the network of citations among legal authorities. This means it is also able to retrieve legally relevant authorities, even if the search terms do not actually occur or occur frequently in the retrieved document.”

What this search engine must do is to track how often a particular case is cited to by other decisions. Perhaps there is some weighting given to cases and case citations from higher courts such as the U.S. Supreme Court. I would hope that more recent citations might also be given more weight. I imagine Westlaw and Lexis, which also have figured out how to automatically identify a case citation in a document and to link it with the original authority (see this post on West KM), could have done something like this, but thus far they have to do so.

Caveats and Thoughts

I am not suggesting that anyone stop using Westlaw and Lexis. This search, after all, only covers a limited amount of federal appellate caselaw—in most cases, that after 1950 and up to July 2007 or so (I describe the volumes covered in more detail below). That may be, by the numbers, the numerical majority of the relevant caselaw, but, most clients paying large firm rates don’t want the majority, they want it all researched.

Furthermore, case research, even with this outstanding tool, isn’t usually the best way to learn about a topic. Better to look at a good treatise written by someone who has lived with all of this caselaw and can digest it for you into useful nuggets. Finding a case through one of these searches also is no guarantee that the case is still valid. That pretty much requires the up-to-the-day checking that Shepard's and KeyCite services provide.

But this search finds the relevant caselaw within its bailiwick better than any other search I’ve seen. And, I have to note, one of the more common purposes of going to Wexis is to do caselaw research. How quickly can they adapt, and what is the additional value that they add? Is it access to treatises? Information about the validity of a case? Support from their huge staffs of attorneys?

Compared to PreCYdent and the other engines, the distinguishing feature is not likely to be ease of use.

If I was researching the top federal appellate decisions on an issue over the last fifty years, and didn't care about the last 6 months, I'd start with PreCYdent.

The Problem With The Rest--Page Numbering and The Need for Pin Cites

The problem with using the other search sites in real litigation is that they mostly lack the indications of where the page breaks are in each case.

Why is this important?

For those of my few readers who aren’t lawyers, or who have been out of law school and in the corporate world they’ve forgotten what F.3d refers to, let me quickly explain the citation system. (The rest of you Bluebookers skip this paragraph.) A basic case citation, such as Pharmastem v. Viacell, 491 F.3d 1342 (Fed. Cir. 2007) has three parts. The first is the case name, consisting of the first plaintiff or appellant on one side of the “v.” and the first defendant or appellee on the other [Pharmastem v. Viacell]. The third identifies the court [Fed. Cir., formally known as the United States Court of Appeals for the Federal Circuit] and the date the decision was issued (usually just a year). Even if someone messes up the first and third bits, however, a lawyer can still find the case and understand the citation if the all-important second element is correct. That’s because it refers to a particular page (the thirteen hundred and forty-second) of a particular volume (the 491st) of a particular series of cases (the third series of the Federal Reporter volumes that contain the published decisions of the federal courts of appeals).

The citation system was developed by reference to a set of books, but that has not diminished its usefulness in the electronic age since a cite is a unique identifier and reference to a decision.

Beyond the reference to the starting page of a case, there is almost always a “pin cite” that specifies both the volume and starting page of a case and the specific page where the quote or proposition of law dwells. Pin cites are necessary for any sophisticated legal work product. A case citation in a formal brief without a pin cite would stand out like a sore thumb and would reflect poorly on the author and firm that submitted the brief. Lawyers care about pin cites because they make checking the reference much simpler. Cases often have sections that are not germane to your particular topic, and having to slog through that material would make a hard job harder.

I can imagine a world where references to the case paragraph number of a quote or legal point would be acceptable and sufficient. Paragraph numbers are even more specific than page numbers, after all, and they can be automatically embedded in a decision based on format recognition. We are not there now, however, and I don’t see how that transition would come about. The books judges refer to don’t even have the paragraph numbers, except in the rare instance where a judge numbers his findings of fact and law as in a bench trial ruling, and decisions can run to hundreds of paragraphs.

Altlaw was a pleasure to search. It had a friendly interface, nothing unexpected happened when I explored the basic features, and it returned some high-quality results. My search for “in personam jurisdiction” yielded 1403 results, with the top case being "Kulko v. California superior Court, 463 U.S. 84. Compared to PreCYdent, the results dropped off rapidly in relevance after that, but, not too badly. The U.S. Supreme Court cases even had pagination, although there was no case pagination in other appeals court cases. The Supreme Court pagination was a little bit hard to read as there was no distinguishing font or other graphical element indicating a page break. Its Advanced Search Options, providing access to the Boolean and proximity searching so beloved by hardcore Lexis and Westlaw users, are nicely laid out and fairly easy to use.

Its federal statute search is elegant and native to the application.

Altlaw “is a joint project of Columbia Law School’s Program on Law and Technology, and the Silicon Flatirons Program at the University of Colorado Law School.”

The Public.Resource source does not have a search engine built in. If you drill down, you can find a particular case by clicking on “Courts” and then “c” and then drilling the right volume of the Federal Reports, marked as F1, F2, F3, and “US.”. The resulting “hit” is a view of the text of the case, without any of the commentary, Key System numbers, or page breaks that appear in the original West reporters.As noted in Ambrogi’s article, it is possible to search this database in Google using the "domain" advanced search feature.

Type in "" plus your search. The scope of the caselaw is impressive though not comprehensive. The “last” federal appeals court case by volume and page is Pharmastem v. Viacell, 491 F.3d 1342 (Fed. Cir. 2007). The most recent case in that volume dates from July 30, 2007. With Thomson having issued volume 508 (now 510) in the F.3d series , that's about a 17-volume lag. This site is probably more useful to creators of sets of caselaw than it is to practitioners, because the process of getting to the cases is fairly laborious and the relevancy can’t (yet?) take advantage of Google’s page-ranking algorhythms.

The Public database contains no references to page numbers except for the first page of the case.

Public Library Of Law

Compared to the other two entries, the Public Library of Law entry was clunky, brazenly “commercial,” and promised some features it did not really have. The full text search was quite fast, but the “advanced” search didn’t measure up to that name. I was solicited to upgrade to the $95/ month FastCase service at every turn, which I suppose might be less annoying if I was getting really good results.

But this search had the fewest and worst-ranked results. There were 487 results for “in personam jurisdiction.” There was a relevancy ranking allegedly starting at “100%” for my first case, U.S. v. One Lear Jet Aircraft, 836 F.2d 1571 (11th Cir. 1988). Compared to Altlaw, let alone PreCYdent, I was unimpressed.

When I clicked on "Search Federal Rules of Appellate Procedure," it tries to download the .pdf to my computer. Worse, it uses an old version that doesn't include the amendments that were effective December 1, 2007 (Fed. R. App. P. 25 was altered to implement the privacy provisions of s 205 of the E-Government Act of 2002 (having to do with Social security numbers and so forth in court filings)).

The “Federal Regulations” tab puts the GPO website containing the federal regulations in a window pane so it sort of looks like you're still in the PLOL web site.

The only advantage of this site would be if you signed up for the FastCase service, you could then link to the state caselaw that is allegedly available there. In short, stay away from this one.


thomas said...

Hi David

Thanks for the kind words!

You express curiosity about our technology, so allow me to tell you a bit more about it. Obviously the details are proprietary, but I can give you the general idea.

Right now, all the other legal search engines are natural language based. They decide if a case is relevant based on what words are in the case, how often, and so on. This is potentially quite powerful technology, and Westlaw natural language search especially shows it can be quite successful.

But it has its limits. PreCYdent takes a different approach. We mine the information in the legal citation network, using metrics developed by computer scientist and MacArthur Prize winner Jon Kleinberg, that measure the authority of a case in the network of legal citations. However, we discovered that these measures don't work so well by themselves, so we modified them to adapt them to the idiosyncrasies of the legal citation network, which is unlike the Web in important respects. We also measure several different dimensions of the network around each case and combine all these measures together in a unique way. It's fair to say there is a lot of number crunching behind each search, but we have figured out ways to make it manageable on limited computational resources.

Our tests indicate our search results are (roughly speaking) about four times better than Westlaw natural language search (the best after us) in terms of recall and precision, standard measures in the search industry.

I hope that helps some! Feel free to email me anytime with feedback, comments etc at tacsmith at precydent dot com. That goes for your readers as well.
Tom Smith
PreCYdent Inc.
(& law prof, USD)

Unknown said...

Hi David

We just want to let you know that we upgraded portal.
Now you can find most recent cases of federal and states jurisdictions and we also assigned official reporter citation codes to about 200 k States opinions.

Thanks for your attention and interest

Best regards