The Recommendation

| | TrackBacks (0)
snippet.png
Each Persai recommendation looks a lot like an everyday search result.

The basic concept behind Persai.com is that you can "accept" or "reject" the content that is pushed to you.  Accepting a recommendation means you'd like to see more of that kind of content.  Rejecting a recommendation means you don't want to see more of that kind of content.

When you click through on the title of a recommendation that is implicitly taken as an acceptance of the content.  That is fed as positive signal to our machine learning algorithm.  Clicking on the red "reject" button at the bottom of the recommendation has the opposite effect and provides our system with negative signal.

So how does your Persai experience improve?  Use it.  Read the articles that are interesting to you and explicitly reject the irrelevant ones.  Don't fret over trying to manage the training of your interest; just use the thing. 

There is not a one to one correlation between positive and negative signal.  The reason I mention this is to prevent users from thinking that every accepting click through is equal to a rejection.  It doesn't work like that so read to your hearts content, reject duds and we'll figure out the rest.

Finally, I'd like to explain the other features of the recommendation.  The green text on the bottom left displays the domain name of the article.  It is also a direct link to the article.  The key difference between this domain link and the title link is that the domain link doesn't register an acceptance in our system.  It serves as a way to obtain the raw link to the recommendation so you can send it to a friend or paste it in an email.  The green feed icon to the right of the domain is a link to the RSS/Atom feed we found the particular article in.  If you don't know what the terms RSS and Atom are that's okay because it doesn't really matter.  Just enjoy the cool green icon.

What Is Persai? Part II

| | TrackBacks (0)
In December of last year, I wrote a post explaining a little bit about Persai.   I talked about Persai as a content recommendation system that learns from user feedback and can better recommend content.  We have since launched this as a web service in private beta, and are steadily inviting more and more people who signed up for an account.

This, however, is not the end of the story.  The next chapter, the one that starts today, is advertising.

The obvious way to "monetize" (turn a profit from, in Silicon Valley speak) a web service is to sell ad space.  Usually, when you want to do this, you need to go to one of the big players: the Googles, the Microsofts, the AdBrites.  We, however, have decided to go it alone.

Our system is designed to recommend textual content.  That content can be almost anything: a news article, a blog post, or an advertisement.  Starting today, we will be showing contextual product ads inline with recommended content.

For example, this is a screenshot from my Persai interest about "US Economy":


us_economy_ads.png
As you can see, the recommended books are very relevant to the context.  This targeting is done solely by Persai's recommendation system.  As a consequence of this, Persai will learn to show you better ads as you use it.

There is still more to this story, but those chapters are still in active development.

Demand

| | TrackBacks (0)
Wow.  The past few days have been kind of crazy.  My inbox is flooded with invite requests and email signups on Persai.com continue to roll in at an accelerated rate.  One thing is clear: people want to give Persai a try.

We have spent the past few days slowly expanding the beta and carefully watching server logs.  User feedback has been great so far and we've been able to fix the inevitable bugs that come with rolling out a production service.

I know there are thousands of you patiently waiting for invite codes.  We're working our hardest to handle the growth and scale this machine learning problem.

First Batch Of Invites Sent Out Tonight

| | TrackBacks (0)

Persai is finally ready for public consumption, albeit on a limited basis. We are sending out the first batch of invite codes tonight to a subset of the people who signed up at persai.com. This is a beta release, so there will be new features rolling out over the next few weeks.

If you are one of the people to get an invitation, we would appreciate all the feedback you can give us. More and more invitations will be sent in the coming days, so if you didn't get one tonight, sit tight.

It's been a long road to this point. Matt, Kyle, and myself are all excited to finally have people beyond our friends and family using Persai. We hope you enjoy it.

Coming In January 2008

| | TrackBacks (0)

You heard it here first. Persai will start its public beta in January 2008.

Most of it is done, but we still have a couple of features to add. First and foremost, we are restructuring some of the data flows in the backend, as we have come to the stark realization that MySQL will beg for mercy when there are a nontrivial number of users in the system (Persai throws around a lot of data).

Secondly, we are implementing clustering, or, in common parlance, "related stories". The theory is that you don't want to read 30 stories about how Facebook flubbed the Beacon release; one will do. Of course, if you really are that interested, you can see all 30 stories, but we won't spam you with them.

So, we hope you all will have a good holiday & New Year's celebration. We'll meet you on the other side with an intelligent content aggregation service.

What Is Persai? Part I

| | TrackBacks (0)

There has been some speculation about what Persai is.

Well, I'm going to set the record straight, sort of. We're not ready to launch it yet, as the user interface is going through a few iterations and there are one or two features that are still failing JUnit tests, but I'll show you the idea.

Persai is a content aggregator that is specific to your interests. You specify a topic that you're interested in with a few words, and Persai will find new content relevant to that interest and recommend it to you. As you use Persai, it learns about you, and can better recommend content to you. Recommendations are based entirely on content, other users' feedback has no bearing on what Persai recommends to you. As an example of this, we have taken two "interests" in Persai, stylized them, and put them on their own domains.

These two sites are examples of content that you would see as a Persai user who is interested in Facebook and Apple. Of course, these sites are just for testing our classifiers; the persai.com consumer site will be much more interactive. Personally, I have been using it to follow news about the American subprime mortgage lending crisis and global warming.

I should note that the hardest part about creating these two sites was the CSS. Persai's classifiers were built in a few seconds, and in these cases, required no feedback to recommend relevant content. We've spent a lot of time on our bootstrapping process so that your recommendations are very relevant from the get-go.

We'll talk more about it and post some screenshots as things progress. It's in private beta right now, but we should be expanding the beta test in the coming months.

What Makes A Sentence?

| | TrackBacks (0)

If you do a standard Google search, you see a little snippet of text under each document title. How do they do that? I have no idea, but here's how we did it at Persai:

  • Step 1: Crawl some web pages. Protip: use Heritrix
  • Step 2: Parse the DOM of a web page. Considering some of the horrific HTML that's out there, this isn't as easy as it sounds. If you're in Java land, try HtmlCleaner. It will give you a clean DOM and is very configurable.
  • Step 3: Once you have all of the text nodes of the DOM, you need to figure out what the English sentences are. OpenNLP does an OK job at this, but it won't validate sentences as grammatically correct. As such, you can still get wonderful things like section headings in your sentences. However, many of the sentences it produces are correct.

So if you've got a list of strings, some of which are valid sentences, some of which are not, how do you make the distinction? Well, you could volunteer your time to the OpenNLP project, but if you're not a linguist (fun fact: Persai doesn't have any linguists...yet), that doesn't really work.

Fortunately, we're a machine learning company, so we did here what we do best. Given a few thousand strings that are either sentences or not sentences, two guys, an afternoon, and a case of beer, we had a few thousand hand classified training samples. Once we got these, we trained a classifier with the right feature set (secret sauce: sorry, can't give that away), and got a classifier that's in the ninetieth percentile for accuracy in k-folds validation.

Score one for classifiers.

We've been using Hadoop MapReduce extensively at Persai, and debugging MapReduce tasks has always been a problem area. Our best approach so far has been to tail log4j files. Today we tested a large processing job with around 10,000 records of input, and the reduce task ran slow: much slower than expected. Tired of deciphering log statements, we decided to search for a better solution.

Success came in the form of YourKit, an awesome tool for Java profiling. You can find screenshots and more information about YourKit's features on their website. Suffice it to say, YourKit is easily the best Java profiler I've ever used, and a must-have tool for any serious Java developer. A personal license is only $99, and commercial licenses are reasonably priced as well.

When it comes to MapReduce, YourKit saves the day by allowing you to profile applications running on remote hosts. To configure your jobs for profiling, add the following line to your hadoop-env.sh (with the appropriate paths and platform for your deployment):

export LD_LIBRARY_PATH=/home/someguy/yjp-6.0.16/bin/linux-amd64
Then add the following property to your hadoop-site.xml:
<property>
  <name>mapred.child.java.opts</name>
  <value>-agentlib:yjpagent</value>
</property>

There you have it. After launching a MapReduce job, fire up the YourKit UI and select the remote profiling option. You may be surprised what your MapReduces have been doing.

Welcome to our startup

| | TrackBacks (0)
Hi, we're Persai, a fresh new internet company based in the Bay Area.  Persai stands for "personal ai" where AI stands for Artificial Intelligence.  Our goal is to become the matching and recommendation technology of the internet.  Our focus is on leveraging machine learning techniques to improve consumer and business user experiences with large corpora of data.  We live in a world of too much information and we're trying to create the solution.

The three of us are finally working on Persai full time after 7 months of development.  Stay tuned to this blog as we detail our development experiences and reveal our product.

The Team

{kyle,matt,ted}@persai.com

Beta

Find recent content on the main index or look in the archives to find all content.