In this age of social, you might have heard the term social graph bandied about. But, what is it and how is it used? Simply put, a social graph is your online social footprint. More specifically, it is the historical relations between your connections, locations, items and establishments.
Although social interrelation research has been done for years in research institutions and government agencies, the application in the business world is relatively new, partially driven by social sites such as Foursquare, Twitter and Facebook. These sites are using your social graph to determine interests and needs for further tailoring as well as advertiser targeting.
Business implications:
Is social data powerful? Read the article, “Facebook Can Predict with Scary Accuracy If Your Relationship Will Last,” to find out just how powerful it is when stored and analyzed properly. With a wealth of information and applications at their fingertips, social sites have the ability to customize the experience to your needs, interests and life-stage, thus increasing engagement. These insights may also be used for advanced advertiser targeting, which, in turn, help them command a premium in advertising fees.
Although social sites predominantly are driving the charge on the business application of social data, there are plenty of business (and government) applications that could kick-start the evolution in the coming years. Those include:
- Telecommunications: These companies collect and manage data in a similar fashion as social sites. They know who you are connected to, where you travel, and even your interests, based on locations and other means. Telecommunications companies may use this data to enhance engagement and retention strategies to ensure that you are a long term customer.
- Media Web Sites: Fantasy sports is extremely popular in the United States. It is estimated that more than 20% of the male population between 18 and 50 participate in fantasy sports annually (have you ever filled out a March Madness bracket?). These fantasy games generate significant advertiser revenue for media sites and the revenue is directly related to the number of fantasy players on the site. Understanding the leagues and player relations allows these sites to focus efforts on the primary influencers.
- National Security Agency (NSA): Ever hear of Edward Snowden? I’m not much into conspiracy theory, but I’m confident that the NSA may have the most in-depth social graph on a segment of the population. The NSA uses this data, among other means, to identify and locate criminals and terrorists.
Storing Graph Data:
Knowing your interests is valuable, but knowing the evolution of your social group, interests, and interest evolution over time, is far more so. It is this historical view of you, and your resulting graph, that allows social sites to not only understand current interests and recent changes in interests but also gives them the ability to predict your future interests and actions.
Graph databases are designed for highly connected data that does not fit a relational schema well and performance needs must be considered in distributed NoSQL systems. Several graph databases exist that allow for the efficient storage of inter-relational / graph data, some of which are:
- Neo4J: The most widely used graph database. A free community edition is available and external ‘community’ support exists. An R (open source analytic solution) plug in is available so you may analyze Neo4J data with R. If you need a pure graph database add Neo4J into your evaluation list.
- OrientDB: This document database is similar to MongoDB, with Graph database extensions. OrientDB currently does not have an R interface package (like Mongo’s R interface, RMongo), but it does support SQL-like queries, allowing for a quick analyst learning curve.
- Apache Giraph: A full open source Apache solution, Giraph is the system that Facebook uses for its social graph. The primary advantage with Giraph is that it is implemented within Hadoop clusters and integrates well with quickly evolving SQL querying solutions such as Hive and Impala.
Analyzing Graph Data:
Analysis of Graph Data requires specialized data transformation and analytic techniques. A few examples include:
- Social Network Analysis (SNA), specifically applied to people networks: This type of analysis determines the interrelations between individuals and groups. For more detail on social network analyses, see this recent SNA blog post.
- Predictive Models: Your historical transactions and interests, social network makeup, and travel habits allow companies (and agencies) to predict future interests as well as locales.
- Social Segmentations: Graph data may be analyzed to determine segments or cliques of a population that have similar connections, interests and/or history. These cliques, and clique overlaps, may be further analyzed to determine the structure of the group. This has been done by police and government agencies to decipher the structure of criminal organizations.
Remember, you leave a social footprint anytime you’re online and that resulting graph data is incredibly valuable to social sites and brands alike. If used properly, the data could keep you engaged whether you like it or not (think Candy Crush!). Your social graph may even be used to hunt you down if you are a known criminal or terrorist.
There is good reason why Facebook called its social graph its third pillar. The research institutions, social sites and government agencies are predominantly driving the application of this social data but other industries will benefit as well. Be fair warned!
Roman Lenzen is Director of Analytics at Quaero.