05-04-12 | Blog Post
It is just what it sounds like – an immense amount of data. From social networks to genomics to medical records, big data is everywhere and rapidly growing. Technology must adapt and advance in the management of big data – otherwise these large data sets would be rendered useless without the capability to efficiently analyze and produce results. Federal agencies have announced $200 million in research and development investments that will allow them to mine, process and store big data.
The National Cancer Institute is funding a $10.5 million project managed by UC Santa Cruz for a supercomputer that will store the genetic codes of malignancies from 10,000 patients with the intent of revealing mutations that trigger uncontrolled cell growth. The Cancer Genomics Hub (CGHub), said to be the world’s largest repository for cancer genomes, will sift through the large amount of data attempting to find gene mutations that cause tumors and make it easier to make cross-dataset comparisons – significantly accelerating the time it takes to analyze and produce results from data sets.
To get an idea of why big data is so big – according to the Oakland Tribune, each tumor’s DNA record is 300 billion bytes (1 gigabyte), which has to be compared to a normal genome (billions of bytes), plus the sequence data from RNA – all adding up to nearly a terabyte for each case.
Not only does big data have major implications for scientific breakthroughs, the aggregate and analysis of healthcare data sets can improve patient care. Digital records stored in electronic medical record or electronic health record systems (EMR/EHRs) can be mined to detect patterns in care. These patterns can help advance the healthcare industry by assisting in the automation of processes in the workflow of patient care, and get the industry up-to-speed with the technological advancement of other industries.
Hospitals and healthcare software companies also need storage-intensive hosting solutions for systems such as PACS (Picture Archiving and Communications Systems) that store and process medical imaging, including X-rays, MRIs, CAT/CT scans and others. A high-capacity HIPAA cloud with a managed SAN (Storage Area Network) can offer a scalable solution to healthcare’s big data needs.
Social media involves the countless amount of user-generated data collected from various sources, including mobile phones – demanding an intelligent way to manage and analyze the content. DataSift is a U.K.-based startup launched to handle the vast amount of social media data by analyzing feed data based on pairing related quantifiers and keyphrases.
The company intends to take monitoring and data analysis to measure the level of intent-to-buy to help sales teams and companies build financial models based around customer conversations. The last week of April was even declared Big Data Week by the Head of Client Services at DataSift and sponsored by Oracle and EMC, with meetups and communities in three countries to discuss big data innovations and startups.
While brands have been long tracking social media for mentions and support-related issues, entrepreneurs are taking it a step further by developing new and more meaningful ways to analyze big data in social media to shape and influence business decisions.
Twitter recently announced its plan to team up with UC Berkeley School of Information to develop and teach a class entirely about analyzing big data, aptly named, Analyzing Big Data with Twitter. The course description details the topics, including applied natural language processing algorithms such as sentiment analysis, large scale anomaly detection, real-time search and more. Students will get advising from Twitter engineers on programming-intensive projects that include building apps and social media data analysis.
Beyond the hype, big data has the potential to put hard facts and real figures behind scientific research, business development and healthcare management.
Cancer Genome Data Center Raises Hope for Cures
DataSift Exploits Big Data for New Insights Into Customers
Twitter Teams Up with UC Berkeley to Teach Students About Big Data
White House Launches Government-Wide Investment in Big Data
World’s Largest Hub for Cancer Genomes Opens
Cancer Genomics Hub – UC Santa Cruz