Building Data Driven Products with Ruby

In the talk "Building Data Driven Products with Ruby" presented at RubyConf 2012, Ryan Weald, a data scientist at Sharethrough, explores how Ruby can be used to create data-driven products. He highlights the growing relevance of big data in the developer community and outlines a structured approach to building these products, emphasizing Ruby's role in each phase of the process.

Key Points:

Understanding Data Driven Products:
- Data driven products leverage data to enhance business performance, seen in applications from user behavior tracking to ad targeting.
- Examples include Google AdWords, LinkedIn’s suggestions, and GitHub usage analytics.
Development Cycle for Data Driven Products:
- The cycle consists of four steps: asking the right questions, collecting and cleaning data, building predictive models, and publishing results.
- Emphasizes the iterative nature of the process rather than a linear approach.
Ruby's Integration in Data Science:
- Ruby is not the only tool available; however, it integrates well with others like Python, R, and Java for different stages in data handling.
- It excels in data collection and cleaning, which often accounts for 90% of a data scientist's time.
- Uses tools such as Nokogiri for parsing, REST Client for API interactions, and Rails for web applications.
Statistical Modeling in Ruby:
- While Ruby may not be the best for statistical calculations, it offers libraries like stat_sample and libsvm that facilitate statistical modeling tasks.
- The speaker notes that Ruby can leverage Java-based libraries through JRuby for more complex analytics tasks.
Publishing Results:
- The final stage involves sharing insights through web applications, and Ruby on Rails is effective for creating dashboards and user interfaces.
- Concludes with the importance of optimizing Ruby for better visualization and machine learning capabilities.

Conclusion:

Weald highlights the importance of Ruby in data-centric roles within businesses, reinforcing that despite its limitations, Ruby, combined with the right tools, can be a powerful ally in the realm of big data. He encourages the Ruby community to improve available libraries and increase Ruby's visibility in academia to foster wider adoption for data science applications.

Ryan's talk concludes with a call to action for those interested in pursuing data projects using Ruby, inviting collaboration and further discussion.

Building Data Driven Products with Ruby
Ryan Weald • Denver, CO • Talk

Date: November 01, 2012
Published: March 19, 2013
Announced: unknown

Big data and data science have become hot topics in the developer community during the past year. This talk will show how ruby is used to build real data driven products at scale.

Data scientist Ryan Weald walks through the building of data driven products at Sharethrough, from exploratory analysis to production systems, with an emphasis on the role Ruby plays in each phase of the data driven product cycle.

He discusses how Ruby interacts with other data analysis tools -- such as Hadoop, Cascading, Python, and Javascript -- with a constructive look at Ruby's weaknesses, and presents suggestions on how Ruby can contribute more to data science in the areas of visualization and machine learning.

RubyConf 2012