How do we get more women into Tech | Data ?

March 13, 2019

November 13, 2017

I am very excited to announce that Data! Data! Data! has several discount codes available for Data Events across the UK & Europe.

Having been running lots of events across the year with AI Europe, Unicom, Data! Data! Data! Meetup and the Weekend University we have managed to network lots of Data Lovers!

Feel free to click the link, enter the code and enjoy a discount on me! (or email me for discount)

Data Science, AI and Deep Learning - November 16th - Novotel London West - Email: mstevenson@nakamalondon.com for 30% Discount!
AI Europe - November 20-21, 2017 - Queen Elizabeth II Conference Centre - Enter Discount Code - AIE17NAKAMAZP for 20% Discount
Data Visualisation Summit - December 6th - Brussels - Email: mstevenson@nakamalondon.com for 30% Discount! (5 Free Tickets Available)
Data Visualisation Summit - December 13th - Manchester - Email: mstevenson@nakamalondon.com for 30% Discount! (5 Free Tickets Available)
And finally if you fancy being a Podcast Mogul you can join the Weekend University's A Crash Course on Podcasting held at Birbeck University on November 25th - Enter Discount Code: nakama10

I'll see you there @TheDataAgent

@TheDataAgent ! @TheDataAgent ! @TheDataAgent !

July 6, 2016

The Futures Bright The Futures Nakama!

May 12, 2016

It's with great pleasure that I've started my new role at the leading Global Digital Recruitment business NAKAMA. They say a change is as good as a rest so I'm excited to be joining NAKAMA well rested and ready to go. The Data market in London is booming as is New York so I'm looking forward to continuing to help talented data professionals secure cool jobs with the best businesses in Data globally!

If you are interested in learning more about what NAKAMA do or how my team can help you in your career or team growth drop me a line on: mstevenson@nakamalondon.com or call me 07814 397 783 / 0203 588 4572

Think Digital ! Think Global ! Think Nakama ! — Think Digital ! Think Global ! Think Nakama !

SALUTE - Disrupting Data Science

April 28, 2016

Imagine teaching a person a new magic trick. It's a simple enough formula, you show them how it's done, let them have a go and correct them when they make a mistake. Practice makes perfect. Now imagine, that instead of being able to show them the trick, you could only show them low quality photos of it being performed. These photos have no description on how the trick is done and barely show enough features to see the trick in action.

Our Latest Blog post comes from Richard Javis - Head of Cyber Analytics Engineering

This is the problem that is plaguing data science. We all know that data science is all the rage - the hottest job on the block. People train in complex, mathematical disciples to apply cutting edge algorithms to glean new insights from data. The problem is, before the learning can take place, the low quality input has to be improved. This is of immense frustration to business stakeholder, data scientists and engineers who spend much of their expert time performing monotonous data cleansing work.

The new open-source project Salute attempts to address some of this challenge and give data scientists some of their time back to focus on what's really important.

Salute's goal is to take any type of file (video, audio, image or text) and to recognise, analyse and transform it so that it is ready for consumption by Machine Learning routines. This involves recognising delimiters, data types, data distributions, categorical variables and much much more. The Salute framework is built in a way that allows new derivations from the data to be added easily so that experts can contribute in their own area of expertise.

To achieve all this on even the largest files, Salute is based on Apache Spark and so can integrate with existing clusters and use their processing power. Using Java means that contributing is straight-forward and is open to many coders.

Salute is a new project, much is immature. If you feel you can contribute - please do here .

We've Hired a Unicorn Data Scientist...Now What?

February 4, 2016

Everyone wants a Data Scientist. It's cool, it's sexy, it's the job of the century but what do they actually do?

Every day I speak with clients eager to grow out their Data Science function hopeful that in hiring a “Data Scientist” their business will become enlightened and pass into commercial nirvana. It’s not always so easy.

All too often once the company have been successful in hiring the Ivy League “Unicorn” with cutting edge Machine Learning Algorithm development skills they hand them a bunch of dirty data and say: “Can you clean that please?”

When the Angel Round funding lands the Data Scientist is the “go-to” trophy hire, an injection of sexiness that will bring about the “Uberisation” of your disruptive Bricklaying App.

Sometimes it pays to have a horse before you buy a cart. Or is that a cart before the horse? It’s a chicken and egg situation. Perhaps buy lots of Chickens and Horses and somehow the cart will learn to pull itself. That’s Machine Learning in an eggshell.

Next Generation Data Architecture

November 10, 2015

Kevin Schmidt - CTO @ Century Technology gives his views on the Next Generation of Data Architecture

One of the benefits of coming to a greenfield job – like when I joined Mind Candy two years ago – was that you can jump several technological steps ahead as you don’t have any legacy to deal with. Essentially we could build from scratch based on lessons learned from traditional data architecture. One of the main ones was to establish a real-time path right away to avoid having to shoehorn it in afterwards. Another was to avoid physical hardware. And the most important one was to hold off on Hadoop as long as possible.

The last one might seem surprising, isn’t Hadoop the centre-piece of a data architecture? Unfortunately it creates a lots of admin overhead and it might be a full person’s (or more) workload to maintain. Not ideal in a small company where people resources are limited. AWS S3 can fulfil most of the storage function but requires no maintenance and is largely fast enough. Also while HDFS is important and will probably come back for us soon, MR1 or YARN is just not – there are better and more advanced execution systems that can use HDFS and we used one of those: Mesos.

Mesos is a universal execution engine for job and resource distribution. Unlike YARN it can not only run Spark but also Cassandra, Kafka, Docker containers and recently also HDFS. That works because Mesos just offers resources and let the framework handle the starting and management of the jobs. This finally breaks the link between framework and execution engine: in Mesos you can run not only different frameworks but different versions of the same framework. No more waiting for your infrastructure to upgrade to the latest Hadoop or Spark version, you can run it right now even when all your other jobs run on older versions. Combine that with a robust architecture and simple upgrading and Mesos can easily be seen as the successor to YARN (for more details on why Mesos beats Yarn, see Dean Wampler’s talk from Strata).

For the real-time path the obvious processing solution is Spark Streaming (so we have a simpler code base) running on Mesos with Kafka to feed data in and with Cassandra to store the results. You now have a so-called SMACK stack (Spark Mesos Apache Cassandra Kafka) for data processing which the Mesos folks call Mesosphere Infinity for some reason (aka marketing).

The last bit of a data architecture is the SQL engine. Traditionally this was Hive but we all know Hive is slow. While there are several open-source solutions out there that improve on good old Hive (Impala, Spark SQL) in the end we decided on AWS Redshift. It’s a column-oriented SQL-based data warehouse with PostgreSQL interface which fulfils most of the data analysis and data science needs while being reasonably fast and relatively easy to maintain with few people.

The resulting architecture looks like the above picture. We have an event receiver and a enricher/validate/cleaner, which were written in-house in Scala/Akka and are relatively simple programs using AWS SQS as a transport channel. The data is then sent to Kafka and S3. Spark uses data straight from S3 to aggregate and put the processed data back into either Redshift or S3. On the real-time side of things we have Kafka going into Spark Streaming with an output into Cassandra.

What can be improved here? HDFS is still better than S3 for certain large scale jobs and we want to bring it back running on Mesos. Redshift could be replaced with Spark SQL hopefully soon. All in all the switch from tightly coupled Hadoop to an open architecture based on Mesos allowed us to have an unprecedented freedom as to which kind of data jobs we want to run and which frameworks to use, allowing a small team to do data processing in ways previous only possible on a large budget.