Not that this is new news (Cray’s business model has been to cater to the needs of three letter US agencies).
Though it interesting, that even Dept of Energy recognize the need to build CPUs for higher fault tolerance / resiliency and that going forward it will become crucial. Could not agree more - it is only a matter of few years when laptops and desktops will have anywhere from 80 - 200 cores that would have another magnitude of threads on top of them. Recovery and resilience will become crucial and probably best handled at the silicon level.
Another interesting notion is better power consumption - probably a good thing, though it is surprising that the government feels the need to push manufacturers in that direction. More than funding, there will be a need for technological breakthroughs to better the power economics of the current chips, scaling via cores alone can only take one so far. Perhaps as the interview alludes to, merging of permanent storage with computational space (with phased memory for instance) is one of the levers.
Is R going to displace Excel as the defacto modeling tool in modern business?
5-6 year seems like an eternity. Though many may not realize it, there have been huge strides made in technological horsepower that is available at our fingertips.
For instance back in 2005, this would have been deemed as a high end workstation:
Pentium 4 @ 3.2GHz w/2Gb of DDR RAM (and if necessary 10k rpm HDD)
In fact, the predominant usage of such machines would have been in the graphics department, where Adobe’s suite would make use of all the floating point and multiple cores it can get.
Oddly, enough those, days all analytics, leveraging likes of SAS or SPSS, would be run on UNIX servers or as in the case of large clients, on old iron (aka mainframes). Sure they had PC versions of their flagship programs, though those would be used for algorithm development (akin to expensive IDEs) and serious work or production usage happened on traditional servers.
So what has changed?
Couple of things:
Computing power has come a long way. There was a presentation made by EMC at one of its keynote speeches which captured how far things have progressed. Since 2005, there has been 20x (that is, twenty!) fold increase in computing power. Admittedly, one has to be using software optimized to make use of the cores through parallelism and other tricks.
Availability of open source solutions like R and their use in universities has essentially fueled the usage of advanced statistics and mathematical constructs like never before. In most corporate environments, the analytics tool tends to be Microsoft Excel and the odd plug-in. Now, for experimentation and investigation, all one needs to do is download and use R. Though there are gestapo like IT policies inforce in most corporations, R is fairly self contained and do not require the user to have administration privileges. In certain analytics heavy professions such as actuarial sciences, there are textbooks on usage of R for common problems.
Advent of cheap consumer grade SSD has changed the equation more than people realize. Popularized by Apple through the Airbook, it allows one to have the IO throughput in their laps that most enterprise SANs struggle to achieve (aside from critical Tier 1 application, most NAS shares for office use are slow…). This aids in the integration and loading of disparate data sets, while aiding prototyping by reducing the time for analysis runs. Most analytics related jobs are bound by the IO time taken, rather than CPU time in the initial development phase.
Through the efforts of Amazon and others, IaaS (Infrastructure as a Service) has matured to a point where users, when the time is right, can tap into HPC power and run applications like R, Datameer on a scale that most corporations would struggle to match using internal resource constraints. And given the sporadic need for such resources, it would be questionable as to the business rationale, especially if one can lease those from a third party like Rackspace or Amazon on an as needed basis.
Nowadays, most desktops and laptops easily have 4Gb RAM if not 8Gb. Coupled with SSD drives, one can easily build fairly comprehensive prototypes using tools like R. Once ready to deploy, it is only a matter of leveraging Amazon EC2 like infrastructure to gain the scale in terms of computational power.
All this allows one to use their desktop or high end laptop to build and validate complex models using data sets that a few years ago would have needed formal IT support and infrastructure.
It will not be surprising to see in the next few years increasing sophistication and advancements in how we use data to support business decisions with models aiding as lens to shape our perspectives. This would also imply higher expectations of those providing such organizations advise and expert opinions. Clients will begin to expect analytics infrastructure to support comprehensive data sets that consulting firms can leverage to enrich their internal information to develop insights and models for ongoing inference. Tools like R will become the norm in businesses just as Excel today, is the defacto lingua franca for business models.
[Day 2 of the ABA conference] It was interesting to note how attendees and some of the speakers were dealing with advancements in modern payment networks. One of the speakers were discussing Amex’s Serve platform and the implications of digital wallets in terms of compliance and money laundering.
Most of the statutes are geared towards physical movement of funds and what may be termed best as ‘managed point to point transfers’ such as wires and International ACH. Case in point is the CMIR form. Once the digital wallet is actually associated to a physical asset such as a mobile phone, the relevancy of CMIR is somewhat diminished in this new paradigm. The monetary value is dynamic as well as the association to a physical device.
Some of the audience members were uneasy - it is new territory for most of them, whose background is in legal or investigative sciences, than technology. Indeed, in a decade, it would not be surprising for compliance officers to have a strong technology background augmented with operational experiences for context.
As mobile payments gain traction and the criminals leverage increasingly sophisticated technologies to funnel illicit funds, the battle will move from what has been mostly a physical domain to one where virtual and physical worlds merge. This has been demonstrated by recent cases involving use of virtual currencies (e.g., eGold) coupled with more traditional structuring approaches.
Law enforcement is beginning to acquire / build tools for this new paradigm, though much progress is needed before victory can be declared.
Advanced Analytics and Transaction Monitoring in Money Laundering Enforcement
I am currently attending the annual ABA / ABA Money Laundering Enforcement Conference in Washington DC.
An interesting theme this year is the validation of the monitoring systems and ‘models’. It is interesting that regulators are building the talent and wherewithal to validate models developed by financial institutions (mostly banks) to detect anomalous behaviour warranting further investigation.
Though most such models are really fancy means of pattern classification and peer benchmarking, it will not be surprising to see some institutions having to address weaknesses. Ironically, it is not going to be some complex Bayesian inference logic, but more mundane (though much more difficult to resolve) issue like data from source systems.
In addition, the regulatory bar is rising, with regulators expecting testing to consider the black box nature of some of the vendor offerings and seeking details on the logic behind the rules (for the ‘white box’ versions).
It will not be surprising to see that the supporting IT departments will be expected to build processes and enhance their methodology to test, validate and enhance models.
In addition, once the foundational issues of data sourcing and quality are surmounted, it would not be surprising to see sophistication of models to increase. The evolution would be similar to that witnessed in the insurance marketplace with pricing and rating models.