Even the highest scale contemporary conventional HPC system architectures are optimized for the basic operations and access patterns of classical matrix and vector processing. These include emphasis on FPU utilization, high data reuse requiring temporal and spatial locality, and uniform strides of indexing through regular data structures. Such systems in the 100 Petaflops performance regime such as the Chinese Sunway TaihuLight, and the US CORAL Summit and Sierra to be deployed in 2018, in spite of their innovations, are still limited in these properties. Emerging classes of new application problems in data analytics, machine learning, and knowledge management demand very different operational properties in response to their highly irregular, sparse, and dynamic behaviors exhibiting little or no data reuse, random access patterns, and metadata-dominated parallel processing. At the core of these “big data” applications is dynamic adaptive graph processing, which is in some ways diametrically opposite to conventional matrix computing. Of immediate importance is the need to significantly enhance efficiency and scalability as well as user productivity, performance portability, and reduced energy. Key to success is the introduction of powerful runtime system strategies and software for the exploitation of real-time system information to support dynamic adaptive resource management and task scheduling. But software alone will be insufficient for extreme-scale where near fine-grained parallelism is necessary and software overheads will bound efficiency and scalability. A new era of architecture research is beginning in the combined domains of accelerator hardware for both graph processing and runtime systems. This paper will discuss the nature of the computational challenges, examples and experiments with state-of-the-art runtime system software HPX-5, and future directions in hardware architecture support for exascale runtime-assisted big data computation.