Open Data Science Conference, Virtual and In-Person | October 27th – 30th, 2020, Natural Language Processing Track
Virtual and In-Person | October 27th – 30th, 2020
Natural Language Processing Track
Learn the latest models, advancements, and trends from the top practitioners and researchers behind NLP
Conference Website
AGENDA
https://live.odsc.com/
Thursday – 10/29/2020
09:00 AM – 10:30 AM – ODSC Keynotes
10:30 AM – 5:30 PM – ODSC Hands-on Trainings and Workshops
10:00 AM – 4:30 PM – Partner Demo Talks
10:30 AM – 5:00 PM – Breakout Talk Sessions
09:30 AM – 4:30 PM – Applied AI Free Virtual Event
12:00 PM – 2:00 PM – Woman Ignite Session
1:00 PM – 1:45 PM – Virtual Networking Event
4:00 PM – 5:30 PM – AI Investors Reverse Pitch
3:30 PM – 4:30 PM – Meet the Expert
Friday – 10/30/2020
09:00 AM – 10:30 AM – ODSC Keynotes
10:30 AM – 5:30 PM – ODSC Hands-on Trainings and Workshops
10:30 AM – 5:00 PM – Breakout Talk Sessions
10:30 AM – 5:00 PM – Career Mentor Talks
11:30 AM – 12:00 PM – Meet the Speaker
4:00 PM – 5:30 PM – Learning from Failure
Are We Ready for the Era of Analytics Heterogeneity? Maybe… but the Data Says No
(PDT)
Marinela Profi | Global Strategist AI & Model Management | Data Science Evangelist | SAS | WOMEN TECH NETWORK
Type: Keynote
Session Details & Prerequisites | Q&A Slack Channel |
Keynote Session – Suchi Saria
(PDT)
Suchi Saria, PhD | Director, Machine Learning & Healthcare Lab | Johns Hopkins University
Type: Keynote
A Secure Collaborative Learning Platform
Raluca Ada Popa, PhD | Assistant Professor | Co-Founder | Berkeley | PreVeil
Type: Keynote
Session Details & Prerequisites | Q&A Slack Channel |
OCTOBER 29TH
Data for Good: Ensuring the Responsible Use of Data to Benefit Society
(PDT)
Jeannette M. Wing, PhD | Avanessians Director of the Data Science Institute and Professor of Computer Science | Columbia University
- Causal INFERENCE Effects – estimate effects
- Over and under estimation of instrumental variables
- Confounders: Model assigned causes – Over and under estimation
- De-Confounder: Estimate substitute confounders – Over and under estimation
- Convolutional Neuro-networks model
- Economics: Monopsony, Robo-Advising
- History: Topic modeling with NLP,
- Trustworthy Computing vs Trustworthy AI: Safety, Fairness, Robustness
- Classifiers: Fair/Unfair make then more robust to a class of distributions
- Image recognition system: DeepXplore: Semantic perturbation
- DP and ML: PixelDP – STOP sign vs Yield sign
- HealthCare @Columbia University: 600 Million EHR
- The Medical De-confounder: Treatment Effects on A1c DM2
Type: Keynote, Level: All Levels, Focus Area: AI for Good, Machine Learning
Session Details & Prerequisites | Q&A Slack Channel |
Keynote Session – Ben Taylor
(PDT)
Ben Taylor, PhD | Chief AI Evangelist | DataRobot
- Convolution NN – Clustering of Countries: Latin America, Asia
- Story telling
- Acceleration:
- GPT-3 from OpenAI – Q&A, Translation, grammar
- Image GPT
- Can AI Predict
Type: Keynote, Level: All Levels, Focus Area: Data Science Track
Applying AI to Real World Use Cases
(PDT)
John Montgomery | Corporate Vice President, Program Management, AI Platform | Microsoft
Type: Keynote
- Machine comprehension
- Massive ML Models: Vision Model – Reznet
- Alternative to Azure, OpenAI (Partner of Microsoft) released –>>>>> GPT-3 1758
- AZURE ML: create models, operationalize models, build models responsibly
- Model interpretability – Data Science, gov’t regulation: Features importance dashdourd
- USE CASES
- Building accurate models
- Little Ceasar’s Pizza: “Hot N-Ready” – Demand forecasting of Pizza Supply by combination of ingredients
Predict: X Quantity by Auto ML
- Deploy and Manage Many Models: MMM Accelerator: Ten Models at AGL – Australia renewal energy
Model for Responsible ML: Fairness & Interpretability
- EY – Bank denies a LOAN
- Mitigation of Bias detection for Men and Women in Loan Applications
Loan Approval
- Explanation dashboard – Aggregate model: Top feature in loan approval: Education Level
- Fairness – Hazard performance for Accuracy: Disparity in prediction by Gender
ML is part of AZURE Platform
Bonsai – is Reinforcement Learning: Simulation Scenarios
AutoML – do know standard algorithms vs when you do not know
Session Details & Prerequisites | Q&A Slack Channel |
TALKS on 10/29/2020
NLP
(PDT)
Tian Zheng, PhD | Chair, Department of Statistics | Associate Director | Columbia University | Data Science Institute
Type: Track Keynote, Level: Intermediate, Focus Area: NLP
- Stochastic variability inference
- Case-control likelihood approximation
- Sampling node system
TEXT
- LDA – Latent Distribution Modeling Dirichlet
Probability distribution over the vocabulary of words: Topic assignment
LINKS
- MMSB – Mixed Membership
Detect communities in networks
blockmodel – profile of social interaction in different nodes
- LMV – Pairwise-Link-LDA – same topic proportions have equal % for citing
Pair-wise-Link-LDA
- Draw topic
- Draw Beta
- For each document
- For each document pair
Variational Inference – fully factored model
- article visibility
Stochastic Variation Inference
- local (specific to each node) & global (across nodes)
- At each iteration minibatch of nodes
Sampling Document pairs
- Stratified sampling scheme – shorter link
- Informative set sampling [informative vs non-imformative sets]
- these scheme – Mean estimation problem: Inclusion probability: All links are included
- Stochastic gradient updates for global parameters
- Comparison with alternative Approaches
- LDA + Regression
- Relational topic model
- Pairwise-Link-LDA combine LDA and MMB [Same priors]
- Predictive ranks (random guessing) and Runtimes (compact id distinct no overalp)
- evaluate model fit: average predictive rank of held-out documents – Top articles
Cora dataset
LMVS – better predictive performance than
KDD Dataset
Citation trends in HEP: Relevance of Topics vs Visibility
Article recommendation by Rank Topic Proportions
Visibility as a topic-adjusted measure
More recent are more visible
CItation is not a strong indicator for visibility
Visibility as a topic-adjusted measure
Making Deep Learning Efficient
(PDT)
Kurt Keutzer, PhD | Professor, Co-founder, Investor | UC Berkeley, DeepScale
Type: Track Keynote
- ML – SubSets
- Deep Learning – TRAINING for Clssification – Neuralnets – LeNet vs AlexNet – 7 layers 140x flops – using parallelism
- Shallow learning – deterministic and linear classifier used
- ML algorithms: Core ML, Audio analysis (Speech and audio recognition) , Multimedia
- NLP: translation,
- McKinsey & Co. – AI as a Service (AIasS)
PROBLEMS to Solve
Image Classification
- Object Detection
- Semantic Segmentation
- Convolutional NN
Audio Enhancement at BabbleLabs
Video Sentiment Analysis – Recommendations to Watch or to search
Natural Language Processing & Speech
- Translation
- Document understanding
- Question answering
- general language understanding evaluation (GLUE)
BerkeleyDeepDrive (BDD)
BERT – Transformer – 7 seconds per sentence
- BERT-base
- Q-BERT
- Transformer
Computational Patterns of Deep NN (DNN) – TRAINING required for DNN
PLATFORMS OF CLOUD
- GRADIANT DESCENT (GD)
- Stochastic GRADIANT DESCENT (SGD)
Recommendation Models – DNN – Parallelism
- Facebook – 80% is recommendation = Advertisement
- No sharing of data by Collector: Alibaba, Facebook, twitter
Considerations
- Latency – NETWORK WIFI
- Energy
- Computation power
- Privacy
- Quantization: Fewer Memory Accesses
- Lower Precision implies higher
- Flat Loss Landscape – Precision Layer by Layer
- Move computation to the EDGE
Language Complexity and Volatility in Financial Markets: Using NLP to Further our Understanding of Information Processing
(PDT)
Type: Track Keynote, Level: All Levels, Focus Area: NLP
Intelligibility Throughout the Machine Learning Life Cycle
(PDT)
Jenn Wortman Vaughan, PhD | Senior Principal Researcher | Microsoft Research
Type: Talk, Level: Beginner-Intermediate, Focus Area: Machine Learning
- A Human-centered Agenda for Intelligibility
- Beyond the model: Data, objectives, performance metrics
- context of relevant stakeholders
- Properties of system design vs Properties of Human behavior
Learning with Limited Labels
(PDT)
Type: Talk, Level: Intermediate-Advanced, Focus Area: Deep Learning, Research frontiers
How AI is Changing the Shopping Experience
(PDT)
Sveta Kostinsky | Director of Sales Engineering | Samasource
Marcelo Benedetti | Senior Account Executive | Samasource
Type: Talk, Level: Intermediate, Focus Area: Machine Learning, Deep Learning
- quality rubric
- Internal QA Sampling
- Client QA Sampling
- Auto QA
Transfer Learning in NLP
(PDT)
Joan Xiao, PhD | Principal Data Scientist | Linc Global
Type: Talk, Level: Intermediate, Focus Area: NLP, Deep Learning
Transfer learning enables leveraging knowledge acquired from related data to improve performance on a target task. The advancement of deep learning and large amount of labelled data such as ImageNet has made high performing pre-trained computer vision models possible. Transfer learning, in particular, fine-tuning a pre-trained model on a target task, has been a far more common practice than training from scratch in computer vision.
In NLP, starting from 2018, thanks to the various large language models (ULMFiT, OpenAI GPT, BERT family, etc) pre-trained on large corpus, transfer learning has become a new paradigm and new state of the art results on many NLP tasks have been achieved.
In this session we’ll learn the different types of transfer learning, the architecture of these pre-trained language models, and how different transfer learning techniques can be used to solve various NLP tasks. In addition, we’ll also show a variety of problems that can be solved using these language models and transfer learning.
- Transfer learning: Computer Vision – ImageNet Classification
- ResNet, GoogleNet, ILSVRC – VGG, ILSVRC’12 – AlexNet
- Feature Extrator vs Fine-tune
- Transfer learning: NLP
- Transfer Transformer: Text-to-Text Transfer Transformer
- Word embeddings: No context is taken into account – Word2vec, Glove
- ELMo – embedding from language models: Contextual,
- BERT – Bi-directional Encoder Representations fro Transformers
- MLM – Masked Language Model: Forward, Backward, Masked
- Next Sentence Prediction
- Achieved SOTA – 11 tasks: GLUE, SQuAD 1.0
- Predisction models;
- Input
- Label – IsNext vs NotNext
GLUE Test score
BERT BASE vs BERT LARGE
- Featured-based approach
BERT Variants – TinyBert, Albert, RoBETa, DistilBert
Multi-lingual BERT, BERT other languages
A Primer in BERTology: How BERT Works
OpenAI built a text generator – too dangerous to release
OpenAI GPT-3 – Trained on 300B tokens – THREE models:
- Zero-shot – English to French – no training
- one-shots
- Few-shot – the GOAL – GPT-3
- GRT-3 is large scale NLP
Examples – Feature extraction
- English to SQL
- English to CSS
- English to LaTex
Semantic textual similarity
NL inference
ULMFiT – Fine tuning – the larger the # of Training examples – the better the performance
- LM pre-training – start from scratch: BART, Big Bird, ELECTRA, Longformer
- LM fine-tuning
- Classifier fine-tuning
Data augmentation
Contextual Augmentation
- Original sentence
- masked
- augmented
Test generation
- boolean questions
- from structured data, i.e., RDF – Resource Description Framework
OCTOBER 30TH
Generalized Deep Reinforcement Learning for Solving Combinatorial Optimization Problems
(PDT)
Azalia Mirhoseini, PhD | Senior Research Scientist | Google Brain
Type: Keynote
Session Details & Prerequisites | Q&A Slack Channel |
- Learning Based Approaches vs branch & Bound, Hill climbing, ILP
- scale on distributed platforms
- Device Placement – too big to fit – PARTITION among multiple devices – evaluate run time per alternative placements
- Learn Placement on NMT – Profiling Placement on NMT
- CPU + layers encoder and decoders – overhead tradeoffs – parallelization for work balancing
- RL-based placement vs Expert placement
- Memory copying task
- Generalization to be achieved forr Device Placement Architecture
- Embeddings that transfer knowledge across graphs
- Graph Partitioning: Normalized cuts objective: Volume , Cuts,
- Learning based approach Train NN on nodes of graph assign Probability of node belonging to a given partition
- Continuous relaxation of Normalized cuts
- Optimize expected normalized Cuts
- Generalized Graph Partitioning Framework
- Placement Optimmization using AGENTS to place the nodes
- Train Policy to be using for placement of ALL chips
- Compiling a Dataset of Chip Placements
- Policy/Value Model Architecture to save wire length used
- RISC-V: Placement Visualization: Training from Scratch (Human) 6-8 weeks vs Pre-Trained 24 hours
Keynote Session – Zoubin Ghahramani
(PDT)
Zoubin Ghahramani, PhD | Distinguished Scientist and Sr Research Director | Professor of Information Engineering | ex-Chief Scientist and VP of AI | Google | University of Cambridge | Uber
Type: Keynote
- Data- models predictiona decisions Understanding
- AI & Games
- AI + ML
- Deep Learning! (DL)
- NN – tunable nonlinear functions with many parameters
- Parameters are weights of NN
- Optimization + Statistics
- DL – New-branding of NN
- Many layers – ReLUs attention
- Cloud resources
- SW – TensorFloe, JAX
- Industry investment in DL
DL – very successful
- non-parametric statistics
- use huge data – simulated data
- automatic differentiation
- stay close to identity – makes models deeps ReLU, LSTMs GRUs, ResNets
- Symmentry parameter tieying
Limitations of DL
- data hungry
- adversarial examples
- black-boxes – difficult to trust
- uncertainty – not easily incorporated
Beyond DL
- ML as Probabilistic Modeling: Data observed from a system
- uncertainty
- inverse probability
- Bayes rule Priors from measured quantities inference for posterior
- learning and predicting can be seen as forms of inference – likelihood
- approximations from estimation of Likelihoods
- Learning
- Prediction
- Model Comparison
- Sum rule: Product rule
Why do probabilities matter in AI and DS?
- COmplexity control and structure learning
- exploration-exlpoitations trade-offs
- Building prior knowledge algorithms for small and large data sets
- BDP – Bayesian DL
- Gaussian Processes – Linear and logistics regressions SVMs
- BDL – Baysian NN/ GP Hybrids
- Deep Sum=Product Networks – deescrimitive programming
Probabilistic Programming Languagues
Languages: Tensors, Turing,
Automatic Statistician –
- model discovery from data and explain the results
Probabilistic ML
- Learn from Data decision theory Prob AI BDL, Prob Prog,
Zoubin Ghahramani, 2015, Probabilistic machine learning and AI, Nature 521; 452-459
The Future of Computing is Distributed
Fri, October 30, 10:00 AM
Type: Keynote
Session Details & Prerequisites | Q&A Slack Channel |
- 1970 – ARPA net 1970 – distributed
- 1980 – High performance computing – HPC 1980s
- 1990 – WEB – Amazon
- 2000 – Big data – Google
Distributed computing – Few courses at universities
- Rise of deep learning (DL)
- Application becomes AI centered: Healthcare, FIN, Manufacturing
- Morse law – is dead: Memory and Processors
- Specialized hardware: CPU, GPU, TPU
- Memory dwarfed by demand
- Memory: Tutring Project 17B
- GPT-2 8.3B
- GPT-1
- Micro-services: Clusters of clouds – integrating with distributed workloads
- AI is overlapping with HPC
- AI and Big Data
AI Applications
- MPI,
- Stitching several existing systems
RAY riselab @Berkeley – Universal framework for distributed computing (Python and JAVA) across different Libraries
- Asynchronous execution enable parallelism
- Function -> Task (API)
- Object ID – every task scheduled
- Library Ecosystem – Native Libraries 3rd Party Libraries
- Amazon and AZURE SPARK, MARS (Tensor)
ADOPTIONS
- Number of contributors increase fast N=300
TALKS on 10/30/2020
Advances and Frontiers in Auto AI & Machine Learning – Lisa Amini
- Fri, Oct 30, 2020 1:30 PM – 2:15 PM EDT
- Auto AI – holistic approach
- Auto ML – Models: Feature creation, modeling, training & testing
AI AUTOmation for Enterprise
- Feature Preprocessor ->>Feature Transformer Feature selector Estimator
- Joint-optimization problem
- Method selection
- Hyper-parameter Optimization
- Black-box constraints
- Bias Mitigation Algorithms
- Pre-processing algo
- In-processing Algo
- Post-processing algo
- Automation for Data – READINESS for ML
- relational data –
- knowledge augmentation
- Data readiness reporting
- Labeling Automation: Enhance
Knowledge augmentation – Federated Learning
- External data sources
- existing data
- documents containing domain knowledge
- Automating Augmenting Data with knowledge: feature-concept mapping
Modeling
- Time Series Forecasting
AI to decision Optimization
- Demand forecasting from Standard AutoAI by ADDING Historical Decisions and Historical Business Impact__>> reinforced learning – Automatically created model from past and Auto AI
Validation
- Meta-learning for performance prediction
- Train the META data
- Score production data with AI
Deployment
- staged deployment with contextual bandits
Monitoring
- Performance prediction meta model applied over windows of production traffic
INNOVATIONS;
- End-to-end AI life cycle
- expanding scope of automation; Domain knowledge and decision optimization
The State of Serverless and Applications to AI
(PDT)
Joe Hellerstein, PhD | Chief Strategy Officer, Professor of Computer Science | Trifacta, Berkeley
The Cloud and practical AI have evolved hand-in-hand over the last decade. Looking forward to the next decade, both of these technologies are moving toward increased democratization, enabling the broad majority of developers to gain access to the technology.
Serverless computing is a relatively new abstraction for democratizing the task of programming the cloud at scale. In this talk I will discuss the limitations of first-generation serverless computing from the major cloud vendors, and ongoing research at Berkeley’s RISELab to push forward toward “”””stateful”””” serverless computing. In addition to system infrastructure, I will discuss and demonstrate applications including data science, model serving for machine learning, and cloud-bursted computing for robotics.
Bio:
Joseph M. Hellerstein is the Jim Gray Professor of Computer Science at the University of California, Berkeley, whose work focuses on data-centric systems and the way they drive computing. He is an ACM Fellow, an Alfred P. Sloan Research Fellow and the recipient of three ACM-SIGMOD “Test of Time” awards for his research. Fortune Magazine has included him in their list of 50 smartest people in technology , and MIT’s Technology Review magazine included his work on their TR10 list of the 10 technologies “most likely to change our world”. Hellerstein is the co-founder and Chief Strategy Officer of Trifacta, a software vendor providing intelligent interactive solutions to the messy problem of wrangling data. He has served on the technical advisory boards of a number of computing and Internet companies including Dell EMC, SurveyMonkey, Captricity, and Datometry, and previously served as the Director of Intel Research, Berkeley.
Type: Talk, Level: Intermediate, Focus Area: AI for Good, Machine Learning
Session Details & Prerequisites | Q&A Slack Channel |
- What happened with the Cloud – no app
- Parallelism – distributed computers – scale up or down, consistency and partial failure
- Serverless Computing: Functions-as-a-Service (FaaS)
- Developers OUTSIDE AWS, AZURE< Google to program the CLoud
- Python for the Cloud
- AutoScaling – yes
- Limitations of FaaS: AWS Lambda: I/O Bottlenecks, lifetine 15 min, No Inbound Network COmmunication
- Program State: local data – managed across invocations
- Data Gravity – expensive to move
Distributed consistency – data replication: Agree on data value mutable variable x [undate took place]
- Two-Phase commit [ Consensus – Paxos]
- coordination avoidance: waiting for control TALL LATENCY- DISTRIBUTION OF PERFORMANCE
- Slowdown cascades: I/O
- Application semantics: Programs requires coordination
- Program must have property of Monotonic
- MONOTONICITY: Input grows/output grows – wait on information not on Coordination
CALM – infinitely-scalable systems – no coordination ->> parallelism and smooth scalability
Monotonicity syntactically in a logic language
Hydro: a Platform for Programming the Cloud
Anna Serverless KVS – Hydro Project
- shared-nothing at all scales (even across Threads)
- Fast under contention: 90% request handling
Cloudburst: A stateful Serverless Platform: CACHE close to compute: Cache consistency
Latency Python, Cloudburst, AWS, AWS Lambda:
- AWS Lambda is SLOW for AI vs Python, Cloudburst
Scalable AWS Lambda simultaneously
- Motion planning compute
- Cloudburst + Anna requirement
@joe_hellerstein
Bloom Lab
RiseLab
Hydro
Just Machine Learning
(PDT)
Tina Eliassi-Rad, PhD | Professor | Core Faculty | Northeastern University | Network Science Institute
Type: Talk, Level: All Levels, Focus Area: Machine Learning
In 1997, Tom Mitchell defined the well-posed learning problem as follows: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” In this talk, I will discuss current tasks, experiences, and performance measures as they pertain to fairness in machine learning. The most popular task thus far has been risk assessment. We know this task comes with impossibility results (e.g., see Kleinberg et al. 2016, Chouldechova 2016). I will highlight new findings in terms of these impossibility results. In addition, most human decision-makers seem to use risk estimates for efficiency purposes and not to make fairer decisions. I will present an alternative task definition whose goal is to provide more context to the human decision-maker. The problems surrounding experience have received the most attention. Joy Buolamwini (MIT Media Lab) refers to these as the “under-sampled majority” problem. The majority of the population is non-white, non-male; however, white males are overrepresented in the training data. Not being properly represented in the training data comes at a cost to the under-sampled majority when machine learning algorithms are used to aid human decision-makers. In terms of performance measures, a variety of definitions exist from group- to individual- to procedural-fairness. I will discuss our null model for fairness and demonstrate how to use deviations from this null model to measure favoritism and prejudice in the data.
Tasks:
- Assessing risk
- Ranking
- Statistical parity: among classifier
PARITY vs imperfect classifier – can’t satisfy all the three conditions
- Precision
- Tru positive
- False parity
All classifier do not consider context or allow for uncertainty
- Learning to Place within existing cases
- Incentives/values of Human decision maker which incorporate in the decision external factors
- Game-theoretical framework
- How human exemplars make decision
- Are algorithms value free?
Computational Ethics
- Logically consistent principle
- Camouflage – machine did not learn on the task but on the cloudiness of the sky
- Model Cards for Model Reporting
- The “undersampled majority”
- Experience: Demonstration: Should we learn from demonstrations or from simulations?
- Complex networks: guilt by association vs privilege and prejudice, individual fairness
- Datasheets for Datasets
- Algorithms are like prescription drug: Adverse events
Human vs Machine judgement
- Performance measure – FAIRNESS: Group, individual
- Normativity throughout the entire well-posed learning problem
- Incentive/values
- Human or machines to make decisions?
- Laws are needed if algorithms are used as expert witness
Machine Learning for Biology and Medicine
Sriram Sankararaman, PhD | Professor, Computer Science | University of California – Los Angeles
Type: Talk, Focus Area: Machine Learning
Abstract:
Biology and medicine are deluged with data so that techniques from machine learning and statistics will increasingly play a key role in extracting insights from the vast quantities of data being generated. I will provide an overview of the modeling and inferential challenges that arise in these domains.
In the first part of my talk, I will focus on machine learning problems arising in the field of genomics. The cost of genome sequencing has decreased by over 100,000 fold over the last decade. Availability of genetic variation data from millions of individuals has opened up the possibility of using genetic information to identifying the cause of diseases, developing effective drugs, predicting disease risk and personalizing treatment. While genome-wide association studies offer a powerful paradigm to discovering disease-causing genes, the hidden genetic structure of human populations can confound these studies. I will describe statistical models that can infer this hidden structure and show how these inferences lead to novel insights into the genetic basis of diseases.
In the second part of my talk, I will discuss how the availability of large-scale electronic medical records is opening up the possibility of using machine learning in clinical settings. These electronic medical records are designed to capture a wide range of data associated with a patient including demographic information, laboratory tests, images, medications and clinical notes. Using electronic records from around 60,000 surgeries over five years in the UCLA hospital, I will describe efforts to use machine learning algorithms to predict mortality after surgery. Our results reveal that these algorithms can accurately predict mortality from information available prior to surgery indicating that automated predictive systems have great potential to augment clinical care.
Bio:
Sriram Sankararaman is an assistant professor in the Departments of Computer Science, Human Genetics, and Computational Medicine at UCLA where he leads the machine learning and genomic lab. His research interests lie at the interface of computer science, statistics and biology and is interested in developing statistical machine learning algorithms to make sense of large-scale biomedical data and in using these tools to understand the interplay between evolution, our genomes and traits. He received a B.Tech. in Computer Science from the Indian Institute of Technology, Madras, a Ph.D. in Computer Science from UC Berkeley and was a post-doctoral fellow in Harvard Medical School before joining UCLA. He is a recipient of the Alfred P. Sloan Foundation fellowship (2017), Okawa Foundation grant (2017), the UCLA Hellman fellowship (2017), the NIH Pathway to Independence Award (2014), a Simons Research fellowship (2014), and a Harvard Science of the Human Past fellowship (2012) as well as the Northrop-Grumman Excellence in Teaching Award at UCLA (2019).
- ML & BioMedicine
BioMedical data: high D, heterogeneous, noisy data
- Clinical Data & DL
- Predict death after surgery – 1000 dealth complication, sepsis acout kidney injury
- Mortality during and after surgery
- collaboration: Anesthesiology, PeriOps, UCLA Health
- Data warehouse – EMR 4/2013 – 12/2018
- 60,000 patients in data: Age, height, weight, gender,ASA Status- input from physician
Pre-operative mortality risk prediction – False positive, missing data: Lab data was collected, what were the values
2% of admission associate with mortality
SMOTE: over-sampling of associate with risk
Learning setup: Temporal training-testing split, hyper parameter
Models: Logistics, Random forest, gradient-boosted trees
Feature sets: ASA status, surrugate-ASA
- ASA Status – did not contribute with it and without it the same
- Lab values and timing of lab – is the most important festure.
- RANDOM FOREST model was selected
- Precision/recall curve
- The model reduced number of patients flagged by around 20x
Open problemsL Interoperability, Learning over private data
2. Epidemiological dat and ML – Social distancing in COVID-19 Pandemic
- Effectiveness of social distancing
- SEIR
- Average duration of infection
- Susceptible-Exposed-Infectious-Removed (SEIR) model
- R-naught applied to social distancing the ratio of Susceptible /Exposed is compared to Infectious/Removed the lowe the better
- Social distancing-relaxation – Relaxation in 2022
- COVID spread – estimate when SOcial distabcing need to END
- UK, NY, Spain, France, Germany, Denmark
- Hierarchical Bayesian model: Shared Global parameters, Location-specific, Observations
- Hierarchical Bayesian model SEIR Model: Data generation process
- Empirical Bayes: Maximize likelihood of the global parameters
- Trajectory based on Model Fit
- Estimation of uncertainty
- End of Social distancing – time distribution around a mean
- No seasonality, no infinite immunity, No vaccine
- Quantify Uncertainty
- Work with domain knowledge experts is great
- Single Exponential Smoothing
- ARIMA – long-memory models – Autoregressive AR
- Moving Average (MA) model – short memory
- Intergrated AR+MA = ARIMA
Learning Intended Reward Functions: Extracting all the Right Information from All the Right Places
Fri, October 30, 3:45 PM
(PDT)
Type: Talk, Focus Area: Deep Learning


- Sequential decision making
- defining what robots goal is
- Autonomous car
- AI = optimize intended rewards vs specified reward
- parametrization of the reward function
- Agent over-learn from specified rewards but under-learn from other sources
- observing feedback and express the human feedback in observation (human) model
- How can we model reward design/specification as a noisy and suboptiman process
- Development vs deployment environment
- Robot trust the development environment
- good behavior incentivized reward
- maximize winning, maximizing score, minimize winning, minimize score
- model the demo as a reward-rational implicit
- Human feedback as a reward-rational implicit choice
- The state of the environment as a reward-rational implicit choice
- task specification –>> reward
KEYNOTE SPEAKERS
ODSC West Keynotes









SCHEDULE

TUESDAY, OCTOBER 27TH
Pre-conference Day
ODSC BootCamp
BOOTCAMP KICKOFF WEST VIRTUAL | |
---|---|
10:00 am |
Fundamentals | Morning Sessions – Choose from 6 foundation sessions in Programming, Mathematics for Data Science, and Statistics |
11:00 am | |
12:00 pm | |
1:00 pm | |
2:00 pm |
Fundamentals | Afternoon Sessions – Choose from 6 foundation sessions in Programming, Mathematics for Data Science, and Statistics |
3:00 pm | |
4:00 pm | |
5:00 pm |

WEDNESDAY, OCTOBER 28TH
Day 1
ODSC Trainings, Workshops & AI Expo, Ai x and Ai x Keynotes
VIRTUAL HANDS-ON TRAINING WEST VIRTUAL | VIRTUAL AI X EXPO & DEMO HALL WEST VIRTUAL | EVENTS WEST VIRTUAL | |
---|---|---|---|
10:00 am |
Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
|
|
11:00 am |
|
||
12:00 pm | |||
1:00 pm | |||
2:00 pm |
Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
|
|
3:00 pm | |||
4:00 pm | |||
5:00 pm |

THURSDAY, OCTOBER 29TH
Day 2
ODSC Keynotes, Talks, Trainings, Workshops, AI Expo & Events
VIRTUAL HANDS-ON TRAINING WEST VIRTUAL | VIRTUAL AI X EXPO & DEMO HALL WEST VIRTUAL | VIRTUAL PRESENTATIONS WEST VIRTUAL | |
---|---|---|---|
9:00 am | |||
10:00 am |
Morning Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
Virtual Exhibitor Showcase & Partners Demo Talks – Choose from 12 Morning Partners Sessions & Visit 25+ Virtual Partners booth |
|
11:00 am |
|
||
12:00 pm | |||
1:00 pm | |||
2:00 pm |
Afternoon Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
Virtual Exhibitor Showcase & Partners Demo Talks – Choose from 12 Afternoon Partners Sessions & Visit 25+ Virtual Partners booth |
|
3:00 pm | |||
4:00 pm | |||
5:00 pm |

FRIDAY, OCTOBER 30TH
Day 3
ODSC Keynotes, Talks, Trainings, Workshops, Events, & Career Expo
VIRTUAL HANDS-ON TRAINING WEST VIRTUAL | VIRTUAL PRESENTATIONS WEST VIRTUAL | CAREER LAB AND EXPO & POSTER SESSIONS WEST VIRTUAL | |
---|---|---|---|
9:00 am | |||
10:00 am |
Morning Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
|
|
11:00 am |
|
||
12:00 pm | |||
1:00 pm | |||
2:00 pm |
Afternoon Hands-on Training and Workshops – Choose from Five 3.5 hours Training Sessions and Six 90 minute Workshop Sessions |
|
|
3:00 pm | |||
4:00 pm | |||
5:00 pm |
SPEAKERS

Click for
more info
Nadja Herger, PhD
DATA SCIENTISTTHOMSON REUTERS

Click for
more info
Viktoriia Samatova
HEAD OF TECHNOLOGY & INNOVATIONTHOMSON REUTERS

Click for
more info
Nina Hristozova
JUNIOR DATA SCIENTISTTHOMSON REUTERS

Click for
more info
Daniel Whitenack, PhD
INSTRUCTOR, DATA SCIENTISTDATA DAN

Click for
more info
David Talby, PhD
CTOPACIFIC AI, JOHN SNOW LABS

Click for
more info
Tian Zheng, PhD
CHAIR, DEPARTMENT OF STATISTICSCOLUMBIA UNIVERSITY

Click for
more info
Phoebe Liu
SENIOR DATA SCIENTISTAPPEN

Click for
more info
Frank Zhao
SENIOR DIRECTOR, QUANTAMENTAL RESEARCHS&P GLOBAL MARKET INTELLIGENCE
TOPICS – trends in NLP, including pre-trained models, with use-cases focusing on deep learning, speech-to text, and semantic search.
- Natural Language Processing
- NLP Transformers
- Pre-trained Models
- Text Analytics
- Natural Language Understanding
- Sentiment Analysis
- Natural Language Generation
- Speech Recognition
- Named Entity Extraction
MODELS
- BERT
- XLNet
- GPT-2
- Transformers
- Word2Vec
- Deep Learning Models
- RNN & LSTM
- Machine Learning Models
- ULMFiT
- Transfer Learning
TOOLS
- Tensorflow 2.0
- Hugging Face Transformers
- PyTorch
- Theano
- SpaCy
- NLTK
- AllenNLP
- Stanford CoreNLP
- Keras
- FLAIR
Leave a Reply