Feature Engineering Techniques used by Facebook and LinkedIn


Optimal Notification Volume for Maximum Rewards

In this paper

  • Notification volume estimation is modelled as a constrained optimization problem
  • Organic and notification driven user engagement is differentiated
  • Reward for notification volume is estimated from activity prediction, unsubscribe prediction and unsubscribe long term effect models
  • Incremental reward of notification is used for volume optimization by hill climbing algorithm
  • Optimal number of notifications are estimated at user level, subject to global constraints

Outline

  • Introduction
  • Pinterest Notification System
  • Problem Formulation
  • Data
  • Proposed Algorithms
  • Experiment & Results

Introduction


Restricted Random Forests for interpretable predictions

Outline

  • Introduction
  • Data
  • Data Exploration
  • Proposed Technique — Restricted Random Forests
  • Interpretability
  • Performance Metrics
  • Experiment & Results

Introduction


Resizing of image leads to loss of information critical for image forgery detection; full-resolution network are better suited for the task

Noiseprint from paper Noiseprint: a CNN-based camera model fingerprint
Noiseprint (Noiseprint: a CNN-based camera model fingerprint https://arxiv.org/abs/1808.08396)

In this paper

  • Xception is used to extract features from patches of the full image without any resizing.
  • Feature aggregation is performed using various pooling techniques.
  • Fully connected layers are used for forgery detection at image level.
  • Noiseprint is experimented as additional feature along with RGB bands.
  • Gradient checkpointing is used for memory management of network.
  • The network is trainable end-to-end (E2E).

Outline

  • Introduction
  • Data
  • Proposed Model
  • Loss Function
  • Performance Metrics
  • Results
  • Forgery Localization
  • Implementation

Introduction


Embeddings are trained on bipartite graph derived from transactions, used in downstream tasks and visualization

Word2Vec — CBOW and Skip-Gram
Source: Efficient Estimation of Word Representations in Vector Space (https://arxiv.org/pdf/1301.3781.pdf)

In this paper

  • Bipartite graph is derived from credit card transactions. If two transactions of an account, falling within a specified time window, are represented as {Merchant, Account, Merchant} then the two merchants constitute an edge in bipartite graph.
  • This graph is used to train Skip-Gram embeddings.
  • Embeddings are found to cluster similar merchants together.
  • Brand based embeddings perform better than raw merchant embeddings for downstream tasks.

Outline

  • Introduction
  • Data
  • DeepTrax Methodology for Embeddings
  • Loss Function
  • Performance Metrics
  • Results

Introduction


Explanation of the paper

Multi-stakeholder Systems and Multi-Objective Recommendations




Abhay Shukla

Lead Data Scientist@Airtel X Labs https://www.linkedin.com/in/shuklaabhay/ #DataScience #ML #AI #Statistics #Reading #Music #Running

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store