- What is XGBoost?
- What does XGBoost stand for?
- Who developed XGBoost and why?
- How is XGBoost different from Gradient Boosting Machines (GBMs)?
- What are the advantages of using XGBoost?
- Is XGBoost an ensemble method? Explain.
- What type of algorithms does XGBoost support?
- How does XGBoost perform regularization?
- What are some typical use cases for XGBoost?
- How is XGBoost different from Random Forest?
- What is boosting in machine learning?
- What is gradient boosting?
- How does boosting differ from bagging?
- How does XGBoost handle overfitting?
- What is shrinkage in XGBoost?
- What are boosting rounds?
- What is the base learner used in XGBoost?
- How does XGBoost calculate feature importance?
- What’s the difference between GBDT and XGBoost?
- What does objective function mean in XGBoost?
- How do you install XGBoost in Python?
- What platforms and languages does XGBoost support?
- How do you use XGBoost with scikit-learn API?
- How can you install XGBoost from source?
- Can XGBoost be run on GPU?
- What are hyperparameters in XGBoost?
- What does max_depth control?
- What is the role of learning_rate (eta)?
- What is n_estimators?
- What is subsample and how does it help?
- What does colsample_bytree do?
- What is gamma in XGBoost?
- What is the purpose of lambda and alpha parameters?
- How can you perform grid search on XGBoost?
- What’s the difference between booster=gptree and booster=gblinear?
- How do you train a model using XGBoost?
- What is DMatrix in XGBoost?
- Why is DMatrix used instead of a standard DataFrame?
- How do you validate a model in XGBoost?
- What evaluation metrics are supported?
- How do you plot a learning curve in XGBoost?
- What is early_stopping_rounds?
- How do you perform k-fold cross-validation in XGBoost?
- How do you save and load a model?
- What’s the difference between xgb.train() and xgboost.XGBClassifier()?
- How does XGBoost handle missing values?
- What is tree pruning in XGBoost?
- How does XGBoost implement regularization?
- How is gain calculated in XGBoost?
- What are leaf-wise trees in XGBoost?
- How does XGBoost rank feature importance?
- What are different types of feature importance plots?
- How do you deal with categorical features in XGBoost?
- How do you visualize trees built by XGBoost?
- What is get_fscore() method?
- How do you perform binary classification using XGBoost?
- What objective function is used for classification?
- How do you evaluate classification performance in XGBoost?
- What is the multi:softmax objective used for?
- What’s the difference between multi:softmax and multi:softprob?
- How do you use XGBoost for regression?
- What is the default loss function for regression in XGBoost?
- How do you evaluate regression performance?
- What is RMSE and how is it used in XGBoost?
- How do you tune a regression model with XGBoost?
- Can XGBoost handle multi-class classification?
- How do you set up a multi-class task?
- What techniques can be used with XGBoost for imbalanced datasets?
- How do you apply SMOTE with XGBoost?
- How do you calculate class weights for imbalanced classes?
- How can XGBoost models be deployed in production?
- Can you convert XGBoost to ONNX or PMML?
- How do you integrate XGBoost with Flask or FastAPI?
- What are the options for XGBoost model serialization?
- How do you serve XGBoost models on cloud platforms?
- How do you interpret the output of XGBoost models?
- What libraries can be used with XGBoost for model explainability?
- How do you use SHAP with XGBoost?
- What are partial dependence plots?
- How do you visualize decision paths in XGBoost?
- XGBoost vs. LightGBM: Key differences?
- XGBoost vs. CatBoost: Which one is better?
- How does XGBoost compare to Random Forest?
- When should you prefer LightGBM over XGBoost?
- How does training time compare between XGBoost and CatBoost?
- What to do when the model overfits?
- How do you handle memory errors with large datasets?
- Why is the model performing poorly on validation data?
- How do you speed up training time?
- How can you improve model generalization?
- How would you apply XGBoost in fraud detection?
- Can XGBoost be used for time series forecasting?
- How do you preprocess data before using XGBoost?
- How would you approach a Kaggle competition with XGBoost?
- How do you use XGBoost with text or NLP data?
- What’s the latest version of XGBoost?
- Can XGBoost be used for unsupervised learning?
- What’s the difference between sklearn and native XGBoost APIs?
- Can you use XGBoost with TensorFlow or PyTorch?
- What are some common mistakes when using XGBoost?
- What is LightGBM?
- How does LightGBM differ from XGBoost?
- What are the key advantages of using LightGBM?
- What data formats does LightGBM accept?
- What is a histogram-based decision tree in LightGBM?
- What are the main components of the LightGBM framework?
- Explain the concept of leaf-wise tree growth in LightGBM.
- What is Gradient-based One-Side Sampling (GOSS)?
- What is Exclusive Feature Bundling (EFB)?
- Why is LightGBM faster than traditional GBM implementations?
- What types of problems can LightGBM solve?
- What is the default boosting type in LightGBM?
- What is the difference between “goss” and “gbdt” boosting types?
- What are the default values for key LightGBM hyperparameters?
- How does LightGBM handle missing values?
- How can you install LightGBM?
- Which programming languages are supported by LightGBM?
- What is the role of “num_leaves” in LightGBM?
- How do you control overfitting in LightGBM?
- What is “max_depth” used for in LightGBM?
- How does LightGBM handle categorical features?
- What is the “feature_fraction” parameter?
- What does “bagging_fraction” mean?
- How do you interpret “min_data_in_leaf”?
- What metrics are supported by LightGBM?
- What’s the difference between training and validation datasets in LightGBM?
- How do you early stop a training process in LightGBM?
- How do you save and load a trained LightGBM model?
- Can LightGBM be used for multi-class classification?
- How do you set class weights in LightGBM?
- What is the impact of increasing “num_leaves” in LightGBM?
- What are the consequences of a small “min_data_in_leaf” value?
- How does “lambda_l1” and “lambda_l2” regularization work in LightGBM?
- How is the importance of a feature computed in LightGBM?
- What are the different ways to evaluate feature importance?
- How does LightGBM handle imbalanced datasets?
- What strategies can be applied for hyperparameter tuning in LightGBM?
- Explain how cross-validation is performed in LightGBM.
- What is “early_stopping_round” in LightGBM?
- How do “boosting” and “objective” differ in LightGBM?
- Explain the concept of monotonic constraints in LightGBM.
- How does LightGBM handle parallel and GPU training?
- What is the difference between GPU and CPU training in LightGBM?
- How does LightGBM compare to CatBoost?
- What is the “metric” parameter used for?
- How can you use LightGBM for regression problems?
- What is “max_bin” and how does it affect model performance?
- What is the role of “learning_rate” in LightGBM?
- How can you deal with overfitting in LightGBM?
- Explain the role of “bagging_freq” in LightGBM.
- What’s the use of the “verbose” parameter?
- How can LightGBM be used for ranking tasks?
- What’s the difference between “lambdarank” and “rank_xendcg” objectives?
- How does LightGBM calculate gradients?
- What kind of loss functions does LightGBM support?
- What does “linear_tree” boosting do?
- How can you visualize a LightGBM tree?
- What is the role of “num_boost_round”?
- What’s the benefit of using categorical features directly in LightGBM?
- What does “is_unbalance” do in LightGBM?
- How does LightGBM integrate with scikit-learn?
- How can you perform grid search with LightGBM?
- What are common performance metrics for regression using LightGBM?
- What are common classification metrics used with LightGBM?
- What happens if LightGBM has too many categorical features?
- How can you improve LightGBM model interpretability?
- How can SHAP values be used with LightGBM?
- What are common pitfalls when using LightGBM?
- What does LightGBM return when using predict()?
- How can LightGBM be deployed in a production system?
- Describe how LightGBM performs histogram construction.
- How does LightGBM maintain histogram efficiency with large datasets?
- Explain how data sampling affects model bias and variance in LightGBM.
- What’s the memory impact of increasing “max_bin” in LightGBM?
- How does LightGBM handle multi-threaded data loading?
- How is LightGBM optimized for distributed training?
- What is the role of “histogram_pool_size” in LightGBM?
- Can LightGBM be used in time series forecasting? If so, how?
- How do you tune LightGBM for latency-critical applications?
- How can LightGBM be used with Dask or Spark?
- What is the impact of training data size on LightGBM performance?
- How can you use Bayesian Optimization for tuning LightGBM?
- What are the key components of LightGBM’s native format (.bin)?
- How does LightGBM differ when used in ranking vs classification?
- What happens when LightGBM model is underfitting?
- How do you interpret the LightGBM decision paths?
- What is “force_col_wise” and “force_row_wise” in LightGBM?
- How can LightGBM be integrated into a CI/CD pipeline?
- How do “drop_rate” and “skip_drop” affect dart boosting?
- What are some tricks to speed up LightGBM training on GPU?
- How is data sharding handled in distributed LightGBM training?
- What’s the importance of the “monotone_constraints_method” parameter?
- What security measures should be taken while deploying LightGBM?
- How can you implement a custom objective function in LightGBM?
- What is the effect of “extra_trees” in LightGBM?
- How can you ensure LightGBM reproducibility across runs?
- What are the risks of feature leakage in LightGBM?
- How does LightGBM handle data with high cardinality features?
- How can LightGBM be used in ensemble learning?
- What are some best practices for productionizing LightGBM models?
- What is CatBoost and who developed it?
- What are the main features of CatBoost?
- How does CatBoost handle categorical variables differently than other gradient boosting methods?
- Explain the term “ordered boosting” in CatBoost.
- What are the advantages of using CatBoost over XGBoost and LightGBM?
- What programming languages are supported by CatBoost?
- How does CatBoost reduce prediction shift?
- What is target leakage and how does CatBoost handle it?
- What are symmetric trees in CatBoost?
- How does CatBoost handle missing values?
- What is the default loss function in CatBoost for classification tasks?
- What is the default loss function in CatBoost for regression tasks?
- Can CatBoost handle text features? How?
- Describe the internal preprocessing pipeline of CatBoost.
- What are oblivious trees in the context of CatBoost?
- How do you train a CatBoost model in Python?
- What is the role of the cat_features parameter?
- What types of tasks does CatBoost support (e.g., classification, regression)?
- What are the benefits of using GPU in CatBoost?
- How do you enable GPU training in CatBoost?
- What is the function of the depth parameter?
- How does the iterations parameter affect model performance?
- What is learning_rate in CatBoost? How do you tune it?
- How can you control overfitting in CatBoost?
- What is the role of l2_leaf_reg?
- How does CatBoost deal with class imbalance?
- What is border_count and how does it affect numerical features?
- Explain the grow_policy parameter.
- How is early stopping implemented in CatBoost?
- What’s the use of bootstrap_type in CatBoost?
- How are categorical features internally converted by CatBoost?
- What is a one-hot encoding strategy in CatBoost?
- How does CatBoost handle high-cardinality categorical features?
- What is the effect of using one_hot_max_size?
- How can you interpret feature importance in CatBoost?
- Explain permutation feature importance vs. loss function change in CatBoost.
- How does CatBoost support feature selection?
- Can you perform feature interaction analysis in CatBoost?
- What is a FeatureInteraction object?
- How can you use SHAP values with CatBoost?
- What is the impact of using text features with embeddings in CatBoost?
- How do you preprocess date/time features in CatBoost?
- What are text_features and how are they processed?
- How can you handle duplicate or correlated features in CatBoost?
- Can you use feature hashing in CatBoost?
- What are the most common evaluation metrics for CatBoost classification?
- How do you evaluate a regression model in CatBoost?
- What is the purpose of the eval_metric parameter?
- Can you define a custom evaluation metric in CatBoost?
- How is Logloss calculated in CatBoost?
- What is Gradient-based One-Side Sampling (GOSS)?
- How does CatBoost support multi-class classification?
- What is AUC and how is it used in CatBoost evaluation?
- How do you visualize overfitting using evaluation plots in CatBoost?
- What is the difference between loss_function and eval_metric?
- What is the role of custom_metric in CatBoost?
- How can you monitor training progress in CatBoost?
- What is verbose used for in training logs?
- How do you evaluate CatBoost models on a holdout set?
- How is cross-validation done in CatBoost?
- What are the pros and cons of using CatBoost’s built-in cross-validation?
- What are the different boosting types available in CatBoost?
- Explain the concept of quantization in CatBoost.
- What is model compression in CatBoost?
- How does CatBoost implement monotonic constraints?
- How can you interpret CatBoost’s model trees?
- How can you export CatBoost models to ONNX?
- How can you implement hyperparameter tuning for CatBoost?
- What is the use of RandomStrength?
- How does CatBoost support time-series forecasting?
- What is the role of od_type and od_wait?
- How can you ensemble CatBoost models?
- How do you use CatBoost with Optuna or Hyperopt for tuning?
- Can CatBoost be integrated with sklearn pipelines?
- How can you convert a trained CatBoost model to CoreML or JSON?
- How do you tune CatBoost on multi-label data?
- How do you save and load CatBoost models?
- What are the ways to deploy CatBoost models in production?
- How does CatBoost support REST API-based inference?
- How do you perform batch prediction using CatBoost?
- How can you integrate CatBoost into a Flask or FastAPI service?
- How do you handle model versioning with CatBoost?
- How to perform online prediction with CatBoost?
- What are the file formats supported for model export/import?
- How to use CatBoost with MLflow?
- How does CatBoost support interpretable ML in deployment?
- What are the common pitfalls when using CatBoost?
- How to resolve GPU memory errors in CatBoost?
- What is a typical preprocessing pipeline for CatBoost models?
- Why might CatBoost training be slow and how can you speed it up?
- What causes NaN or Inf values in CatBoost output?
- How can you debug poor model performance in CatBoost?
- What are best practices for categorical encoding in CatBoost?
- How do you debug overfitting in a CatBoost model?
- How can you make CatBoost models smaller and faster?
- What logging and visualization tools are useful with CatBoost?
- How do you ensure fairness in CatBoost predictions?
- How do you address data drift in CatBoost deployments?
- What are alternatives to CatBoost for categorical-heavy datasets?
- What is the best way to use CatBoost with large datasets?
- How does CatBoost handle unseen categories during inference?