Select the Clean Missing Data component, and click on Edit column in the right panel of the component.įor Include, select Column types from the dropdown list, and then select Numeric.Īny cleaning or replacement method that you choose must be applicable to all columns in the selection. Therefore, typically you need to clean string columns and numeric columns separately.įor example, to check for missing values in all numeric columns: You can choose multiple columns, but you must use the same replacement method in all selected columns. Therefore, if you need to clean different columns using different methods, use separate instances of the component.Īdd the Clean Missing Data component to your pipeline, and connect the dataset that has missing values.įor Columns to be cleaned, choose the columns that contain the missing values you want to change. Replace missing valuesĮach time that you apply the Clean Missing Data component to a set of data, the same cleaning operation is applied to all columns that you select. Consider both the justification for use of a particular method, and the quality of the results. We recommend that you experiment with different methods. The cleaning method that you use for handling missing values can dramatically affect your results. To apply a cleaning transformation to new data See the following sections of how to create and save a cleaning process: You can also save the cleaning operation so that you can apply it later to new data. This component lets you define a cleaning operation. You can re-use this transformation on other datasets that have the same schema, by using the Apply Transformation component. This component also outputs a definition of the transformation used to clean the missing values. You can also save the new, cleaned dataset for reuse. Instead, it creates a new dataset in your workspace that you can use in the subsequent workflow. Using this component does not change your source dataset.
Completely removing rows and columns that have missing values.Replacing missing values with a placeholder, mean, or other value.This component supports multiple types of operations for "cleaning" missing values, including: The goal of such cleaning operations is to prevent problems caused by missing data that can arise when training a model. Use this component to remove, replace, or infer missing values.ĭata scientists often check data for missing values and then perform various operations to fix the data or insert new values.
This article describes a component in Azure Machine Learning designer.