{"id":733,"date":"2019-12-08T15:47:12","date_gmt":"2019-12-08T15:47:12","guid":{"rendered":"https:\/\/www.danielparente.net\/en\/2019\/12\/08\/everything-but-the-kitchen-sink-feature-generation\/"},"modified":"2019-12-08T15:47:12","modified_gmt":"2019-12-08T15:47:12","slug":"everything-but-the-kitchen-sink-feature-generation","status":"publish","type":"post","link":"https:\/\/www.danielparente.net\/en\/2019\/12\/08\/everything-but-the-kitchen-sink-feature-generation\/","title":{"rendered":"Everything But the Kitchen Sink Feature Generation"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div itemprop=\"articleBody\">\n<p><em>This blog is a part of a <a href=\"https:\/\/blogs.sas.com\/content\/tag\/data-science-pilot-explained\/\" target=\"_blank\" rel=\"noopener noreferrer\">series on the Data Science Pilot Action Set<\/a>. In my first blog we introduced the action set and the actions for building data understanding. In this blog we dive into feature generation and selection.\u00a0<\/em><\/p>\n<p>The <a href=\"https:\/\/go.documentation.sas.com\/?docsetId=casactml&amp;docsetTarget=casactml_datasciencepilot_toc.htm&amp;docsetVersion=8.5&amp;locale=en\" target=\"_blank\" rel=\"noopener noreferrer\">Data Science Pilot Action Set<\/a> is included with SAS Visual Data Mining and Machine Learning (VDMML) and consists of actions that implement a policy-based, configurable, and scalable approach to automating data science workflows. The two actions we will examine in this blog are the featureMachine action and the selectFeatures action.<\/p>\n<div id=\"attachment_2715\" style=\"width: 987px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blogs.sas.com\/content\/subconsciousmusings\/files\/2019\/10\/dataSciencePilotActions.jpg\" target=\"_blank\" rel=\"noopener\"><img fetchpriority=\"high\" decoding=\"async\" aria-describedby=\"caption-attachment-2715\" class=\"size-full wp-image-2715\" src=\"https:\/\/blogs.sas.com\/content\/subconsciousmusings\/files\/2019\/10\/dataSciencePilotActions.jpg\" alt=\"\" width=\"977\" height=\"289\" srcset=\"https:\/\/blogs.sas.com\/content\/subconsciousmusings\/files\/2019\/10\/dataSciencePilotActions.jpg 977w, https:\/\/blogs.sas.com\/content\/subconsciousmusings\/files\/2019\/10\/dataSciencePilotActions-300x89.jpg 300w\" sizes=\"(max-width: 977px) 100vw, 977px\"\/><\/a><\/p>\n<p id=\"caption-attachment-2715\" class=\"wp-caption-text\">Data Science Pilot Actions<\/p>\n<\/div>\n<p>The<a href=\"https:\/\/go.documentation.sas.com\/?docsetId=casactml&amp;docsetTarget=casactml_datasciencepilot_details23.htm&amp;docsetVersion=8.5&amp;locale=en\" target=\"_blank\" rel=\"noopener noreferrer\"> featureMachine action<\/a> not only generates new features, but it also explores the data and screens variables. That means that this action can also take on data exploration. Not only is this action taking on double duty, it can save you time by running task in parallel. When it comes to new features, this action creates everything but the kitchen sink. The list of transformations includes missing indicators, several types of imputation, several types of binning, and much more. Unfortunately, the featureMachine action doesn&#8217;t include any subject matter expertise. This action won&#8217;t replace domain knowledge in feature generation, but it has everything else covered.<\/p>\n<p>The featureMachine action includes the explorationPolicy and screenPolicy, and the transformationPolicy. The explorationPolicy specifies how the data is grouped together, the screenPolicy controls how much data messiness is acceptable, and the transformationPolicy defines which types of features to create. The resulting output of this action set includes information about the generated features, the features generated using the input data, and an analytic store file for generating the features with new data.<\/p>\n<div class=\"wp_syntax\">\n<table>\n<tr>\n<td class=\"code\">\n<pre class=\"sas\" style=\"font-family:monospace;\"><span style=\"color: #006400; font-style: italic;\">\/* Create new features using featureMachine Action *\/<\/span>\n<span style=\"color: #000080; font-weight: bold;\">proc cas<\/span>;\n\tloadactionset <span style=\"color: #a020f0;\">\"dataSciencePilot\"<\/span>;\n\tdataSciencePilot.featureMachine\n\t\t\/\t<span style=\"color: #0000ff;\">table<\/span> \t\t\t= <span style=\"color: #a020f0;\">\"hmeq\"<\/span>\n\t\t\ttarget \t\t\t= <span style=\"color: #a020f0;\">\"BAD\"<\/span>\n\t\t\tcopyVars \t\t= <span style=\"color: #a020f0;\">\"BAD\"<\/span>\n\t\t\texplorationPolicy      \t= <span style=\"color: #66cc66;\">{<\/span>cardinality = <span style=\"color: #66cc66;\">{<\/span>lowMediumCutoff = <span style=\"color: #2e8b57; font-weight: bold;\">40<\/span><span style=\"color: #66cc66;\">}<\/span><span style=\"color: #66cc66;\">}<\/span>\n\t\t    \tscreenPolicy           \t= <span style=\"color: #66cc66;\">{<\/span>missingPercentThreshold=<span style=\"color: #2e8b57; font-weight: bold;\">35<\/span><span style=\"color: #66cc66;\">}<\/span>\n            \t\ttransformationPolicy   \t= <span style=\"color: #66cc66;\">{<\/span>entropy = True, iqv = True,  <span style=\"color: #0000ff;\">kurtosis<\/span> = True, Outlier = True<span style=\"color: #66cc66;\">}<\/span>\n            \t\ttransformationOut      \t= <span style=\"color: #66cc66;\">{<\/span>name = <span style=\"color: #a020f0;\">\"TRANSFORMATION_OUT\"<\/span>, <span style=\"color: #0000ff;\">replace<\/span> = True<span style=\"color: #66cc66;\">}<\/span>\n           \t\tfeatureOut             \t= <span style=\"color: #66cc66;\">{<\/span>name = <span style=\"color: #a020f0;\">\"FEATURE_OUT\"<\/span>, <span style=\"color: #0000ff;\">replace<\/span> = True<span style=\"color: #66cc66;\">}<\/span>\n            \t\tcasOut                 \t= <span style=\"color: #66cc66;\">{<\/span>name = <span style=\"color: #a020f0;\">\"CAS_OUT\"<\/span>, <span style=\"color: #0000ff;\">replace<\/span> = True<span style=\"color: #66cc66;\">}<\/span>\n            \t\tsaveState              \t= <span style=\"color: #66cc66;\">{<\/span>name = <span style=\"color: #a020f0;\">\"ASTORE_OUT\"<\/span>, <span style=\"color: #0000ff;\">replace<\/span> = True<span style=\"color: #66cc66;\">}<\/span>\n\t\t;\n\t<span style=\"color: #000080; font-weight: bold;\">run<\/span>;\n<span style=\"color: #000080; font-weight: bold;\">quit<\/span>;<\/pre>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<p>The <a href=\"https:\/\/go.documentation.sas.com\/?docsetId=casactml&amp;docsetTarget=casactml_datasciencepilot_details26.htm&amp;docsetVersion=8.5&amp;locale=en\" target=\"_blank\" rel=\"noopener noreferrer\">selectFeatures Action<\/a> will filter features based on a specified measure. Using the selectionPolicy, you can specify the measure you want to filter on and how many features you want. If no measure is specified, the Mutual Information criterion is used as a default.\u00a0 In continuation of my coding example, I fed the data generated from the featureMachine action into the selectFeatures action to select the best ten features.<\/p>\n<div class=\"wp_syntax\">\n<table>\n<tr>\n<td class=\"code\">\n<pre class=\"sas\" style=\"font-family:monospace;\"><span style=\"color: #006400; font-style: italic;\">\/* Select features using selectFeatures Action *\/<\/span> \n<span style=\"color: #000080; font-weight: bold;\">proc cas<\/span>;\n\tloadactionset <span style=\"color: #a020f0;\">\"dataSciencePilot\"<\/span>;\n\tdataSciencePilot.selectFeatures\n\t\t\/ \t<span style=\"color: #0000ff;\">table<\/span> \t\t= <span style=\"color: #a020f0;\">\"CAS_OUT\"<\/span>\n\t\t\tcasOut \t\t= <span style=\"color: #66cc66;\">{<\/span>name = <span style=\"color: #a020f0;\">\"SELECT_FEATURES_OUT\"<\/span>, <span style=\"color: #0000ff;\">replace<\/span> = True<span style=\"color: #66cc66;\">}<\/span>\n\t\t\ttarget \t\t= <span style=\"color: #a020f0;\">\"BAD\"<\/span>\n\t\t\tselectionPolicy = <span style=\"color: #66cc66;\">{<\/span>topk=<span style=\"color: #2e8b57; font-weight: bold;\">10<\/span><span style=\"color: #66cc66;\">}<\/span>\n\t\t;\n\t<span style=\"color: #000080; font-weight: bold;\">run<\/span>;\n<span style=\"color: #000080; font-weight: bold;\">quit<\/span>;<\/pre>\n<\/td>\n<\/tr>\n<\/table>\n<\/div>\n<p>In this blog, we took control over more aspects of the data science workflow. Using the featureMachine action, we were able to create many new features and using the selectFeatures action, we narrowed our features down into a usable number. However, if you are looking for one piece of code that will do it all (or almost it all), stayed tuned for my next blog! In the <a href=\"https:\/\/blogs.sas.com\/content\/tag\/data-science-pilot-explained\/\" target=\"_blank\" rel=\"noopener noreferrer\">upcoming blog<\/a>, we will introduce the dsAutoMl action.<\/p>\n<\/p><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/blogs.sas.com\/content\/subconsciousmusings\/2019\/11\/06\/kitchen-sink-feature-generation\/\" target=\"_blank\" rel=\"noopener\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] This blog is a part of a series on the Data Science Pilot Action Set. In my first blog we introduced the action set and the actions for building data understanding. In this blog we dive into feature generation and selection.\u00a0 The Data Science Pilot Action Set is included with SAS Visual Data Mining [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":734,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-733","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"blocksy_meta":[],"jetpack_featured_media_url":"https:\/\/e928cfdc7rs.exactdn.com\/info\/uploads\/sites\/3\/2019\/12\/Everything-But-the-Kitchen-Sink-Feature-Generation.jpg?strip=all","jetpack_shortlink":"https:\/\/wp.me\/p2TFCd-bP","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/comments?post=733"}],"version-history":[{"count":0,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/733\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media\/734"}],"wp:attachment":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media?parent=733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/categories?post=733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/tags?post=733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}