Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
This assignment is based on the MicroArray Quality Control (MAQC-II) study that I participated
in as postdoctoral researcher. We will be using endpoints D, E, J, K, L, and M described in Table
1 The Data
Some of the key challenges of these data are that there are relatively few samples (hundreds) com-
pared to the number of features (tens of thousands), providing an example of the \curse of dimen-
sionality.” Another challenge is that the class labels, although binary, are not balanced. So, there
may be 90% from one class and only 10% from the other. This will depend on the endpoint.
2 The Code
You will submit one Python le named gene expression.py that contains six functions. Each
function returns a Pipeline that is ready to be trained using its \ t” method:
# gene_expression.py
def endpoint_d():
# return a Scikit-Learn Pipeline object
return pipeline
def endpoint_e():
# return a Scikit-Learn Pipeline object
return pipeline
def endpoint_j():
# return a Scikit-Learn Pipeline object
return pipeline
def endpoint_k():
# return a Scikit-Learn Pipeline object
return pipeline
def endpoint_l():
# return a Scikit-Learn Pipeline object
return pipeline
def endpoint_m():
# return a Scikit-Learn Pipeline object
return pipeline
3 Model Selection and Hyperparameter Tuning
Use the training data to select your model and tune its hyperparameters. Review chapters 2, 3, and
4 for guiding principles.
We will be use \balanced accuracy” to evaluate the models which is the average of sensitivity and
speci city.
You have been provided a training set. Web-CAT will report your performance on a validation set.
Your grade will be determined by your performance on the hidden test set relative to your peers.
4 Submitting on Web-CAT
Web-CAT will use your classi er with a variety of dierent hyperparameters to see if it performs
the same as the Scikit-Learn GaussianMixture model and produces the correct attribute values.