Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Deep Learning 1. Prove that the softmax and sigmoid functions are equivalent when the number of possible labels is two (for binary classification problem). 2. According to what you have learned about BP algorithm including feed forward propagation and backward propagation processes of Neural Network, answer Question 1 in the slide below to prove whether the 6 equations hold or do not hold: 3. Answer Question 2 in the slide “Homework” above to “implement an NN …”. Using the 3-layer NN codes I gave you as a reference example Requirements: Implement the NN algorithm by using Python (submit the codes and make sure that your codes can be tested by our TAs or instructors through a simple “click”. You can submit a readme file if needed). Note: that the architecture (2 – 3 – 2) used in the given code is different from the one (2 – 2 – 2) in this Question 2 of Homework Assignment 2. Also, note that the activation functions are different as well. 4. Given the network architecture as shown in the slide “Homework” above in Question 2 (assumed that the loss function is: ?(?, ŷ) = 1 2 ∑ (? − ŷ) ? ?=1 2 , learning rate, α =0.5, answer the questions below (provide the details step by step): a. According to the forward propagation process, calculate the output: ŷ = ? b. Calculate the loss (errors): L (?, ŷ)= ? c. According to the BP algorithm: calculate the updated values of the weights W1 after 1 iteration. 5. Given the same dataset as Qn5 in HA1: • “on the street” 50 times; • “on the table” 1000 times; • “on the computer” 2000 times; • “standing on the street” 40 times; • “standing on the computer” 1 time; • “standing on the table” 100 times; • “on the sky” 5 times; • “on the water” 10 times; • “on the cloud” 2 times; • “it is magic that the guy smiles standing on the sky” 1 time; • “it is magic that the guy smiles standing on the water” 2 times; • “it is magic that the guy smiles standing on the cloud” 3 times; a. Leverage RNN to perform the same task (build the language model) to guess the next word: I notice three guys standing on the ( ). b. Compare this with the results obtained by probabilistic language model in HA1. c. Leverage the feedforward neural network for the same task, compare the result with that of RNN. d. If we use the corpus plus (shown in “HA1-Question 5 corpus plus.xlsx”), compare the results obtained by language model, feedforward neural network, and RNN. Explain what you find. 6. According to what you have learned about RNN algorithm, do the homework in the slide below. Compare the results obtained by a feedforward neural network and RNN for this task. Explain what you find. (You can leverage a large dataset, such as “shakespeare.txt” to further explain your findings or explain why RNN is better (or worse) than feedforward NN for sequence analysis.) 7. (Optional question) For a three-layer feedforward neural network, how do you determine the number of nodes/neurons in the hidden layer? For the given dataset below, how many neurons in the hidden layer is the best choice? (Hint: general, the following 3 equations can be used: ∑??ℎ ? ? ?=0 > ? (1) where k is the number of samples, ?ℎ is number of nodes/neurons in the hidden layer; ? is the number of inputs; ? ∈ [0, ?]. ?ℎ = √? + ? + ? (2) where ?ℎ is number of nodes/neurons in the hidden layer; ? is the number of input units, ? is the number of output units; ? ∈ [1, 10]. ?ℎ = log2 ? (3) where ?ℎ is number of nodes/neurons in the hidden layer; ? is the number of input units. Table 1: Training dataset: No. x1 x2 x3 x4 x5 x6 x7 x8 y 1 -1.7817 -0.2786 -0.2954 -0.2394 -0.1842 -0.1572 -0.1584 -0.1998 1 2 -1.8710 -0.2957 -0.3494 -0.2904 -0.1460 -0.1387 -0.1492 -0.2228 1 3 -1.8347 -0.2817 -0.3566 -0.3476 -0.1820 -0.1435 -0.1778 -0.1849 1 4 -1.8807 -0.2467 -0.2316 -0.2419 -0.1938 -0.2103 -0.2010 -0.2533 1 5 -1.4151 -0.2282 -0.2124 -0.2147 -0.1271 -0.0680 -0.0872 -0.1684 2 6 -1.2879 -0.2252 -0.2012 -0.1298 -0.0245 -0.0390 -0.0762 -0.1672 2 7 -1.5239 -0.1979 -0.1094 -0.1402 -0.0094 -0.1394 -0.1673 -0.2810 2 8 -1.6781 -0.2047 -0.1180 -0.1532 -0.1732 -0.1716 -0.1851 -0.2006 2 9 0.1605 -0.0920 -0.0160 0.1246 0.1802 0.2087 0.2234 0.1003 3 10 0.2045 0.1078 0.2246 0.203I 0.2428 0.2050 0.0704 0.0403 3 11 -1.0242 -0.1461 -0.1018 -0.0778 -0.0363 -0.0476 -0.0160 -0.0253 3 12 -0.7915 -0.1018 -0.0737 -0.0945 -0.0955 0.0044 0.0467 0.0719 3 Table 2: Test dataset: 13 -1.4736 -0.2845 -3.0724 -0.2108 -0. 190 4 -0.1467 -0.1696 -0.2001 1 14 -1.6002 -0.2011 -0.1021 -0.1394 -0.1001 -0.1572 -0.1584 -0.2790 2 15 -1.0314 -0.1521 -0.1101 -0.0801 -0.0347 -0.0482 -0.0158 -0.0301 3 Part 2: Sequence Labelling and POS Application of POS Tagging to Sentiment Analysis POS tagging is a necessary step for performing sentiment analysis, as the part of speech has a great impact on a word’s sentiment polarity. Design an algorithm in which different parts of speech are assigned different sentiment weights. For example, we assume that adjectives convey the stronger sentiment information than verbs and nouns. So we assign larger sentiment weights to the adjectives. Verbs and nouns may also convey sentiment information from time to time. For example, the verb love and the noun congratulations are often associated with positive sentiment. However, to express the sentiment, we believe adjectives play a much more dominant role than verbs and nouns. Therefore, we will assign smaller sentiment weights to verbs and nouns than adjectives. Similarly, we should assign smaller or zero sentiment weights to determiner and preposition, …. A small dataset “amazon_cells_labelled.csv” can be used as a case study for your homework. 1. Based on what we learnt from our class and what is described above, improve the Naïve Bayes method for sentiment classification and implement it (Python codes are preferred)—(application of POS Tagging to improve Naïve Bayes method for sentiment classificationn).