Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Please zip all your files for Homework 2, including the scripts, input files and output files if any, into a
single file called YourLastname_Firstname_HW2.zip (or .rar).
NOTE 1: You will need to add necessary comments in your program to explain your code. Examples of
commenting can be found in the textbook.
NOTE2: Test your program with various test cases to ensure that it works properly.
1. Unknown Letters
Write a program to list which letters in the file seqs.txt are not A, T, C, or G. It should only list
each letter once. Hint: Start with an empty list for unknown letters. Then use two loops to scan
letters in each sequences.
2. Sequence Properties
Write a program, 1) read all sequences in seqs.txt and store them into a list called seqs, 2)
prompt the user a menu for selection of various properties of the seuqences, and 3) show the
corresponding results based on user’s choice. The menu for selection should include:
1) Number of sequences in the input file
2) Number of occurrences of a specific sequence, e.g. GGATC (The program will prompt
another message to the user for the target sequence.)
3) Number of sequences that are longer than a particular length, e.g. 1000 bases (The
program will ask the user again for the minimum length.)
4) Number of sequences with GC content higher than a given value, e.g. 50% (The GC
content could be calculated as (num_of_G + num_of_C) / seq_total_len )
5) The combination of choices 3 and 4: Number of sequences longer than a particular
length and with GC content over a particular value
In your program, there should be separate functions for the analysis in options 1 to 4. Your
program should work like this:
Please select the sequences property that you want to display, or press 0 to
exit the program.
1) Total number of sequences
2) Number of pattern occurrences
3) Number of sequences with length >= min_len
4) Number of sequences with GC% >= min_GC
5) Number of sequences with length >= min_len and GC% >= min_GC
Enter the choice: 4
Enter the minimum GC content (min_GC): 50
Calculating …
There are 36 sequences with GC% >= 50%.
==
Please select the sequences property that you want to display, or press 0 to
exit the program.
GNBF5010 Homework 2
1) Total number of sequences
2) Number of pattern occurrences
3) Number of sequences with length >= min_len
4) Number of sequences with GC% >= min_GC
5) Number of sequences with length >= min_len and GC% >= min_GC