Water and Jug Problem

2016-09-22

Algorithm

LeetCode

Problem

You are given two jugs with capacities x and y litres. There is an infinite amount of water supply available. You need to determine whether it is possible to measure exactly z litres using these two jugs.

If z liters of water is measurable, you must have z liters of water contained within one or both buckets by the end.

Operations allowed:

Fill any of the jugs completely with water.
Empty any of the jugs.
Pour water from one jug into another till the other jug is completely full or the first jug itself is empty.

Example 1: (From the famous "Die Hard" example)

Input: x = 3, y = 5, z = 4
Output: True
Example 2:

Input: x = 2, y = 6, z = 5
Output: False

Solution

贝祖等式（Bézout’s identity / 定理）：
对任何整數 a、 b和它们的最大公约数 d，关于未知数 x 和 y 的线性方程（称为贝祖等式）：

ax + by = m

有整数解时当且仅当m是d的倍数。

所以，只要x、y的最大公约数d能整除m，并且x+y<=m，那么存在a、b满足 ax + by = m

public boolean canMeasureWater(int x, int y, int z) {
    if(x + y < z) return false;
    // 出现x=0或者y=0、x + y == z
    if( x == z || y == z || x + y == z ) return true;
    // 利用贝祖定理，求最大公约数
    return z%GCD(x, y) == 0;
}

public int GCD(int a, int b){
    while(b != 0 ){
        int temp = b;
        b = a%b;
        a = temp;
    }
    return a;
}

STL Heap

2016-09-19

Algorithm

Cpp, STL

堆

make_heap


void make_heap (RandomAccessIterator first, RandomAccessIterator last, Compare comp );



make_heap (v.begin(),v.end());

重新组织[first, last)，使之成为一个堆（默认为大顶堆）。

is_heap


bool is_heap (RandomAccessIterator first, RandomAccessIterator last, Compare comp);



is_heap(foo.begin(), foo.end());

检查[first, last)范围内的元素是否满足堆的定义

is_heap_until


RandomAccessIterator is_heap_until (RandomAccessIterator first, RandomAccessIterator last, Compare comp);



auto last = std::is_heap_until (foo.begin(),foo.end());

返回[first, last)范围内第一个不满足heap定义的迭代器。

pop_heap & push_heap


void pop_heap (RandomAccessIterator first, RandomAccessIterator last, Compare comp);



pop_heap (v.begin(), v.end());

v.pop_back();

pop, push完了之后，重新调整，pop默认最大值。

sort_heap

1 2	void sort_heap (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

排序，heap的属性可能会丢失

排序

1 2	#include <algorithm>

|:———:|:————-:|:———:|:————-:|

sort


void sort (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

sort (myvector.begin(), myvector.end());

将[first, last)范围内元素排序，排序方式由comp决定，comp是自定义函数，可以使用lambda表达式替代。


/* lambda表达式形式如下

capture：指定了在可见域范围内 lambda 表达式的代码内可见得外部变量的列表，可能取值如下：

    []             不截取任何变量

    [&]         截取外部作用域中所有变量，并作为引用在函数体中使用

    [=]         截取外部作用域中所有变量，并拷贝一份在函数体中使用

    [=, &foo]     截取外部作用域中所有变量，并拷贝一份在函数体中使用，但是对foo变量使用引用

    [bar]         截取bar变量并且拷贝一份在函数体重使用，同时不截取其他变量

    [x, &y]     x按值传递，y按引用传递

    [this]         截取当前类中的this指针。如果已经使用了&或者=就默认添加此选项

params：  指定 lambda 表达式的参数

ret:  返回类型

body: 函数体

*/



[ capture ] ( params ) -> ret { body }

sort (envelopes.begin(), envelopes.end(), [](const pair<int, int> &a, const pair<int, int> &b){

    if (a.first == b.first) return a.second > b.second;

    return a.first < b.first;

});

stable_sort

1 2	void stable_sort ( RandomAccessIterator first, RandomAccessIterator last, Compare comp );

函数用法与sort一致，它的排序结果是稳定的。

partial_sort

1 2	void partial_sort (RandomAccessIterator first, RandomAccessIterator middle, RandomAccessIterator last, Compare comp);

对[first, middle)范围的元素进行排序。

partial_sort_copy

1 2	void partial_sort_copy (InputIterator first,InputIterator last,RandomAccessIterator result_first,RandomAccessIterator result_last, Compare comp);

对[first, last)范围的元素进行排序，并将排序结果复制到[result_first, result_last)

is_sorted

1 2	bool is_sorted (ForwardIterator first, ForwardIterator last, Compare comp);

判断[first, last)范围的元素是否有序。

nth_element

1 2	void nth_element (RandomAccessIterator first, RandomAccessIterator nth, RandomAccessIterator last, Compare comp);

重新组织[first, last)范围的元素，使第n大元素处于第n位置，并且比这个元素小的元素都排在这个元素之前，比这个元素大的元素都排在这个元素之后，但不能保证他们是有序的

partition

1 2	BidirectionalIterator partition (BidirectionalIterator first, BidirectionalIterator last, UnaryPredicate pred);

将区间[first,last)中的元素重新排列，满足判断条件pred的元素会被放在区间的前段，不满足pred的元素会被放在区间的后段。该算法不能保证元素的初始相对位置，返回指向第二类第一个元素的迭代器。

用法：


bool IsOdd (int i) { //用语判断奇数

    return (i%2) == 1; //奇数返回true，偶数返回0

}



int main(void)

{

    std::vector<int> ivec;

    for (int i=1; i<10; ++i)

        ivec.push_back(i); // 1 2 3 4 5 6 7 8 9

    auto bound = partition (ivec.begin(), ivec.end(), IsOdd);

    std::cout << "odd elements:";

    for (auto it=ivec.begin(); it!=bound; ++it)

        std::cout << ' ' << *it;       //输出1,9,3,7,5

    std::cout << "even elements:";

    for (auto it=bound; it!=ivec.end(); ++it)

        std::cout << ' ' << *it;     //输出6,4,8,2

    return 0;

}

stable_partition

partition的稳定版本。

sort_heap

1 2	void sort_heap (RandomAccessIterator first, RandomAccessIterator last, Compare comp);

用法和上面的函数一致。

lower_bound & upper_bound


ForwardIter lower_bound(ForwardIter first, ForwardIter last, const _Tp& val);

ForwardIter upper_bound(ForwardIter first, ForwardIter last, const _Tp& val);

lower_bound返回一个非递减序列[first, last)中的第一个大于等于值val的位置.

upper_bound返回一个非递减序列[first, last)中第一个大于val的位置

这两个函数都是用二分查找方法实现的，复杂度为O(logn)

集合set

set作为一个容器也是用来存储同一数据类型的数据类型，并且能从一个数据集合中取出数据，在set中每个元素的值都唯一，而且元素的值是有序的。set中数元素的值不能直接被改变。C++ STL中标准关联容器set, multiset, map, multimap内部采用RB树(插入元素迭代器不会失效，删除元素时除了被删除的那个以外的迭代器不会失效)。

其包含的方法包括

begin(), end(), rbegin(), rend()
clear(), empty(), find(), insert(key_value)
inset(first,second);将迭代器first到second之间的元素插入到set中，返回值是void
size(), max_size()
equal_range()，返回一对迭代器，分别表示第一个大于或等于给定关键值的元素迭代器以及第一个大于给定关键值的元素的迭代器，这个返回值是一个pair类型，如果这一对定位器中哪个返回失败，就会等于end()的值。
erase(iterator) ，删除迭代器iterator指向的值
erase(first,second)，删除迭代器first和second之间的值
erase(key_value)，删除键值key_value的值
lower_bound(key):返回第一个大于等于key的迭代器指针
upper_bound(key):返回第一个大于key的迭代器指针

链表list

c++中的list容器是双向链表。


#include < list > 

list<int> new_list;



assign(iter1, iter2) 将两个迭代器之间的元素赋值给当前list

back() 返回最后一个元素

begin() 返回指向第一个元素的迭代器

clear() 删除所有元素

empty() 如果list是空的则返回true

end() 返回末尾的迭代器

erase() 删除一个元素 

front() 返回第一个元素 

get_allocator() 返回list的配置器 

insert(const_iterator position, const value_type& val) 插入一个元素到list中 

max_size() 返回list能容纳的最大元素数量 

merge(list& x, Compare comp) 合并当前list和x

pop_back() 删除最后一个元素 

pop_front() 删除第一个元素 

push_back() 在list的末尾添加一个元素 

push_front() 在list的头部添加一个元素 

rbegin() 返回指向第一个元素的逆向迭代器 

remove(const value_type& val) 从list删除元素元素val

remove_if(Predicate pred) 按指定条件删除所有满足条件的元素 

rend() 指向list末尾的逆向迭代器 

resize(size_type n, const value_type& val) 改变list的大小，val是增加空间用的填充值

reverse() 把list的元素倒转 

size() 返回list中的元素个数 

sort(Compare comp) 给list排序，复杂度大约是NlogN，可以用归并

splice() 合并两个list 

swap(list& x) 交换当前list和x

unique() 删除list中重复的元素

遍历使用迭代器

forward_list

forward_list是单向链表

deque

deque容器类与vector类似，支持随机访问和快速插入删除，它在容器中某一位置上的操作所花费的是线性时间。与vector不同的是，deque还支持从开始端插入数据：push_front()。其余类似vector操作方法的使用。

随机访问方便，即支持[ ] 操作符和vector.at() ，但性能没有vector 好；
可以在内部进行插入和删除操作，但性能不及list ；
可以在两端进行push 、pop ；
相对于verctor 占用更多的内存。

priority_queue

将任意类型的序列容器转换为一个优先级队列，一般使用vector作为底层存储方式。

只能访问第一个元素，不能遍历整个priority_queue。

模板声明带有三个参数，priority_queue<Type, Container, Functional>

其中Type 为数据类型， Container 为保存数据的容器，Functional 为元素比较方式。

Container 必须是用数组实现的容器，比如 vector, deque 但不能用list.

STL里面容器默认用的是 vector. 比较方式默认用 operator <

缺省时优先队列就是大顶堆，队头元素优先级最大。


#include <queue>



priority_queue <Elem> c;    // 创建一个空的queue

c.top();     // 返回队列头部数据

c.push(elem);     // 在队列尾部增加elem数据

c.pop();     // 队列头部数据出队

c.empty();

c.size();

First Missing Positive

2016-09-14

OJ

LeetCode

Problem

Given an unsorted integer array, find the first missing positive integer.

For example,
Given [1,2,0] return 3,
and [3,4,-1,1] return 2.

Your algorithm should run in O(n) time and uses constant space.

Solution

解题思路

把每个数放在它应该的位置，比如找到4，就把4和num[3]兑换
小于1的数忽略，因为题目说是正数
大于数组长度的数也不可能是，因为它前面至少有n个数，至少有一个缺失
第一遍扫面把数放在正确的地方，第二遍，把num[i]!=i+1的最小序号输出

class Solution
{
public:
    int firstMissingPositive(int A[], int n) {
        // 将数放在正确的地方
        for(int i = 0; i < n; ++ i)
            while(A[i] > 0 && A[i] <= n && A[A[i] - 1] != A[i]) swap(A[i], A[A[i] - 1]);
        // 找到缺失的数
        for(int i = 0; i < n; ++ i)
            if(A[i] != i + 1) return i + 1;

        return n + 1;
    }
};

Copy List with Random Pointer

2016-09-12

OJ

LeetCode

Copy List with Random Pointer

A linked list is given such that each node contains an additional random pointer which could point to any node in the list or null.

Return a deep copy of the list.
节点的数据结构如下：

class RandomListNode {
    int label;
    RandomListNode next, random;
    RandomListNode(int x) { this.label = x;}
};

Solution

本题的naive思想如下：

在原来list上，每个节点后面插入一个拷贝这个节点的节点
从第一个节点开始，若：nodei->random = nodek, 那么设置nodei->next.random = nodek->next即可
还原原来的list，提取复制的list

public RandomListNode copyRandomList(RandomListNode head) {
    RandomListNode iter = head, next;

    // 在原链表上每个节点后一位创建copy
    while (iter != null) {
        next = iter.next;

        RandomListNode copy = new RandomListNode(iter.label);
        iter.next = copy;
        copy.next = next;

        iter = next;
    }

    // random指针设置
    iter = head;
    while (iter != null) {
        if (iter.random != null) {
            iter.next.random = iter.random.next;
        }
        iter = iter.next.next;
    }

    // 还原原来的list，提取复制的list
    iter = head;
    RandomListNode pseudoHead = new RandomListNode(0);
    RandomListNode copy, copyIter = pseudoHead;

    while (iter != null) {

        // extract the copy
        copy = iter.next;
        copyIter.next = copy;
        copyIter = copy;

        // restore the original list
        next = iter.next.next;
        iter.next = next;

        iter = next;
    }

    return pseudoHead.next;
}

Reservoir Sampling

2016-09-12

Algorithm

LeetCode, random algorithm

算法介绍

Reservoir Sampling（水塘抽样）是一系列的随机算法，其目的在于从包含n个项目的集合S中选取k个样本，其中n为一很大或未知的数量，尤其适用于不能把所有n个项目都存放到主内存的情况。
算法思路如下：

从S中抽取首k项放入「水塘」中
对于每一个S[j]项（j ≥ k）：
    随机产生一个范围从0到j的整數r
    若 r < k 则把水塘中的第r项换成S[j]项

KMP

2016-09-12

Algorithm

String

KMP算法是是一种在线性时间内对字符串进行匹配的经典算法.
待匹配的字符串是s，模式串为p

匹配原理

算法的出发点很直接，如上图所示，当匹配过程中出现不匹配时，不是简单地将模式串向右移动一位再从头匹配。而是利用匹配中的信息，比如上图，应该直接将模式串移动4位，到达下图的位置：

那么算法如何确定应该移动几位呢？这样的信息其实隐藏在模式串p中。我们把这些信息放在数组中，称之为next数组。（后面再讨论next数组的求法）

例子中模式串p的next数组位
A B C D A B D
0 0 0 0 1 2 0

所以当第k位匹配失败时，模式串向前移动的位数满足：

移动位数 = 已匹配的字符数 - next[k-1]

next数组

next数组就是”前缀”和”后缀”的最长的共有元素的长度
next[i]表示的是p的子串p[0,i]的”前缀”和”后缀”的最长的共有元素的长度
next数组的计算方式依旧采用滑动匹配的方式，并且在计算next[j]时需要使用已经计算过的next值

首先p的第一个字符的最大相同前后缀长度为0
设next[q-1]=k，对于第q个
- 如果：p[q]==p[k]，那么，next[q] = ++k，break;
- 否则：因为此时p[q]和p[k]已经不匹配了，所以我们需要将匹配串右移next[k-1]=j位，此时p[0,j-1]和p[q-j,q-1]已经匹配上了，所以此时要看p[j]==p[q]?，所以回到第2步；

实现

#include<stdio.h>
#include<string.h>
void makeNext(const char P[],int next[]) {
    int q,k;
    int m = strlen(P);
    next[0] = 0;
    for (q = 1,k = 0; q < m; ++q) {
        // k=0时，无论P[q]和P[k]是否匹配，next[q]都next[k-1]值无关
        // 不匹配时，根据next[k-1]=j右移，并且看P[q]和P[j]的匹配情况
        while(k > 0 && P[q] != P[k])
            k = next[k-1];
        if (P[q] == P[k]) { // next值增长一
            k++;
        }
        next[q] = k;
    }
}

int kmp(const char T[],const char P[],int next[]) {
    int n,m;
    int i,q; // i指示T的索引，q指示P的索引
    n = strlen(T);
    m = strlen(P);
    makeNext(P,next);
    for (i = 0,q = 0; i < n; ++i) {
        // 这里的原理和上面一致
        while(q > 0 && P[q] != T[i])
            q = next[q-1];
        if (P[q] == T[i]) {
            q++;
        }
        if (q == m) { // 找到一个匹配
            printf("Pattern occurs with shift:%d\n",(i-m+1));
            // count++; // 计算模式串出现的次数
            q = next[q-1]; // 找到一个匹配后，寻找下一个匹配，移动p
        }
    }
}

int main() {
    int i;
    int next[20]={0};
    char T[] = "ababxbababcadfdsss";
    char P[] = "abcdabd";
    printf("%s\n",T);
    printf("%s\n",P );
    // makeNext(P,next);
    kmp(T,P,next);
    for (i = 0; i < strlen(P); ++i) {
        printf("%d ",next[i]);
    }
    printf("\n");

    return 0;
}

Word Search

2016-09-11

OJ

DFS, LeetCode

Word Search

Given a 2D board and a word, find if the word exists in the grid.

The word can be constructed from letters of sequentially adjacent cell, where “adjacent” cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

For example,
Given board =

[
  ['A','B','C','E'],
  ['S','F','C','S'],
  ['A','D','E','E']
]
word = "ABCCED", -> returns true,
word = "SEE", -> returns true,
word = "ABCB", -> returns false.

解题思路

典型的DFS题
每到一个点，将其做标记（可以通过异或操作减少空间浪费），然后继续递归，递归完成后，把这个点的值还原

Word Search

在上一题的基础上，给定一个单词集合words，找到words中的能由board构建的全部单词。对每个单词，假设board的每个格子只能用1次。

For example,
Given words = ["oath","pea","eat","rain"] and board =

[
  ['o','a','a','n'],
  ['e','t','a','e'],
  ['i','h','k','r'],
  ['i','f','l','v']
]
Return ["eat","oath"].

解题思路

还是要用dfs的思想
关于words的形式，trie树显然是最natural的
trie树节点包含26个指针，指向其可能存在的子节点，还包含一个string属性，此属性只在根节点上有用，表示以这个根结点结尾的word
对board的每一个节点，调用dfs算法，dfs算法根据trie树经行搜索。遇到根结点的话，将对应的结果存储
结果对重复的word并不计算，所以要考虑去重

public List<String> findWords(char[][] board, String[] words) {
    List<String> res = new ArrayList<>();
    TrieNode root = buildTrie(words);
    for(int i = 0; i < board.length; i++) {
        for(int j = 0; j < board[0].length; j++) {
            dfs(board, i, j, root, res);
        }
    }
    return res;
}

public void dfs(char[][] board, int i, int j, TrieNode p, List<String> res) {
    char c = board[i][j];
    if(c == '#' || p.next[c - 'a'] == null) return;
    p = p.next[c - 'a'];
    if (p.word != null) {   // 叶节点
        res.add(p.word);
        p.word = null;     // 记录这个单词一次就够了，以后不需要记录它
    }

    board[i][j] = '#';
    if(i > 0) dfs(board, i - 1, j ,p, res);
    if(j > 0) dfs(board, i, j - 1, p, res);
    if(i < board.length - 1) dfs(board, i + 1, j, p, res);
    if(j < board[0].length - 1) dfs(board, i, j + 1, p, res); 
    board[i][j] = c;
}

public TrieNode buildTrie(String[] words) {
    TrieNode root = new TrieNode();
    for(String w : words) {
        TrieNode p = root;
        for(char c : w.toCharArray()) {
            int i = c - 'a';
            if(p.next[i] == null) p.next[i] = new TrieNode();
            p = p.next[i];
       }
       p.word = w; // 叶节点的word属性值为这个单词
    }
    return root;
}

// trie树的定义
class TrieNode {
    TrieNode[] next = new TrieNode[26];
    String word;
}

Word Break

2016-09-11

OJ

Backtracking, DP, LeetCode

Word Break

给定一些单词集合dic，问给定字符串s是否可以拆分成这些单词。
思路如下：

使用DP思想，f[i]表示s.substring(0,i)是否可以拆分
f[i] |= (f[k] && s.substring(k,i) in dic)对于所有的k in [0,i]

Word Break II

在上题的基础上，要求给出所有可能的拆分，单词之间用空格隔开

For example, given
s = "catsanddog",
dict = ["cat", "cats", "and", "sand", "dog"].

A solution is ["cats and dog", "cat sand dog"].

解题思路：

由于需要给出所有的结果，上面的DP就不行了
解法是使用递归求解，对于已经处理过的子串，要用哈希表保存其结果，再次遇到可以直接使用
代码如下：

public class Solution {
    HashMap<String, List<String> >map = new HashMap<String, List<String> >();
    public List<String> wordBreak(String s, Set<String> dict) {
        if (map.containsKey(s) == true) // 已经处理过的子串
            return map.get(s);
        List<String> ret = new ArrayList<String>();
        for (int i=1; i<=s.length(); i++) {
            String tmp1 = s.substring(0, i), tmp2 = s.substring(i);
            if (dict.contains(tmp1) == true) { // 第一段是word的话
                if (tmp2.length() == 0)
                    ret.add(tmp1);
                else {
                    List<String> l = wordBreak(tmp2, dict); // 获取第二段的组成方式
                    for (String str:l)
                        ret.add(tmp1+" "+str); //拼接
                }
            } // 否则进入下一次循环
        }
        map.put(s, ret);
        return ret;
    }
}

Subsets

2016-09-10

OJ

LeetCode, bit

Problem

Given a set of distinct integers, nums, return all possible subsets.

Note: The solution set must not contain duplicate subsets. <无重复假设>

For example,
If nums = [1,2,3], a solution is:

[
  [3],
  [1],
  [2],
  [1,2,3],
  [1,3],
  [2,3],
  [1,2],
  []
]

Solution

递归的做法很明显，不赘述了。
这里介绍使用bit思想的方法

一个长度为n的无重复数组，其子集的个数为2^n个，每一个元素可能出现或者不出现。
所以我们从0到2^n-1，逐个递增。对每个数，按位位移然后&1就可以得到每一位的值，然后根据每一位是0还是1确定对应的数组元素是否加入。这种方法的优势在于不用递归调用栈。代码略！

下面的代码是这样的思想

先生成2^n个空的集合
每一个元素都会在2^(n-1)个子集中出现
第一个元素每隔一个集合出现一次，第二个元素每4个元素出现连续2次，第三个元素每8个元素出现连续4次
所以第j个集合中，第i个元素是否出现可以由 j >> i & 1 决定

下面是例子[1,2,3]
[], [], [], [], [], [], [], []

[], [1], [], [1], [], [1], [], [1]

[], [1], [2], [1, 2], [], [1], [2], [1, 2]

[], [1], [2], [1, 2], [3], [1, 3], [2, 3], [1, 2, 3]

class Solution {
public:
    vector<vector<int> > subsets(vector<int> &S) {
        sort (S.begin(), S.end());
        int elem_num = S.size();
        int subset_num = pow (2, elem_num);
        vector<vector<int> > subset_set (subset_num, vector<int>());
        for (int i = 0; i < elem_num; i++)
            for (int j = 0; j < subset_num; j++)
                if ((j >> i) & 1)
                    subset_set[j].push_back (S[i]);
        return subset_set;
    }
}

数组允许重复的情况

在数组允许重复时，使用递归做的话，可以先排序，然后在遇到num[i] == num[i-1]时，直接跳过去就好

public class Solution {
    public List<List<Integer>> subsetsWithDup(int[] nums) {
        Arrays.sort(nums);
        List<List<Integer>> res = new ArrayList<>();
        List<Integer> each = new ArrayList<>();
        helper(res, each, 0, nums);
        return res;
    }
    public void helper(List<List<Integer>> res, List<Integer> each, int pos, int[] n) {
        if (pos <= n.length) {
            res.add(each);
        }
        int i = pos;
        while (i < n.length) {
            each.add(n[i]);
            helper(res, new ArrayList<>(each), i + 1, n);
            each.remove(each.size() - 1);
            while (i+1 < n.length && n[i] == n[i + 1]) i++;
        }
        return;
    }
}

迭代算法

采用逐个插入元素的思想，对每一个新元素，将其插入到每个已存在的子集中
如果当前元素是重复元素，那么只能把它插入到上一次插入位置的后面
比如说1,2,3,3，在插入最后一个3时，不能在[1],[2],[]这些集合中插入，因为第一个3插入时已经找到了这些子集，只能在第一个包含3的集合开始，所以需要插入第二个3的集合有[3],[1,3],[2,3],[1,2,3]。如果有3个3也是一样，第三个3只能插进包含2个3的子集。

vector<vector<int> > subsetsWithDup(vector<int> &S) {
    sort(S.begin(), S.end());
    // 包含空集的集合
    vector<vector<int>> ret = {{}};
    int size = 0, startIndex = 0;
    for (int i = 0; i < S.size(); i++) {
        // startIndex从上一次开始插入的地方开始，不是重复元素则从0开始
        startIndex = i >= 1 && S[i] == S[i - 1] ? size : 0;
        size = ret.size();
        for (int j = startIndex; j < size; j++) {
            vector<int> temp = ret[j];
            temp.push_back(S[i]);
            ret.push_back(temp);
        }
    }
    return ret;
}

Repeated DNA Sequences

2016-09-10

OJ

LeetCode, Sliding window

Problem

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution

滑动窗口和哈希表的思想很明显，不过直接将子串作为key放入哈希表中会超出内存限制
所以，关键是要减少内存使用量，一个可行的办法如下：

由于A,C,G,T这四个字母的二进制值的最后三位是不一样的，所以他们&111的值也是不同的，也就是说三位二进制就能区分出这四个字母
因为window的长度是10，所以表示长度为10的子串需要30位的长度，int型长度是32位(前两位需要设置为0，可以&3fffffff达到效果)，满足条件
随着窗口移动，左边的退出，右边的加入，我们使用的int数可以每次向左移动三位

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> ret = new ArrayList<String>();
        int res=0;
        HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();
        for (int i=0; i<s.length(); i++) {
            // 3 bits can indicates one letter. so we total need 30 bits to represent a 10-letter string
            // while int is 32 bits long, so, &0x3FFFFFFF helps set the first 2 bits to 0.
            // res<<3 : move the header.
            // s.charAt(i)&7 : add a new letter.
            res = res << 3 & 0x3FFFFFFF | (s.charAt(i) & 7);   //get rid of the header and add the tailer.
            if (map.containsKey(res)==true) {
                if (map.get(res) == 1)
                    ret.add(s.substring(i-9, i+1));
                map.put(res, map.get(res)+1);
            } else
                map.put(res, 1);
        }
        return ret;
    }
}

更加节省空间的方法也有，区分四个数其实只需要2bit。先看下ACGT的二进制后三位

A : 001
C : 011
G : 111
T : 100

所以，对于一个字母先&100，再&010即可区分这四个数。（区分较难也没关系，可以写个函数搞定）

Problem

Solution

堆

make_heap

is_heap

is_heap_until

pop_heap & push_heap

sort_heap

排序

sort

stable_sort

partial_sort

partial_sort_copy

is_sorted

nth_element

partition

stable_partition

sort_heap

lower_bound & upper_bound

集合set

链表list

forward_list

deque

priority_queue

Problem

Solution

Copy List with Random Pointer

Solution

算法介绍

相关问题

可否在一未知大小的集合中，随机取出一元素？

在一个长度很大但未知的链表中，如何最高效地取出k个元素？

Random Pick Index

匹配原理

next数组

实现

Word Search

Word Search

Word Break

Word Break II

Problem

Solution

数组允许重复的情况

Problem

Solution