最近在研究7zip的LZMA SDK,虽然很久以前曾在本站写过LZMA SDK的简单介绍,不过当时只是走马观花地扫了一下,这次由于一个项目需要,不得不仔细研究了一把。不知道7zip的那帮弟兄太忙还是不喜欢写使用手册, 翻遍整个SDK也没找到一份完整的使用说明,只有两个可怜的7zC.txt和lzma.txt可以参考,真是郁闷-_-
准备工作
首先是去http://www.7-zip.org/sdk.html下载lzma sdk包(我下载的是9.11 beta版本),我们需要SDK中的下面这些文件:
- sdk\C\Alloc.c
- sdk\C\LzFind.c
- sdk\C\LzFindMt.c
- sdk\C\LzmaDec.c
- sdk\C\LzmaEnc.c
- sdk\C\Threads.c
- sdk\CPP\7zip\Common\CWrappers.cpp
- sdk\CPP\7zip\Common\FileStreams.cpp
- sdk\CPP\7zip\Common\StreamUtils.cpp
- sdk\CPP\7zip\Compress\LzmaDecoder.cpp
- sdk\CPP\7zip\Compress\LzmaEncoder.cpp
- sdk\CPP\Common\StringConvert.cpp
- sdk\CPP\Windows\FileIO.cpp
当然,还有头文件
我是在Windows下使用的,没在Unix下试过,从头文件定义上看,如果在Unix下,只要把sdk\CPP\Windows\FileIO.cpp
改成sdk\CPP\Common\C_FileIO.cpp
就可以了。
解压LZMA文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| #include <iostream> #include "sdk/CPP/Common/MyInitGuid.h" #include "sdk/CPP/7zip/Compress/LzmaDecoder.h" #include "sdk/CPP/7zip/Common/FileStreams.h" #include "sdk/CPP/Windows/NtCheck.h" int main(int argc, char* argv[]) { NCompress::NLzma::CDecoder dec; CInFileStream InStm; COutFileStream OutStm; if(argc <= 2){ std::cout << "用法:" << std::endl; std::cout << " unLzma LZMA文件名 解压后的文件名" << std::endl; return 0; } if(!InStm.Open(argv[1])) return -1; OutStm.Create(argv[2], true); const UInt32 kPropertiesSize = 5; BYTE properties[kPropertiesSize]; InStm.Read(properties, kPropertiesSize, 0); if(!SUCCEEDED(dec.SetDecoderProperties2(properties, kPropertiesSize))) return -1; UInt64 size = 0; for(int i=0; i<8; i++) { BYTE b; InStm.Read(&b, sizeof(b), NULL); size |= ((UInt64)b) << (8*i); } dec.Code(&InStm, &OutStm, 0, &size, NULL); return 0; }
|
要想正确编译这段代码,必须把本文开头的那堆文件加入到项目或makefile里一起编译。另外,在Windows下,还得链接uuid
库,否则会说找不到IID_IUnknown
。
IID_IUnknown
? 很眼熟?嗯~~搞过COM编程的人一定会感到亲切,这个SDK的C++封装借鉴了COM的思想,到处都充斥着IUnknown
接口,以至于它还顺便实现了一套用于非Windows系统的COM头文件以及COM的智能指针,VARIANT封装等东东。
最后,友情提供一个测试源:到MinGW的 sourceforge网站上http://sourceforge.net/projects/mingw/files/去随便找个小的lzma来测试。当然,也可以直接跳到本文后面的LZMA压缩部分,用它来生成一个LZMA文件。
在内存中解压LZMA
看看NCompress::NLzma::CDecoder
类的Code方法:
1 2 3 4
| STDMETHODIMP CDecoder::Code(ISequentialInStream *inStream, ISequentialOutStream *outStream, const UInt64 * , const UInt64 *outSize, ICompressProgressInfo *progress);
|
我们只要实现ISequentialInStream
和ISequentialOutStream
就可以自定义数据来源了。
下面是我实现的一个内存流,仅演示目的,更好的方案是使用现成的SHCreateMemStream
Windows API:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
| class CMemStream :public IInStream, public IOutStream, public IStreamGetSize, public CMyUnknownImp { public: MY_UNKNOWN_IMP3(IInStream, IOutStream, IStreamGetSize); STDMETHOD(Read)(void *data, UInt32 size, UInt32 *processedSize) { size_t readed = size; if(m_offset > m_data.size()) readed = 0; else { if(m_offset + size > m_data.size()) readed = m_data.size() - m_offset; char *pdata = static_cast<char*>(data); std::copy(m_data.begin()+m_offset, m_data.begin()+m_offset+readed, pdata); m_offset += readed; } if(processedSize) *processedSize = readed; return S_OK; } STDMETHOD(Seek)(Int64 offset, UInt32 seekOrigin, UInt64 *newPosition) { switch(seekOrigin) { case SEEK_END: m_offset = m_data.size(); m_offset = (m_offset>=offset? m_offset-offset: 0); break; case SEEK_CUR: m_offset += offset; break; case SEEK_SET: m_offset = offset; default: break; } if(newPosition) *newPosition = m_offset; return S_OK; } STDMETHOD(GetSize)(UInt64 *size) { if(size) *size = m_data.size(); return S_OK; } STDMETHOD(Write)(const void *data, UInt32 size, UInt32 *processedSize) { if(m_offset + size > m_data.size()) m_data.resize( m_offset + size ); const char *pdata = static_cast<const char*>(data); std::copy(pdata, pdata+size, m_data.begin()+m_offset); m_offset += size; if(processedSize)*processedSize = size; return S_OK; } STDMETHOD(SetSize)(Int64 newSize) { m_data.resize(newSize); return S_OK; } CMemStream() :m_offset(0) { } template <class InputIterator> CMemStream(InputIterator begin, InputIterator end) :m_data(begin, end), m_offset(0) { } virtual ~CMemStream() {} std::vector<char>::iterator begin(){ return m_data.begin(); } std::vector<char>::iterator end(){ return m_data.end(); } std::vector<char>::const_iterator begin() const{ return m_data.begin(); } std::vector<char>::const_iterator end() const{ return m_data.end(); } protected: std::vector<char> m_data; size_t m_offset; };
|
这里的CMyUnknownImp
和MY_UNKNOWN_IMP3
是LZMA SDK“友情提供”的简化COM编程的东东,CMyUnknownImp
里定义了一个整型数据,用于引用记数。宏MY_UNKNOWN_IMP3
则实现了 IUnknwon
的AddRef
,ReleaseRef
和QueryInterface
三大方法,后面的数字3指的是可以放几个参数(即实现了几个接口),还有MY_UNKNOWN_IMP
,MY_UNKNOWN_IMP1
,MY_UNKNOWN_IMP2
…MY_UNKNOWN_IMP5
可供选 择。
下面的程序把LZMA文件解压到我们的内存流中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
| #include <iostream> #include <vector> #include <iterator> #include "sdk/CPP/Common/MyInitGuid.h" #include "sdk/CPP/7zip/Compress/LzmaDecoder.h" #include "sdk/CPP/7zip/Common/FileStreams.h" #include "sdk/CPP/Windows/NtCheck.h"
class CMemStream :public IInStream, public IOutStream, public IStreamGetSize, public CMyUnknownImp { public: ... }; int main(int argc, char* argv[]) { NCompress::NLzma::CDecoder dec; CInFileStream InStm; CMemStream OutStm; if(argc <= 1){ std::cout << "用法:" << std::endl; std::cout << " unLzma LZMA文件名" << std::endl; return 0; } if(!InStm.Open(argv[1])) return -1; const UInt32 kPropertiesSize = 5; BYTE properties[kPropertiesSize]; InStm.Read(properties, kPropertiesSize, 0); if(!SUCCEEDED(dec.SetDecoderProperties2(properties, kPropertiesSize))) return -1; UInt64 size = 0; for(int i=0; i<8; i++) { BYTE b; InStm.Read(&b, sizeof(b), NULL); size |= ((UInt64)b) << (8*i); } dec.Code(&InStm, &OutStm, 0, &size, NULL); std::copy(OutStm.begin(), OutStm.end(), std::ostream_iterator<char>(std::cout)); return 0; }
|
ICompressProgressInfo接口
ICompressProgressInfo
接口用于得到解压、压缩的过程信息,用它来更新进度条比较合适。
下面是我实现的ICompressProgressInfo
接口,只是简单地向控制台输出百分比:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| class CProgressInfoShow : public ICompressProgressInfo, public CMyUnknownImp { public: MY_UNKNOWN_IMP1(ICompressProgressInfo); STDMETHOD(SetRatioInfo)(const UInt64 *inSize, const UInt64 *outSize) { std::cout << *inSize * 100.0 / m_TotalSize << std::endl; return S_OK; } CProgressInfoShow(UInt64 inTotalSize) :m_TotalSize(inTotalSize) {} private: UInt64 m_TotalSize; };
|
修改一下我们前面的代码,在解压前弄一个CProgressInfoShow实例:
1 2 3 4 5
| UInt64 totalsize; InStm.GetSize(&totalsize); CProgressInfoShow pi( totalsize ); dec.Code(&InStm, &OutStm, 0, &size, &pi);
|
压缩LZMA数据
压缩过程和解压相差不大,只是这次用的是NCompress::NLzma::CEncoder
。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| #include <iostream> #include "sdk/CPP/Common/MyInitGuid.h" #include "sdk/CPP/7zip/Compress/LzmaEncoder.h" #include "sdk/CPP/7zip/Common/FileStreams.h" #include "sdk/CPP/Windows/NtCheck.h" int main(int argc, char* argv[]) { NCompress::NLzma::CEncoder enc; CInFileStream InStm; COutFileStream OutStm; if(argc <= 2){ std::cout << "用法:" << std::endl; std::cout << " Lzma 源文件名 输出文件名" << std::endl; return 0; } if(!InStm.Open(argv[1])) return -1; OutStm.Create(argv[2], true); enc.SetCoderProperties(NULL,NULL,0); enc.WriteCoderProperties(&OutStm); UInt64 size; InStm.GetSize(&size); for(int i=0; i<8; i++) { BYTE b = BYTE(size >> (8*i)); OutStm.Write(&b, sizeof(b), NULL); } enc.Code(&InStm, &OutStm, NULL, NULL, NULL); return 0; }
|
如果你装了7-zip的话,可以打开我们压缩的lzma文件。
根据这段代码,可以对LZMA文件格式有个大致的了解: 压缩参数(5字节) + 源大小(8字节) + 压缩数据
。头上的五个字节保存了LZMA压缩相关的参数,比如字典大小、压缩模式、压缩线程等。本例中的enc.SetCoderProperties(NULL,NULL,0);
就是用来设置这些参数的,它的原型是:
1 2
| STDMETHODIMP CEncoder::SetCoderProperties(const PROPID *propIDs, const PROPVARIANT *coderProps, UInt32 numProps)
|
- propIDs 是PROPID数组,这个数组中的每个PROPID指定一种LZMA压缩参数。
- coderProps 是PROPVARIANT数组,保存对应PROPID的LZMA压缩参数数值。
- numProps 指出数组大小。
对于LZMA算法来说,可以接受的PROPID(NCoderPropID名空间下)见下表:
PROPID |
描述 |
类型 |
取值范围 |
默认值 |
kDictionarySize |
字典大小 |
VT_UI4 |
2^12 ~ 2^30 |
2^24 (16M) |
kAlgorithm |
压缩模式 |
VT_UI4 |
0或1 |
1(最大压缩) |
kEndMarker |
是否有流结束标记 |
VT_BOOL |
VARIANT_TRUE VARIANT_FALSE |
VARIANT_FALSE |
kNumThreads |
工作线程数 |
VT_UI4 |
依系统而定 |
依系统而定 |
kPosStateBits |
set number of pos bits |
VT_UI4 |
0~4 |
2 |
kLitContextBits |
set number of literal context bits |
VT_UI4 |
0~8 |
3 |
kLitPosBits |
set number of literal pos bits |
VT_UI4 |
0~4 |
0 |
kNumFastBytes |
set number of fast bytes |
VT_UI4 |
5~273 |
128 |
kMatchFinder |
set Match Finder |
VT_BSTR |
bt2, bt3, bt4, hc4 |
bt4 |
kMatchFinderCycles |
set number of cycles for match finder |
VT_UI4 |
|
|
注:线程数量可以依据NWindows::NSystem::GetNumberOfProcessors()
取得的CPU内核数确定,后的6项是英文描述,因为在下实在不知道它们的具体含义,惭愧~~
下面的例子设置了二个线程并行压缩、加入流结束标记、字典大小为1M
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| int main(int argc, char* argv[]) { NCompress::NLzma::CEncoder enc; CInFileStream InStm; COutFileStream OutStm; if(argc <= 2){ std::cout << "用法:" << std::endl; std::cout << " Lzma 源文件名 输出文件名" << std::endl; return 0; } if(!InStm.Open(argv[1])) return -1; OutStm.Create(argv[2], true); PROPID propIDs[] = { NCoderPropID::kDictionarySize, NCoderPropID::kEndMarker, NCoderPropID::kNumThreads }; PROPVARIANT props[3]; props[0].vt = VT_UI4; props[0].ulVal = 1<<20; props[1].vt = VT_BOOL; props[1].boolVal = VARIANT_TRUE; props[2].vt = VT_UI4; props[2].ulVal = 2; enc.SetCoderProperties(propIDs,props,3); enc.WriteCoderProperties(&OutStm); UInt64 size = UInt64(-1); OutStm.Write(&size, 8, NULL); enc.Code(&InStm, &OutStm, NULL, NULL, NULL); return 0; }
|
其它压缩算法
打开SDK文件夹的CPP\7zip\Compress
目录,会发现除了Lzma的算法以外,还有Lzma2和Ppmd算法。想试试 吗?它们的C++接口差不多,只要把上面的例子小小的修改一下就可以了。
下面的例子使用Lzma2算法压缩一个文件,只需改一个头文件和一个名空间
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| #include <iostream> #include "sdk/CPP/Common/MyInitGuid.h" #include "sdk/CPP/7zip/Compress/Lzma2Encoder.h" #include "sdk/CPP/7zip/Common/FileStreams.h" #include "sdk/CPP/Windows/NtCheck.h" int main(int argc, char* argv[]) { NCompress::NLzma2::CEncoder enc; CInFileStream InStm; COutFileStream OutStm; if(argc <= 2){ std::cout << "用法:" << std::endl; std::cout << " Lzma2 源文件名 输出文件名" << std::endl; return 0; } if(!InStm.Open(argv[1])) return -1; OutStm.Create(argv[2], true); enc.SetCoderProperties(NULL,NULL,0); enc.WriteCoderProperties(&OutStm); UInt64 size; InStm.GetSize(&size); for(int i=0; i<8; i++) { BYTE b = BYTE(size >> (8*i)); OutStm.Write(&b, sizeof(b), NULL); } enc.Code(&InStm, &OutStm, NULL, NULL, NULL); return 0; }
|
要编译通过,除了项目中加入本文开头的那些文件以外,还要加入CPP\7zip\Compress\Lzma2Encoder.cpp
和C\MtCoder.c
文件。