The Principle and Application of Protocol Buffer
LU Zhong-yangGUO Zhen-bo
(College of Information Engineering,Qingdao University,Qingdao Shandong 266071,China)
【Abstract】This paper mainly introduces the principle of protobuf and considers its compression efficiency and the speed of data parsing. The simulation results reveal that, compared with the xml and json, protobuf has the samllest size of serialized file and the fastest speed of parsing. Therefore, we get a conclusion that protobuf is more suitable for some applications requiring high performance.
【Key words】Protobuf;Serialization;Parsing speed
1The introduction of protobuf
Protocol buffer(protobuf) is an efficient and quick serialization frame provided by Google[1]. Through the proto compiler, users can compile the. proto file to generate the corresponding class, and then call the corresponding methods in the class to serialize and deserialize the message. Java, c++ and python are three kinds of language provided by Google, which contains the compiler and library files of corresponding language. Protobuf is a serialization tool with a higer data exchange efficiency compared with xml and json[2] and the contents of serialized data transfer format is binary. Besides, protobuf can be used for data exchange between heterogeneous environment, distributed application of data communication and data storage and configuration files and other fields.
Protobuf is a kind of protocol maily used for object serialization and binary coding. Since the underlying communication protocol is general binary format, ptotobuf has the advantages of small occupied space in the communication network and high analytical speed and etc. The basic principle of protobuf in socket communicationis: protobuf packs the transfering data according to the defined format of message and the serilizes the message before sending data; then it deserilizes the packages at the receiving side.
2Design a protobuf file
Generally, writing a protobuf application requires the following steps:
(1)Defining message format and using proto as a suffix
The definition of the message format is very simple, and each field has a tag, modifier(repeated/required/optional) and field type (bool/bytes/string/int32/long)[3]. The value of the required field must be initialized, otherwise it would make mistakes that serialization and deserialization are compiled under the library of Debug mode. For optional field, it would be assigned a default value if not initialized. Repeated field can be repeated use, such as we can employ repeated field to define that a person with multiple mobile phones. The function of tag is to identify the positions of the fields in the binary stream. The format of proto file is defined as below:
message user {
required string name = 1;
required int32 age = 2;
};
(2)Generating code files with the protocol buffers compiler
Code files are automatically generated by the proto compiler[4], which describes the message format n a specific way of language. The defined proto file is chosen as the input file of the proto compiler. The result of compiling is the code file with a specific language. Protobuf has a high ability of cross-language. This paper use C++ program to serialize the user message to a file. And the other is a Java program, which is responsible for deserialization into a user object from a file. Figure 1 shows the relationship between the two programs and protobuf.
The proto file of User object is shown as follows, and we can get C++ files and Java files from protocol buffer compiler.
package msg;
option java_package = "com.lzy.msg.proto";
Fig. 1Protobuf cross-language communication
option java_outer_classname = "Userproto";
message User {
repeated string name = 1;
repeated int32 age = 2;
}
The compiling command is:
protoc-I=./--cpp_out=./c++--java_out=./java./Userproto.proto
In the command above, --cpp_out and --java_out are respectively the path of C++ and Java files, and ./Userproto.proto is the path specified by the proto file. The C++ file includes the Userproto.pb.cc and Userproto.pb.h. In Java file, the compiler creates a new class named UserProto and com.lzy.msg.proto is the package’s name. Meoreover, there is a public static class named User in Userproto class.
We use C++ program to serialize the data in the compiled file and declare an object user in class User, setting user.set_name(“Jack”) and user.set_age(18). Data after serialization is stored in a file named a. And then we read data from the file referred by parameter through Java program and deserialize the data into a User object.
User user = User.parseFrom(new FileInputStream(args[0]));
user.getName();
user.getAge();
The deserialized results are: Jack, 18. From the example above, we can see that protobuf can achieve a good cross-language communication.
3Performance comparison of protobuf, XML and JSON
Protobuf, XML and JSON can be used as serialization tools, and can also be used in network transmission. Both JSON and XML are a kind of description language[5]. Although they have better readability than protobuf, protobuf has an obvious advantage of high compression efficiency and high speed of data parsing. This paper compares the size of files after serialization and speed of data parsing among protobuf, XML and JSON.
This paper chooses protobuf, XML and JSON to serialize the same fields. And the simulation experiment environment is CPU dual core P8800, 4G memory, 500G hard disk and 7200 rpm. Figure 2 and 3 show the experiment results.
Fig. 2Comparison of file size after serialization
Fig. 3Comparison of parsing time
From figure 2 and figure 3, we can find that the file size serialized by protobuf is less than that by JSON and XML, and parsing time of protobuf is also more fast than the that of JSON and XML. Besides, table 1 compares these three serialization tools side by side.
Tab. 1Comparision of XML, JSON and protobuf
4Conclusion
This paper makes a simple introduction of the priciple of protocol buffer, which has a high ability of cross-language and cross-platform. Besides, we make a simulation experiment showing that protobuf can save time of data parsing and the size of serialized file is the smallest conpared with XML and JSON.
【Reference】
[1]Kalman M. ProtoML: A rule-based validation language for google protocol buffers[A]. 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST)[C]. London: IEEE, 2013. 188-193.
[2]Kaur G, Fuad M M. An evaluation of protocol buffer[A]. Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon)[C]. Concord, NC: IEEE, 2010. 459-462.
[3]Mu ller J, Lorenz M, Geller F, et al. Assessment of communication protocols in the EPC network -replacing textual SOAP and XML with binary google protocol buffers encoding[A]. 2010 IEEE 17th International Conference on Industrial Engineering and Engineering Management (IE&EM)[C]. Xiamen: IEEE, 2010. 404-409.
[4]Feng J H, Li J H. Google protocol buffers research and application in online game[A]. Proceedings of 2011 13th IEEE Joint International Computer science and Information Technology Conference[C]. Chongqing, China: IEEE, 2011.1-4.
[5]Cui C, Ni H. Optimization simulation of XML performance based on JSON [J]. Communications Technology, 2009, 42(8):108-114.
[责任编辑:刘展]