With the surging development of information technology, to provide a high quality of network services, there are increasing demands and challenges for network analysis. As all data on the Internet are encapsulated and transferred by network packets, packets are widely used for various network traffic analysis tasks, from application identification to intrusion detection. Considering the choice of features and how to represent them can greatly affect the performance of downstream tasks, it is critical to learn high-quality packet representations. In addition, existing packet-level works ignore packet representations but focus on trying to get good performance with independent analysis of different classification tasks. In the real world, although a packet may have different class labels for different tasks, the packet representation learned from one task can also help understand its complex packet patterns in other tasks, while existing works omit to leverage them. Taking advantage of this potential, in this work, we propose a novel framework to tackle the problem of packet representation learning for various traffic classification tasks. We learn packet representation, preserving both semantic and byte patterns of each packet, and utilize contrastive loss with a sample selector to optimize the learned representations so that similar packets are closer in the latent semantic space. In addition, the representations are further jointly optimized by class labels of multiple tasks with loss of reconstructed representations and of class probabilities. Evaluations demonstrate that the learned packet representation of our proposed framework can outperform the state-of-the-art baseline methods on extensive popular downstream classification tasks by a wide margin in both the close-world and open-world scenario.