Flink应用开发系列（六）DataSet开发之自带的读取数据源函数

在前面的文章里面，我们介绍过Dataset是从固定的数据源里面读取数据的，所以这里的话，我们列巨额下Flink自带的读取数据源相关的函数，同时也是以表格的事项列举。

序号	函数	说明	示例
1	readTextFile(path)	按行读取文件并将其作为字符串返回	DataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile"); DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");
2	readTextFileWithValue(path)	按行读取文件并将它们作为StringValues返回。StringValues是可变字符串。	DataSet<String> localLines = env.readTextFileWithValue("file:///path/to/my/textfile"); DataSet<String> hdfsLines = env.readTextFileWithValue("hdfs://nnHost:nnPort/path/to/my/textfile");
3	readCsvFile(path)	解析逗号（或其他字符）分隔字段的文件。返回元组，案例类对象或POJO的DataSet。支持基本java类型及其Value对应作为字段类型。	DataSet<Person>> csvInput = env.readCsvFile("hdfs:///the/CSV/file").pojoType(Person.class, "name", "age", "zipcode");
4	readSequenceFile(Key, Value, path)	创建一个JobConf并从类型为SequenceFileInputFormat，Key class和Value类的指定路径中读取文件，并将它们作为Tuple2 返回。基于集合：	DataSet<Tuple2<IntWritable, Text>> tuples = env.readSequenceFile(IntWritable.class, Text.class, "hdfs://nnHost:nnPort/path/to/file");
5	fromCollection(Collection)	从JavaJava.util.Collection创建数据集。集合中的所有数据元必须属于同一类型。	List<String> sources = new ArrayList<String>(); sources.add("From the age groups"); DataSet<String> text = env.fromCollection(sources);
6	fromElements(T …)	根据给定的对象序列创建数据集。所有对象必须属于同一类型。	DataSet<String> text = env.fromElements("From the age groups","we can see that the largest group of citizens is the group in the age between 20-29","People in this period have had their own career.In this society of ever-quickening pace","working with copmputer has become a fashion. Furthermore");
7	generateSequence(from, to)	并行生成给定间隔中的数字序列。	DataSet<Long> numbers = env.generateSequence(1, 10000000);