增加一篇由慕尼黑理工博士论文,其主要研究内容在查询处理和优化方面。慕尼黑理工在数据库方面有很深的造诣,其有很多vldb文章。同样,德国在数据库领域也出了一批牛人,他们在数据库查询处理和优化,事务等方面都有着非常非常高的水平。 也希望什么时候我们国内也有这样的人出现。
https://d-nb.info/1120584469/34
增加一篇由慕尼黑理工博士论文,其主要研究内容在查询处理和优化方面。慕尼黑理工在数据库方面有很深的造诣,其有很多vldb文章。同样,德国在数据库领域也出了一批牛人,他们在数据库查询处理和优化,事务等方面都有着非常非常高的水平。 也希望什么时候我们国内也有这样的人出现。
https://d-nb.info/1120584469/34
The list referenced papers are listed from. 《Hybrid Transactional/Analytical Processing: A Survey.》, from IBM Research.
[1] Apache Parquet. https://parquet.apache.org/.
[2] R. Appuswarmy, M. Karpathiotakis, D. Porobic, and A. Ailamaki. The Case For Heterogeneous HTAP. In
CIDR, 2017.
[3] M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu,J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin,
A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. In SIGMOD, pages 1383–1394, 2015.
[4] J. Arulraj, A. Pavlo, and P. Menon. Bridging the Archipelago Between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD, pages 583–598, 2016.
[5] R. Barber, C. Garcia-Arellano, R. Grosman, R. Mueller, V. Raman, R. Sidle, M. Spilchen, A. Storm, Y. Tian, P. T¨ozun, D. Zilio, M. Huras, ¨
G. Lohman, C. Mohan, F. Ozcan, and H. Pirahesh. ¨Evolving Databases for New-Gen Big Data Applications. In CIDR, 2017.
[6] A. Boehm, J. Dittrich, N. Mukherjee, I. Pandis, and R. Sen. Operational analytics data management systems. PVLDB, 9:1601–1604, 2016.
[7] P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.
[8] Apache Cassandra. http://cassandra.apache.org.
[9] A. Costea, A. Ionescu, B. R˘aducanu, M. Switakowski, C. Bˆarca, J. Sompolski, A. Luszczak, M. Szafra´nski, G. de Nijs, and P. Boncz. Vectorh: Taking
sql-on-hadoop to the next level. In SIGMOD ’16, pages 1105–1117, 2016.
[10] Danial Abadi and Shivnath Babu and Fatma Ozcan ¨ and Ippokratis Pandis. Tutorial: SQL-on-Hadoop Systems. PVLDB, 8, 2015.
[11] IBM dashDB. http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb.
[12] DataStax Spark Cassandra Connector. https://github.com/datastax/spark-cassandra-connector.
[13] C. Diaconu, C. Freedman, E. Ismert, P.-˚A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server’s memory-optimized OLTP
engine. In SIGMOD, pages 1243–1254, 2013.
[14] F. F¨arber, N. May, W. Lehner, P. Große, I. Muller, ¨ H. Rauhe, and J. Dees. The SAP HANA Database –An Architecture Overview. IEEE DEBull,35(1):28–33, 2012.
[15] S. Gray, F. Ozcan, H. Pereyra, B. van der Linden, and ¨A. Zubiri. IBM Big SQL 3.0: SQL-on-Hadoop without compromise. http://public.dhe.ibm.com/common/ssi/
ecm/en/sww14019usen/SWW14019USEN.PDF, 2014.
[16] SAP HANA Vora. http://go.sap.com/product/data-mgmt/hana-vora-hadoop.html.
[17] Apache HBase. https://hbase.apache.org/.
[18] Hive Transactions. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk dataintegration/content/hive-013-feature-transactions.html.[19] A. Kemper and T. Neumann. HyPer – A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE, pages 195–206, 2011.
[20] M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky,C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang,
N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for Hadoop. In CIDR, 2015.
[21] Apache Kudu. https://kudu.apache.org/.
[22] T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle TimesTen: An In-Memory Database for Enterprise Applications. IEEE DEBull, 36(3):6{13, 2013.
[23] A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica Analytic Database: C-store 7 Years Later. PVLDB, 5(12):1790{1801, 2012.
[24] MemSQL. http://www.memsql.com/.
[25] C. Mohan. History Repeats Itself: Sensible and NonsenSQL Aspects of the NoSQL Hoopla. In EDBT, 2013.
[26] B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav.SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics. In CIDR, 2017.
[27] Apache ORC. https://orc.apache.org/.
[28] A. Pavlo, J. Arulraj, L. Ma, P. Menon, T. C. Mowry, M. Perron, A. Tomasic, D. V. Aken, Z. Wang, and T. Zhang. Self-Driving Database Management
Systems. In CIDR, 2017.
[29] Apache Phoenix. http://phoenix.apache.org.
[30] V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus,
R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, 6:1080{1091, 2013.
[31] RocksDB. http://rocksdb.org/.
[32] Roshan Sumbaly and others. Serving large-scale batch computed data with project Voldemort. In Proc. of the 10th USENIX conference on File and Storage Technologies, 2012.
[33] Splice Machine. http://www.splicemachine.com/.
[34] M. Stonebraker and U. Cetintemel. “One Size Fits All”: An Idea Whose Time Has Come and Gone. In ICDE, pages 2{11, 2005.
[35] M. Stonebraker and A. Weisberg. The VoltDB Main
Memory DBMS. IEEE Data Eng. Bull., 36(2):21{27, 2013.
[36] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive –
A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, 2010.
[37] S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy Transactions in Multicore In-memory Databases. In SOSP, pages 18{32, 2013.
[38] Z. Zhang. Spark-on-HBase: Dataframe Based HBase Connector. http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector.
本文为P. G. Selinger所著的《Access path selection in a relational Database Management System》,其中描述了System R的JOIN候选解的求解方法。通过对于我们发现在System R中所使用的JOIN候选解的求解方法与PostgreSQL所采用的后选解的求解方法特别相似。我们可以通过阅读Selinger的文章与PosrtgreSQL源码进行对比分析。
文章首先介绍了如何处理SQL语句,在这里我们就不对该部分进行详细介绍,因为都需要经过词法解释,语法解析等步骤,然后形成相应的查询引擎所需要的数据结构。而后在该基础上根据相应的条件:参与连接的基表;连接类型;连接条件等来选择最优的连接路径。对于一个给定的连接操作,有些连接顺序是合法的,而有些则是属于非法的连接顺序。那么如何判断某个连接顺序的合法性呢?Selinger文章则给出了相应的解释。
Join进行选择时,选择那些与与参与的内表相关的连接谓词来。 即: t1, t2, …., tn的基表, 我们仅仅考虑, ti1, ti2,…, tin 对于所有的j (j=2,…., n), 当 (1) tij 至少与其它基表 tik 间有一个连接谓词, 当 k < j ; (2)对于所有的k > j时候,如果 其于 ti1, ti2,…, ti(j-1) 没有连接谓词的话,我们使用笛卡尔乘积来计算。
“A heuristic is used to reduce the join order permutations which are considered. Whe’h possible, the search is reduced by consideration only of join orders which have join predicates relating the inner relation to the other relations already participating in the join. This means that in joining relations tl,tt,…,tn only those orderings til,ti2,…,tin are examined .in which for all j (j=2,…,n) either (1) tij has at least one join predicate ”
“with some relation tik, where k < j, or (2) for all k > j, tik has no join predicate with til,tit,…,or ti(j-1). This means that all joins requiring Cartesian products are performed as late in the join sequence as possible. For example, if Tl.T2,T3 are the three relations in a query block’s FROM list, and there are join predicates between Tl and T2 and between T2
and T3 on different columns than the Tl-T2 join, then the following not considered:
permutations are not considered: T l-T3-T2 , T3-T l-T2″
接下来: 我们考虑Merge Join, 例如对于 EMP中的DNO已经属于索引扫描,故而可以与DEP构成 merge join,而且无需要进行sort操作。