{"id":1497,"date":"2022-09-15T11:57:43","date_gmt":"2022-09-15T09:57:43","guid":{"rendered":"https:\/\/www.loicmathieu.fr\/wordpress\/?p=1497"},"modified":"2022-09-21T13:53:49","modified_gmt":"2022-09-21T11:53:49","slug":"apache-pinot-et-de-ses-differents-types-dindexes","status":"publish","type":"post","link":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/apache-pinot-et-de-ses-differents-types-dindexes\/","title":{"rendered":"Apache Pinot and its various types of indexes"},"content":{"rendered":"<p>Some time ago, I finally took the time to test Apache Pinot, you can find the story of my first experiments <a href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/jai-enfin-pris-le-temps-de-tester-apache-pinot\/\">here<\/a>.<\/p>\n<p>Apache Pinot is a distributed real-time OnLine Analytical Processing (OLAP) datastore specifically designed to provide ultra-low latency analytics, even at extremely high throughput. If you don&#8217;t know it, start by reading <a href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/jai-enfin-pris-le-temps-de-tester-apache-pinot\/\">my introductory article<\/a> before this one.<\/p>\n<p>One of the strengths of Pinot is its different types of indexes, it is these ones that we will explore in this article.<\/p>\n<h2>The Chicago Crimes dataset<\/h2>\n<p>We will use the Chicago Crimes dataset from Google Big Query which we will export to CSV. To retrieve this dataset, go to the BigQuery interface, then search for the public project <code>bigquery-public-data<\/code>, then the table <code>chicago_crime<\/code>; navigating to this URL should do the same: <a href=\"https:\/\/console.cloud.google.com\/bigquery?p=bigquery-public-data&amp;d=chicago_crime\" target=\"_blank\" rel=\"noopener\"><a href=\"https:\/\/console.cloud.google.com\/bigquery?p=bigquery-public-data&amp;d=chicago_crime\">https:\/\/console.cloud.google.com\/bigquery?p=bigquery-public-data&amp;d=chicago_crime<\/a><\/a>.<\/p>\n<p>You then need to copy this table to a dataset in your GCP project, then export it to CSV.\nThis will give you 6 CSVs of about 250MB each, so 1.5GB of data to analyze.<\/p>\n<p>This dataset contains crime data for the city of Chicago over several years; its description can be found here: <a href=\"https:\/\/console.cloud.google.com\/marketplace\/details\/city-of-chicago-public-data\/chicago-crime?filter=solution-type:dataset\" target=\"_blank\" rel=\"noopener\"><a href=\"https:\/\/console.cloud.google.com\/marketplace\/details\/city-of-chicago-public-data\/chicago-crime?filter=solution-type:dataset\">https:\/\/console.cloud.google.com\/marketplace\/details\/city-of-chicago-public-data\/chicago-crime?filter=solution-type:dataset<\/a><\/a>.<\/p>\n<p>Once the CSVs have been retrieved, you will need to define a schema and a table, then create a Job to import the data.<\/p>\n<p>Here is the schema we are going to use:<\/p>\n<pre>{\n  \"schemaName\": \"chicagoCrimes\",\n  \"dimensionFieldSpecs\": [\n    {\n      \"name\": \"unique_key\", \"dataType\": \"LONG\"\n    },\n    {\n      \"name\": \"case_number\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"block\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"iucr\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"primary_type\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"description\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"location_description\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\":\"district\", \"dataType\": \"INT\"\n    },\n    {\n      \"name\": \"ward\", \"dataType\": \"INT\"\n    },\n    {\n      \"name\": \"community_area\", \"dataType\": \"INT\"\n    },\n    {\n      \"name\": \"fbi_code\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"year\", \"dataType\": \"INT\"\n    },\n    {\n      \"name\": \"location\", \"dataType\": \"STRING\"\n    },\n    {\n      \"name\": \"arrest\", \"dataType\": \"BOOLEAN\"\n    },\n    {\n      \"name\": \"domestic\", \"dataType\": \"BOOLEAN\"\n    },\n    {\n      \"name\": \"beat\", \"dataType\": \"BOOLEAN\"\n    }\n  ],\n  \"metricFieldSpecs\": [\n    {\n      \"name\": \"x_coordinate\", \"dataType\": \"FLOAT\"\n    },\n    {\n      \"name\": \"y_coordinate\", \"dataType\": \"FLOAT\"\n    },\n    {\n      \"name\": \"latitude\", \"dataType\":\"FLOAT\"\n    },\n    {\n      \"name\": \"longitude\", \"dataType\": \"FLOAT\"\n    }\n  ],\n  \"dateTimeFieldSpecs\": [\n    {\n      \"name\": \"date\",\n      \"dataType\": \"STRING\",\n      \"format\": \"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss z\",\n      \"granularity\": \"1:SECONDS\"\n    },\n    {\n      \"name\": \"updated_on\",\n      \"dataType\": \"STRING\",\n      \"format\": \"1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss z\",\n      \"granularity\": \"1:SECONDS\"\n    }\n  ]\n}\n<\/pre>\n<p>And here is the associated table, it does not currently contain any index:<\/p>\n<pre>{\n  \"tableName\": \"chicagoCrimes\",\n  \"segmentsConfig\" : {\n    \"replication\" : \"1\",\n    \"schemaName\" : \"chicagoCrimes\"\n  },\n  \"tableIndexConfig\" : {\n    \"invertedIndexColumns\" : [],\n    \"loadMode\"  : \"MMAP\"\n  },\n  \"tenants\" : {\n    \"broker\":\"DefaultTenant\",\n    \"server\":\"DefaultTenant\"\n  },\n  \"tableType\":\"OFFLINE\",\n  \"metadata\": {}\n}\n<\/pre>\n<p>To ingest the data you can use the following job:<\/p>\n<pre>\nexecutionFrameworkSpec:\n  name: 'standalone'\n  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'\n  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'\n  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'\njobType: SegmentCreationAndTarPush\ninputDirURI: '\/tmp\/pinot-quick-start'\nincludeFileNamePattern: 'glob:**\/*.csv'\noutputDirURI: '\/tmp\/pinot-quick-start\/segments\/'\noverwriteOutput: true\npinotFSSpecs:\n  - scheme: file\n    className: org.apache.pinot.spi.filesystem.LocalPinotFS\nrecordReaderSpec:\n  dataFormat: 'csv'\n  className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'\n  configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'\ntableSpec:\n  tableName: 'chicagoCrimes'\n  schemaURI: 'http:\/\/pinot-controller:9000\/tables\/chicagoCrimes\/schema'\n  tableConfigURI: 'http:\/\/pinot-controller:9000\/tables\/chicagoCrimes'\npinotClusterSpecs:\n  - controllerURI: 'http:\/\/pinot-controller:9000'\n<\/pre>\n<p>We will now start a Pinot cluster, the easiest way is to use the <a href=\"https:\/\/docs.pinot.apache.org\/basics\/getting-started\/running-pinot-in-docker#docker-compose\" target=\"_blank\" rel=\"noopener\">Docker Compose from the Pinot documentation<\/a>.<\/p>\n<p>To start the cluster via Docker Compose, run the command <code>docker compose up<\/code>. After a few minutes, you will have a Pinot cluster started whose interface is accessible at the URL <a href=\"http:\/\/localhost:9000\" target=\"_blank\" rel=\"noopener\"><a href=\"http:\/\/localhost:9000\">http:\/\/localhost:9000<\/a><\/a>.<\/p>\n<p>The Pinot image allows you to launch table creation or data ingestion jobs, the following commands assume that the necessary resources are in a directory <code>~\/dev\/pinot\/crime<\/code> which contains :<\/p>\n<ul><li>The table schema in the file <code>chicagoCrimes-schema.json<\/code><\/li>\n\n<li>The configuration of the table in the file <code>chicagoCrimes-table.json<\/code><\/li>\n\n<li>The 6 CSVs of data in a directory <code>data<\/code><\/li>\n<\/ul>\n<p>To create the table you can use the following docker command:<\/p>\n<pre>docker run --rm -ti \\\n    --network=pinot_default \\\n    -v ~\/dev\/pinot\/crime:\/tmp\/pinot-quick-start \\\n    --name pinot-batch-table-creation \\\n    apachepinot\/pinot:0.10.0 AddTable \\\n    -schemaFile \/tmp\/pinot-quick-start\/chicagoCrimes-schema.json \\\n    -tableConfigFile \/tmp\/pinot-quick-start\/chicagoCrimes-table.json \\\n    -controllerHost manual-pinot-controller \\\n    -controllerPort 9000 -exec\n<\/pre>\n<p>To ingest the data you can use the following docker command:<\/p>\n<pre>docker run --rm -ti \\\n    --network=pinot_default \\\n    -v ~\/dev\/pinot\/crime:\/tmp\/pinot-quick-start \\\n    --name pinot-data-ingestion-job \\\n    apachepinot\/pinot:0.10.0 LaunchDataIngestionJob \\\n    -jobSpecFile \/tmp\/pinot-quick-start\/job-ingest.yml\n<\/pre>\n<p>After ingestion we will have 6452716 rows in our table.<\/p>\n<h2>Performances without index<\/h2>\n<p>To test the performance of Pinot without any index, we will run a few queries from the Pinot administration console:<\/p>\n<pre>\nselect count(*) from chicagoCrimes\n\nselect year, count(*) from chicagoCrimes where arrest = true group by year\n\nselect year, count(*) from chicagoCrimes where primary_type='NARCOTICS' group by year\n\nselect year, count(*) from chicagoCrimes where x_coordinate&gt;1180000 group by year\n\nselect year, count(*) from chicagoCrimes where ward=45 group by year\n\nselect year, sum(community_area) from chicagoCrimes group by year\n<\/pre>\n<p>Observation: the queries run in a few (tens of) milliseconds on a dataset of 1.5 GB and 6.5 million rows.<\/p>\n<p>Segments take up 476MB on disk.<\/p>\n<p>The secret of these good performances without index is that each field is stored in a <strong>Forward Index<\/strong>, by default of type <strong>dictionary<\/strong> for dimension columns otherwise <strong>raw value<\/strong>.<\/p>\n<h3>Dictionary-encoded forward index with bit compression<\/h3>\n<p>In a dictionary-like forward index, an identifier is assigned to each unique value in a column, and a dictionary is constructed to associate the identifier with the value. The forward index stores identifiers compressed in bits. If you have few unique values, dictionary encoding can significantly improve index storage efficiency.<\/p>\n<p>Source : <a href=\"https:\/\/docs.pinot.apache.org\/basics\/indexing\/forward-index\" target=\"_blank\" rel=\"noopener\"><a href=\"https:\/\/docs.pinot.apache.org\/basics\/indexing\/forward-index\">https:\/\/docs.pinot.apache.org\/basics\/indexing\/forward-index<\/a><\/a>.<\/p>\n<h2>Pinot indexes<\/h2>\n<h3>Inverted index<\/h3>\n<p>In an inverted index, a mapping is created for each field value. This mapping will store the list of documents that contain this value.<\/p>\n<p>For example, for the following documents:<\/p>\n<table><thead><tr><th>Document IDs<\/th>\n\n<th>primary_type<\/th>\n        <\/tr><\/thead><tbody><tr><td>1<\/td>\n\n<td>MURDER<\/td>\n        <\/tr><tr><td>2<\/td>\n\n<td>MURDER<\/td>\n        <\/tr><tr><td>3<\/td>\n\n<td>DRUGS<\/td>\n        <\/tr><\/tbody><\/table>\n<p>You will have the following inverted index:<\/p>\n<table><thead><tr><th>primary_type<\/th>\n\n<th>Document IDs<\/th>\n        <\/tr><\/thead><tbody><tr><td>MURDER<\/td>\n\n<td>1, 2<\/td>\n        <\/tr><tr><td>DRUGS<\/td>\n\n<td>3<\/td>\n        <\/tr><\/tbody><\/table>\n<h3>Bloom Filter<\/h3>\n<p>A Bloom Filter makes it possible to exclude segments that do not contain any record corresponding to an EQUALITY predicate.<\/p>\n<h3>Range index<\/h3>\n<p>Same as an inverted index, but will create an index mapping for a range of values instead of creating one for each value.<\/p>\n<p>Saves space for columns with lots of distinct values.<\/p>\n<p>For columns of <code>TIMESTAMP<\/code> types, a dedicated index exists and serves the same purpose: the <strong>Timestamp index<\/strong>.<\/p>\n<h3>Star-tree<\/h3>\n<blockquote>Star-Tree data structure offers a configurable trade-off between space and time and lets us achieve hard upper bound for query latencies for a given use case.<\/blockquote>\n<p>Source : <a href=\"https:\/\/docs.pinot.apache.org\/basics\/indexing\/star-tree-index\" target=\"_blank\" rel=\"noopener\"><a href=\"https:\/\/docs.pinot.apache.org\/basics\/indexing\/star-tree-index\">https:\/\/docs.pinot.apache.org\/basics\/indexing\/star-tree-index<\/a><\/a>.<\/p>\n<p>A star-tree index will pre-compute and store one or more pre-aggregations.<\/p>\n<p>It will use a tree data structure to pre-compute sub-aggregations based on one dimension at certain nodes in the tree.<\/p>\n<p>When querying, Pinot will then select the nodes taking part of the query and aggregate the sub-aggregation of these nodes.<\/p>\n<img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/star-tree.png?resize=640%2C192&#038;ssl=1\" alt=\"\" width=\"640\" height=\"192\" class=\"alignnone size-full wp-image-1502\" srcset=\"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/star-tree.png?w=899&amp;ssl=1 899w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/star-tree.png?resize=300%2C90&amp;ssl=1 300w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/star-tree.png?resize=768%2C231&amp;ssl=1 768w, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/star-tree.png?resize=604%2C181&amp;ssl=1 604w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/>\n<p>A star-tree index is a tree structure that contains the following types of nodes:<\/p>\n<ul><li><strong>Root Node (Orange)<\/strong> : Single root node, from which the rest of the tree can be traversed.<\/li>\n\n<li><strong>Leaf Node (Blue)<\/strong> : Can contain at most T records, where T is configurable.<\/li>\n\n<li><strong>Non-leaf Node (Green)<\/strong> : Nodes with more than T records are further split into children nodes.<\/li>\n\n<li><strong>Star-Node (Yellow)<\/strong> : Star-Node, contains the pre-aggregated records after removing the dimension on which the data was split for this level.<\/li>\n\n<li><strong>Dimensions Split Order ([D1, D2])<\/strong> : Ordered list of dimensions that is used to determine the dimension to split on for a given level in the tree.<\/li>\n<\/ul>\n<h2>The results<\/h2>\n<p>To test the different types of indexes, we will create a <code>chicagoCrimesWithIdx<\/code> table with a set of indexes and run the same queries on the table with indexes and the one without to compare the performance.<\/p>\n<p>The following table definition reuses the <code>chicagoCrimes<\/code> schema but adds indexes on some fields.<\/p>\n<pre>\n{\n  \"tableName\": \"chicagoCrimesWithIdx\",\n  \"segmentsConfig\" : {\n    \"replication\" : \"1\",\n    \"schemaName\" : \"chicagoCrimes\"\n  },\n  \"tableIndexConfig\" : {\n    \"invertedIndexColumns\" : [\"primary_type\"],\n    \"bloomFilterColumns\": [\"ward\"],\n    \"rangeIndexColumns\": [\"x_coordinate\"],\n    \"starTreeIndexConfigs\": [{\n      \"dimensionsSplitOrder\": [\"year\"],\n      \"skipStarNodeCreationForDimensions\": [],\n      \"functionColumnPairs\": [\"SUM__community_area\"],\n      \"maxLeafRecords\": 1\n    }],\n    \"loadMode\"  : \"MMAP\"\n  },\n  \"tenants\" : {\n    \"broker\":\"DefaultTenant\",\n    \"server\":\"DefaultTenant\"\n  },\n  \"tableType\":\"OFFLINE\",\n  \"metadata\": {}\n}\n<\/pre>\n<p>You can create the table via the following Docker command:<\/p>\n<pre>\ndocker run --rm -ti \\\n    --network=pinot_default \\\n    -v ~\/dev\/pinot\/crime:\/tmp\/pinot-quick-start \\\n    --name pinot-batch-table-creation \\\n    apachepinot\/pinot:0.10.0 AddTable \\\n    -schemaFile \/tmp\/pinot-quick-start\/chicagoCrimes-schema.json \\\n    -tableConfigFile \/tmp\/pinot-quick-start\/chicagoCrimes-table-with-idx.json \\\n    -controllerHost manual-pinot-controller \\\n    -controllerPort 9000 -exec\n<\/pre>\n<p>And load the data into the table via the following Docker command, the job <code>job-ingest-with-idx.yml<\/code> is the same as the job <code>job-ingest.yml<\/code> except that it uses the new table description:<\/p>\n<pre>\ndocker run --rm -ti \\\n    --network=pinot_default \\\n    -v ~\/dev\/pinot\/crime:\/tmp\/pinot-quick-start \\\n    --name pinot-data-ingestion-job \\\n    apachepinot\/pinot:0.10.0 LaunchDataIngestionJob \\\n    -jobSpecFile \/tmp\/pinot-quick-start\/job-ingest-with-idx.yml\n<\/pre>\n<h3>Inverted index<\/h3>\n<pre>\nselect year, count(*) from chicagoCrimes where primary_type='NARCOTICS' group by year\n<\/pre>\n<table class=\"small\"><tr><th>Index Y\/N<\/th>\n\n<th>timeUsedMs<\/th>\n\n<th>numDocsScanned<\/th>\n\n<th>numEntriesScannedInFilter<\/th>\n\n<th>numEntriesScannedPostFilter<\/th>\n    <\/tr><tr><td>Without index<\/td>\n\n<td>25ms<\/td>\n\n<td>636118<\/td>\n\n<td>6452716<\/td>\n\n<td>636118<\/td>\n    <\/tr><tr><td>With index<\/td>\n\n<td>11ms<\/td>\n\n<td>636118<\/td>\n\n<td>O<\/td>\n\n<td>636118<\/td>\n    <\/tr><\/table>\n<p>We see here the advantage of an inverted index: the filter (the WHERE clause) did not scan any entry because it used the index.\nQuery execution time has been greatly improved from 25ms to 11ms.<\/p>\n<h3>Range index<\/h3>\n<pre>\nselect year, count(*) from chicagoCrimes where x_coordinate&gt;1180000 group by year\n<\/pre>\n<table class=\"small\"><tr><th>Index Y\/N<\/th>\n\n<th>timeUsedMs<\/th>\n\n<th>numDocsScanned<\/th>\n\n<th>numEntriesScannedInFilter<\/th>\n\n<th>numEntriesScannedPostFilter<\/th>\n    <\/tr><tr><td>Without index<\/td>\n\n<td>29ms<\/td>\n\n<td>990885<\/td>\n\n<td>6452716<\/td>\n\n<td>990885<\/td>\n    <\/tr><tr><td>With index<\/td>\n\n<td>11ms<\/td>\n\n<td>990885<\/td>\n\n<td>641072<\/td>\n\n<td>990885<\/td>\n    <\/tr><\/table>\n<p>We see here the interest of a range index: the filter (the WHERE clause) scanned far fewer entries because it used the index. As the index only have an entry by range, it still had to scan part of the entries (10% here).\nQuery execution time has been greatly improved from 29ms to 11ms.<\/p>\n<h3>Bloom filter<\/h3>\n<pre>\nselect year, count(*) from chicagoCrimesWithIdx where ward=45 group by year\n<\/pre>\n<p>A Bloom filter will filter the segments (segment pruning) to be processed, although here we have 6 segments, these segments was created without any filter logic (technical segment), so the Bloom filter will have no effect.<\/p>\n<p>To be able to use a Bloom filter on the ward field, it would have been necessary to create the segments based on the value of this field (one segment per value range for example).<\/p>\n<h3>Star-tree index<\/h3>\n<pre>\nselect year, sum(community_area) from chicagoCrimes group by year\n<\/pre>\n<table class=\"small\"><tr><th>Index Y\/N<\/th>\n\n<th>timeUsedMs<\/th>\n\n<th>numDocsScanned<\/th>\n\n<th>numEntriesScannedInFilter<\/th>\n\n<th>numEntriesScannedPostFilter<\/th>\n    <\/tr><tr><td>Without index<\/td>\n\n<td>31ms<\/td>\n\n<td>6452716<\/td>\n\n<td>0<\/td>\n\n<td>12905432<\/td>\n    <\/tr><tr><td>With index<\/td>\n\n<td>6ms<\/td>\n\n<td>132<\/td>\n\n<td>0<\/td>\n\n<td>264<\/td>\n    <\/tr><\/table>\n<p>With a star-tree index, documents will be pre-computed with pre-aggregations.\nInstead of scanning the table documents, the index documents was scanned; hence the number of scanned documents of 132.\nThis query only used data from the indexes and therefore ran very quickly.\nQuery execution time has been greatly improved from 31ms to 6ms.<\/p>\n<h2>Conclusion<\/h2>\n<p>Without index, the performance of a Pinot query is already very good thanks to its optimized storage of fields in forward indexes. Adding specific indexes allows a significant performance gain even for queries scanning a large part of the table data. Please note that the query times given in this article are based on local executions on a small dataset for Pinot, so they cannot be extrapolated to a real dataset on a production Pinot deployment.<\/p>\n<p>The star-tree index allows pre-aggregation without requiring a large storage space, queries using it becomes ultra-fast because they use a small number of pre-built documents instead of requiring a total scan of segments. This is for me the most interesting and innovative index offered by Pinot.<\/p>","protected":false},"excerpt":{"rendered":"<p>Some time ago, I finally took the time to test Apache Pinot, you can find the story of my first experiments here. Apache Pinot is a distributed real-time OnLine Analytical Processing (OLAP) datastore specifically designed to provide ultra-low latency analytics, even at extremely high throughput. If you don&#8217;t know it, start by reading my introductory article before this one. One of the strengths of Pinot is its different types of indexes, it is these ones that we will explore in&#8230;<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/apache-pinot-et-de-ses-differents-types-dindexes\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[9],"tags":[203,204,202],"class_list":["post-1497","post","type-post","status-publish","format-standard","hentry","category-informatique","tag-database","tag-olap","tag-pinot"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":1400,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/jai-enfin-pris-le-temps-de-tester-apache-pinot\/","url_meta":{"origin":1497,"position":0},"title":"I finally took the time to test Apache Pinot","author":"admin","date":"Thursday January 20th, 2022","format":false,"excerpt":"I've been wanting to test Apache Pinot for a very long time and I finally took the time to do it! First, a quick description of Pinot Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput. It can ingest directly from\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/Pinot-architecture.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/Pinot-architecture.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/Pinot-architecture.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1508,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/4-ans-chez-zenika\/","url_meta":{"origin":1497,"position":1},"title":"(Fran\u00e7ais) 4 ans chez Zenika","author":"admin","date":"Tuesday September  6th, 2022","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1674,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/devoxx-fr-2023-foundation-db-le-secret-le-mieux-garde-des-nouvelles-architectures-distribuees-par-pierre-zemb-et-steven-le-roux\/","url_meta":{"origin":1497,"position":2},"title":"(Fran\u00e7ais) Devoxx FR 2023 &#8211; FoundationDB : le secret le mieux gard\u00e9 des nouvelles architectures distribu\u00e9es ! par Pierre Zemb et Steven Le Roux","author":"admin","date":"Monday April 17th, 2023","format":false,"excerpt":"Sorry, this entry is only available in Fran\u00e7ais.","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":419,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/chti-jug-nosql\/","url_meta":{"origin":1497,"position":3},"title":"Ch&#8217;ti JUG : NoSQL","author":"admin","date":"Monday December 20th, 2010","format":false,"excerpt":"Le 2 d\u00e9cembre s'est tenu dans les locaux de l'IUT A de Lille une session du Ch'ti JUG sur les technologie NoSQL anim\u00e9 par Olivier Mallassi. L'intervenant a commenc\u00e9 la conf\u00e9rence par un bref historique de la mani\u00e8re dont les donn\u00e9es on \u00e9t\u00e9 stock\u00e9es dans le monde de l'informatique: Au\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":566,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/chti-jug-lili-et-cassandra\/","url_meta":{"origin":1497,"position":4},"title":"Ch&#8217;ti JUG : Lili et Cassandra","author":"admin","date":"Thursday December 15th, 2011","format":false,"excerpt":"Le 12 d\u00e9cembre s'est tenu dans les locaux de l'IUT A de Lille une session du Ch'ti JUG sur Lili et Cassandra deux outils autour des bases de donn\u00e9es NoSql. La pr\u00e9sentation de Lili a \u00e9t\u00e9 faite par Stevens Noel et celle sur Cassandra par J\u00e9r\u00e9my Sevellec. Ayant d\u00e9j\u00e0 \u00e9crit\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1611,"url":"https:\/\/www.loicmathieu.fr\/wordpress\/informatique\/introduction-a-kestra\/","url_meta":{"origin":1497,"position":5},"title":"Introduction to Kestra","author":"admin","date":"Monday March  6th, 2023","format":false,"excerpt":"Kestra is an open-source data orchestrator and scheduler. With Kestra, data workflows, called flows, use the YAML format and are executed by its engine via an API call, the user interface, or a trigger (webhook, schedule, SQL query, Pub\/Sub message, ...). The important notions of Kestra are : The flow:\u2026","rel":"","context":"In &quot;informatique&quot;","block_context":{"text":"informatique","link":"https:\/\/www.loicmathieu.fr\/wordpress\/category\/informatique\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/kestra-01-1024x267.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/kestra-01-1024x267.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/loicmathieu.fr\/wordpress\/wp-content\/uploads\/kestra-01-1024x267.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1497","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/comments?post=1497"}],"version-history":[{"count":22,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1497\/revisions"}],"predecessor-version":[{"id":1530,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/posts\/1497\/revisions\/1530"}],"wp:attachment":[{"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/media?parent=1497"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/categories?post=1497"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.loicmathieu.fr\/wordpress\/wp-json\/wp\/v2\/tags?post=1497"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}