Thread: BUG #18320: Duplicate primary key records in table
The following bug has been logged on the website: Bug reference: 18320 Logged by: vishnu ch Email address: jaihind213@gmail.com PostgreSQL version: 15.2 Operating system: aarch64-apple-darwin21.6.0 Description: hi i create table with primary key and inserted a few records but i am surprised to see duplicate records having same primary key ? is this a bug ? or am i missing something in postgres ? i noticed other conversations/bugs like BUG #16938 --------------------- 1. create table CREATE TABLE foo ( id integer PRIMARY KEY, name VARCHAR(255) NOT NULL ); 2. insert records: insert into foo values (4, 'boo'); insert into foo values (6, 'asd'); insert into foo values (6, 'asd'); ..... 3. select * from foo; id|name| --+----+ 6|xxx | 5|yyy | 5|yyy | 6|xxx | 4|boo | 6|asd | 6|asd | 6|asd | 4|boo | --------------------------------------- Other details: 1. PostgreSQL 15.2 on aarch64-apple-darwin21.6.0, compiled by Apple clang version 14.0.0 (clang-1400.0.29.102), 64-bit 2. Using https://postgresapp.com/ to launch postgres 15 on mac m1 - macOs Monterey
hi All,
i was using spark to load the data.
here is the sample code:
--------
SparkSession spark = SparkSession.builder()
.appName("testUpdateSegmentStatsCuesGenerated")
.master("local[*]")
.getOrCreate();
List<Row> rows;
Row row1 = RowFactory.create( 6, "xxx");
Row row2 = RowFactory.create(5, "yyy");
Row row3 = RowFactory.create( 4, "boo");
rows = new ArrayList<>();
rows.add(row1);
rows.add(row2);
rows.add(row3);
Dataset<Row> input = spark.createDataFrame(rows, new StructType()
.add("id", DataTypes.IntegerType, false)
.add("name", DataTypes.StringType));
// Define PostgreSQL connection properties
Properties connectionProperties = new Properties();
connectionProperties.setProperty("user", "postgres");
connectionProperties.setProperty("password", "");
connectionProperties.setProperty("driver", "org.postgresql.Driver");
// Define PostgreSQL table name
String postgresTableName = "foo1";
// Write data to PostgreSQL table, overwriting existing records
input.write()
.mode(SaveMode.Overwrite)
.jdbc("jdbc:postgresql://localhost:5432/postgres", postgresTableName, connectionProperties);
// Stop Spark session
spark.stop();
.appName("testUpdateSegmentStatsCuesGenerated")
.master("local[*]")
.getOrCreate();
List<Row> rows;
Row row1 = RowFactory.create( 6, "xxx");
Row row2 = RowFactory.create(5, "yyy");
Row row3 = RowFactory.create( 4, "boo");
rows = new ArrayList<>();
rows.add(row1);
rows.add(row2);
rows.add(row3);
Dataset<Row> input = spark.createDataFrame(rows, new StructType()
.add("id", DataTypes.IntegerType, false)
.add("name", DataTypes.StringType));
// Define PostgreSQL connection properties
Properties connectionProperties = new Properties();
connectionProperties.setProperty("user", "postgres");
connectionProperties.setProperty("password", "");
connectionProperties.setProperty("driver", "org.postgresql.Driver");
// Define PostgreSQL table name
String postgresTableName = "foo1";
// Write data to PostgreSQL table, overwriting existing records
input.write()
.mode(SaveMode.Overwrite)
.jdbc("jdbc:postgresql://localhost:5432/postgres", postgresTableName, connectionProperties);
// Stop Spark session
spark.stop();
--------
Steps 2 reproduce:
1. create table
2. run spark code with mode overwrite. do a select u will see 3 rows.
3. modify code to edit the record ' Row row3 = RowFactory.create( 4, "boo");' to Row row3 = RowFactory.create( 4, "boo--1");
4. run spark code again with mode overwrite. it will overwrite and u will see 3 records with id4 being boo--1
5. now run spark code again with mode Append
6. do a select * from foo2, you will see 6 records and duplicate primary key rows.
-----------------
created a youtube video showing the bug:
thanking you.
On Thu, Feb 1, 2024 at 9:28 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18320
Logged by: vishnu ch
Email address: jaihind213@gmail.com
PostgreSQL version: 15.2
Operating system: aarch64-apple-darwin21.6.0
Description:
hi
i create table with primary key and inserted a few records but i am
surprised to see duplicate records having same primary key ?
is this a bug ? or am i missing something in postgres ?
i noticed other conversations/bugs like BUG #16938
---------------------
1. create table
CREATE TABLE foo (
id integer PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
2. insert records:
insert into foo values (4, 'boo');
insert into foo values (6, 'asd');
insert into foo values (6, 'asd');
.....
3. select * from foo;
id|name|
--+----+
6|xxx |
5|yyy |
5|yyy |
6|xxx |
4|boo |
6|asd |
6|asd |
6|asd |
4|boo |
---------------------------------------
Other details:
1. PostgreSQL 15.2 on aarch64-apple-darwin21.6.0, compiled by Apple clang
version 14.0.0 (clang-1400.0.29.102), 64-bit
2. Using https://postgresapp.com/ to launch postgres 15 on mac m1 - macOs
Monterey
Not a PostgreSQL bug - we didn’t break unique constraints without anyone noticing during development.
On Wednesday, January 31, 2024, vishnu rao <jaihind213@gmail.com> wrote:
On Wednesday, January 31, 2024, vishnu rao <jaihind213@gmail.com> wrote:
2. run spark code with mode overwrite. do a select u will see 3 rows.
A internet search leads me to believe overwrite operates by dropping and recreating the table. Apparently the new one doesn’t have the constraint.
I suggest you enable statement logging on the server and see exactly what your library is doing behind the scenes on your behalf. At least DDL related stuff.
David J.