Re: extracting location info from string - Mailing list pgsql-sql
From | Lew |
---|---|
Subject | Re: extracting location info from string |
Date | |
Msg-id | irgo1v$dbv$1@news.albasani.net Whole thread Raw |
In response to | Re: extracting location info from string (Tarlika Elisabeth Schmitz <postgresql3@numerixtechnology.de>) |
Responses |
Re: extracting location info from string
Re: extracting location info from string |
List | pgsql-sql |
Tarlika Elisabeth Schmitz wrote: > Lew wrote: >> That isn't a table structure, that's a freeform text structure. You >> didn't state your question, Tarlika, but your database structure is >> terrible. For example, "region" and "country" should be different >> columns. > I presume you are referring to my original post: > CREATE TABLE person > ( > id integer NOT NULL, > "name" character varying(256) NOT NULL, > "location" character varying(256), > CONSTRAINT person_pkey PRIMARY KEY (id) > ); > > Sorry, this was just a TEMPORARY table I created for quick analysis of > my CSV data (now renamed to temp_person). > > > > The target table is: > CREATE TABLE person > ( > id integer NOT NULL, > "name" character varying(100) NOT NULL, > country character varying(3), > county character varying(3), > town character varying(50), > CONSTRAINT trainer_pkey PRIMARY KEY (id), > CONSTRAINT country_person_fk FOREIGN KEY (country) > REFERENCES country (id) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > CONSTRAINT county_person_fk FOREIGN KEY (country, county) > REFERENCES county (country, code) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION, > ); Ah, yes, that makes much more sense. Temporary tables such as you describe can be very convenient and effective. Thanks for the clarification. I think this problem is very widespread, namely how to get structured information out of freeform data. I've encountered it many times over the years, as have so many I know. I believe that human intervention will always be needed for this type of work, e.g., distinguishing place names that seem the same or correlating ones that seem distinct. I also don't know of any perfect approach. Perhaps the best one can find is a probabilistic promise that error will be less than some epsilon. That said, if you have a robust process to correct errors as the user population discovers them, then you can approach perfection asymptotically. Sometimes the best solution to a technical problem is a good human process. From an engineering standpoint, user feedbackis a vital element of homeostatic control. Edward W. Rouse's suggestion of a reference table to resolve different forms of address or region identification would fit well with such a process. -- Lew Honi soit qui mal y pense. http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg