Re: Recursive Arrays 101 - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Recursive Arrays 101 |
Date | |
Msg-id | 562EA0E6.5090500@aklaver.com Whole thread Raw |
In response to | Re: Recursive Arrays 101 (David Blomstrom <david.blomstrom@gmail.com>) |
Responses |
Re: Recursive Arrays 101
|
List | pgsql-general |
On 10/26/2015 01:51 PM, David Blomstrom wrote: > I'm focusing primarily on vertebrates at the moment, which have a total > of (I think) about 60,000-70,000 rows for all taxons (species, families, > etc.). My goal is to create a customized database that does a really > good job of handling vertebrates first, manually adding a few key > invertebrates and plants as needed. > > I couldn't possibly repeat the process with invertebrates or plants, > which are simply overwhelming. So, if I ever figure out the Catalogue of > Life's database, then I'm simply going to modify its tables so they work > with my system. My vertebrates database will override their vertebrate > rows (except for any extra information they have to offer). > > As for "hand-entry," I do almost all my work in spreadsheets. I spent a > day or two copying scientific names from the Catalogue of Life into my > spreadsheet. Common names and slugs (common names in a URL format) is a > project that will probably take years. I might type a scientific name or > common name into Google and see where it leads me. If a certain > scientific name is associated with the common name "yellow birch," then > its slug becomes yellow-birch. If two or more species are called yellow > birch, then I enter yellow-birch in a different table ("Floaters"), > which leads to a disambiguation page. > > For organisms with two or more popular common names - well, I haven't > really figured that out yet. I'll probably have to make an extra table > for additional names. Catalogue of Life has common names in its > database, but they all have upper case first letters - like American > Beaver. That works fine for a page title but in regular text I need to > make beaver lowercase without changing American. So I'm just starting > from square one and recreating all the common names from scratch. I think there has to be a better way as this is just a formatting issue Can't remember what programming language you are working in, but in Python: In [13]: s = 'American Beaver' In [14]: s.capitalize() Out[14]: 'American beaver' In [15]: s.lower() Out[15]: 'american beaver' > > It gets still more complicated when you get into "specialist names." ;) > But the system I've set up so far seems to be working pretty nicely. > > On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent@gmail.com > <mailto:robjsargent@gmail.com>> wrote: > > On 10/26/2015 02:29 PM, David Blomstrom wrote: > > Sorry for the late response. I don't have Internet access at > home, so I only post from the library or a WiFi cafe. > > Anyway, where do I begin? > > Regarding my "usage patterns," I use spreadsheets (Apple's > Numbers program) to organize data. I then save it as a CSV file > and import it into a database table. It would be very hard to > break with that tradition, because I don't know of any other way > to organize my data. > > On the other hand, I have a column (Rank) that identifies > different taxonomic levels (kingdom, class, etc.). So I can > easily sort a table into specific taxonomic levels and save one > level at a time for a database table. > > There is one problem, though. I can easily put all the > vertebrate orders and even families into a table. But genera > might be harder, and species probably won't work; there are > simply too many. My spreadsheet program is almost overwhelmed by > fish species alone. The only solution would be if I could import > Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But that > might be kind of tedious, especially if I have to make multiple > updates. > > Yes I suspect you spreadsheet will be limited in rows, but of course > you can send all the spreadsheets to a single table in the database. > If that's what you want. You don't have to, but you see mention of > tables millions of records routinely. On the other hand, if > performance becomes an issue with the single table approach you > might want to look at "partitioning". But I would be surprised if > you had to go there. > > What is your data source? How much hand-entry are you doing? There > are tools which (seriously) upgrade the basic 'COPY into <table>' > command. > > > As for "attributes," I'll post my table's schema, with a > description, next. > > > > > > -- > David Blomstrom > Writer & Web Designer (Mac, M$ & Linux) > www.geobop.org <http://www.geobop.org> -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: