Posted By | Message |
DaClyde
Posts: 1,318
Joined: Sep 2008
|
Monday, October 17, 2022 9:08 AM | |
If anyone would like to help populate the Japanese checklists to TCDB, I am setting up a Google Space as a collaboration space for working on this. I have set up a copy of my Google Sheets with the VLOOKUP for translating player and team names, included a sample of the TCDB standard import format, and also uploaded all the official checklist PDFs from BBM and Epoch to a shared Google Drive folder.
Let me know if interested and I can send you an invite and walk you through my process, and maybe we can start getting these sets in the system more quickly. I use a combination of OpenOffice Calc, Google Sheets and EditPad for a lot of the pasting and formatting.
Here is the link to the Google Drive folder.
https://drive.google.com/drive/folders/1-_pX4Gp68iiysXTuau5s2d0xOYCf18gZ
I'm hoping this goes well enough that we can start loading non-baseball sets regularly, as well. And if we find the resources, we could easily adapt this for other languages.
Edited on: Oct 17, 2022 - 9:10AM
|
|
|
|
FiresNBeers
Posts: 436
Joined: Aug 2018
|
Tuesday, October 18, 2022 12:06 AM | |
Very interesting, I should be able to find some time to learn and see how I can help.
-------------------------------
I am one of the members that helps within the site. I work closely with the IRs and can answer most questions. Please send me a message if there is anything I can assist with. If I don't know, I can certainly try to figure it out or direct you to a resource.
|
|
|
|
DaClyde
Posts: 1,318
Joined: Sep 2008
|
Tuesday, October 18, 2022 8:19 AM | |
If you would like to try it out, I would suggest setting up your own new Google Sheets doc, and copy the whole NPB VLOOKUPv1 into it, so you can edit.
I have found that most of the BBM checklist PDFs are good with copying a page worth of checklist and pasting into something offline like Excel of OpenOffice Calc. For some reason, it never pastes cleanly into a Google Sheet. In some cases, the Japanese names might have spaces in them. In that case, I do a find & replace to remove the space in all the names.
In your offline spreadsheet, you can select the column of the player names in Japanese, and paste them into Column I in your copy of the NPB VLOOKUPv1 document, and it should automatically translate all the names and populate them into Column F. Any entries that still show N/A in them are likely not in the lookup list, so they need to be added.
|
|
|
|
mrackar
Posts: 38
Joined: Aug 2017
|
Tuesday, October 18, 2022 9:57 AM | |
Just wanted to note that your VLOOKUP sheet contains duplicate names and each name has a different set of characters associated with it. Using VLOOKUP will only pull the first match it finds on the list.
Example:
Row 3082: Ryan Rupe - ループライアン
Row 7195: Ryan Rupe - ライアンループ
I don't know what the differences are between the two or which one you would need to pull, but that could be a potential issue. It's probably a matter of "Ryan Rupe" vs "Rupe Ryan" in Japanese.
Edited on: Oct 18, 2022 - 10:00AM
|
|
|
|
BigEd76
Posts: 4,015
Joined: Nov 2016
|
Tuesday, October 18, 2022 10:20 AM | |
Yeah, the first one is "Rupe Ryan" and the second is "Ryan Rupe"
-------------------------------
* Ed * L8 * Cards in my personal Collection are unavailable *
|
|
|
|
brodiescomics
Posts: 451
Joined: Oct 2019
|
|
|
|
tpxcards
Posts: 846
Joined: Jun 2019
|
Tuesday, October 18, 2022 11:10 AM | |
Word of caution. If the rule on the site is that card names should have the text that is on the card, then consider the following:
- some Japanese cards have English names instead of Japanese names on the card front. This is true on BBM 91, 92, 93 at least (I have most from those, not others) where Japanese name is only on the back.
- if the text on the card is in Japanese, you need to be certain that it is "correct" as per the actual text. Often companies who print Japanese on products do not use correct or modern day translations. I have found this during transcriptions of video games. There are occasional instances where the text on an item is not "correct" and online translation tools will attempt to correct it. So even though you may have a list of names, you should still verify every card and not just blindly use the name in your list IMO.
- Google translate often mixes up the small and big versions of some characters, for example:
ツ and ッ
イ and ィ
-------------------------------
TCDB Collection Leaderboard spots: #1 Alexei Zhamnov #1 Shane Doan #1 Phoenix Coyotes #1 Arizona Coyotes
|
|
|
|
DaClyde
Posts: 1,318
Joined: Sep 2008
|
Tuesday, October 18, 2022 12:48 PM | |
There are duplicate names in English because I've found multiple versions of those names in the checklists, so I want to catch all of them with minimal effort. Sometimes the checklists just use a last name, sometimes a first initial/last name, sometimes a full name. The "just last names" will be tricky, as context can determine the correct Lopez or Ramirez or Matsui.
I'm not using Google Translate here, the names either came directly from the checklists, off the card fronts or from the NPB database. Surprisingly how often even those resources don't match. Even the NPB site vs the specific team sites sometimes use both Shinjo vs Shinjoh.
As to the site rules regarding what is on the card vs just identifying the players, the site is effectively based in English, so I don't see much use in going back and editing all of the old menko or Takara sets to change the main name to the Kanji.
Also, some players change how they write their name over the course of their career. I'm more concerned with identifying the player than getting the specific characters used for the player in that specific set in that specific year correct, as long as it is linked to the correct player. That can be fixed after the fact. The larger issue is, can the card be found in the database? Yes. Daikan Yoh vs Dai-Kang Yang? Splitting hairs in the short term. Feel free to correct once someone loads the images. I'm trying to clear a 1000+ set backlog, first. We can always make it better.
Edited on: Oct 18, 2022 - 1:03PM
|
|
|
|
tpxcards
Posts: 846
Joined: Jun 2019
|
Tuesday, October 18, 2022 1:40 PM | |
" Even the NPB site vs the specific team sites sometimes use both Shinjo vs Shinjoh. "
They are all correct but are the result of different transliteration or romanisation efforts. A situation where Shinjo is correct whereas Shinjoh (or Shinjiyo/Shinjyo) are also correct are because the latter versions have pronunciation hints. This should only be used where card name is in Japanese and the "Translated" name ends up in Note2, but you should determine which particular romanisation type you are going to be using, so that all シンジョ or しんじょ are represented as Shinjo, Shinjyo or Shinjoh across the board.
The site I work on has a soft policy to not use any transliteration or romanisation that includes pronunciation hints, which include using ou instead of ō, etc.
-------------------------------
TCDB Collection Leaderboard spots: #1 Alexei Zhamnov #1 Shane Doan #1 Phoenix Coyotes #1 Arizona Coyotes
|
|
|
|
DaClyde
Posts: 1,318
Joined: Sep 2008
|
Tuesday, October 18, 2022 1:43 PM | |
If they are in a published manufacturers checklist, used by the league, or printed on a card, they will be attached to the player's ID as an alias. I would consider all of them valid finding aids to help a user find the player from the search box.
|
|
|
|