Ameblo API

Discussion in 'The STAGE48 Lobby' started by goratnik, Apr 19, 2016.

  1. goratnik

    goratnik Member

    Joined:
    Nov 27, 2011
    Oshimen:
    Akimoto Yasushi
    Twitter:
    goratnik
    Lately I've been bugged by idol blogs. I've scripted parsing some of them, but ran onto NMB48 now: http://ameblo.jp/nmb48/ and I'm already scared, how many posts are missing (graduations, nooo).

    While I could read blog directly over HTTP, with something like 40000 posts and say 75KB a page, this would be a lot to download (3GB on posts' pages alone?!), so I've thought - Ameblo is a quite popular social site, so maybe there is an API? Twitter has a very well documented one. Googling has suggested some links:

    http://ameblo.jp/amebavisionapi/
    http://so-zou.jp/web-app/tech/web-api/ameba/now/
    https://developer.amebame.com/
    http://matome.naver.jp/odai/2138565026824196301

    http://www.ateliee.com/archives/2287

    The last link seems to be important, as it says something about discontinuation... It has also a template for quite useful RSS, that is sadly limited to last 20 entries(?), while I would want them all.

    Can someone take a look and tell me, if there is such a feature for Ameblo? I gather that Ameba Now is a different service; I want to be sure if there is a way for blogs only to do things more nicely than through crawling bulky HTML pages.
     
  2. foodlfg

    foodlfg Upcoming Girls

    Joined:
    Aug 2, 2011
    Location:
    Hungary
    Oshimen:
    yamadanoe
    Twitter:
    foodlfg
    I didn't have time to look into it that much and I'm not an API expert either, but if they have a smartphone app for ameba / ameblo for example then there is a high chance they have some kind of http API too, hopefully an open one to the public.
    I couldn't figure out from those links what kind of API they might or might not have. developer.amebame.com sounds useful though but I have no access to it.

    But even if they have an open API, generally what you are trying to do is probably restricted. For example at twitter, their API can only retrieve a limited amount of tweets (something like 50, I don't know the exact numbers) at once and how many requests you can send is limited too. Also, their API can only retrieve up to the last 3000 or so tweets... Not to mention APIs nowadays require OAuth 2.0 for user authentication which further complicates the picture.
    https://dev.twitter.com/rest/reference/get/statuses/user_timeline

    Theoretically, you probably could save many of their ameblo posts but if their API is restricted too, then it would take time.

    I would use RSS even with its obvious disadvantages. This way at least you can save the new posts and their images (even the bigger ones)...
    http://feedblog.ameba.jp/rss/ameblo/nmb48/rss20.xml

    [​IMG]
    http://stat.ameba.jp/user_images/20160420/00/nmb48/ed/b9/j/t02200165_0640048013624427660.jpg
    http://stat.ameba.jp/user_images/20160420/00/nmb48/ed/b9/j/o0640048013624427660.jpg

    Anyways, it's an interesting "problem" to solve. I didn't mean to discourage you and I'm curious about this as well.


    What kind of programing languages do you use btw?
     
  3. goratnik

    goratnik Member

    Joined:
    Nov 27, 2011
    Oshimen:
    Akimoto Yasushi
    Twitter:
    goratnik
    Python3 with requests, re and sometimes libraries for API. Just that! (Google+'s archivizer is in Java, but honestly it's an old crap I can't even maintain properly.)

    Twitter applies its API restrictions to itself, too, so not really an example, because Ameblo on the contrary doesn't limit access to any posts by itself. If there is a problem with anything, it's because user has deleted this.
    If RSS is this limited, then there is not really point in coding its parser, as just catching up after a day off might require using direct HTML crawling anyway.
     
  4. honeysenpai

    honeysenpai Kenkyuusei

    Joined:
    Jan 8, 2010
    Location:
    http://www.youtube.com/watch?v=VtwbMs_9WYk
    Twitter:
    missingno15
    If Ameblo has any APIs, then I would like to know as well.

    You are looking into preserving idol blogs right? I've been doing some research on using WARC files to create snapshots of a website during some point in time - basically what webarchive/wayback machine does for archiving a certain webpage.

    What really like to do is understand how the entire archival process works and then port that into my language of choice. Unfortunately, my native language is Ruby and all the tools currently available seem to be either Python or Java. And I don't fully understand how it works or how to implement it.

    I looked into using Heritrix (what Wayback Machine uses apparently) but the biggest challenge is archiving dynamically generated content via JS and so far, Webrecorder (, as a service) seemed to be the only best and modern solution so far that does this. Ameblo though is just mostly text so in this case, setting up Heretrix would be fine but my personal goal was to archive anything and everything possible in order to do data analysis afterwards so Google+ was something I took into heavy consideration. Perhaps you'll be able to work something out?

    https://webrecorder.io/
    https://github.com/webrecorder/webrecorder
     

Share This Page