Scraping data | MATLAB | Forum

Avatar

Please consider registering
Guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —






— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed sp_TopicIcon
Scraping data
No permission to create posts
January 18, 2018
8:18 am
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

Hi I am trying to scrape the data from my temp sensor. it is located at malcomp.mooo.com:1010 

I have put the temperature between span tags like the example but it doest like it.

im sure im just missing something but it just doesnt seem to get any data?

 

any ideas?

 

[code]

% Scrape a website to identify the current temperature . The
% temperature is then written to another ThingSpeak channel.

% Specify the url containing information on current temperature in Natick, MA, U.S.A.

url = 'http://malcomp.mooo.com:1010/index.htm';

% TODO - Replace the [] with channel ID to write data to:
writeChannelID = 405063;

% TODO - Enter the Write API Key between the '' below:
writeAPIKey = 'UGEUH8A5T1V9AFA9';

% Fetch data and parse it to find information of interest. Learn more about
% the URLFILTER function by going to the Documentation tab on the right
% side pane of this page.
temp = urlfilter(url,'Temp');

display(temp, 'Temperature');

[/code]

January 18, 2018
9:21 am
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

Looks like the site is loading some of the DOM dynamically. You will need to do this in two steps:

1) Create a ThingHTTP app

Set the URL to: http://malcomp.mooo.com:1010

Set the Parse String to: //span[2]/text()

Now you can hit this ThingHTTP from a device, or from MATLAB and the result will be the text in the second <span> on the page

2) Create a MATLAB Analysis app with this code:

opts = weboptions('Timeout',15);
data = webread('INSERT API URL FROM STEP 1',opts)

January 18, 2018
11:01 am
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

I have no idea how that works in the second part but it does. now I need to push the data to the feilds but the examples given dont make sence to me

i have tried the code below, but it is so far off it is probably laughable.

I am trying to get the data to  field   lounge_Temp  and then once I have this I will try to play to get it to do the 4 other sensors.

 

thank you for your help im relived it is at least scraping

 

[code]

% Enter your MATLAB Code below

opts = weboptions('Timeout',15);
data = webread('https://api.thingspeak.com/apps/thinghttp/send_request?api_key=PC4EOCTMNDPPQOSW',opts)
writeChannelID = 405063;

% TODO - Enter the Write API Key between the '' below:
writeAPIKey = 'blahblahblah';

% Fetch data and parse it to find information of interest. Learn more about
% the URLFILTER function by going to the Documentation tab on the right
% side pane of this page.
lounge_Temp = urlfilter(numbers);

display(lounge_Temp, 'Temp');

% Write the temperature data to another channel specified by the
% 'writeChannelID' variable

display(['Note: To successfully write data to another channel, ',...
'assign the write channel ID and API Key to ''writeChannelID'' and ',...
'''writeAPIKey'' variables above. Also uncomment the line of code ',...
'containing ''thingSpeakWrite'' (remove ''%'' sign at the beginning of the line.)'])

% Learn more about the THINGSPEAKWRITE function by going to the Documentation tab on
% the right side pane of this page.

% thingSpeakWrite(writeChannelID, tempF, 'Writekey', writeAPIKey);

[/code]

January 18, 2018
11:01 am
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

Just clarifying the reason you were unable to use WEBREAD to get to the data - the website that was serving the data was serving it on port 1010. This is a non-standard port for HTTP web servers and MATLAB running in the cloud blocks access to these non-standard ports. 

By using the ThingHTTP app we were able to put a redirection from the normal port (port 80) to the non-standard port (1010) on the website serving the data.

January 18, 2018
11:30 am
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

OOHHH

I only picked 1010 because I could remember it. 80 is being used and a few others so i just picked 1010 as it isnt used by any of my software.

what is the best port to use other than 80

I will move it to that port then.

once i have done that I would like to scrape this data from the page into their 4 fields. can you help with the code to do it?

my fields are 1 lounge_temp 2 Loft_temp 3 Lounge_humid 4 Loft_humid

the page comes out like this below (source)

 

thank you for your help. I think I shot myself in the foot!

  The Fan is 0<br>
  <span>Temp: </span>
  <span>
  18
  </span>
  <span>Temp2: </span>
  <span>
  7
  </span>
  <span>humidity: </span>
  <span>
  60
  </span>
  <span>humidity2: </span>
  <span>
  81
  </span>
  <br>
  Click <a href="/H">here</a> turn the Fan on<br>
  Click <a href="/L">here</a> turn the Fan off<br>
January 18, 2018
11:43 am
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

If you're not using port 80, you will need to use ThingHTTP to essentially proxy the data from port 1010 to port 80. I showed in the example above, the Temperature, 18, is in the second span, which is why I set the Parse String to extract the value in the second <span>.

 

You can create any number of ThingHTTPs, one for each field.

Say you want humidity2, you will change the Parse String to 

 //span[8]/text()

This parses the text and pulls the value within the 8th span.

 

Hope this helps explain the concept so you can modify accordingly.

January 18, 2018
12:02 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

 super 3 questions please im nearly there.

1 how often will the thinghttp or the app poll for infromation and where to I change it

2 can I remove the span and just look for the temp: before the temperature and temp2: before the second (this way I can remove the spans from the code saving some software space as im pushing my luck on it

3 how do I get that info to the correct fields in analysis

 

I am aware of my stupidity with this but Im so upside down with C+ my brain isnt coping well

 

thank you for your patience and help 

January 18, 2018
1:02 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

Ok Im nearly nearly there. 

1 I set that easy to answer 

2 yes i can

3 still cant figure

my output is now as below

but I cant figure the command to take the 4 numbers and fill them into the for fields.

could anyone tell me what is needed? I have got part way with  thingSpeakWrite(xxxxxx, analyzedData, 'WriteKey', xxxxxxxxxx);

its the analyseddata bit im stuck on?

 

data =

'
17
6
61
83
'

January 18, 2018
1:50 pm
Avatar
cstapels
Moderator
Members


Moderators
Forum Posts: 258
Member Since:
March 7, 2017
sp_UserOfflineSmall Offline

have a look at the documentation for thingSpeakWrite.  There are some good examples there.

I think you want to use the name,value pair 'fields'

thingSpeakWrite(xxxxxx,'values', analyzedData, 'WriteKey', xxxxxxxxxx,'fields',[1 2 3 4]);

You may have to transpose analyzedData, depending on what shape it is in.  Use and apostrophe after the variable to transpose it.  -> analyzedData'

January 18, 2018
1:59 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

Hi thank you for your reply

I looked at the examples over and over and thought they would work but they only seem to be this sort 

 

thingSpeakWrite(17504,[2.3,1.2,3.2,0.1],'WriteKey','23ZLGOBBU9TWHG2H')

perfect I thougt so off i went to use it and realised it was just filling in the data 2.3 to the first and so on and I dont know how to turn the output into a string,
i tried  your command but it said

Undefined function or variable 'analyzedData'.?


is no one else doing this? scraping data off a webpage and putting it into thingspeak? 
a big thank you for your help
January 19, 2018
1:29 pm
Avatar
cstapels
Moderator
Members


Moderators
Forum Posts: 258
Member Since:
March 7, 2017
sp_UserOfflineSmall Offline

Can you show what the value or some sample values you have for analyzedData?  or show the command where you scrape web data?

You need to be pretty explicit about format so that thingSpeak can put stuff where you want it.

perhaps consider this format : 

thingSpeakWrite(17504,'Fields',[1,4,6],'Values',{2.3,'on','good'},'WriteKey','23ZLGOBBU9TWHG2H')

The result of the above will be an entry as follows

Time              field1  field2  field3  field4  field5  field6 field7 field8 status location

{write time}     2.3                            on              good

 

Is that the effect you are looking for?

January 19, 2018
1:48 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

Kind of I saw an example like that but every example I see are not taking values from the output and putting them into the channel. they all say the values in the code and I dont see the point in that. the output it sees are

data =

'Lounge_Temp
17
Loft_Temp
6
Lounge_Humidity
64
Loft_Humidity
88'

I have Fields with the names Lounge_Temp and Loft_Temp etc... so I need the value after putting in its field. and every time I try to do this I get an error. 

I would have thought this is a normal thing to do but I cant find any examples?

Hopefully you can help?

January 19, 2018
4:31 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

If anyone is interested I fixed it, rather than trying to do all the data in one go and feed it to the channel I made 4 Matlab apps and have set them to keep requesting data.

this is the code that worked. 

 

% Template MATLAB code for reading numeric data from a webpage, analyzing
% the data and storing the analyzed data in a channel.

% Prior to running this MATLAB code template, assign the url for the
% webpage to scrape to the 'url' variable. Also assign the target string to
% search for in the web page to the 'targetString' variable.

% To store the scraped data, you will need to write it to a channel other
% than the one you are reading data from. Assign this channel ID to the
% 'writeChannelID' variable. Also assign the write API Key to the
% 'writeAPIKey' variable below. You can find the write API Key in the right
% side pane of this page as well.

% TODO - Specify URL of the page to read data from
url = 'https://api.thingspeak.com/apps/thinghttp/send_request?api_key=xxxxxxxxxxxxxxx';
% TODO - Specify the target string to search in webpage
targetString = 'Lounge_Temp';

% TODO - Replace the [] with channel ID to write data to:
writeChannelID = xxxxxx;
% TODO - Enter the Write API Key between the '' below:
writeAPIKey = 'xxxxxxxxxxx';

%% Scrape the webpage %%
data = urlfilter(url, targetString);
display(data);

%% Analyze Data %%
% Add code in this section to analyze data and store the result in the
% analyzedData variable.
analyzedData = data;

%% Write Data %%
thingSpeakWrite(writeChannelID, {analyzedData,'Lounge_Temp'}, 'WriteKey', writeAPIKey);

January 19, 2018
9:33 pm
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

You can use a single MATLAB analysis to scrape your data, parse the different fields and write it to 4 fields of a channel, or 4 different channels, if that is what you wish.

I was trying to write the example for you, but your website is malcomp.mooo.com:1010 does not seem to be up.

January 19, 2018
11:39 pm
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

I set up a ThingHTTP that reads the #T1 element on your page to get its data.

Here's the simple MATLAB Analysis App example that parses the data and writes it into fields 1 through 4 of your channel in a single write.

opts = weboptions('Timeout',18);
data = webread('https://api.thingspeak.com/apps/thinghttp/send_request?api_key=USE YOUR THINGHTTP API KEY HERE',opts);
d2 = strsplit(data, char(13));
LoungeTemp = str2double(regexprep(d2{2},'[^0-9]', ''));
LoftTemp = str2double(regexprep(d2{4},'[^0-9]', ''));
LoungeHumidity = str2double(regexprep(d2{6},'[^0-9]', ''));
LoftHumidity = str2double(regexprep(d2{8},'[^0-9]', ''));
thingSpeakWrite(YOURCHANNELIDHERE, [LoungeTemp, LoftTemp, LoungeHumidity, LoftHumidity], 'WriteKey', 'USE YOUR CHANNEL WRITE API KEY HERE');

 

If you're new to MATLAB, you can search for what some of the functions like STRREP, STRSPLIT, STR2DOUBLE, REGEXPREP, etc. do here.

February 6, 2018
6:10 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

WOW!

 

thank you.

it works. i would never had got to that point, to many syntax traps for me in that code!

 

thank you thank you thank you! 

February 28, 2018
5:04 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

after trying this code ive found it doesnt show minus figures anymore.

do you know why? 

February 28, 2018
5:30 pm
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

The example code works by splitting the return string based on new lines (d2 = strsplit(data, char(13));) and then replaces all characters that are not in [0-9] in each of the subsets with a ''. This will essentially convert '-123' into 123. If you want to modify it to not drop the '-' sign, then you will need to modify the regular expression to 

regexprep(d2{4},'[^\-0-9]', '')

That says replace everything that is not '-','0','1',...,'9' with a null value. It spares your '-' sign from being converted to a null string. If you want to ensure decimal points also should not be dropped, you will need to modify to

regexprep(d2{4},'[^\-.0-9]', '')

March 27, 2018
6:55 pm
Avatar
jacktheripper125

Silver
Members
Forum Posts: 18
Member Since:
January 18, 2018
sp_UserOfflineSmall Offline

Amazing! how do you know this stuff??

 

unfortunately due to me being useless I broke it. 

I think it had something to do with me changing the name from LoungeTemp to Lounge_Temp but I have now changed both back but i still get the error.

 

Index exceeds array bounds.

Error in Main_Sensor_Data (line 4)
LoungeTemp = str2double(regexprep(d2{2},'[^\-.0-9]', ''));

 

opts = weboptions('Timeout',18);
data = webread('https://api.thingspeak.com/apps/thinghttp/send_request?api_key=xxxxxxxxxxxxx',opts);
d2 = strsplit(data, char(13));
LoungeTemp = str2double(regexprep(d2{2},'[^\-.0-9]', ''));
LoftTemp = str2double(regexprep(d2{4},'[^\-.0-9]', ''));
LoungeHumidity = str2double(regexprep(d2{6},'[^\-.0-9]', ''));
LoftHumidity = str2double(regexprep(d2{8},'[^\-.0-9]', ''));
thingSpeakWrite(xxxxxx, [LoungeTemp, LoftTemp, LoungeHumidity, LoftHumidity], 'WriteKey', 'xxxxxxxxxxxx');
display(data);

 

my channel info is

Name: Main_Sensor_Data
Channel ID: 4xxxxxxxxx
Access: Public
Read API Key: Oxxxxxxxxx
Write API Key: 
ZxxxxxxxxxF
Fields:
      1: LoungeTemp 
      2: LoftTemp 
      3: LoungeHumidity 
      4: LoftHumidity 

 

as usual thank you for helping me out 🙂

March 27, 2018
11:20 pm
Avatar
Vinod

MathWorks
Members
Forum Posts: 206
Member Since:
May 1, 2016
sp_UserOfflineSmall Offline

I think there is something else going on. Your server at http://malcomp.mooo.com:1010 has become terribly slow and takes a long time to render. Additionally, there are changes in the server which cause it to sometimes return "-1" for the rendered text. To confirm this, open a new Chrome tab and hit F12. You will see the Chrome debugger. Now click no the "Network" tab of the debugger. Now, in the address bar, type: http://malcomp.mooo.com:1010. You will see that the page takes >15s for it to render! This will likely cause your MATLAB code to timeout, since as a free user your MATLAB code has to execute in less than 20 seconds.

I don't believe the solution to what you are looking for is on the ThingSpeak end. You would need to modify the code on your server that is rendering the page at http://malcomp.mooo.com:1010. 

Good luck!

No permission to create posts
Forum Timezone: America/New_York

Most Users Ever Online: 114

Currently Online:
21 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

rw950431: 261

Vinod: 196

piajola: 85

turgo: 70

vespapierre: 63

Adarsh_Murthy: 62

Member Stats:

Guest Posters: 1

Members: 5703

Moderators: 0

Admins: 2

Forum Stats:

Groups: 4

Forums: 17

Topics: 1313

Posts: 4565

Newest Members:

terranceqh2, DianeEmids, Alanawaype, uqovitay, ufijifode, manish01

Administrators: Hans: 387, lee: 457