Reputation
Badges 1
25 × Eureka!ShaggyHare67 could you send the console log trains-agent
outputs when you run it?
Now theΒ
trains-agent
Β is running my code but it is unable to importΒ
trains
Do you have the package "trains" listed under "installed packages" in your experiment?
EnthusiasticCoyote30 you can register an existing Model with:from clearml import InputModel model = InputModel.import_model(weights_url="
"...)
Oh this is so internally, the background thread can signal it is not deferred, are you saying there is bug or the code is odd?
Yep, this is a limitation of the "low tier" G instances... I guess they want you to switch to the P instances?
Which G are you using ?
Hi MammothGoat53
Basically what you are missing are the headers with the Token you have:
https://blog.logrocket.com/secure-rest-api-jwt-authentication/
I think I found something, let me test my theory
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere π
the trend step artifact used to keep track the time of the data so we know the expected trend of the input data. For example, on the first data which is trend_step = 1 the trend value is 10, then if the trend_step = 10 (the tenth data) our regressor will predict the trend value of the selected trend_step. this method is still in research to make it more efficient so it doesn't need to upload artifact every request
Make sense! I would suggest you add a GitHub issue with feature request ...
Yes, as long as the client is served from http://app.something.com it will look for the api server at http://api.something.com
strange ...
If you want to quickly test it:pip install clearml-agent
Then assuming Task id is aabbcc
Runclearml-agent execute --id aabbcc
You should be able to trace if the package was installed
mostly by using
Task.create
instead of
Task.init
.
UnevenDolphin73 , now I'm confused , Task.create is Not meant to be used as a replacement for Task.init, this is so you can manually create an Additional Task (not the current process Task). How are you using it ?
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
I think the main thing we need to...
Hi @<1729309131241689088:profile|MistyFly99>
notice that the files server need to have an "address" that can be accessed from the browser, data is stored in a federated manner. This means your browser is directly accessing the files server, not through the API server, I'm assuming the address is not valid?
BoredGoat1 where exactly do you think that happens ?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L316
?
https://github.com/allegroai/trains/blob/master/trains/utilities/gpu/gpustat.py#L202
Correct the serving Task ID is the clearml serving session. It is the instance that holds all the information of this specific setup and models
Hi @<1720249416255803392:profile|IdealMole15>
I'm assuming you mean on a remote machine with clearml-agent running ?
If you do, then you either use clearml-task
to create a Task (Job) and specify the container and script. or click on "Create New Experiment" in the UI, and fill out the git repo / script and specify the docker image.
Make sense?
Does this file look familiar to you?file not found: archive/constants.pkl
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML (remote execution) sometimes doesn't "pick-up" GPU. After I rerun the task it picks it up.
what do you mean by "does not pick up"? is it the container is up but not executed with --gpus , so no GPU access?
Oh I think that I understand what's going on, @<1523701260895653888:profile|QuaintJellyfish58> let me check how to update the cron scheduler while it is running (I really like this idea, so if this is not already supported I'l like us to add this capability π )
Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?
I'll make sure we have conda ignore git:// packages, and pass them to the second pip stage.
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
Found the issue, fix in the next RC (soon to be out)
You can see in the log it tries to download an artifact from a specific IP:URL is that link a valid one?
(this seems like the main cause of the error, first line in the screenshot)
Could it be there is a Task.init being called Before this code snippet ?
Also, for a single parameter you can use:cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")
This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters
, only for a single parameter)
You need trains-server support, so if trains v0.15 is working with older backend it will revert to "training" type
How are you starting the agent?