Reputation
Badges 1
25 × Eureka!DefiantHippopotamus88 you are sending the curl to the wrong port , it should be 9090 (based on what remember from the unified docker compose) on your setup
DefiantHippopotamus88HTTPConnectionPool(host='localhost', port=8081):This will not work because inside the container of the second docker compose "fileserver" is not definedCLEARML_FILES_HOST=" "
You have two options:
configure to the docker compose to use the networkhost on all conrtainers (as oppsed to the isolated mode they are now running ing)2. Configure all of the CLEARML_* to point to the Host IP address (e.g. 192.168.1.55) , then rerun the entire thing.
DeterminedToad86 I suspect that since it was executed on sagemaker it registered a specific package that is unique for Sagemaker (no to worry installed packages can be edited after you clone/reset the Task)
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent will be able to override them :)
Yep it is the scale š and yes it should appear once you upgrade
basically
would allow blocking the machine from being scaled-in when
Oh this is what I was missing š That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?
If you passed the correct path it should work (if it fails it would have failed right at the beginning).
BTW: I think it is clearml-agent --config-file <file here> daemon ...
Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?
I'll make sure they get back to you
JitteryCoyote63 no you should not (unless you already have the Task.init call in your code)clearml-data add the Task.init call at the beginning of the code in the entry point.
This means you should be able to get Task.current_task() and get back the object.
What do you have under the "uncommitted changes" on the Task that was created?
UnevenDolphin73 clearml.config.get_remote_task_id() will return the Task ID not the Task object. in order to get automagic to work, one h...
Hi DisturbedWalrus17
This is a bit of a hack, but will work:from clearml.backend_interface.metrics.events import UploadEvent UploadEvent._file_history_size = 10Maybe we should expose it somewhere, what do you think?
If the load balancer it Gateway can do the computation and leverage caching,
Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries š )
Iād prefer not to have our EC2 instance directly exposed to the public Internet.
Yep, I tend to agree š
Yes
Are you trying to upload_artifact to a Task that is already completed ?
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent ) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
Hi IcySwallow94
Are you deploying the clearml server with the helm chart ?
the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187
Awesome ! thank you so much!
1.0.2 will be out in an hour
RobustSnake79 this one seems like scalar type graph + summary table, correct?
BTW: I'm not sure how to include the "Recommendation" part š
TenseOstrich47 make sense š
First that is awesome to hear PanickyFish98 !
Can you send the full exception? You might be on to something...
2. Actually we thought of it, but could not find a use case, can you expand?
3. I'm not sure I follow, do you mean you expect the first execution to happen immediately?
481.2130692792125 seconds
This is very slow.
It makes no sense, it cannot be network (this is basically http post, and I'm assuming both machines on the same LAN, correct ?)
My guess is the filesystem on the clearml-server... Are you having any other performance issues ?
(I'm thinking HD degradation, which could lead to a slow write speeds, which would effect the Elastic/Mongo as well)
Thank you @<1523720500038078464:profile|MotionlessSeagull22> always great to hear š
btw, if you feel like sharing your thoughts with us, consider filling our survey , it should not take more than 5min
Hi @<1674588542971416576:profile|SmarmyGorilla62>
You mean on your elastic / mongo local disk storage ?
Hmm check if this one works:optimizer._get_child_tasks_ids( parent_task_id=optimizer._job_parent_id or optimizer._base_task_id, order_by=optimizer._objective_metric._get_last_metrics_encode_field(), additional_filters={'page_size': int(top_k), 'page': 0})If it does, let's PR it as a dedicated function
do you suggest to delete those first?
it might make it easier on the server (I think there is some bug there when it deleted many tasks it tries to parallelize the delete process, but fails to properly sync, anyhow this is fixed and will be pushed with the next clearml-server version)
Hi CrookedWalrus33
I think there if you are already logged in and you pressed on the "signup" tab instead of the "login" tab (frontend are working on a solution)
In the meantime just make sure you are clicking on the "login" tab
I want that last python program to be executed with the environment that was created by the agent for this specific task
Well basically they all inherit the Python environment that points to the venv they started from, so at least in theory it should be transparent when the agent is spinning the initial process.
I eventually found a different way of achieving what I needed
Now I'm curious, what did you end up doing ?
TrickyRaccoon92 the title provided by write.scalars is also a representing string for the specific metric. This is more than just a title on the plot itself.
It means that this will be the name of the scalar metric (title/series combination) .
Is that your intention, or is it for viewing purpose only?
BTW: generally speaking the default source dir inside a docker will be:/root/.trains/venvs-builds/<python_version>/task_repository/<repository_name>/
for example:/root/.trains/venvs-builds/3.6/task_repository/trains.git/