
Reputation
Badges 1
25 × Eureka!The point is, " leap"
is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
Ok the doc needs fixΒ (edited)
suggestion?
We do upload the final model manually.
If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229
I was just wondering if i can make the autologging usable.
It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
` Task.current_task().mo...
I have mounted my s3 bucket at the location /opt/clearml/data/fileserver/ but I can see my data is not being stored in s3 but its storing in ebs. How so?
I'm assuming the mount was not successful
What you should see is a link to the files server inside clearml, and actual files in your S3 bucket
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? π
Hi ItchyHippopotamus18
The iteration reporting is automatically detected if you are using tensorboard, matplotlib, or explicitly with trains.Logger
I'm assuming there were no reports, so the monitoring falls back to report every 30seconds where the iterations are seconds from start" (the thing is, this is a time series, so you have to have an X axis...)
Make sense ?
Logger.current_logger()
Will return the logger for the "main" Task.
The "Main" task is the task of this process, a singleton for the process.
All other instances create Task object. you can have multiple Task objects and log different things to them, but you can only have a single "main" Task (the one created with Task.init).
All the auto-magic stuff is logged automatically to the "main" task.
Make sense ?
I mean , the python package, not the trains-server version
then will clearml associate that image with my experiment and always use that image with it,
when you say "agent to use my docker image," I'm assuming you mean the configuration file or --docker
argument, in both cases this means Default conatiner.
This means that if the Task does Not specify a docker, it will use the one you set in the conf/argument, But Tasks can always specify a different docker to use, and the agent will pull the requested docker based on the Task's entry.
Eve...
Hmm maybe different numpy version? ( numpy==1.22.1
maybe the Task needs a diff version) ? Can you post the Task log ?
Copy paste it here π
Looks great, let me see if I can understand what's missing, because it should have worked ...
If this is the case, there is nothing you need to change, just provide the docker image (no need to pass packages
)
Ohh then you do docker sibling:
Basically you map the docker socket into the agent's docker , that lets the agent launch another docker on the host machine.
You cab see an example here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L144
I'll make sure they get back to you
PompousBeetle71 Could you check with 0.14.3 that just released?
What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?
I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?
Also I just tried the pytorch-lightningΒ
RichProgressBar
Β (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Yey!
` from time import sleep
from clearml import Task
import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?
My plan is to have a AWS Step Functions state machine (DAG) that treats running a ClearML job as one step (task) in the DAG.
...
Yep, that should work
That said, after you have that working, I would actually check pipelines + clearml aws autoscaler, easier setup, and possibly cheaper on the cloud (Lambda vs EC2 instance)
If this works, we might be able to fully replace Metaflow with ClearML!
Can't wait for your blog post on it π
GreasyPenguin14 I think the default is reporting on failed tasks only? could that be?
from clearml.backend_api.session.client import APIClient c = APIClient() c.projects.update(project="project-id-here", system_tags=[])
Okay, some progress, so what is the difference ?
Any chance the issue can be reproduced with a small toy code ?
Can you run the tqdm loop inside the code that exhibits the CR issue ? (maybe some initialization thing that is causing it to ignore the value?!)
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
I think task.init flag would be great!
π
WittyOwl57 this is what I'm getting on my console (Notice New lines! not a single one overwritten as I would expect)
` Training: 10%|β | 1/10 [00:00<?, ?it/s]
Training: 20%|ββ | 2/10 [00:00<00:00, 9.93it/s]
Training: 30%|βββ | 3/10 [00:00<00:00, 9.89it/s]
Training: 40%|ββββ | 4/10 [00:00<00:00, 9.87it/s]
Training: 50%|βββββ | 5/10 [00:00<00:00, 9.87it/s]
Training: 60%|ββββββ | 6/10 [00:00<00:00, 9.88it/s]
Training: 70%|βββββββ | 7/10 [00:00<00...
Hi ReassuredTiger98
I do not want to share with the clearml-agent workstations.
Long story short, no π
The agent is responsible to spin all jobs, regardless of users, basically it has to have a read-only user for all the repositories. I "think" the enterprise version has a vault feature, that allows you to store these kind of secrets on the User itself.
What exactly is the use case?
RoughTiger69 how did you end up with a Task with just "origin" in the repo field ?