Reputation
Badges 1
25 × Eureka!Are you seeing the argparse arguments in the UI (when running locally) ?
great π
two things:
I'm not sure argparse supports dict as a type (I mean it will take anything but I'm not sure it will parse your arguments as dict) I know there was an issue with argparsing, but I think it was solvedbtw: Basically the way clearml-agent works, it does not actually pass the arguments in commandline but directly to the argparser at runtime
What happens if you clone the Task (the one with Args showing and without the explicit task.connect(_args)
and send it to the age...
When I have:n = 20 duration = 1000 now = time.mktime(time.localtime()) timestamps = np.linspace(now, now + duration, n) dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps] values = np.sin((timestamps - now) / duration * 2 * np.pi) fig = go.Figure(data=go.Scatter(x=dates, y=values, mode='markers')) task.get_logger().report_plotly( title="plotly", series="b", iteration=0, figure=fig)
Everything looks okay
ThickDove42 Windows conda python3.6 was exactly what I was using,
started the jupyter with:
"python -m jupyter notebook"
Then opened / created a new notebook, everything worked.
Tested on latest clearml 0.17.2
Maybe it's something with the path to the repo that breaks it? Because obviously the issue is it is looking in the wrong folder.
I did not start with python -m, as a module. I'll try that
I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?
You can try callingtask._update_repository()
I'm still trying to figure out how to reproduce it...
What's the jupyter / noetbook version you have?
Also from within the jupyter could you send me "sys.argv" ?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
ThickDove42 looking at the code, I suspect it fails interacting with the actual jupyter server (that is running on the same machine, but still).
Any chance you have a firewall on the Windows machine ?
Nice! I'll see if we can have better error handling for it, or solve it altogether π
Hi FancyChicken53
This is a noble cause you are after π
Could you be more specific on what you had in mind, I'll try to find the best example once I have more understanding ...
an implementation of this kind is interesting for you or do you suggest to fork
You mean adding a config map storing a default trains.conf for the agent?
at that point we define a queue and the agents will take care of trainingΒ
This is my preferred way as well :)
Questions
I want to trigger a retrain task when F1
That means that in inference you are reporting the F1 score, correct?
As part of the retraining I have to train all the models and then have to choose best one and deploy it
Are you using passing output_uri to Task.init? are you storing the model as artifact?
You can tag your model/task with "best" tag (and untag the previous one). Then in production , look for the "best" task and get its model
Thoughts?
For classification it's F1 score but for other task it maybe and I don't think that's problem. we just have to log it right?
Correct π
Give me few days, I will work on your sugestions and then let you know if I am not able to do this
Sounds good!
BTW:previous_tasks = Task.get_tasks(task_filter={'tags': 'best'}) local_model_file = previous_tasks[0].artifcats['my_model'].get_local_copy()
BTW:
Error response from daemon: cannot set both Count and DeviceIDs on device request.
Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
Okay, I'll make sure we always qoute "
, since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?
SmarmySeaurchin8
updated_tags = task.tags
updated_tags.remove(tag)
task.tags = updated_tags
SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')
Hi SmilingFrog76
Great question, sadly multi-node is never simple π
Let's start with the basic, let's assume one worker is available and the other is not, what would you want to happen? (p.s. I'm not aware of flexible multi-node training frameworks, i.e. a framework that can detect another node is available and connect with it mid training, that said, it might exist π )
I failed to update the "STARTED AT" and the "COMPLETED AT" attributes in the "INFO" tab.
I'm not sure this can actually be overridden...
what's the error/reply ?