Reputation
Badges 1
25 × Eureka!Hi StickyMonkey98
I'm (again) having trouble with the lack of documentation regarding Task.get_tasks(task_filter={STUFF}).
Yes we really have to add documentation there... Let me add that to the todo list
How do I filter tasks by time started? It seems there's a "started" property, and the web ui uses "started" as a key-word in the url query, but task_filter results in an error when I try that...Is there some other filter keyword for filtering by start-time??
last 10 started ...
BTW: StickyMonkey98 if you feel like writing a few examples I think it will be easy to push into the docs, so that at least we improve iteratively...
What I'm trying to do is to filter is between two datetimes...Is that possible?
could you expand ?
One way to circumvent this btw would be to also add/use theย
--python
ย flag forย
virtualenv
Notice that when creating the venv , the cmd that is used is basically pythonx.y -m virtualenv ...
By definition this will create a new venv based on the python that executes the venv.
With all that said, it might be there is a bug in virtualenv and in some cases, it does Not adhere to this restriction
Hmm, could it be that the working dir is outside of the git repo?
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7 ) then use it to create the new venv
As a result -> Could the agent maybe also output theย
virtualenv
ย version used ...
was consistent, whereas for some reason this old virtualenv decided to use python2.7 otherwiseย
Yes,
This sounds like a virtualenv bug I think it will not hurt to do both (obviously we have the information)
ย
Thank you!!! ๐
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
Ohh then YES!
the Task will be closed by the process, and since the process is inside the Jupyter and the notebook kernel is running, it is still running
Hi @<1533257278776414208:profile|SuperiorCockroach75>
ModuleNotFoundError: No module named 'statsmodels'
seems like this package is missing from the Task
wither import it manually import statsmodels (so the automagic logs it)
Or add before task init:
Task.add_requirements("statsmodels")
task = Task.init(...)
ps: no need to @ so many people ...
web-server seems okay, could you send the logs from the api-server?
Also if you can, the console logs from your browser, when you get the blank screen. Thanks.
you can run md5 on the file as stored in the remote storage (nfs or s3)
s3 is implementation specific (i.e. minio weka wassaby etc, might not support it) and I'm actually not sure regrading nfs (I mean you can run it, but it actually means you are reading the data, that said, nfs by definition I'm assuming is relatively fast access)
wdyt?
Please feel free to do so (always better to get it from a user not the team behind the product ๐ )
Hi StickyMonkey98
aย
very
ย large number of running and pending tasks, and doing that kind of thing via the web-interface by clicking away one-by-one is not a viable solution.
Bulk operations are now supported , upgrade the clearml-server to 1.0.2 ๐
Is it possible to fetch a list of tasks via Task.get_tasks,
Sure:Task.get_tasks(project_name='example', task_filter=dict(system_tags=['-archived']))
Hi Martin, of course not,
Smart!
I was just wondering if it has been patched yet and if not what is the expected timeline for patching it
Yes, I believe the target is a patch version 1.15.1 to be released in a couple of weeks. This is not a major issue but it's always better to have have it fixed. (btw: the enterprise version never had this issue to being with, because it is of course authenticated, as well as it has additional RBAC layer on top.)
what do you see in the console when you start the trains-agent , it should detect the cuda version
WobblyCrab70 sure, put a load-balancer in between, AWS has a solution for that basically use the AMI from the GitHub and ask IT to add https on the 8080/8008/8081 ports
OHH nice, I thought that it just some kind of job queue on up and running machines
It's much more than that, it's a way of life ๐
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)
Maybe I need to change something here:ย
apiserver.conf
Not sure, I'm still waiting on answer... It...
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?
CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
Let me check... I think you might need to docker exec
Anyhow, I would start by upgrading the server itself.
Sounds good?
GrievingTurkey78 short answer no ๐
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
if the first task failed - then the remaining task are not schedule for execution which is what I expect.
agreed
I'm just surprised that if the first task is
aborted
instead by the user,
How is that different from failed? The assumption is if a component depends on another one it needs its output, if it does not then they can run in parallel. What am i missing?
Hi CooperativeFox72
I think the upload reporting (files over 5mb) was added post 0.17 version, hence the log.
The default is upload chunk reporting is 5MB, but it is not configurable, maybe we should add it to the clearml.conf ? wdyt?
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) ๐
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
The main reason to add the timeout is because the warning was annoying to users ๐
The secondary was that clearml will start reporting based on seconds from start, then when iterations start it will revert back to iterations. But if the iterations are "epochs" the numbers are lower so you end up with a graph that does not match the expected "iterations" x-axis. Make sense ?
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec ๐
Yes it is reproducible do you want a snippet?
Already fixed ๐ please ping tomorrow, I think an RC should be out soon with the fix